Research on the Design and Implementation of Preschool Education Book Recommendation
System Based on Data Mining Algorithm
XinZongli1
-
(Northern Beijing Vocational Education Institute, Beijing 101400, China)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
One-way frequent pattern tree, Spark platform, Design and Implementation, Preschool Education Book, Recommendation system
1. Introduction
In the era of big data, the information environment of the book platform has changed
from insufficient information overload to information overload, and its corresponding
service mode should also change from “people looking for information” to “information
looking for people” [1,2]. This also puts forward new requirements and challenges for the online book platform.
The book platform should not be limited to users’ passive search to provide information,
but should adapt to user needs and improve content delivery, ease of use and service
responsiveness [3]. Find the content they need in the complex information, which requires users to spend
a lot of time and energy to screen and evaluate the search results, and cannot significantly
solve the problem of information overload [4]. At the same time, when the user needs are not clear and the keyword description
content is not accurate, often get the user is not interested in the results, users
need to search for many times to find their own satisfactory results. Since its performance
depends on the hidden association rules to provide important reference for decision
making. With the deepening of many scholars’ research on this technology, the correlation
rule mining has been applied in different fields and achieved good results [5,6]. At present, related rules have been widely used in the retail industry, medical
care, securities market and other fields. For example, the related rules are applied
in the retail industry, according to the consumption behavior, dig out the purchasing
habits of customers, help merchants to design reasonable commodity layout, and formulate
appropriate sales strategies to maximize profits; find the common characteristics
of patients with certain disease from a large number of case records, which has important
guiding significance for the existing stock trend, it has important reference value
for investors to decide [7].
Books, as easily transportable and affordable goods, have become a significant component
of e-commerce. The vast amount of data generated in daily transactions contains valuable
insights, posing a challenge for businesses to effectively extract useful information.
Data mining technology is essential for merchants to analyze transaction records,
providing a crucial basis for decision-making. However, traditional frequent item
set mining algorithms are limited by computing power and memory when applied to large-scale
data [8,9]. To address this, parallelization techniques, particularly using the Spark framework,
can enhance the efficiency of frequent item set mining by processing data in memory
and reducing disk reads [11,12]. This paper explores improvements to existing mining algorithms, leveraging Spark’s
distributed computing capabilities, and applies these advancements to a parallelized
book recommendation system [13,14]. By mining transactional data, the improved algorithm enhances recommendation accuracy,
demonstrating both practical and broad applicability in e-commerce settings.
2. Design and Implementation of the Frequent Term Set Mining Algorithm Based on UFP-tree
2.1. Creation of the Constrained Subtree
Data cleaning is not simply a process of selecting high-quality data, but of adding,
deleting, grouping or reorganizing the original data, so as to improve the quality
of data and reduce the impact of this gap on the mining results. As shown in Eqs. (1), (2), data from multiple files or multiple databases running environments are integrated
together and stored in the same data warehouse.
Data integration can effectively reduce data redundancy, data value conflict and other
problems, and effectively improve the efficiency of data processing. Data selection,
in order to better serve users’ decision-making and improve the efficiency of data
mining, needs to narrow the scope of data, such as shown in Eqs. (3) and (4), and only extract part of the data that users are interested in. Data transformation,
data transformation is the data into an analysis model for a specific data mining
algorithm.
Different mining tasks process different data types, so we should decide to choose
a specific data transformation method according to the following mining tasks. The
methods of data transformation include normalization, reduction, switching, rotation
and projection. As shown in Eqs. (5) and (6), according to the characteristics and uses of the data, the appropriate data mining
algorithm is selected, so as to more accurately search out the mode of interest in
the data. The parameters may need to be adjusted multiple times during the algorithm
execution until satisfactory results are obtained.
The patterns obtained by data mining may have no practical value or contradict the
actual scenario, as shown in Eqs. (7) and (8), which requires pattern evaluation. The pattern obtained by data mining is evaluated
in practical application to determine whether the model can achieve the predetermined
goal. At this stage, the final decision to use the mining results is made.
This algorithm tries to use samples to reduce the number of scans of the database.
This algorithm can effectively reduce the number of I / O, so as to achieve a performance
improvement. As shown in Eqs. (9), (10), but due to the nonuniformity of transaction data distribution, this method does
not guarantee the accuracy of mining, that is, data distortion. Therefore, this algorithm
is suitable for mining scenarios with high efficiency requirements and not too high
accuracy requirements.
2.2. Mining of Frequent Item Sets
Association refers to the special law of the existence of values between two or more
variables. This association includes simple association, quantitative association,
temporal association and so on. Association rule mining is the most commonly used
method in association analysis. As shown in Eqs. (11) and (12), we can find the correlation relationship between data and find the hidden and important
knowledge through association rule mining, where X and Y are disjoint rule precursor
and rule backs.
Due to the lack of exact correlation functions for the associations between the data,
as shown in Eqs. (13) and (14), the support and confidence measures are needed to describe the strength of the mined
correlation rules.
The number of entries that appear must be equal to or greater than the given minimum
support. As shown in Eqs. (15), (16), in the above two steps, the second step is carried out on the basis of the first
step, which is relatively simple, so the key and difficulty of the association rule
mining algorithm lies in the frequent item set mining.
Divide the original transaction database into m different subblocks, as shown in Eqs. (17), (18), the principle of division is that each subblock data can be loaded into memory;
each subblock independently obtains the local frequency item set, where the minimum
support count of each subblock frequent item set is the product of min_sup and the
total number of transactions of the subblock. Local frequencies set of each subblock
can be obtained by scanning the original transaction database once.
Using the property “the frequent item set is frequent in at least one subblock”, as
shown in Eqs. (19), (20). Partition The algorithm is highly parallelized, requiring only scanning the complete
database twice, quite large and the redundant calculation cost is high.
3. A Parallelization Implementation of the UFIM Algorithm on the Spark
3.1. Parallel-based Analysis of the UFIM Algorithm
When using such algorithms, the accuracy of the mining results depends on the size
of the sampling capacity: if the sample data is too little, the mining results may
be greatly biased and cannot meet the mining requirements; if the sample data is too
large, the significance of using the sampling algorithm is lost [15]. Based on the hash method, the DHP algorithm proposed by Park et al utilizes the
hash technique to reduce the number of candidates sets included in Ck. For example,
when going through the transaction database to generate frequent 1-item sets, Fig. 1 is a flowchart of the collaborative filtering algorithm. All 2-item sets can be generated
according to the transaction, mapping them to different buckets of the scatter list,
and adding the corresponding bucket count value [16,17]. If the bucket count is less than the support threshold, none of the 2-item sets
in this bucket are frequent 2-item sets. The number of Ck can be effectively reduced
by hash-based techniques [18,19]. The efficiency of the DHP algorithm depends on the selected hash function. Transaction
compression method that improves the efficiency of frequent item set mining by reducing
the size of the future scanning transaction database. By marking or deleting transactions
that do not contain any item in Ck, you calculate the Ck support. With the increase
of k value, the actual number of transactions will be significantly reduced [20,21].
Fig. 1. Flow chart of the collaborative filtering algorithm.
The Apriori algorithm is the first of the frequent term set mining algorithm, it can
mine all the frequent term sets, but its inherent performance bottleneck causes the
algorithm cannot well adapt to the needs of the era of big data [22,23]. Table 1 shows the table of running time changes with support, and the defect FP-Growth algorithm
for the Apriori algorithm emerged. This algorithm mines out all the frequent items
by squeezing the transaction database into the frequent mode tree (FP-tree) and using
the pattern growth. Compared with Apriori algorithm, FP-Growth algorithm has greatly
improved the time efficiency [24,25]. It compresses the large database into a tree structure and avoids the cost of scanning
the database repeatedly. The idea of separation and treatment is adopted in frequent
item set mining, and the problem of finding long frequent term set is transformed
into discovering short frequent term set mining through iterative operation [26,27]. However, FP-tree occupies a lot of memory space, and when the data is large, it
may not be loaded into memory. In the process of frequent item set mining, a large
number of conditional pattern base and frequent pattern trees should be formed, which
consumes a lot of time and space. The whole process adopts the recursive calling strategy
of pattern growth, which also occupies a lot of CPU time and reduces the mining efficiency
[28,29].
Table 1. Changes of running time with support.
|
Minimum support level
|
FP-growth running time
|
Algorithm running time
|
UFIM running time
|
|
10
|
25.736
|
1.248
|
1.063
|
|
20
|
13.149
|
0.421
|
0.354
|
|
30
|
5.631
|
0.188
|
0.172
|
|
40
|
2.950
|
0.172
|
0.156
|
|
50
|
1.520
|
0.148
|
0.125
|
Create a two-dimensional array when creating the conditional FP-tree, and the elements
in the array are the count of any two frequent term pairs [30]. Fig. 2 is the flowchart of the content base recommendation algorithm. Through this two-dimensional
array, it can determine which are frequent items when generating the new condition
FP-tree, and reduce the recurrence of the original condition FP-tree, thus improving
the mining efficiency of frequent item set; using H-struct data structure. This data
structure is more space-saving than using candidate discovery frequency item sets
and pattern growth patterns. Good expansion in various types of data; adopt an additional
structure hash structure to store the items and positions in the item header table.
When you need to find an item i from the item header table, first find the value corresponding
to the key in the scatter list, and then directly read the structure according to
the order of the item header table. An algorithm that uses the original FP-tree without
generating conditional FP-tree. The basic idea of this algorithm is to first scan
the transaction database twice, put the data in a highly compressed unidirectional
tree, and then use this unidirectional tree to construct a subtree constrained by
single terms, thus recursively mining all the sets of frequent items in a pattern
growth way. The input data and output directory specified in this job are correct
and send the job jar file, job division, configuration file and other resources required
for the job operation to Job Tracker, these resources are stored in the folder that
Job Tracker has created specifically for the job, the file name is the job ID, Job
Id; Job Tracker After receiving the task, Would put it in a queue, Wait for the job
scheduler to dispatch it. When the job is executed by the job scheduler, Job Tracker
creates Map tasks one by one based on the job division and assigns them to Task Tracker
for execution.
Fig. 2. Flow chart of the content base recommendation algorithm.
3.2. The UFIM Algorithm Is Designed Based on Spark
In Spark, Map tasks prioritize “data localization,” where the task runs on the same
node as the data, minimizing data transmission time. Table 2 illustrates how run time changes with the number of transactions. Unlike Map tasks,
reduce tasks don’t consider data localization. Task Trackers report regularly to Job
Tracker to confirm they’re either executing tasks or idle, ready for new tasks. If
a node fails, the task is quickly reassigned to another slave node. Each task runs
in its own Java virtual machine, and outputs are saved in temporary HDFS files before
being merged into a final result file. Users submit applications through a client,
which initializes Spark Context on the driver node. During system setup, two scheduling
modules are created: DAG Scheduler and Task Scheduler. DAG Scheduler groups tasks
into Stages based on dependencies, while Task Scheduler submits these Stages (as Task
Sets) for execution. Task monitoring and reporting occur during execution. Cluster
resource management is handled by Spark’s own manager or other systems like YARN or
Mesos. Upon completion, the Executor process stores the results, and the resource
manager reclaims resources, completing the Spark application execution.
Table 2. Running time changes with the number of transactions.
|
Transaction number
|
Number of frequent item sets
|
Algorithm running time
|
UFIM running time
|
|
1000
|
389279
|
0.734
|
0.610
|
|
2000
|
563663
|
1.094
|
0.860
|
|
3000
|
530501
|
1.047
|
0.958
|
|
4000
|
275087
|
1.172
|
0.957
|
|
5000
|
597519
|
1.156
|
0.984
|
The Spark computing framework features several operational modes: Spark Local, Spark
Standalone, Spark Mesos, and Spark YARN. Spark Local Mode simulates distributed operations
using multiple single threads. This straightforward mode requires only unzipping the
Spark package and modifying common configurations. However, it lacks a distribution
mechanism, making it primarily suitable for verifying the logical correctness of applications
rather than assessing time efficiency. Spark Standalone Mode operates independently
of other resource management systems and can be deployed in a single cluster with
complete services. It only supports FIFO scheduling for multiple simultaneous tasks.
Compared to MapReduce, this mode utilizes a master/slave architecture and employs
Zookeeper to address single points of failure. In contrast to MapReduce’s distinction
between Map and Reduce tasks, Spark improves resource utilization by allowing a single
slot to handle various task types. Spark Mesos Mode provides two scheduling options:
coarse-grained and fine-grained. In coarse-grained mode, Mesos allocates all resources
of the running environment, retaining them even when not in use, and recovers these
resources after application execution. This mode offers a low startup cost for tasks
but lacks flexibility in resource allocation, leading to underutilization of cluster
resources and subsequent waste, as shown in Fig. 4, which depicts the accuracy evaluation. The fine-grained mode allocates resources
before task execution.
Fig. 3. Flow chart of the hybrid recommendation algorithm.
Fig. 4. Accuracy assessment.
The UFP-tree is a compact data structure similar to the FP-tree but features key differences.
The UFP-tree is unidirectional, containing pointers only from the current node to
its parent, while the FP-tree has bidirectional pointers. FP-tree nodes are identified
by data items, whereas UFP-tree nodes are identified by frequent 1-item sets. Additionally,
children in FP-trees are unordered, while those in UFP-trees are arranged in ascending
order. Each node in the FP-tree has six domains, while the UFP-tree has only four,
resulting in lower memory usage. After constructing the UFP-tree, a constrained subtree
can be generated to mine all frequent item sets. The constrained subtree is analyzed
in two scenarios: pointing to the same or different endpoints. The proposed improvement
reduces recursion and storage requirements without altering the frequent item sets
mined. The UFIM algorithm primarily comprises the UFP mining process, which generates
frequent 1-item sets and constrained subtrees, and the subsequent mining process for
longer item sets.
4. Design of Recommendation System for Preschool Education Based on UFIM Algorithm
In a single-machine environment, frequent item set mining algorithms can efficiently
process small data volumes, as illustrated in Fig. 5, which shows the recall rate evaluation. However, the computing power and memory
limitations in this environment hinder algorithm performance, often leading to challenges
such as the difficulty of storing the constructed frequent mode tree in memory. Hadoop
addresses big data processing effectively but is limited by its method of writing
Map stage results to local disk and subsequently reading from it after the Reduce
phase, rendering iterative frequent item set mining inefficient. In contrast, Spark,
a memory-based big data processing framework, retains intermediate results in memory,
minimizing frequent read/write operations and enhancing performance during iterative
processes. This chapter will focus on parallelizing the design and implementation
of the UFIM algorithm on the Spark platform, ensuring that processing tasks do not
interfere with one another. In real-world applications, complex tasks are divided
into subtasks, which can introduce time and space overhead due to data synchronization.
Nevertheless, parallel computing, whether through new algorithm design or analyzing
existing serial algorithms for parallelizable components, significantly improves execution
efficiency compared to traditional serial methods.
Fig. 5. Recall assessment.
This paper uses the second way to parallelize the serial UFIM algorithm proposed.
Fig. 6 shows the coverage evaluation graph. Construct the subtree constrained by a single
item according to the item header table, and the frequent item set mined from the
constrained subtree contains m. Other item mining procedures in the item header table.
Obviously, the process of constrained subtrees of different terms and mining the set
of frequent items is mutually interference. UFIM algorithm mining process parallelization
is simply divided subtask to construct a single constraint subtree required data as
the basic unit will be distributed to the working node, each work node can independently
mine contains frequent item set, eventually the node get the frequent set summary
constitute the frequent set of the whole transaction database. Since the operation
of Spark is based on RDD, when UFIM algorithm is parallelized by Spark platform, you
need to convert the transaction database to RDD and then mine all the sets of frequent
items based on RDD. The parallelization of the UFIM algorithm on Spark is mainly divided
into the following five steps: to convert the transaction database to the initial
RDD.
Fig. 6. Coverage assessment.
Spark parallel computing relies on RDDs (Resilient Distributed Datasets) to process
transaction databases. The transaction database is first converted into an RDD, referred
to as T. As illustrated in Fig. 7, support counts for all items are calculated in parallel to yield frequent 1-item
sets. Initially, the transactions in T are sorted according to the F_List, generating
a new dataset, T’. The items in T’ are then grouped, producing G_List, which includes
group numbers and corresponding segments of the transaction database. The UFIM algorithm
is executed in parallel to mine the frequent item sets. Unlike the traditional UFIM
algorithm, this approach employs parallelization to prevent redundant mining across
multiple groups. Each group generates a UFP-tree item header table containing only
serial numbers for frequent items specific to that group. The transaction database
is transformed into an RDD using the map and reduceByKey operations. The map operation
converts each transaction into a <key, value> pair, with the key as the transaction
and the value initialized to 1. The reduceByKey operation aggregates these values
to form the initial RDD, which is used for subsequent frequent item set mining, enabling
efficient processing without re-scanning the original database.
Fig. 7. Diversity assessment plots.
The purpose of data grouping in a Spark cluster is to facilitate parallel computing
without the need for inter-node communication, as shown in Fig. 8, which illustrates user satisfaction evaluation. First, the F_List set is numbered
in descending order of support count, generating a F_Ranked List in the format <key:
frequent item, value: frequent item ranking>. Next, the T transaction database is
numbered, and infrequent items are removed using the F_Ranked List to obtain T’. Data
items are then grouped by determining the maximum number of frequent items in each
group based on F_List. The number of frequent item sets per group (Q) is calculated
and increased by one. Using a flatMap operation, items in T’ are divided into appropriate
groups. The division method involves iterating through each transaction, assigning
a group number (gid) based on the formula (t[i]−1)/g_size. If a group number does
not exist, the corresponding part of the transaction is added to that group. Subsequently,
a groupByKey operation aggregates data by group number to form G_List. Frequent item
set mining occurs in parallel across nodes, utilizing the grouped data in G_List.
Each node independently applies a serial frequent item set mining algorithm, generating
frequent item sets and their support values.
Fig. 8. User satisfaction assessment chart.
The completion of the previous step entailed mining all the frequent item sets and
numbering them for evaluation, as illustrated in Fig. 9 for click assessment. To convert these serial numbers into distinct items, a mapping
operation is executed, translating the serial numbers into corresponding items from
the F_Ranked List. The resultant set, denoted as Result, can then be saved using “save
As Text File()” for preservation in a specified location. Given the vast amount of
books stored, featuring potentially thousands of similar or identically titled books,
enhancing user experiences becomes crucial. Major book websites, such as Amazon, encompass
around 3.1 million books, necessitating efficient retrieval methods.
Fig. 9. Click rate evaluation.
Extensive searches based on classifications, titles, and authors, though viable, often
consume time without guaranteeing satisfactory outcomes, an issue both users and businesses
seek to mitigate. Reading through book content for selection further compounds time
expenditure. Novices in certain professions might inadvertently purchase low-quality
books, dampening their interest and, subsequently, impeding societal progress. A personalized
book recommendation system serves practical importance, with this paper deploying
the Spark-based parallel UFIM algorithm to enhance book recommendations, reinforcing
its applicability. An integral facet determining software acceptance in the market
pertains to the standardization of the development process.
Software development commences with requirement analysis, aimed at comprehending user
demands and defining the software system’s functionalities within the business context.
Subsequent phases span design, implementation, and testing. Design intricacies cover
software architecture and data structure specifications post-requirement assessments,
further segmented into summary and detailed designs. Implementation focuses on materializing
module functions per design, emphasizing the optimal selection of programming languages
and tools for efficient software creation. Testing stages scrutinize software alignment
with design expectations, differentiating between white box and black box testing
methodologies based on internal software comprehension emphasis.
5. Experimental Analyses
Based on the user’s collaborative filtering, suitable scenario is less users, more
items of scenario, Fig. 10 for conversion rate evaluation, and most of the scenario, is the number of users
more, if the collaborative filtering based on user will lead to calculation time is
too long, although offline recommendation is calculated in advance does not need to
consider the problem of time. In addition, based on user collaborative filtering,
because changes in users’ interests can also lead to inaccurate recommendation results.
Comprehensive consideration, in the scoring matrix is relatively dense scenario, this
article or choose based on collaborative filtering, it is more suitable for recommended
system late items less users, and can meet the needs of the personalized users, and
because of the similarity between items more stable, not frequent changes, is more
suitable for offline recommendation, ensure the accuracy of the recommendation system,
and also can save calculation time.
Fig. 10. Conversion rate assessment.
Based on the collaborative filtering of items, we need to find groups of users who
are interested in the same item. Fig. 11 is a retention assessment chart, and in order to find the interest of users, we can
use the scoring data. If a person has rated an item, it can indicate that he is interested
in the item. Therefore, the scored data can be mined to calculate the similarity between
items. With the behavioral data, the behavioral data in the existing data is the reader’s
rating of the book. Therefore, after the connection from the system to MongoDB is
configured and Spark is defined, Spark reads are stored offline in the scoring data
set Rating of the database, calling the map () method to extract the required fields:
“User Id”, “Product Id”, “Rating”, defined as table rating DF, and the data structure
is converted to Data Frame for subsequent calculation and processing. Also, call the
cache () method cache for performance purposes.
Fig. 11. Retention rate assessment chart.
The loaded scoring data is preprocessed and entered into the core algorithm section.
According to the formula, Fig. 12 is the turnover rate evaluation chart. There are two values to be calculated to calculate
the current similarity, one is the number of the numerator, and the other is the denominator
to score the number of each two books at the same time. The scoring table rating DF
calls group by () method, which configure parameter Product Id, groups the users according
to the dimension of the product, and counts the scores of each book. The table structure
generated here is also Data Frame, the custom table name is product Rating Count DF,
and the table contains field Product Id and the number of scores of each book count.
Fig. 12. Rain assessment.
6. Conclusion
This paper begins by outlining the research background of book recommendation systems,
including their significance, current status both domestically and internationally,
and the various stages of research. The objective is to develop a book recommendation
system capable of handling large-scale data processing while improving accuracy. To
implement the recommendation system, two core technologies are utilized. The first
is a computing framework for large-scale data processing. This paper selects the Spark
architecture, considering the need for both offline computing and online real-time
processing. It explains Spark’s principles, including Resilient Distributed Datasets
(RDDs), and discusses the evolution of the Spark community to address diverse scenarios
within the Spark ecosystem. Compared to Hadoop, Spark is more suitable for streaming
computing, allowing for rapid calculations of large datasets. The second core technology
is the recommendation algorithm. This study examines mainstream recommendation algorithms,
explaining their central concepts and suitability for different scenarios. The validation
of these algorithms contributes to the development of a prototype system capable of
addressing various multi-scenario challenges.
Based on the research and analysis of the core technology, this paper introduces the
design and implementation of the system. First, the system architecture of the project
is described, including the offline recommendation module, real-time recommendation
module and data transmission storage module. For, the offline recommendation module
and real-time recommendation module mainly introduce the recommendation algorithm
used by the module, the core code and implementation steps. For the data transmission
and storage module, it mainly introduces the tools used in offline data and online
data processing, and sorts out all the links of data from acquisition and filtering
to computing and storage, so that the tools and processing involved in the data life
cycle are clearer. According to the evaluation index curve in the figure, the algorithm
training achieves convergence around 450 rounds, and the fluctuation range of SSIM
is 42.6-42.3, and that of SSIM is 0.92∼0.96. According to the evaluation index analysis,
PRI increased by about 2.5%∼3.5%, GCE decreased by about 15%∼20%, and VOI decreased
by about 13%∼16%. The number of pixels in groups A and B was 408000 and 745608. Image
similarity reached 97.84% in group A, 96.21% in group B, and 99.33% in group C.
Finding
This work was supported by Northern Beijing Vocational Education Institute. Research
and Application of Teachers’ Guidance Ability Development for Family Education for
Preschool Education Majors in Higher Vocational Colleges Based on Home-School Cooperation
(YD202103).
References
Xu D., Zhang Y., Jin F., 2021, The role of AKR1 family in tamoxifen resistant invasive
lobular breast cancer based on data mining, BMC Cancer, Vol. 21

Fiaschetti M., Graziano M., Heumann B. W., 2021, A data-based approach to identifying
regional typologies and exemplars across the urban-rural gradient in Europe using
affinity propagation, Regional Studies, Vol. 55, No. 12, pp. 1939-1954

Vemprala P., Bhatt P., Valecha R., Rao H. R., 2021, Emotions during the COVID-19 crisis:
A health versus economy analysis of public responses, American Behavioral Scientist,
Vol. 65, No. 14, pp. 1972-1989

Bao Y., Yang Q., 2022, A data-mining compensation approach for yaw misalignment on
wind turbine, IEEE Transactions on Industrial Informatics, Vol. 17, No. 12, pp. 8154-8164

Kaspersky A., 2021, Selected morphotic parameters differentiating ulcerative colitis
from Crohn's disease, Acta Mechanica et Automatica, Vol. 15, No. 4, pp. 249-253

Mandorino M., Figueiredo A. J., Cima G., Tessitore A., 2021, A data mining approach
to predict non-contact injuries in young soccer players, International Journal of
Computer Science in Sport, Vol. 20, No. 2, pp. 147-163

Baruah A. J., Baruah S., 2021, Data augmentation and deep neuro-fuzzy network for
student performance prediction with MapReduce framework, International Journal of
Automation and Computing, Vol. 18, pp. 981-992

Sharifi M., Khatibi T., Emamian M. H., Sadat S., Hashemi H., Fotouhi A., 2021, Development
of glaucoma predictive model and risk factors assessment based on supervised models,
Bio Data Mining, Vol. 14

Singh T., Panda S. S., Mohanty S. R., Dwibedy A., 2023, Opposition learning based
Harris hawks optimizer for data clustering, Journal of Ambient Intelligence and Humanized
Computing, Vol. 14, pp. 8347-8362

Torres-Ramos S., Fajardo-Robledo N. S., Pérez-Carrillo L. A., Castillo-Cruz C., Retamoza-Vega
P. del R., Rodríguez-Betancourtt V. M., Neri-Cortés C., 2021, Mentors as female role
models in STEM disciplines and their benefits, Sustainability, Vol. 13, No. 23

Guimarães N., Figueira Á., Torgo L., 2021, Can fake news detection models maintain
the performance through time? A longitudinal evaluation of twitter publications, Mathematics,
Vol. 9, No. 22, pp. 2988

Torres-Berru Y., Batista V. F. L., 2021, Data mining to identify anomalies in public
procurement rating parameters, Electronics, Vol. 10, No. 22

Maciejewska K., Froelich W., 2021, Hierarchical classification of event-related potentials
for the recognition of gender differences in the attention task, Entropy, Vol. 23,
No. 11

Wei S., Shen X., Shao M., Sun L., 2021, Applying data mining approaches for analyzing
hazardous materials transportation accidents on different types of roads, Sustainability,
Vol. 13, No. 22

Bertón M. J., 2021, Text and data mining exception in South America: A way to foster
AI development in the region, GRUR International, Vol. 70, No. 12, pp. 1145-1157

Meng J., Zhang H., Wang X., Zhao Y., 2021, Data mining to atmospheric corrosion process
based on evidence fusion, Materials, Vol. 14, No. 22, pp. 6954

Wei J., Dai J., Zhao Y., Han P., Zhu Y., Huang W., 2021, Application of association
rules analysis in mining adverse drug reaction signals, Applied Sciences, Vol. 11,
No. 22

Xu J., Liu Z., Lin Y., Liu Y., Tian J., Gu Y., Liu S., 2021, Grey correlation analysis
of haze impact factor PM2.5, Atmosphere, Vol. 12, No. 11

Neissi L., Golabi M., Albaji M., Naseri A. A., 2022, Evaluating evapotranspiration
using data mining instead of physical-based model in remote sensing, Theoretical and
Applied Climatology, Vol. 147, pp. 701-716

Xu F., Qu S., 2021, Data mining of students' consumption behaviour pattern based on
self-attention graph neural network, Applied Sciences, Vol. 11, No. 22

Ferencsik D. K., Varga E. B., 2021, Cycling activity dataset creation and application
for feedback giving, Acta Marisiensis Seria Technologica, Vol. 18, No. 2, pp. 29-35

Ekerete I., Garcia-Constantino M., Diaz-Skeete Y., Nugent C., McLaughlin J., 2021,
Fusion of unobtrusive sensing solutions for sprained ankle rehabilitation exercises
monitoring in home environments, Sensors, Vol. 21, No. 22

Gao Q., Molloy J., Axhausen K. W., 2021, Trip purpose imputation using GPS trajectories
with machine learning, ISPRS International Journal of Geo-Information, Vol. 10, No.
11

Kaewyotha J., Songpan W., 2021, Multi-objective design of profit volumes and closeness
ratings using MBHS optimizing based on the PrefixSpan mining approach (PSMA) for product
layout in supermarkets, Applied Sciences, Vol. 11, No. 22

Mariana-Ioana M., Czibula G., Oneț-Marian Z.-E., 2021, Towards using unsupervised
learning for comparing traditional and synchronous online learning in assessing students'
academic performance, Mathematics, Vol. 9, No. 22

Konieczny J., Stojanowski J., Rydzyńska K., Kusztal M., Krajewska M., 2021, Artificial
intelligence—A tool for risk assessment of delayed-graft function in kidney transplant,
Journal of Clinical Medicine, Vol. 10, No. 22, pp. 5244

Yang J., Wang L., 2021, Applying MMD data mining to match network traffic for stepping-stone
intrusion detection, Sensors, Vol. 21, No. 22

Wang D., Zou Y., Li H., Yu S., Xia L., Cheng X., Xu T., 2021, Data mining: Traditional
spring festival associated with hypercholesterolemia, Cardiovascular Disorders, Vol.
21

Rahman M. A., Duradoni M., Guazzini A., 2022, Identification and prediction of phubbing
behavior: A data-driven approach, Neural Computing and Applications, Vol. 34, pp.
3885-3894

Sheikhhosseini Z., Mirzaei N., Heidari R., Monkaresi H., 2021, Delineation of potential
seismic sources using weighted K-means cluster analysis and particle swarm optimization
(PSO), Acta Geophysica, Vol. 69, pp. 2161-2172

Author
Zongli Xin is a master of science in administration. She graduated from Southwest
Jiaotong University in 2006. She worked in Beijing Jingbei Vocational College. Her
research interests include preschool education and home schooling.