ZhangRuixue1
               
                  - 
                           
                        (Department of College English, Zhejiang Yuexiu University, Shaoxing, 312000, China
                        ruixue.zhang@gmx.com )
                        
 
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Keywords
               
                Decision tree,  Resource recommendation model,  ID3 algorithm,  English reading
             
            
          
         
            
                  1. Introduction
               English is the lingua franca for communication in all disciplines, so English learning
                  is becoming increasingly important in universities (Yi, 2020). The increase in teaching
                  requirements has not brought about changes in teaching methods, and most educators
                  have optimized the teaching arrangements from inside and outside the classroom or
                  improved them for English teaching. Nevertheless, such improvements are limited (Duan,
                  2021). Internet development has enabled various learning resources to spread worldwide.
                  This vast pool of resources can provide specific ideas for teaching English (Zhou,
                  2021). Among them, the selection of reading resources has a certain impact on reading
                  instruction. Students’ learning progress will be delayed if the selected resources
                  do not match the characteristics of the students. Hence, establishing an interactive
                  adaptive reading resource selection can effectively improve English reading instruction
                  (Ma, 2021). This study used ID3, a classification algorithm with interaction, as the
                  basis of the model and optimized its information gain formula to solve the problem
                  of local optimality. The model before and after improvement and the traditional recommendation
                  method were simulated and compared to evaluate the performance and practicality of
                  the model. The change in students’ reading scores and feedback was the performance
                  indicator, and the study also attempted to use the model to mine the recommended solutions
                  for different types of students to provide some ideas for improving English reading
                  instruction.
               
             
            
                  2. Related Work
               The ID3 algorithm is the most commonly used algorithm in decision trees and is used
                  widely as the basis for various complex systems because of its superior classification
                  performance. Park et al. (2018) developed an ID3 adaptive path selection model using
                  the fuzzy decision tree algorithm to overcome the sensitivity of decision trees in
                  route selection. Simulation experiments were conducted on this model. The results
                  showed that this improvement could improve the prediction accuracy of the model with
                  good adaptability. An and Zhou (2022) examined the effect of the decision tree algorithm
                  in rural energy construction and set thresholds for the selection algorithm of terrain
                  features by the ID3 algorithm to promote attribute complementarity to filter irrelevant
                  attributes. The spatial optimization configuration problem for establishing solar
                  energy in rural areas showed that the algorithm had particular promotion potential.
                  Abbas A R and Farooq A O used ID3 to distinguish between skin and non-skin pixel types,
                  specifically by improving the ID3 algorithm to improve the skin detection accuracy
                  and exclude the interference of skin color on the recognition results. They added
                  three color space data sets to the algorithm. The results showed that the system accuracy
                  of each index was above 99.50% (Abbas and Farooq, 2019). Karthi et al. (2018) used
                  data mining for accident prediction in the railroad sector. They used text mining
                  techniques to mine the data provided by the user and the railroad sector, where the
                  unstructured data provided by the railroad sector was analyzed using the ID3 algorithm
                  to predict the cause of accidents. Pratama and Saragi (2018) attempted to classify
                  the quality of cassava to ensure the quality of related processed products by examining
                  various parameters of cassava and image processing of whiteness and speckle degree
                  in visual parameters and then classifying them using the ID3. Maingi et al. (2019)
                  proposed an ID3-based decision tree for symptom burden classification against disease
                  outbreaks. The algorithm specifically sorts and classifies the disease burden information
                  gained to derive the required knowledge. The results showed that their proposed method
                  could support the related field.
               
               Reading comprehension is an important segment of English language learning and one
                  item requiring improvement in college education. Educational researchers have been
                  trying to improve the quality of reading comprehension in various ways. Zaiter (2020)
                  suggested that reading is as indispensable as writing, and as an educator, he believed
                  that students should be able to find motivation for reading and writing. Nevertheless,
                  there is a difference between reading and writing when it comes to academic writing.
                  They analyzed the situation of English majors in the Arab world and proposed remedial
                  measures supported by extensive experimental data to help prevent plagiarism. Wu (2021)
                  believed that it is necessary to improve students’ thinking when teaching English
                  reading comprehension, which is one of the core competencies of the subject. Chakraborty
                  and Chowdhury (2021) reported that reading comprehension is one of the essential English
                  skills at all levels of education, and the importance of reading comprehension in
                  obtaining a degree is becoming an issue. Academic reading is an early manifestation
                  of this concern, and the source of this finding is based on the results of a survey
                  of students in government colleges in Bangladesh who believed that teaching academic
                  reading to undergraduate students strengthens their competitiveness. Chinese higher
                  education policy focuses on the importance of developing students’ intercultural competence.
                  On the other hand, Yu and Maele (2018) suggested that this is not the case in practice.
                  Hence, they conducted a curriculum study of a university college while building a
                  Baker-based model of intercultural awareness to train participants. The results proved
                  that reading courses can help Chinese students build intercultural awareness. Audina
                  et al. (2020) reported that students who do not understand the reading content are
                  prone to translate word by word rather than comprehending it as a whole. In response
                  to this problem, they investigated the teaching strategies of English teachers and
                  their causes. They established the DRA strategy to guide students to understand the
                  text content, and the results proved that this attempt is meaningful.
               
               The application and improvement of the ID3 algorithm by domestic and foreign scholars
                  have been proven effective. Moreover, the data processing models established under
                  ID3 are used widely in various fields, and the classification accuracy is improving.
                  The algorithm can combine the characteristics of users and data objects for adaptive
                  matching, which is a good fit for the problem that college English reading cannot
                  be adaptively recommended for students. Most educators improve English education from
                  the text and related issues inside and outside the classroom. Few integrate intelligent
                  technologies into English education. Thus, attempts at intelligent control have some
                  positive significance.
               
             
            
                  3. Construction of Reading Resource Recommendation Model based on ID3 Algorithm
               
                     3.1 Decision Tree Composition based on ID3 Algorithm
                  Data mining often requires a supervised learning algorithm to predict the attributes
                     and categories of unknown data. Tree-structured decision trees represent this class
                     owing to their good discriminative rule generation mechanism, among which ID3 is one
                     of the most commonly used algorithms (Hong et al., 2018). This algorithm first calculates
                     the gain value of the information, and the attribute with the highest result is used
                     as the basis for classifying other information. This approach minimizes the amount
                     of information required for classification and follows the principle of minimum randomness
                     of division. The decision tree is constructed with a modular distinction of known
                     attributes from top to bottom, starting with the root node for the classification
                     calculation of the sample set, which is then used as a basis for several divisions
                     of the sample (Tulloch et al., 2018). The decision tree will be iterated to achieve
                     the classification purpose of the above-mentioned sample until the construction is
                     completed. The non-categorical attributes of its structure will become non-leaf nodes,
                     and their attribute values are represented as branches. The complete structure from
                     the root of the tree to the leaf nodes represents a complete classification rule,
                     and the mapping of the entire rule builds an expression. The result will become a
                     resource recommendation expression.
                  
                  The ID3 algorithm is simple and has strong learning ability. Its classification speed
                     is fast, so it is suitable as the basis of the algorithm for large-volume data processing.
                     Here, let the number of possible class labels of the sample set$X$ be$n$. The probability
                     distribution is expressed as Eq. (1).
                  
                  
                  At this point,$X$ contains the information entropy, whose expression is written as
                     Eq. (2).
                  
                  
                  If the value of $P_{i}$ > 0 in Eq. (2), then the value of$0\log 0$ is also$0$. The base of the logarithm is$2$ because the
                     information encoding method is binary encoding. If two variables are to be calculated
                     in the sample set$\left(X,Y\right)$, then the probability distribution is expressed
                     as Eq. (3).
                  
                  
                  The conditional entropy of $Y$ under the specific conditions of $X$ was calculated
                     using Eq. (4).
                  
                  
                  $P_{i}$ is then expressed as the mathematical expectation of the probability distribution
                     of$X$ for a given conditional shrimp. If the information entropy of another dataset
                     $A$ is $Entropy\left(A\right)$ and the empirical conditional entropy in this dataset
                     is $Entropy\left(B\left| A\right.\right)$ , the information gain of the dataset$B$
                     can be calculated using Eq. (5).
                  
                  
                  The larger the result of Eq. (5), the greater the information gain. The purity of the subset of the sample is higher.
                     The decision tree selects the attribute with the larger result value as the classification
                     attribute and constructs the nodes. Finally, it constructs the complete decision tree
                     in a cycle to analyze the recommended rules and recommend the appropriate reading
                     resources for college students.
                  
                  The main advantage of the ID3 algorithm is the concept of information entropy. The
                     information gained reduces the sensitivity to abnormal training samples. This easy
                     operation mode of the upper and lower search space allows it to handle complex samples.
                     The tree structure lets the user visualize the classification rules and principles
                     (Andrew et al., 2018). Nevertheless, the algorithm also has some drawbacks: the relationship
                     between attributes is more complex, and the direction of subsequent optimization,
                     where attributes with large information gain values are not the best for splitting
                     because an increase in attribute value also leads to a larger gain value.
                  
                
               
                     3.2 Optimization of ID3 Algorithm in the Resource Recommendation Model
                  The principle of the ID3 algorithm is to use the attribute with the greatest information
                     gain as the splitting attribute. On the other hand, multi-valued information will
                     also cause an increase in gain, so the problem of multi-value bias will directly affect
                     the classification accuracy of this algorithm (Li et al., 2018). Let$A$ be an attribute
                     of the dataset$X$; divide its value domain into two equal parts; set the attribute
                     as$A'=\left(A_{1},A_{2},\ldots ,A_{n+1}\right)$; determine the possibility of attribute
                     value bias of this algorithm by calculating $A_{i}$ and$A'_{i}$, which are the gain
                     of the attribute values before and after the transformation. $Gain\left(X,A\right)$
                     is the gain of the attribute $A$. $Gain\left(X,A'\right)$ is the gain of the new attribute$A'_{i}$.
                     In this case,$Gain\left(X,A\right)$ is calculated using Eq. (6).
                  
                  
                  $P\left(D_{i}\right)$ in Eq. (6) represents the probability of the attribute of class $i$ in the dataset,$P\left(A_{j}\right)$
                     is the proportion of the sample size, and$P\left(D_{i}\left| A_{j}\right.\right)$
                     is the probability of attribute$A$ having a value of$A_{j}$ corresponding to the attribute
                     of class $i$ in the dataset. Similarly, the gain value of the new attribute$Gain\left(X,A'\right)$
                     is calculated using Eq. (7).
                  
                  
                  In this case, the difference between the two gain values is calculated using Eq. (8).
                  
                  
                  $L=\frac{P\left(A'_{n}\right)}{P\left(A_{n}\right)},$ $x_{i}=P\left(X_{i}\left| A_{n}\right.\right)\,,$
                     $p_{i}=P\left(X_{i}\left| A'_{n}\right.\right)\,,$ $o_{i}=P\left(X_{i}\left| A'_{n+1}\right.\right)$
                     will be introduced to calculate the gain difference value and simplify the expression
                     of the calculation process, and the gain difference expression is expressed as Eq.
                     (9).
                  
                  
                  Eq. (9) is processed and divided by$P\left(A_{n}\right)$ to obtain Eq. (10).
                  
                  
                  Set$f\left(x\right)=x\log _{2}x$, at which point Eq. (11) is obtained.
                  
                  
                  According to the rules of concavity and convexity,$f\left(x\right)$ is a convex function,
                     and the following relationship can be obtained:
                  
                  
                  Eq. (13) can be obtained by processing each relationship.
                  
                  
                  Bringing Eq. (13) into the difference of information gain comparison results in $Gain\left(X\left|
                     A\right.\right)\leq Gain\left(X\left| A'\right.\right)$ because the attribute selection
                     mechanism of the ID3 algorithm is based on information gain, and a larger gain value
                     of$A'$ indicates that the algorithm has multi-value bias. Suppose students need to
                     read resources with attributes $A$, using the traditional ID3 to classify the potential
                     resources. Resources with multiple attributes will have the same results as those
                     with a strong longitudinal single attribute $A$. The quality will be reduced accordingly
                     based on this recommendation. At this time, it is necessary to improve this situation.
                     This study solves this problem by introducing the correlation coefficient of the fixed
                     class variable, and the improved gain formula is updated as (14).
                  
                  
                  The above Eq. (14) of$\rho _{ay}$ represents the correlation coefficient between attribute$A$ and category
                     $Y$. Introducing the correlation coefficient will reduce the information gain of the
                     category with little relevance and many attribute values. This change optimizes the
                     gain function in terms of the algorithmic process to solve the multi-value bias problem.
                     The formula must be simplified to make the constructed decision tree operation concise.
                     Eq. (15) expresses the final information gain formula after simplifying the logarithmic operation.
                  
                  
                  $B$ in Eq. (15) is a subset of the original dataset divided by$n$, while the original dataset has
                     $m$ classes. The dataset $B$ is divided into subsets using$m$ again. A decision tree
                     T, input data set X is generated based on the above optimization process. The feature
                     value and threshold are also set. If all individuals in the data set are the same
                     type, then generate class labels. If the data do not meet the requirements of feature
                     set E, select the highest number of individuals as labels. If the condition is not
                     met, follow the above formula set dispersion features on the information gain value
                     of data set X; the maximum value is taken as the split node. If the maximum value
                     is less than the threshold, the highest number of labels in the data set is selected
                     as the splitting point. If the labeling point is not satisfied, the feature value
                     less than the threshold is used as the new division basis to establish a new feature
                     value. The above steps are repeated until the decision tree is generated, as shown
                     in Fig. 1 below.
                  
                  
                        Fig. 1. ID3 algorithm generation architecture diagram.
 
                
               
                     3.3 Adaptive ID3 for Reading Resource Recommendation Model Construction
                  The reading recommendation model needs to be a two-way interactive model that adapts
                     to the situation of the tweeted person. In contrast, the situation of the tweeted
                     person changes, and the recommendation content should be updated in due time. Therefore,
                     the resource recommendation algorithm should understand the characteristics of the
                     target person, the characteristics of the reading resources, and the characteristics
                     of the attributes that need to be classified. The data storage in the pre-processing
                     session uses two-dimensional arrays, and discrete data should also be processed. The
                     experimental subjects of the study were selected to participate in CET-4 learners,
                     and their situation was modeled to understand students’ styles from four aspects based
                     on various reading ability scales and the actual learning involved: possessed reading
                     ability, learning goals, learning efficiency, learning style, and cognitive style.
                     When using the ID3 algorithm for student-style classification, feature selection is
                     crucial for constructing decision trees and the final classification results. The
                     style data are first pre-processed, which simplifies and standardizes students' learning
                     situations to classify students' styles accurately. First, the study defines the learning
                     style of each student, including reading ability, learning objectives, learning efficiency,
                     learning style, and cognitive style. Then, calculate their information entropy, conditional
                     entropy, and information gain in different situations. Next, find the maximum value
                     from all the feature value information gains, which will serve as the root node of
                     the ID3 algorithm decision tree. Form branches with this value until all subsets contain
                     data from the same category. This results in a decision tree that can classify students
                     based on their learning style characteristics. By applying the ID3 algorithm, students
                     can be classified based on their learning styles, better understanding each student's
                     learning preferences and needs. This has important guiding significance for educators
                     because it can help them better design and adjust teaching strategies to meet the
                     needs of different types of students, improving educational effectiveness.
                  
                  The reading ability (Ability, Ab) in the study was rated according to the Chinese
                     English Reading Ability Scale, which has nine levels from small to large, indicating
                     ability in ascending order, adapting the study to the content specified in levels
                     4–7 (Ma, 2021). Students with different reading abilities will select reading content
                     to improve a particular ability. Some students aim to increase their vocabulary; others
                     want to increase their sense of language. The study will allow them to select the
                     reading goal in the student assessment model (Objective, Ob).
                  
                  Cognitive style (Cs) is an element that affects the student’s learning abilities and
                     characteristics. Its advantage is that it visualizes the probability of students’
                     success and is an explicit indicator formed over time. From a reading comprehension
                     perspective, the two most involved cognitive styles are field-dependent and independent.
                     Field-dependent students prefer to read texts with human subjects, and their thinking
                     has a certain ability to synthesize. They prefer to study the text in detail when
                     reading, but they cannot easily establish an independent reading space and are easily
                     influenced by the outside world. Although independent students are the opposite, they
                     pay more attention to the actual content conveyed behind the text and prefer the content
                     of natural subjects. They will build their reading field when reading and have a specific
                     resistance to interference. Cognitive style is an essential element of research to
                     analyze the situation of college students. The learning result (Lr) will be assessed
                     based on the students’ self-assessments and test results.
                  
                  The model involves three attributes of reading comprehension resources, the main content
                     that needs to be classified by ID3. Theme (Th) refers to the content source of reading
                     resources divided into natural subjects and social sciences. The difficulty value
                     (FV) is the level according to the overall assessment of the resources. Category (Ca)
                     is a category of questions based on the CET-4 test, including completion reading for
                     detail, sequential reading for logical order, and narrowly defined fine reading. The
                     final model is constructed according to the logical order of model construction, as
                     shown in Fig. 2.
                  
                  
                        Fig. 2. Adaptive recommendation model.
 
                  The adaptive model in Fig. 2 has four indicators in the learner segment and three indicators in the reading resource
                     model. In the actual process, the recommended reading resources should be changed
                     adaptively by combining both situations, while the feedback from learners is the basis
                     for real-time updates, and the resource recommendation model built on this basis can
                     be used as one of the teaching tools to improve teaching quality.
                  
                
             
            
                  4. Results and Analysis
               A specific CET-4 training course of a training institution was tested to assess the
                  performance of the constructed model. The necessary information was collected to build
                  a learner model. The ID3 algorithm was used to classify the reading resources in the
                  resource library, and the learner model was used as the basis of the attributes for
                  adaptive recommending. Seventy-five percent of the learner data was used as training
                  data; the remaining reference data was used as the basis for the evaluation results.
                  The decision tree resource generation categories are expressed regarding good or bad
                  recommendations, specifically YES and NO. The test set was added to the ID3 algorithm,
                  and the output results are shown in Fig. 3.
               
               
                     Fig. 3. Read Resource Recommendation decision tree.
 
               The simulated data decision tree establishment is still based on the type of reading,
                  reading difficulty, learner’s effect, and cognitive style to establish the nodes,
                  which is similar to the decision tree establishment of the sample data, so the decision
                  tree establishment is valid. The relationship between the accuracy of this simulation
                  and the number of samples is as follows, as shown in Fig. 4.
               
               
                     Fig. 4. Relationship between the number of learners and accuracy.
 
               The accuracy rate in the test set was close to the reference value ( > 80%). As the
                  number of learners increased, the curve of the test data nearly approached the curve
                  of the reference set, suggesting that the accuracy rate is also increasing and that
                  the recommendation model of ID3 as a classification tool is effective. Although the
                  accuracy rate obtained from the performance test experiment did not reach 90%, increasing
                  the feature data of learners can improve the performance, suggesting that this error
                  is an inherent limitation of the performance test experiment because of the limited
                  data it collects.
               
               The model was applied to the daily teaching of an English tutorial institution in
                  the 1$^{\mathrm{st}}$ quarter of 2020, and the change in students’ reading performance
                  was used to indicate the impact of the model on teaching. The ID3 model after improvement
                  was used as the experimental group, and the control group was the ID3 model before
                  improvement and the traditional English recommendation model. The reading chapters
                  were recommended for students from the same resource library to evaluate the advantages
                  and disadvantages of the three methods.
               
               The initial reading ability of students in the three groups was similar, all around
                  B5, and no students showed abnormal performance in the class (Table 1). Some differences in the outcomes of the three groups were observed after passing
                  the first training period. The most noticeable performance improvement was in the
                  improved ID3 recommended model group, which performed better than the standard ID3
                  model and traditional method groups. According to the students’ feedback, the goal
                  achievement rate of the improved ID model group reached over 90%, indicating that
                  the recommended model is effective.
               
               
                     Table 1. Changes of students in each group before and after learning.
                  
                        
                           
                              | Recommended model | Optimize ID3 | Standard ID3 | Traditional way | 
                        
                              | Number of students | 177 | 172 | 169 | 
                        
                              | Initial achievement | 56.13±4.01 | 53.14±3.92 | 58.53±4.09 | 
                        
                              | Average reading ability | R5+ | R5 | R5+ | 
                        
                              | Performance improvement | 18.01±1.07 | 13.21±1.03 | 9.02±1.01 | 
                        
                              | Target achievement rate | 91.22% | 80.13% | 69.21% | 
                     
                  
                
               The accuracy of the model was assessed by evaluating the four indicators of recall,
                  accuracy, precision, and F-value according to the above experimental groupings. A
                  random sample of three groups was fitted with the recommendation and student feedback
                  as criteria to obtain the four indicators. Fig. 5 presents the results of the four indicators.
               
               
                     Fig. 5. Comparison of the prediction performance between two models.
 
               The overall accuracy of the improved model reached more than 95% (Fig. 5), and each index was higher than the standard ID3 model for the same case. With the
                  change in the sampling proportion, the accuracy of the improved ID3 model did not
                  change significantly. In contrast, the accuracy of the standard ID3 algorithm decreased
                  as the sampling proportion decreased. Hence, the algorithm falls more easily into
                  a local optimum as the sample size decreases. The improvement made by adding the correlation
                  coefficient algorithm solves this problem, i.e., it does not change as the sample
                  size changes.
               
               During the use of the recommendation model, the ID3 improvement model group was given
                  a reading test to monitor the change in the learners in real time. The learner profiles
                  were first entered into the model to find students with each characteristic as a basis
                  for finding typical learners. Table 2 lists the output of their characteristics.
               
               
                     Table 2. Table of typical students.
                  
                        
                           
                              | Feature dimension | Student C | Student B | Student C | 
                        
                              | Reading ability | R5 | R5- | R5 | 
                        
                              | Cognitive style | Dependence | Independence | Dependence | 
                        
                              | Self-evaluation efficiency | 80 | 81 | 70 | 
                        
                              | Initial accuracy | 87.2% | 80.3% | 75.9% | 
                        
                              | Question type preference | Cloze | Reading Comprehension | Sort reading | 
                        
                              | Subject preference | Social | Natural | Social | 
                     
                  
                
               As shown in Table 2, Learner A was field-dependent, with a preference for completion-type reading and
                  sensitivity to humanities and social science texts. Learner B was field-independent,
                  with a preference for reading comprehension and natural science texts. Student C was
                  field-dependent, with a preference for logical sequencing and humanities and social
                  science. All three students had similar initial abilities and minor differences in
                  their self-assessment abilities and initial test scores. Adaptive recommendations
                  were given to them using the improved model, and their accuracy rates were tallied,
                  as shown in Fig. 6.
               
               
                     Fig. 6. Comparison of the prediction performance of the two models.
 
               Among the three students, student A showed the greatest improvement, but his accuracy
                  rate fluctuated the most (Fig. 6), indicating that the accuracy rate of the field-dependent students will be affected
                  by the environment. Nevertheless, the overall improvement was significant. Although
                  the initial ability of student B was more general, his accuracy rate also improved,
                  but the overall fluctuations were not significant, suggesting that the student was
                  more dependent on the difficulty of the reading material. His accuracy rate improved
                  from 30% to 50%, indicating that the effect of the recommendation model was significant.
                  The situation of student C was similar to B, and the improvement also proved the effectiveness
                  of the model.
               
               Although the model is effective, the specific recommended attributes that should significantly
                  impact teaching and learning still need to be explored. The study analyzed each typical
                  learner's data statistically, with each indicator good or bad, taking values ranging
                  from 1–4 from small to large. Table 3 presents the specific results.
               
               
                     Table 3. Analysis of variance of the attribute and accuracy of the recommended resources.
                  
                        
                           
                              | Correspondence | Class III sum of squares | freedom | mean square | F | Significance | 
                        
                              | Correction model | 12.864a | 14 | 0.910 | 4.047 | 0.000 | 
                        
                              | Th | 6.241 | 3 | 1.421 | 7.151 | 0.000 | 
                        
                              | Ca | 0.465 | 1 | 0.516 | 2.364 | 0.163 | 
                        
                              | De | 0.246 | 1 | 0.246 | 1.036 | 0.221 | 
                        
                              | Th×Ca | 1.145 | 4 | 0.531 | 2.468 | 0.042 | 
                        
                              | Th×De | 1.984 | 1 | 0.359 | 1.634 | 0.201 | 
                        
                              | De×Ca | 2.093 | 1 | 2.147 | 11.397 | 0.001 | 
                        
                              | Total | 2042.000 | 732 | / | / | / | 
                     
                  
                
               The F-value of the model was 4.047, while the significance structure was 0.000 (Table 3), indicating that the analysis had some effect, where the question type and question
                  x difficulty level significantly affected the students’ reading accuracy. In contrast,
                  the other factors had little effect.
               
               The same correlation analysis was performed on the output results of the learner’s
                  attributes in the data results, and then the results were clustered. The eigenvalue
                  transformation of the clustering results eventually yielded the radar plot of the
                  recommended preferences of the improved ID3 model for various classes of students,
                  as shown in Fig. 7.
               
               
                     Fig. 7. Recommended preference radar chart.
 
               The algorithm in Fig. 7 has a higher requirement for difficulty in the resources recommended for category
                  A. This indicates that these students have better reading ability and have their own
                  goals and mobility. The recommended resources for these students are more in line
                  with their requirements. Type B students prefer moderately difficult topics; subject
                  matter and question type can also influence their correct rates. The overall situation
                  of category C is similar to that of type A students, with a higher requirement for
                  difficulty. In contrast, the type of questions and topics have an average effect on
                  them.
               
               This study compared and verified the running time of the ID3 and traditional decision
                  tree algorithms to verify the differences between the proposed method and previous
                  methods. The research set a data volume interval of 100 to 500. Table 4 compares the running time of the two algorithms for constructing decision trees.
                  The ID3 algorithm proposed in this paper had higher efficiency in constructing decision
                  trees than the traditional decision tree algorithms. The time difference between the
                  two algorithms increased as the amount of data increased. When the data volume was
                  500, the running time of the ID3 algorithm in this article was 118.08 ms, which was
                  19.64% shorter than that of the traditional decision tree algorithm (146.94 ms). When
                  the data volume was 100, the running time of the ID3 algorithm was 24.64 ms, which
                  was 16.57% shorter than that of the traditional decision tree algorithm (29.54 ms).
                  These data fully demonstrate the superiority of the ID3 algorithm in processing large-scale
                  datasets. The performance of the traditional decision tree algorithms decreased gradually
                  as the amount of data increased, while the ID3 algorithm in this paper maintained
                  efficient computational speed. Therefore, the ID3 algorithm proposed in this article
                  has practical applications, especially when dealing with large-scale datasets.
               
               
                     Table 4. Comparison of the runtime between two algorithms for constructing decision trees.
                  
                        
                           
                              | Data volume | Run time/ms | 
                        
                              | Traditional Decision Tree Algorithm | ID3 algorithm | 
                        
                              | 100 | 29.54 | 24.64 | 
                        
                              | 200 | 61.56 | 50.17 | 
                        
                              | 300 | 88.15 | 68.91 | 
                        
                              | 400 | 118.68 | 95.34 | 
                        
                              | 500 | 146.94 | 118.08 | 
                     
                  
                
             
            
                  5. Conclusion
               Reading is one of the elements that improve English learners’ abilities, and this
                  learning module, which integrates vocabulary-linguistics and grammar, is also an important
                  element in assessing students’ English proficiency. Improving English reading proficiency
                  should focus on the students’ differences under the laws of education, and recommending
                  appropriate learning content tailored to them helps improve their accuracy. The decision
                  tree established in the study builds an adaptive recommendation model by considering
                  the students’ characteristics to adjust the algorithm. In addition, it also optimizes
                  the information gain formula by introducing correlation coefficients to prevent the
                  model from falling into a local optimum. The algorithm before and after the improvement
                  was tested. The improved algorithm fitted better with the reference value, and the
                  results of the four indicators for evaluating the performance showed that the accuracy
                  of the improved algorithm was above 95%. The average satisfaction of students with
                  the recommendation model was 91.22%. Although the accuracy of the standard ID3 model
                  did not reach 90%, the improvement path was still effective. Applying the algorithm
                  to students in an institution showed that the learner classification module in the
                  model can classify students into three categories. The adaptive recommendation for
                  the three types of students found that different types of students have different
                  requirements for resources. Students with strong learning abilities require more challenging
                  recommended reading, while students with average or poor ability need the recommendation
                  model to focus on topics and question types.
               
             
          
         
            
                  
                     REFERENCES
                  
                     
                        
                        Abbas A R and Farooq A O. (2019) ‘Skin Detection using Improved ID3 Algorithm’, Iraqi
                           Journal of Science, Vol. 60, No. 2, pp. 402-410.

 
                     
                        
                        An Y and Zhou H. (2022). ‘Short term effect evaluation model of rural energy construction
                           revitalization based on ID3 decision tree algorithm’, Energy Reports, No. 8, pp. 1004-1012.

 
                     
                        
                        Andrew, Russ, Gayle et al. (2018). ‘Decision tree for pretreatments for winter maintenance’,
                           Transpor-tation Research Record, Vol. 2055, No. 1, pp. 106-115.

 
                     
                        
                        Audina Y, Zega N and Simarmata A et al. (2020). ‘An analysis of teacher’s strategies
                           in teaching reading comprehension’, Lectura Jurnal Pendidikan, Vol. 11, No. 1, pp.
                           94-105.

 
                     
                        
                        Chakraborty S B and Chowdhury. (2021). ‘Teaching academic reading in English to the
                           undergraduate students at a government college of Bangladesh -Challenges and solutions’,
                           IOSR Journal of Research & Method in Education (IOSRJRME), Vol. 11, No. 2, pp. 49-63.

 
                     
                        
                        Duan X. (2021). ‘The application of activity-based method in English reading teaching
                           in senior high school’, Region - Educational Research and Reviews, Vol. 3, No. 2,
                           pp. 60-64.

 
                     
                        
                        Hong H, Liu J and Bui D et al. (2018). ‘Landslide susceptibility mapping using J48
                           Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang
                           area (China)’, Catena, No. 163, pp. 399-413.

 
                     
                        
                        Karthi M, Priscilla R and Benila E. (2018). ‘The patrons for anticipating the veracity
                           of rail mishaps using text mining and ID3 algorithm’, International Journal of Pure
                           and Applied Mathematics, Vol. 119, No. 15, pp. 1753-1759.

 
                     
                        
                        Li S, Laima S and Li H. (2018) ‘Data-driven modeling of vortex-induced vibration of
                           a long-span suspension bridge using decision tree learning and support vector regression’,
                           Journal of Wind Engineering and Industrial Aerodynamics, No. 172, pp. 196-211.

 
                     
                        
                        Ma Y. (2021). ‘The application of schema theory in the teaching of English reading
                           in senior high schools’, Region - Educational Research and Reviews, Vol. 3, No. 3,
                           pp. 17-20.

 
                     
                        
                        Ma Y. (2021). ‘The application of schema theory in the teaching of English reading
                           in senior high schools’, Region - Educational Research and Reviews, Vol. 3, No. 3,
                           pp. 17-20.

 
                     
                        
                        Maingi N N, Lukandu I A and Mwau M. (2019). ‘Inter-county comparative analysis of
                           ID3 decision tree algorithms for disease symptom burden classification and diagnosis’,
                           International Journal of Science and Research (IJSR), Vol. 8, No. 5, pp. 83-89.

 
                     
                        
                        Park K, Bell M G, Kaparias I and Belzner H. (2008). ‘Soft discretization in a classification
                           model for modeling adaptive route choice with a fuzzy id3 algorithm’, Transportation
                           Research Record, Vol. 2076, No. 1, pp. 20-28.

 
                     
                        
                        Pratama Y and Saragi H S. (2018). ‘Cassava quality classification for tapioca flour
                           ingredients by using ID3 algorithm’, Indonesian Journal of Electrical Engineering
                           and Computer Science, Vol. 9, No. 3, pp. 799-805.

 
                     
                        
                        Tulloch A, Nancy A and Stephanie A G et al. (2018). ‘A decision tree for assessing
                           the risks and benefits of publishing biodiversity data’, Nature Ecology & Evolution,
                           Vol. 2, No. 8, pp. 1209-1217.

 
                     
                        
                        Wu J. (2021). ‘The research on the English reading teaching mode aiming at the improvement
                           of thinking quality’, Region - Educational Research and Reviews, Vol. 3, No. 2, pp.
                           40-43.

 
                     
                        
                        Yi H. (2020). ‘Teaching strategies of cultivating humanistic literacy in reading teaching’,
                           Education Study, Vol. 2, No. 3, pp. 174-183.

 
                     
                        
                        Yu Q and Maele J V. (2018). ‘Fostering intercultural awareness in a Chinese English
                           reading class’, Chinese Journal of Applied Linguistics, Vol. 41, No. 3, pp. 357-375.

 
                     
                        
                        Zaiter W A. (2020). ‘Reading and writing skills: The challenges of teaching at college
                           level’, Addaiyan Journal of Arts Humanities and Social Sciences, Vol. 1, No. 10, pp.
                           41-51.

 
                     
                        
                        Zhou Q. (2021). ‘The application of TBLT to English reading teaching in junior high
                           school’, Region - Educational Research and Reviews, Vol. 3, No. 2, pp. 52-55.

 
                   
                
             
            Author
            
            
               Ruixue Zhang obtained her Master’s Degree in English Language and Literature (2009)
               from the Southwest University in China. Presently, she is working as a professor in
               the Department of College English, Zhejiang Yuexiu University, Shaoxing. She has published
               articles in more than 10 national or international journals and conference proceedings.
               Her areas of interest include English Teaching and Educational Management.