The study utilizes gating mechanisms in LSTM to update learners’ knowledge states.
Attention mechanism is introduced to capture the learner’s current knowledge state.
A knowledge tracking means on the foundation of LSTM is built. Meanwhile, a model-based
CF is utilized to build an LRR model based on CF.
2.1. Construction of Knowledge Tracking Model based on LSTM
Before conducting LRR, students’ mastery status on different knowledge points is obtained
through knowledge tracking. Learners’ learning behavior is usually time-series data.
LSTM is proposed to deal with Recurrent Neural Network (RNN)’s long-term reliance.
It can analyze inputs using time series and has been proven to have superior performance
in processing time-series 14,15]. Therefore, the study adopts LSTM to build a knowledge tracking model. LSTM includes
three gating mechanisms. Input gates can control whether input information is updated
into the cell unit, represented by Eq. (1).
In Eq. (1),
i
t
stands for the input gate state.
W
i
means a weight matrix. tanh stands for an activation function.
c
¯
t
refers to candidate cell units.
W
c
stands for vector units’ weight matrix.
c
t
refers to the memory cell node’s latest state. The forget gate state is represented
by Eq. (2) [16].
In Eq. (2),
σ
stands for the sigmoid activation function.
W
f
stands for the forget gate’s weight matrix.
h
t
−
1
stands for the input from the previous moment.
V
stands for the weight matrix of
h
t
−
1
.
x
t
refers to the input value at the current time.
b
stands for bias term. The output gate is represented by Eq. (3).
In Eq. (3),
o
t
refers to the output gate state.
W
o
refers to the weight matrix.
h
t
refers to the output at the current time. In knowledge tracking, a concept matrix
M
is constructed for all knowledge, with each column representing a knowledge vector
M
(
i
)
. The knowledge point
k
t
and problem
q
t
at time
t
are transformed into knowledge point embedding
e
t
and problem embedding
z
t
.
z
t
is merged with the problem response
r
t
, represented by Eq. (4).
In Eq. (4),
⊕
represents the connection operation. The inner product of knowledge point embeddings
and each column of knowledge vectors in the concept matrix is calculated. The results
are input into the softmax function, represented by Eq. (5) [17].
In Eq. (5),
β
t
represents the problem and various knowledge points’ correlation. Most existing knowledge
tracking models judge learners’ knowledge status based on their performance in the
problem, which cannot reflect their actual knowledge level [18]. Therefore, the study utilizes the gating mechanism in LSTM and designs two gating
mechanisms, learning and forgetting, to update learners’ knowledge states. The learner’s
learning benefit is represented by Eq. (6).
In Eq. (6),
w
1
T
and
b
1
refer to the parameters that need to be optimized.
a
t
t
refers to the time spent embedding in answering question
q
t
.
y
t
refers to embedding problems with authority. A learning gate is designed to control
learners’ absorption of knowledge, represented by Eq. (7).
Learners’ knowledge gain
L
˜
t
after one learning interaction
(
q
t
,
r
t
)
is represented by Eq. (8).
The forget gate is represented by Eq. (9).
In Eq. (9),
i
t
t
refers to the time interval embedding between the learner’s current interaction and
the previous interaction on the same problem. Therefore, the knowledge state update
of learners after experiencing a learning interaction is represented by Eq. (10).
The difficulty of the problem can have an impact on the learner’s response to the
question. Therefore, attention mechanisms are introduced to capture how different
knowledge concepts and problem difficulty affect knowledge tracking for learners in
their current knowledge state. When processing input data, the attention mechanism
assigns different weights to enable the model to focus on key parts of the input data,
thereby improving processing accuracy and efficiency 19,20]. The attention mechanism is represented by Eq. (11).
In Eq. (11),
Q
means query.
K
means key.
V
means value.
d
k
means the dimensions of
Q
and
K
. Learner data are collected and preprocessed. Each knowledge concept’s difficulty
d
(
k
t
)
and the problem’s difficulty
d
(
q
t
)
are calculated, represented by Eq. (12).
In Eq. (12),
c
o
u
n
t
R
q
i
=
1
represents the frequency that question
q
i
has been correctly answered by learners.
c
o
u
n
t
q
i
means the frequency that the question
q
i
is answered by learners.
c
o
u
n
t
R
k
i
=
1
means the frequency that the knowledge concept
k
i
has been correctly grasped.
c
o
u
n
t
k
i
means the frequency that knowledge concept
k
i
is answered by learners.
Fig. 1. Framework diagram of knowledge tracking model.
The attention network’s output is represented by Eq. (13).
Subsequently, the study increases the model nonlinearity through a position Feedforward
Neural Network (FNN). FNN consists of multiple levels of nodes that transmit information
in specific directions without feedback loops. Therefore, FNN is highly effective
in handling static input data and classification tasks [21]. The FNN output is represented by Eq. (14).
The study utilizes the sigmoid function to activate the fully connected layer and
minimizes the loss of the prediction layer and problem response
r
t
through a binary cross entropy loss function, as shown in formula (15).
In summary, the knowledge tracking model constructed in this study mainly includes
three modules: knowledge weight calculation, learning and forgetting, and attention.
Fig. 1 shows a specific LSTM based knowledge tracking model.
2.2. Construction of the Collaborative Filtering Based-Learning Resource Recommendation
Model
To further improve learning outcomes, this study conducts personalized LRR based on
the knowledge mastery of different learners. CF mainly includes two types: heuristic
based and model-based [22]. Compared to traditional user-based CF, model-based CF has higher flexibility. Therefore,
the study adopts model-based CF to build LRR systems. Matrix Factorization (MF) is
a method utilized to process high-dimensional data, with the main objective of decomposing
a high-dimensional matrix into the product of a low dimensional matrix [23]. Neural Matrix Factorization (NMF) combines Generalized Matrix Factorization (GMF)
and Multi-Layer Perceptron (MLP) to better capture users and items’ complicated relationships
24,25]. Fig. 2 shows the specific structure of NMF.
Fig. 2. Structure diagram of neural matrix factorization model.
NMF’s input layer is a one-hot encoding in users or items. One-hot representation
is embedded into dense vectors in the feature space and merged to obtain the final
prediction result. However, one-hot representation cannot reflect any semantic information,
which can lead to the appearance of correlation between features. Therefore, the study
adopts an embedding layer in the recommendation model and utilizes one-hot input as
the table retrieval index. To address the attribute feature processing, attention
mechanisms are introduced to reduce the MF irrationality. The model’s attention level
λ
j
for each attribute is represented by Eq. (16).
In Eq. (16),
w
1
and
w
2
represent weight matrices.
V
j
means the connection vector between the user’s vector
q
i
in the feature space and the user’s attribute
a
j
.
b
1
and
b
2
mean bias terms.
v
j
means the K-dimensional vector mapped from
V
j
.
I
u
means the user attributes set. The user’s final latent feature vector is represented
by Eq. (17).
In Eq. (17),
⊙
represents two vectors’ product element by element. To improve the interaction quality
between vectors, paired pooling is utilized to encode features’ correlation in the
latent feature space. Paired pooling takes into account the interactions within the
set of user attribute features, can further improve user vectors’ interaction quality
by multiplying them element by element. Paired pooling is represented by Eq. (18).
To make the model highly nonlinear, a fully connected layer was also added. After
linear combination through fully connected layers, the final prediction result is
represented by Eq. (19).
In Eq. (19),
L
represents the quantity of fully connected layers.
p
u
′
means the user’s ultimate characteristics.
σ
is the ReLu activation function. To obtain learners’ rating predictions for learning
resources, the output layer’s loss function is Mean Squared Error (MSE). Finally,
to accelerate the convergence speed of the model, Adam, which has high computational
efficiency, is utilized as the optimizer to enhance the loss function. Adam’s calculation
is represented by Eq. (20).
In Eq. (20),
m
t
represents the first-order momentum of the current gradient.
β
1
means first-order matrix exponential decay rate.
g
t
means the current gradient value.
v
t
means the second-order momentum of the current gradient.
β
2
means second-order matrix exponential decay rate. Since
m
t
and
v
t
are both zero vectors at the beginning, bias correction is required, represented
by Eq. (21).
In Eq. (21),
m
^
t
means gradient weighted average.
v
^
t
means that there is a bias in gradient weighting. The optimized parameter
θ
t
is represented by Eq. (22).
In Eq. (22),
θ
0
represents the initial parameter.
α
means learning rate.