HuWei1*
LuoYipeng2
-
( Foreign Language School, Hunan University of Science and Engineering, Yongzhou, Hunan
425199, China huw1983@outlook.com)
-
( School of Information Science and Engineering, Hunan Women’s University, Changsha,
Hunan 410004, China)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Semantic feature, Machine translation, English, Text vectorization
1. Introduction
For non-native English speakers, it is difficult to learn and use English skillfully
[1]. The machine translation algorithm brought by the progress of computer technology
can provide effective assistance for non-native English speakers, both in English
learning and in communication with native English speakers [2]. The essence of a machine translation algorithm is to convert one sequence into another
sequence. Traditional machine translation algorithms often translate texts through
one-to-one correspondence between words [3], but this will lead to inaccurate semantic understanding, unnatural translation expression,
and other problems. In order to optimize the efficiency of machine translation algorithms,
semantic features are incorporated to utilize contextual information. Zhang et al.
[4] designed a novel word embedding training method using part-of-speech features and
verified its effectiveness through experiments. Qiu et al. [5] proposed a short text semantic similarity calculation approach combining bidirectional
encoder representation from transformers and a time-warping distance algorithm. They
verified its performance advantage through experiments. Guan et al. [6] proposed a method combining word and word dependencies to calculate sentence similarity
and found that combining word and word dependencies can improve the capability to
extract matching features between two sentences. Lin et al. [7] proposed an improvement to neural machine translation using a novel beam search evaluation
function and discovered that the approach effectively enhanced the quality of English-to-Chinese
translation. Lee et al. [8] utilized a character-level convolutional network as an encoder for machine translation.
It performed significantly better than the subword-level encoder in multilingual experiments.
The aforementioned literature all investigated word features and attempted to utilize
more unified features for representing text words, facilitating rapid language processing
by computers later on. The main issue addressed in this article is how to optimize
English machine translation. Ultimately, the article focuses on optimizing the representation
of text vector features in order to reflect the underlying connections between words
better and achieve more accurate correspondences during the machine translation process.
This paper briefly introduces the Word2vec model for text vectorization and the convolutional
neural network (CNN) extractor for extracting semantic features. Then, the CNN extractor
was applied to machine translation, and the semantic feature vector was combined with
the intermediate vector of the encoder to improve the performance of the translation
algorithm. Simulation experiments were also conducted. The innovation of this article
lies in using Word2vec to vectorize the text, utilizing a CNN to extract semantic
features of words, and integrating them, thereby enhancing the regular information
contained in word features and improving the performance of machine translation.
2. Language Vector Feature Extraction Algorithm
The essence of a machine translation algorithm is to convert one text sequence into
another text sequence, and the computer itself does not comprehend the meaning of
the text sequence during the conversion process [9]. In addition, both English and Chinese are natural languages, and computers cannot
directly process natural languages due to their underlying logic, so it is necessary
to first convert natural languages into vector texts that computers can process.
2.1 Word2vec Model
Word2vec uses context-related words to predict the current word to obtain a distributed
vector of words [10]. After training the Word2vec model using the following steps, when it is formally
used, it is also centered on the current word, and the context word window is used
to obtain the one-hot encoding of the context. It is input into the model for forward
calculation, and the word vector with a size of $1\times N$ obtained in the mapping
layer is the Word2vec text vector of the current word.
Taking Fig. 1 as an example, to obtain the text vector of current word $W(t)$ [11], a context word window with a length of 4 is used to obtain the one-hot encoding
of the context of four current words $W(t-2)$, $W(t-1)$, $W(t+1)$, and $W(t+2)$, and
they form a one-hot encoding matrix with a size of $4\times V$ as the input data of
the input layer [12].
Fig. 1. Basic structure of the Word2vec model.
The input data is multiplied with weight matrix $W$ with a size of $V\times N$ ($N$:
the dimension of a word vector that is set according to demands) in the mapping layer
to obtain the word vector. The obtained vector with a specification of $C\times N$
is averaged on each dimension to obtain a $1\times N$ word vector. Then, it is multiplied
with another weight matrix $W'$ in the output layer. After normalization processing
[13], the word probability distribution with a size of $1\times V$ is obtained, and the
word with the largest probability is the currrent word obtained through prediction.
Finally, the predicted current word is compared with actual current word $W(t)$ in
the corpus, and the error between them is calculated. If the error converges to within
the threshold, the training is terminated; otherwise, the weight value in the weight
matrix is adjusted reversally according to the error, and then the forward calculation
is performed again.
2.2 Semantic Feature Extraction
The Word2vec model only transforms natural language into vector language that computers
can process. Although the vector language contains certain semantic features due to
the utilization of the context of the corpus, it is not enough to provide semantic
information during machine translation. Once the synonyms are encountered, it is easy
to affect the accuracy of translation. To optimize the performance of a machine translation
algorithm, this paper introduces semantic features and fuses semantic feature vectors
with Word2vec encoding vectors. In this paper, a CNN is employed to extract semantic
features [14].
When using the CNN extractor, the Word2vec model is also used to obtain the encoding
vectors of the text, and then they are combined according to the text sequence to
form a two-dimensional matrix. The matrix is regarded as an image, and the elements
in the matrix are considered as pixels. Then, the convolution kernel in the CNN is
employed to extract the semantic features. The convolution formula is
where $x_{j}^{l} $ is the convolution output feature map, $x_{i}^{l-1} $ is the feature
output of the $i$-th convolution kernel after pooling in the previous convolution
layer, $W_{ij}^{l} $ is the weight parameter between the i-th convolution kernel and
the ${j}$-th convolution kernel, $b_{j}^{l} $ is the bias of the ${j}$ convolution
kernels in ${l}$ layers, $M$ is the quantity of convolution kernels, and $f(\cdot
)$ is the activation function. Then, in order to reduce the amount of calculation,
the pooling layer compresses the convolutional features obtained from the convolution
kernel. The compression process is that the pooling box slides on the convolutional
feature map, and the average value or the maximum value is taken when sliding. After
convolution and pooling, the combined convolution features are the required semantic
features [15]. In the subsequent machine translation algorithm, the extracted semantic features
are introduced to enhance the translation quality.
3. English Machine Translation Algorithm
The encoder-decoder structure is a typical structure used in machine translation algorithms.
This kind of algorithm uses an encoder to transform the source text into a fixed-length
intermediate vector sequence and then uses a decoder to convert the intermediate vector
sequence into the translation text. By utilizing an intermediate vector between the
encoder and decoder, the problem of different lengths between the source text and
the translation can be effectively dealt with.
When the machine translation algorithm processes the source text, it is also not a
natural language processing. It also needs to employ the Word2vec model to vectorize
the source text and then employs the encoder to transform it into an intermediate
vector. The LSTM algorithm is better suited for processing sequential data in the
encoder, and it is also used in the decoder. At the same time, the beam search algorithm
decodes the probability distribution of characters into a translation sequence when
the decoder outputs the results. The above translation process does not make full
use of semantic features. Therefore, the CNN semantic feature extractor is introduced
into the translation, and the extracted semantic features are combined with the Word2vec
encoding vector. The following are the precise measures to be taken.
① The source text is input, and the source text is preprocessed, including removing
special characters, word segmentation, etc. Then, the Word2vec model is used to vectorize
the source text.
② The CNN semantic feature extractor performs forward calculation on the vectorized
source text to derive its semantic features.
③ The vectorized source text is input into the encoder for forward calculation by
LSTM to obtain the intermediate vector sequence.
④ The gating mechanism [16] is used to adjust the weight of the semantic feature vector and the intermediate
vector sequence, and then the weighted combination is performed. The combination formula
is
where $h_{Ns} $ and $h_{sr} $ are the encoding vectors given by the encoder and semantic
feature extractor respectively, $\omega _{Ns} $ and $\omega _{Ns} $ are the gating
parameters, $\alpha $ is the encoder weight, and $h_{mix} $ is the vector encoding
after fusing semantic features.
⑤ The encoding vector fused with semantic features is input into the decoder, and
the LSTM algorithm carries out the forward computation to obtain the distribution
probability of the translated characters. After that, the translation sequence with
the largest probability is searched using the beam search algorithm [17].
4. Simulation Experiments
4.1 Experimental Data
The experimental data were from the United Nations Parallel Corpus, which contains
various language versions of United Nations documents and can be used to compare the
expressions and translations of different languages. This corpus contains the original
text and the corresponding machine translation version. In simple terms, this corpus
is a bilingual parallel corpus. Eight thousand sentences from the above corpus were
selected as the training set, and 4,000 sentences were selected as the test set.
4.2 Experimental Setup
In the machine translation algorithm, the main body still adopted the encoder-decoder
structure, but it used a CNN to extract the semantic features of the source text,
integrated them with the intermediate vector given by the encoder, and decoded it
by the decoder. The basic parameter settings of the CNN extractor and encoder-decoder
are shown in Tables 1 and 2. The later tests compared the influence of the one-hot encoding method and the Word2vec
method on the machine translation algorithm proposed in this paper.
In addition to evaluating the effectiveness of the proposed translation algorithm,
two additional translation algorithms were also tested. One utilized RNN as both the
encoder and decoder, while the other excluded semantic feature fusion.
Table 1. Basic parameters of the CNN for extracting semantic features.
Parameter
|
Setting
|
Parameter
|
Setting
|
Input layer
|
A specification of $100\times 150$
|
Convolutional layer 1
|
32 convolutional kernels ($1\times 2$) with a moving step length of 2
|
Pooling layer 1
|
A pooling box with a specification of $1\times 3$, using mean pooling, and a step
size of 2
|
Convolutional layer 2
|
64 convolutional kernels (($1\times 2$)) with a moving step length of 2
|
Pooling layer 2
|
A pooling box with a specification of $1\times 3$, using mean pooling, and a step
size of 2
|
Output layer
|
The number of nodes is set to 12
|
Activation function
|
Sigmoid
|
Learning rate
|
0.02
|
Table 2. Relevant parameters.
|
Encoder
|
Decoder
|
Input layer
|
150 nodes
|
100 nodes
|
Hidden layer
|
2 hidden layers with 256 nodes per layer
|
2 hidden layers with 256 nodes per layer
|
Output layer
|
100 nodes
|
The cluster search algorithm is used, and the cluster window size is set to 10
|
Activation function
|
Sigmoid
|
Sigmoid
|
Learning rate
|
0.01
|
0.01
|
4.3 Evaluation Criteria
Word error rate and bilingual evaluation understudy (BLEU) were used to measure the
machine translation algorithm. The former is used to measure the accuracy of the word
translation, and the latter is used to measure the accuracy and fluency of the translation
from the overall perspective. BLEU is calculated
where $N$ is the maximum order of the $n$-gram grammar, $\omega _{n} $ is the weight
of the $n$-gram grammar, $p_{n} $ is the proportion of phrases in the $n$-gram grammar,
and $B$ is the penalty factor.
4.4 Test Results
Firstly, the effect of two text vectorization methods, the one-hot encoding method
and the Word2vec method, on the translation performance of the algorithm was tested
using BLEU as the measurement indicator, and the outcomes are presented in Table 3. It can be observed that under the same number of translated words, the translation
performance of the algorithm using Word2vec as the text vectorization method was better.
In addition, as the number of translated words increased, the translation performance
of the algorithm using the one-hot coding method decreased, and the performance of
the algorithm using Word2vec also decreased, but the reduction was not obvious.
Table 4 displays some translation results of the algorithms and the semantic part-of-speech
tagging. The annotation of the semantic part of speech was given by the CNN semantic
feature extractor in the combined algorithm. When using RNN as encoder and decoder
was used, the algorithm translated the words that express the main idea without considering
the order. When using the traditional LSTM algorithm as the translation algorithm,
the words expressing the main meaning were also translated, and some polish was made.
However, there was still a certain degree of disfluency. The LSTM algorithm combined
with semantic features can give a translation result closer to the reference translation.
Table 3. Translation performance of the proposed algorithm under different text vectorization
methods.
Number of translated words/word
|
5
|
10
|
15
|
20
|
One-hot coding method
|
32.5%
|
29.4%
|
27.1%
|
24.2%
|
Word2vec method
|
45.7%
|
44.8%
|
44.2%
|
43.7%
|
Table 4. Partial translation results of three algorithms and semantic part-of-speech
tagging.
Source text
|
This novel is very interesting.
|
Did you finish yesterday's homework?
|
Reference translation
|
这本小说很有趣。
|
昨天的作业完成了吗?
|
RNN
|
这小说是有趣。
|
你完成昨天的作业?
|
Traditional LSTM
|
这本小说是非常有趣的。
|
你完成昨天的作业了吗?
|
LSTM fused with semantic features
|
这本小说很有趣。
|
昨天的作业完成了吗?
|
Part of speech
|
Pron./n./v./adv./adj.
|
V./pron./v./adv./n.
|
In the practical application of the CNN extractor, the convolution feature vector
of the hidden layer in the middle of the structure is used as the semantic feature,
and the performance of the semantic feature extractor cannot be directly measured
by using the convolution feature. Therefore, the performance of the extractor was
measured by the identification results of the semantic part of speech (Fig. 2). It can be seen that with the increase of the sentence length to be processed by
the CNN extractor, the recognition accuracy of the extractor for the semantic part
of speech was reduced, but it was still above 98.4%, which was high enough. This result
verified the effectiveness of the convolution feature vector in the hidden layer of
the extractor.
Fig. 2. Performance of the CNN semantic feature extractor.
The translation performance of the algorithms for sentences of different lengths is
presented in Fig. 3. In this paper, word error rate and BLEU were used to measure the performance of
algorithms. It can be seen that the performance of the three algorithms decreased
with the increase of the length of the translation sentence, but the reduction was
different. The word error rate and BLEU of the semantic feature-fused LSTM algorithm
did not change much, while the word error rate of the RNN algorithm increased significantly
with the increase of the length of the sentence, and the BLEU decreased significantly.
Under the same sentence length, the LSTM algorithm fused with semantic features had
the lowest word error rate and the highest BLEU. The RNN algorithm had the highest
word error rate and the lowest BLEU. The traditional LSTM algorithm was in the middle.
Fig. 3. Performance of the three algorithms.
5. Discussion
The development of globalization has resulted in increasingly frequent international
exchanges; however, language communication is the major challenge during these interactions.
English, being one of the most commonly used universal languages, poses difficulties
for achieving seamless communication and comprehension. To enhance efficiency in communications,
machine translation has come into existence. The principle behind machine translation
can be summarized as computers adhering to specific rules that enable them to convert
one textual sequence into another textual sequence effectively. These mapping rules
form the crux of machine translation. The mapping rule is the key to machine translation.
This article used an encoder-decoder structure to achieve English machine translation.
When translating English using the encoder-decoder structure, it first converted the
English source text into an intermediate vector sequence using the encoder and then
converted the intermediate vector sequence into a translated text using the decoder.
This translation method utilized a fixed-length intermediate vector sequence to avoid
difficulties in one-to-one correspondence between source and translation. The semantic
feature vectors extracted by CNN were integrated with the original Word2vec text vectors
to enhance the information contained in the text vectors, aiming to optimize machine
translation algorithms. Subsequently, simulation experiments were conducted. The machine
translation algorithm using Word2vec text vector method is superior to the one-hot
encoding method, because the text vectors obtained by one-hot encoding not only have
a large dimension but also sparse effective information, which greatly affects the
computational efficiency of machine translation algorithms. Compared with the RNN
and traditional LSTM algorithms, the semantic features combined with LSTM had better
translation performance and less sensitivity to sentence length. The reason lies in
that the RNN algorithm is prone to gradient vanishing or explosion when processing
long sequence data; the traditional LSTM algorithm avoids gradient explosion as much
as possible through forgetting mechanism, but LSTM combined with semantic features
uses a CNN to extract language semantic features and combines them with Word2vec vector
features, enriching the effective information in text feature vectors so that LSTM
can obtain more rules from it and achieve better translation performance.
6. Conclusion
This paper briefly introduces the Word2vec model for text vectorization and the CNN
semantic feature extractor. Then, the CNN extractor was applied to machine translation.
The semantic feature vector was combined with the intermediate vector of the encoder
to optimize the translation algorithm. The proposed algorithm was compared with the
RNN and traditional LSTM algorithms using simulation experiments. Compared with the
one-hot encoding method, the machine translation algorithm using Word2vec performed
better and was less affected by the number of translated words. Regarding translation
results, the LSTM algorithm fused with semantic features was closer to the reference
translation, while the other two algorithms only expressed the main meaning, and the
translation was not smooth enough. With the increase of the sentence length to be
processed by the CNN extractor, the accuracy of the semantic extractor in identifying
the semantic part of speech was reduced, but it was still above 98.4%, which was high.
The increase in the length of the sentence affected the performance of the algorithm,
but the LSTM algorithm fused with semantic features was less affected. Under the same
sentence length, the LSTM algorithm had the highest BLEU and lowest word error rate.
The limitation of this study lies in the fact that only a portion of the corpus was
used for training and testing, so future research directions would involve expanding
the range of corpora to enhance the universality of machine translation algorithms.
The contribution of this paper lies in utilizing a CNN to extract semantic features
from text and integrating them with Word2vec vector features, providing an effective
reference for enhancing the performance of machine translation algorithms.
REFERENCES
L. Bei, ``Study on the intelligent selection model of fuzzy semantic optimal solution
in the process of translation using English corpus,'' Wireless Communications and
Mobile Computing, vol. 2020, no. 5, pp. 1-7, 2020.

Q. Lu and Y. Wang, ``Latent semantic text classification method research based on
support vector machine,'' International Journal of Information and Communication Technology,
vol. 15, pp. 243-255, 2019.

K. X. Han, S. F. Yuan, W. Chien, and C. F. Yang, ``Emotional feature extraction from
texts by support vector machine with local multiple kernel learning,'' Sensors and
Materials: An International Journal on Sensor Technology, vol. 34, no. 6, pp. 2263-2280,
2022.

J. Zhang, J. Liu, and X. Lin, ``Improve neural machine translation by building word
vector with part of speech,'' Journal of Artificial Intelligence, vol. 2, no. 2, pp.
79-88, 2020.

S. Qiu, Y. Niu, J. Li, and X. Li, ``Research on semantic similarity of short text
based on bert and time warping distance,'' Journal of Web Engineering, vol. 20, no.
8, pp. 2521-2543, 2021.

X. Guan, J. Han, Z. Liu, and M. Zhang, ``Sentence similarity algorithm based on fused
bi-channel dependency matching feature,'' International Journal of Pattern Recognition
and Artificial Intelligence, vol. 34, no. 7, 2050019, 2019.

X. Lin, J. Liu, J. Zhang, and S. J. Lim, ``A novel beam search to improve neural
machine translation for English-Chinese,'' Computer, Materials and Continuum, vol.
65, no. 1, pp. 387-404, 2020.

J. Lee, K. Cho, and T. Hofmann, ``Fully character-level neural machine translation
without explicit segmentation,'' Transactions of the Association for Computational
Linguistics, vol. 5, pp. 365-378, 2017.

K. Wu, X. Wang, and A. T. Aw, ``Bilingual word embedding with sentence similarity
constraint for machine translation,'' Proc. of International Conference on Asian Language
Processing, pp. 119-122, 2017.

Q. Yang, L. Yu, S. Tian, and J. Song, ``Collaborative semantic representation network
for metaphor detection,'' Applied Soft Computing, vol. 113, no. 1, 107911, 2021.

X. Yang, T. Zhang, and C. Xu, ``Semantic feature mining for video event understanding,''
ACM Transactions on Multimedia Computing Communications & Applications, vol. 12, no.
4, 55, 2016.

H. Xi, ``The design of complex semantic machine translation model for foreign linguistics,''
Boletin Tecnico/Technical Bulletin, vol. 55, no. 15, pp. 473-481, 2017.

M. Liu, L. Zhang, H. Hu, L. Nie, and J. Dai, ``A classification model for semantic
entailment recognition with feature combination,'' Neurocomputing, vol. 208, no. 5,
pp. 127-135, 2016.

Z. Peng, ``The approaches of internet public opinion research,'' Libraly Journal,
vol. 35, no. 12, pp. 63-68, 2016.

U. Germann, ``Sampling phrase tables for the moses statistical machine translation
system,'' Prague Bulletin of Mathematical Linguistics, vol. 104, no. 1, pp. 39-50,
2015.

T. Yoshioka, S. Karita, and T. Nakatani, ``Far-field speech recognition using CNN-DNN-HMM
with convolution in time,'' Proc. of IEEE International Conference on Acoustics, Speech
and Signal Processing, pp. 4360-4364, 2015.

R. K. Chakrawarti, H. Mishra, and P. Bansal, ``Review of machine translation techniques
for idea of Hindi to English idiom translation,'' International Journal of Computational
Intelligence Research, vol. 13, no. 5, pp. 1059-1071, 2017.

Author
Wei Hu was born in November 1983. She has received the master's degree from Xiangtan
University. She is working at Hunan University of Science and Engineering as an associate
professor. She is interested in English translation and culture.
Yipeng Luo was born in December 1982. He has received the doctor's degree from
Central South University. He is working at Hunan Women's University as a lecturer.
He is interested in English language teaching.