Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 14, No. 03, p.400-406

ISSN (online) :

2287-5255

Received : 23 May 2024Revised : 17 June 2024Accepted : 27 June 2024

DOI :

https://doi.org/10.5573/IEIESPC.2025.14.3.400

Regular Paper

Research on Improving the Performance of English Machine Translation Through Optimal Extraction of Language Vector Features

HuWei¹^* LuoYipeng²

( Foreign Language School, Hunan University of Science and Engineering, Yongzhou, Hunan 425199, China huw1983@outlook.com)
( School of Information Science and Engineering, Hunan Women’s University, Changsha, Hunan 410004, China)

^* Corresponding Author: Wei Hu, huw1983@outlook.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

This article briefly introduces the Word2vec model used for text vectorization and the convolutional neural network (CNN) extractor used for extracting semantic features. The CNN extractor was applied in machine translation, and the semantic feature vector was combined with the intermediate vector of the encoder to optimize the translation algorithm. Simulation experiments were conducted to compare this algorithm with the recurrent neural network and traditional long short-term memory (LSTM) models. The results showed that the machine translation algorithm adopting Word2vec performed better than that adopting one-hot encoding and was less affected by sentence length. The CNN extractor effectively extracted semantic features. The LSTM algorithm that combined semantic features not only was less affected by sentence length but also achieved higher accuracy in translations, resulting in more fluent translations.

Keywords

Semantic feature, Machine translation, English, Text vectorization

1. Introduction

For non-native English speakers, it is difficult to learn and use English skillfully ^[1]. The machine translation algorithm brought by the progress of computer technology can provide effective assistance for non-native English speakers, both in English learning and in communication with native English speakers ^[2]. The essence of a machine translation algorithm is to convert one sequence into another sequence. Traditional machine translation algorithms often translate texts through one-to-one correspondence between words ^[3], but this will lead to inaccurate semantic understanding, unnatural translation expression, and other problems. In order to optimize the efficiency of machine translation algorithms, semantic features are incorporated to utilize contextual information. Zhang et al. ^[4] designed a novel word embedding training method using part-of-speech features and verified its effectiveness through experiments. Qiu et al. ^[5] proposed a short text semantic similarity calculation approach combining bidirectional encoder representation from transformers and a time-warping distance algorithm. They verified its performance advantage through experiments. Guan et al. ^[6] proposed a method combining word and word dependencies to calculate sentence similarity and found that combining word and word dependencies can improve the capability to extract matching features between two sentences. Lin et al. ^[7] proposed an improvement to neural machine translation using a novel beam search evaluation function and discovered that the approach effectively enhanced the quality of English-to-Chinese translation. Lee et al. ^[8] utilized a character-level convolutional network as an encoder for machine translation. It performed significantly better than the subword-level encoder in multilingual experiments. The aforementioned literature all investigated word features and attempted to utilize more unified features for representing text words, facilitating rapid language processing by computers later on. The main issue addressed in this article is how to optimize English machine translation. Ultimately, the article focuses on optimizing the representation of text vector features in order to reflect the underlying connections between words better and achieve more accurate correspondences during the machine translation process. This paper briefly introduces the Word2vec model for text vectorization and the convolutional neural network (CNN) extractor for extracting semantic features. Then, the CNN extractor was applied to machine translation, and the semantic feature vector was combined with the intermediate vector of the encoder to improve the performance of the translation algorithm. Simulation experiments were also conducted. The innovation of this article lies in using Word2vec to vectorize the text, utilizing a CNN to extract semantic features of words, and integrating them, thereby enhancing the regular information contained in word features and improving the performance of machine translation.

2. Language Vector Feature Extraction Algorithm

The essence of a machine translation algorithm is to convert one text sequence into another text sequence, and the computer itself does not comprehend the meaning of the text sequence during the conversion process ^[9]. In addition, both English and Chinese are natural languages, and computers cannot directly process natural languages due to their underlying logic, so it is necessary to first convert natural languages into vector texts that computers can process.

2.1 Word2vec Model

Word2vec uses context-related words to predict the current word to obtain a distributed vector of words ^[10]. After training the Word2vec model using the following steps, when it is formally used, it is also centered on the current word, and the context word window is used to obtain the one-hot encoding of the context. It is input into the model for forward calculation, and the word vector with a size of $1\times N$ obtained in the mapping layer is the Word2vec text vector of the current word.

Taking Fig. 1 as an example, to obtain the text vector of current word $W(t)$ ^[11], a context word window with a length of 4 is used to obtain the one-hot encoding of the context of four current words $W(t-2)$, $W(t-1)$, $W(t+1)$, and $W(t+2)$, and they form a one-hot encoding matrix with a size of $4\times V$ as the input data of the input layer ^[12].

Fig. 1. Basic structure of the Word2vec model.

The input data is multiplied with weight matrix $W$ with a size of $V\times N$ ($N$: the dimension of a word vector that is set according to demands) in the mapping layer to obtain the word vector. The obtained vector with a specification of $C\times N$ is averaged on each dimension to obtain a $1\times N$ word vector. Then, it is multiplied with another weight matrix $W'$ in the output layer. After normalization processing ^[13], the word probability distribution with a size of $1\times V$ is obtained, and the word with the largest probability is the currrent word obtained through prediction.

Finally, the predicted current word is compared with actual current word $W(t)$ in the corpus, and the error between them is calculated. If the error converges to within the threshold, the training is terminated; otherwise, the weight value in the weight matrix is adjusted reversally according to the error, and then the forward calculation is performed again.

2.2 Semantic Feature Extraction

The Word2vec model only transforms natural language into vector language that computers can process. Although the vector language contains certain semantic features due to the utilization of the context of the corpus, it is not enough to provide semantic information during machine translation. Once the synonyms are encountered, it is easy to affect the accuracy of translation. To optimize the performance of a machine translation algorithm, this paper introduces semantic features and fuses semantic feature vectors with Word2vec encoding vectors. In this paper, a CNN is employed to extract semantic features ^[14].

When using the CNN extractor, the Word2vec model is also used to obtain the encoding vectors of the text, and then they are combined according to the text sequence to form a two-dimensional matrix. The matrix is regarded as an image, and the elements in the matrix are considered as pixels. Then, the convolution kernel in the CNN is employed to extract the semantic features. The convolution formula is

(1)

$ x_{j}^{l} =f\left(\sum _{j\in M}x_{i}^{l-1} \cdot W_{ij}^{l} +b_{j}^{l} \right), $

where $x_{j}^{l} $ is the convolution output feature map, $x_{i}^{l-1} $ is the feature output of the $i$-th convolution kernel after pooling in the previous convolution layer, $W_{ij}^{l} $ is the weight parameter between the i-th convolution kernel and the ${j}$-th convolution kernel, $b_{j}^{l} $ is the bias of the ${j}$ convolution kernels in ${l}$ layers, $M$ is the quantity of convolution kernels, and $f(\cdot )$ is the activation function. Then, in order to reduce the amount of calculation, the pooling layer compresses the convolutional features obtained from the convolution kernel. The compression process is that the pooling box slides on the convolutional feature map, and the average value or the maximum value is taken when sliding. After convolution and pooling, the combined convolution features are the required semantic features ^[15]. In the subsequent machine translation algorithm, the extracted semantic features are introduced to enhance the translation quality.

3. English Machine Translation Algorithm

The encoder-decoder structure is a typical structure used in machine translation algorithms. This kind of algorithm uses an encoder to transform the source text into a fixed-length intermediate vector sequence and then uses a decoder to convert the intermediate vector sequence into the translation text. By utilizing an intermediate vector between the encoder and decoder, the problem of different lengths between the source text and the translation can be effectively dealt with.

When the machine translation algorithm processes the source text, it is also not a natural language processing. It also needs to employ the Word2vec model to vectorize the source text and then employs the encoder to transform it into an intermediate vector. The LSTM algorithm is better suited for processing sequential data in the encoder, and it is also used in the decoder. At the same time, the beam search algorithm decodes the probability distribution of characters into a translation sequence when the decoder outputs the results. The above translation process does not make full use of semantic features. Therefore, the CNN semantic feature extractor is introduced into the translation, and the extracted semantic features are combined with the Word2vec encoding vector. The following are the precise measures to be taken.

① The source text is input, and the source text is preprocessed, including removing special characters, word segmentation, etc. Then, the Word2vec model is used to vectorize the source text.

② The CNN semantic feature extractor performs forward calculation on the vectorized source text to derive its semantic features.

③ The vectorized source text is input into the encoder for forward calculation by LSTM to obtain the intermediate vector sequence.

④ The gating mechanism ^[16] is used to adjust the weight of the semantic feature vector and the intermediate vector sequence, and then the weighted combination is performed. The combination formula is

(2)

$ \left\{\begin{aligned} & \alpha =f(\omega _{Ns} h_{Ns} +\omega _{sr} h_{sr} ), \\ & h_{mix} =\alpha \cdot h_{Ns} +(1-\alpha )\cdot h_{sr}, \end{aligned}\right. $

where $h_{Ns} $ and $h_{sr} $ are the encoding vectors given by the encoder and semantic feature extractor respectively, $\omega _{Ns} $ and $\omega _{Ns} $ are the gating parameters, $\alpha $ is the encoder weight, and $h_{mix} $ is the vector encoding after fusing semantic features.

⑤ The encoding vector fused with semantic features is input into the decoder, and the LSTM algorithm carries out the forward computation to obtain the distribution probability of the translated characters. After that, the translation sequence with the largest probability is searched using the beam search algorithm ^[17].

4. Simulation Experiments

4.1 Experimental Data

The experimental data were from the United Nations Parallel Corpus, which contains various language versions of United Nations documents and can be used to compare the expressions and translations of different languages. This corpus contains the original text and the corresponding machine translation version. In simple terms, this corpus is a bilingual parallel corpus. Eight thousand sentences from the above corpus were selected as the training set, and 4,000 sentences were selected as the test set.

4.2 Experimental Setup

In the machine translation algorithm, the main body still adopted the encoder-decoder structure, but it used a CNN to extract the semantic features of the source text, integrated them with the intermediate vector given by the encoder, and decoded it by the decoder. The basic parameter settings of the CNN extractor and encoder-decoder are shown in Tables 1 and 2. The later tests compared the influence of the one-hot encoding method and the Word2vec method on the machine translation algorithm proposed in this paper.

In addition to evaluating the effectiveness of the proposed translation algorithm, two additional translation algorithms were also tested. One utilized RNN as both the encoder and decoder, while the other excluded semantic feature fusion.

Table 1. Basic parameters of the CNN for extracting semantic features.

Parameter	Setting	Parameter	Setting
Input layer	A specification of $100\times 150$	Convolutional layer 1	32 convolutional kernels ($1\times 2$) with a moving step length of 2
Pooling layer 1	A pooling box with a specification of $1\times 3$, using mean pooling, and a step size of 2	Convolutional layer 2	64 convolutional kernels (($1\times 2$)) with a moving step length of 2
Pooling layer 2	A pooling box with a specification of $1\times 3$, using mean pooling, and a step size of 2	Output layer	The number of nodes is set to 12
Activation function	Sigmoid	Learning rate	0.02

Table 2. Relevant parameters.

	Encoder	Decoder
Input layer	150 nodes	100 nodes
Hidden layer	2 hidden layers with 256 nodes per layer	2 hidden layers with 256 nodes per layer
Output layer	100 nodes	The cluster search algorithm is used, and the cluster window size is set to 10
Activation function	Sigmoid	Sigmoid
Learning rate	0.01	0.01

4.3 Evaluation Criteria

Word error rate and bilingual evaluation understudy (BLEU) were used to measure the machine translation algorithm. The former is used to measure the accuracy of the word translation, and the latter is used to measure the accuracy and fluency of the translation from the overall perspective. BLEU is calculated

(3)

$ BLEU=B\cdot \exp \left(\sum _{n=1}^{N}\omega _{n} \log p_{n} \right), $

where $N$ is the maximum order of the $n$-gram grammar, $\omega _{n} $ is the weight of the $n$-gram grammar, $p_{n} $ is the proportion of phrases in the $n$-gram grammar, and $B$ is the penalty factor.

4.4 Test Results

Firstly, the effect of two text vectorization methods, the one-hot encoding method and the Word2vec method, on the translation performance of the algorithm was tested using BLEU as the measurement indicator, and the outcomes are presented in Table 3. It can be observed that under the same number of translated words, the translation performance of the algorithm using Word2vec as the text vectorization method was better. In addition, as the number of translated words increased, the translation performance of the algorithm using the one-hot coding method decreased, and the performance of the algorithm using Word2vec also decreased, but the reduction was not obvious.

Table 4 displays some translation results of the algorithms and the semantic part-of-speech tagging. The annotation of the semantic part of speech was given by the CNN semantic feature extractor in the combined algorithm. When using RNN as encoder and decoder was used, the algorithm translated the words that express the main idea without considering the order. When using the traditional LSTM algorithm as the translation algorithm, the words expressing the main meaning were also translated, and some polish was made. However, there was still a certain degree of disfluency. The LSTM algorithm combined with semantic features can give a translation result closer to the reference translation.

Table 3. Translation performance of the proposed algorithm under different text vectorization methods.

Number of translated words/word	5	10	15	20
One-hot coding method	32.5%	29.4%	27.1%	24.2%
Word2vec method	45.7%	44.8%	44.2%	43.7%

Table 4. Partial translation results of three algorithms and semantic part-of-speech tagging.

Source text	This novel is very interesting.	Did you finish yesterday's homework?
Reference translation	这本小说很有趣。	昨天的作业完成了吗？
RNN	这小说是有趣。	你完成昨天的作业？
Traditional LSTM	这本小说是非常有趣的。	你完成昨天的作业了吗？
LSTM fused with semantic features	这本小说很有趣。	昨天的作业完成了吗？
Part of speech	Pron./n./v./adv./adj.	V./pron./v./adv./n.

In the practical application of the CNN extractor, the convolution feature vector of the hidden layer in the middle of the structure is used as the semantic feature, and the performance of the semantic feature extractor cannot be directly measured by using the convolution feature. Therefore, the performance of the extractor was measured by the identification results of the semantic part of speech (Fig. 2). It can be seen that with the increase of the sentence length to be processed by the CNN extractor, the recognition accuracy of the extractor for the semantic part of speech was reduced, but it was still above 98.4%, which was high enough. This result verified the effectiveness of the convolution feature vector in the hidden layer of the extractor.

Fig. 2. Performance of the CNN semantic feature extractor.

The translation performance of the algorithms for sentences of different lengths is presented in Fig. 3. In this paper, word error rate and BLEU were used to measure the performance of algorithms. It can be seen that the performance of the three algorithms decreased with the increase of the length of the translation sentence, but the reduction was different. The word error rate and BLEU of the semantic feature-fused LSTM algorithm did not change much, while the word error rate of the RNN algorithm increased significantly with the increase of the length of the sentence, and the BLEU decreased significantly. Under the same sentence length, the LSTM algorithm fused with semantic features had the lowest word error rate and the highest BLEU. The RNN algorithm had the highest word error rate and the lowest BLEU. The traditional LSTM algorithm was in the middle.

Fig. 3. Performance of the three algorithms.

5. Discussion

The development of globalization has resulted in increasingly frequent international exchanges; however, language communication is the major challenge during these interactions. English, being one of the most commonly used universal languages, poses difficulties for achieving seamless communication and comprehension. To enhance efficiency in communications, machine translation has come into existence. The principle behind machine translation can be summarized as computers adhering to specific rules that enable them to convert one textual sequence into another textual sequence effectively. These mapping rules form the crux of machine translation. The mapping rule is the key to machine translation. This article used an encoder-decoder structure to achieve English machine translation. When translating English using the encoder-decoder structure, it first converted the English source text into an intermediate vector sequence using the encoder and then converted the intermediate vector sequence into a translated text using the decoder. This translation method utilized a fixed-length intermediate vector sequence to avoid difficulties in one-to-one correspondence between source and translation. The semantic feature vectors extracted by CNN were integrated with the original Word2vec text vectors to enhance the information contained in the text vectors, aiming to optimize machine translation algorithms. Subsequently, simulation experiments were conducted. The machine translation algorithm using Word2vec text vector method is superior to the one-hot encoding method, because the text vectors obtained by one-hot encoding not only have a large dimension but also sparse effective information, which greatly affects the computational efficiency of machine translation algorithms. Compared with the RNN and traditional LSTM algorithms, the semantic features combined with LSTM had better translation performance and less sensitivity to sentence length. The reason lies in that the RNN algorithm is prone to gradient vanishing or explosion when processing long sequence data; the traditional LSTM algorithm avoids gradient explosion as much as possible through forgetting mechanism, but LSTM combined with semantic features uses a CNN to extract language semantic features and combines them with Word2vec vector features, enriching the effective information in text feature vectors so that LSTM can obtain more rules from it and achieve better translation performance.

6. Conclusion

This paper briefly introduces the Word2vec model for text vectorization and the CNN semantic feature extractor. Then, the CNN extractor was applied to machine translation. The semantic feature vector was combined with the intermediate vector of the encoder to optimize the translation algorithm. The proposed algorithm was compared with the RNN and traditional LSTM algorithms using simulation experiments. Compared with the one-hot encoding method, the machine translation algorithm using Word2vec performed better and was less affected by the number of translated words. Regarding translation results, the LSTM algorithm fused with semantic features was closer to the reference translation, while the other two algorithms only expressed the main meaning, and the translation was not smooth enough. With the increase of the sentence length to be processed by the CNN extractor, the accuracy of the semantic extractor in identifying the semantic part of speech was reduced, but it was still above 98.4%, which was high. The increase in the length of the sentence affected the performance of the algorithm, but the LSTM algorithm fused with semantic features was less affected. Under the same sentence length, the LSTM algorithm had the highest BLEU and lowest word error rate.

The limitation of this study lies in the fact that only a portion of the corpus was used for training and testing, so future research directions would involve expanding the range of corpora to enhance the universality of machine translation algorithms. The contribution of this paper lies in utilizing a CNN to extract semantic features from text and integrating them with Word2vec vector features, providing an effective reference for enhancing the performance of machine translation algorithms.

REFERENCES

L. Bei, ``Study on the intelligent selection model of fuzzy semantic optimal solution in the process of translation using English corpus,'' Wireless Communications and Mobile Computing, vol. 2020, no. 5, pp. 1-7, 2020.

Q. Lu and Y. Wang, ``Latent semantic text classification method research based on support vector machine,'' International Journal of Information and Communication Technology, vol. 15, pp. 243-255, 2019.

K. X. Han, S. F. Yuan, W. Chien, and C. F. Yang, ``Emotional feature extraction from texts by support vector machine with local multiple kernel learning,'' Sensors and Materials: An International Journal on Sensor Technology, vol. 34, no. 6, pp. 2263-2280, 2022.

J. Zhang, J. Liu, and X. Lin, ``Improve neural machine translation by building word vector with part of speech,'' Journal of Artificial Intelligence, vol. 2, no. 2, pp. 79-88, 2020.

S. Qiu, Y. Niu, J. Li, and X. Li, ``Research on semantic similarity of short text based on bert and time warping distance,'' Journal of Web Engineering, vol. 20, no. 8, pp. 2521-2543, 2021.

X. Guan, J. Han, Z. Liu, and M. Zhang, ``Sentence similarity algorithm based on fused bi-channel dependency matching feature,'' International Journal of Pattern Recognition and Artificial Intelligence, vol. 34, no. 7, 2050019, 2019.

X. Lin, J. Liu, J. Zhang, and S. J. Lim, ``A novel beam search to improve neural machine translation for English-Chinese,'' Computer, Materials and Continuum, vol. 65, no. 1, pp. 387-404, 2020.

J. Lee, K. Cho, and T. Hofmann, ``Fully character-level neural machine translation without explicit segmentation,'' Transactions of the Association for Computational Linguistics, vol. 5, pp. 365-378, 2017.

K. Wu, X. Wang, and A. T. Aw, ``Bilingual word embedding with sentence similarity constraint for machine translation,'' Proc. of International Conference on Asian Language Processing, pp. 119-122, 2017.

Q. Yang, L. Yu, S. Tian, and J. Song, ``Collaborative semantic representation network for metaphor detection,'' Applied Soft Computing, vol. 113, no. 1, 107911, 2021.

X. Yang, T. Zhang, and C. Xu, ``Semantic feature mining for video event understanding,'' ACM Transactions on Multimedia Computing Communications & Applications, vol. 12, no. 4, 55, 2016.

H. Xi, ``The design of complex semantic machine translation model for foreign linguistics,'' Boletin Tecnico/Technical Bulletin, vol. 55, no. 15, pp. 473-481, 2017.

M. Liu, L. Zhang, H. Hu, L. Nie, and J. Dai, ``A classification model for semantic entailment recognition with feature combination,'' Neurocomputing, vol. 208, no. 5, pp. 127-135, 2016.

Z. Peng, ``The approaches of internet public opinion research,'' Libraly Journal, vol. 35, no. 12, pp. 63-68, 2016.

U. Germann, ``Sampling phrase tables for the moses statistical machine translation system,'' Prague Bulletin of Mathematical Linguistics, vol. 104, no. 1, pp. 39-50, 2015.

T. Yoshioka, S. Karita, and T. Nakatani, ``Far-field speech recognition using CNN-DNN-HMM with convolution in time,'' Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4360-4364, 2015.

R. K. Chakrawarti, H. Mishra, and P. Bansal, ``Review of machine translation techniques for idea of Hindi to English idiom translation,'' International Journal of Computational Intelligence Research, vol. 13, no. 5, pp. 1059-1071, 2017.

Author

Wei Hu

Wei Hu was born in November 1983. She has received the master's degree from Xiangtan University. She is working at Hunan University of Science and Engineering as an associate professor. She is interested in English translation and culture.

Yipeng Luo

Yipeng Luo was born in December 1982. He has received the doctor's degree from Central South University. He is working at Hunan Women's University as a lecturer. He is interested in English language teaching.