The online evaluation data of an image of tourist attractions is conducive to tourists’ objective and fair perception of the destination. This study built a tourist perception model based on online evaluation data of an image of Xi’an tourist attractions. The model first uses the TF-IDF algorithm to analyze the cognitive image of tourists. It then uses the NB method to analyze the emotional image of tourists, and finally, it uses an LDA theme model to analyze the overall image of the scenic spot to explore the tourist perception. The range of TF-IDF values is 0.0245-0.2316, and the maximum value and minimum value correspond to the service attitude and category, respectively. The NB model has a long running time under different data scales, and the corresponding maximum values are 8.1 s and 7.9 s. With the same data size, NN has the shortest running time, followed by SVM and KNN. When the number of topics is 4, the confusion degree of positive emotional text and negative emotional text are the lowest, and the best number of topics is 4. The method can obtain the satisfaction and dissatisfaction of tourists in a scenic spot in online evaluation data, thus avoiding an unpleasant feeling in the process of tourism. The scenic spot’s management efficiency can also be improved according to the situation.

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

### Journal Search

## 1. Introduction

In the context of the rapid development of Internet technology, more and more tourists
in China have begun to share their travel routes and experiences in real time through
the Internet. The original content of tourism users is an important part of tourism
big data. These data can not only obtain the needs of tourists in depth, but also
reflect the user’s preference for the destination, scenic spot demand, tourism motivation,
tourist attributes, etc. ^{[1,}^{2]}. Tourist perception refers to the emotion and cognition of tourists of a destination
in the process of tourism, which has a direct impact on tourism satisfaction and tourism
quality and also has an important impact on the sustainable development of the destination.
Therefore, in the field of tourism, it is of great significance to explore users’
perception of an image of destination scenic spots ^{[3,}^{4]}.

Big data technology can effectively promote the intelligent development of destinations and the design of more targeted tourism products and marketing strategies to improve the management efficiency and tourism service quality of tourist attractions. Research in China on big data technology for visitor perception is mainly reflected in data mining, vector regression model, etc. However, there are some problems in the current research, such as the low prediction accuracy of the model. This study combines a variety of big data technologies to analyze the tourist perception of an image of tourist attractions to lay a foundation for the sustainable development of tourist destinations.

## 2. Related Work

An open, interoperable service-oriented architecture was adopted to maintain a proprietary
system of transit flight planners and the arduous task of regularly updating transit
information, as well as to realize the difficulty of using geospatial data and the
rapid development of network technology. This framework is used in a transportation
route planning system to re-examine modular resources. It integrates online geospatial
services, open-source geospatial database technology, and pathfinding algorithms in
a loosely coupled manner. It was found that this method makes the system more stable
and can effectively use network technology ^{[5]}.

Arif et al. used SPSS software to make statistics on questionnaires, online search
logs, and chat records of 18 pairs of participants before and after a search to help
travel websites search online information activities and a travel planning collaborative
search system. Through analysis, it was found that the formulation of collaborative
query, the division of search tasks, chat, and result sharing were important means
of collaborative search for tourists ^{[6]}. Choi et al. investigated the number and type of source-related visual cues presented
by online travel media. The model examines the relationship between online tourism
information sources from the aspects of specialization, endorsement, and star rating
of other users. Visual cues, cue-induced perception, information credibility and target
images can reflect the technical functions. The experimental results show that tourists
are closely related to source-related data ^{[7]}.

In order to compare the comments of different online travelers, Hou et al. used semantic
association analysis to extract keywords from the comments of the three major online
travel agencies in China to build a semantic association network. The experimental
analysis found that there were significant differences in the attribute platform,
topic distribution, and community relationship of these structures. This study provided
new insights for the development of new hotels, tourism, tourism companies, and online
travel agencies ^{[8]}. Lin et al. analyzed the difference between the perceived value of first-time tourists
and the perceived value of revisiting tourists. Tourism quality was the best measure
of first-time tourists’ purchase intention. For revisiting tourists, perceived value
was the best measure of revisit degree ^{[9]}.

From the perspective of tourists’ perception, Souza and other scholars used a single
factor evaluation model of tourism destination service quality to do a quantitative
analysis of the perception of tourism services when domestic tourists travel to Xi’an.
The research results show that the proposed evaluation model has high feasibility
and reliability ^{[10]}. Moon and other researchers built a multidimensional measurement index system of
the degree of satisfaction of consumers’ experience perception in travel agencies.
The results show that consumers’ experience perception was good. This model can be
applied to the experience perception evaluation of other tourist attractions ^{[11]}.

Suhartanto and other scholars adopted data mining and data visualization methods by
using the data of Tuniu. The differentiation of tourism products and the promotion
strategy of tourism products were studied ^{[12]}. Samara and other researchers analyzed the average daily Baidu network attention
data of rural tourism from 2011 to 2013 and obtained the characteristics of rural
tourism attention in a year, week, month, season, golden week, and other time periods
^{[13]}. Alaei constructed a tourist sentiment analysis model by using artificial emotion
discrimination rules and used this model to analyze the sentiment of Chinese tourists
who post comments on domestic tourism websites to Australia. The model compares the
differences between Chinese tourists and international tourists in Australia ^{[14]}.

Ardito et al. used 70,859 user signs in data from 58 tourist attractions in Zhengzhou
on Sina Weibo to build a scenic spot evaluation index system. The evaluation index
system can be applied to the number of sign-ins. According to the user’s gender reflected
in the sign-in data, the region where the user signed in, and the sign-in time, the
preference of the user’s gender for the sign-in place can be obtained ^{[15]}.

According to a large number of research results, tourist perception technology for an image of tourist attractions has made certain achievements in Chinese and international research. Data preprocessing technology, an emotion analysis method, a recommendation algorithm, a recommendation system related evaluation, and other aspects have made breakthroughs. However, in the field of tourist perception of an image of tourist attractions, few studies involve big data technology. This study conducted in-depth analysis of this topic to find problems in the tourist attractions and promotion strategies for a tourist image.

## 3. Tourist Perception of an Image of Xi’an Tourist Attractions Applying Big Data Technology

### 3.1 Data Source and Preprocessing of Xi’an Tourist Attractions

After preprocessing original review data, the final effective evaluation data was 2548 items. On this basis, a tourist perception model of a Xi’an scenic spot image was built using big data technology. The model first uses the term frequency – inverse document frequency (TF-IDF) algorithm to analyze the cognitive image of tourists and then uses the Naive Bayes (NB) method to analyze the emotional image of tourists. It finally uses the document generation model (LDA) theme model to analyze the overall image of scenic spots. Before constructing the tourist perception model of the image of Xi’an tourist attractions, the original data needs to be processed through preprocessing technology. Taking Xi’an tourist attractions as an example, a tourist perception model of destination terrain image was constructed based on a user’s review data in a tourism website. The selected user data are based on the real experience of users during the travel process, so the data are highly effective. The data source material was online evaluation data of Xi’an tourist attractions on Dianping.com and Ctrip. These two websites have a large number of users and a high rate of comments.

The collection method for online comment data was code written in Python 3.0. The data include the user number, number, comment content, comment time, and score. Due to the large scale of data, it takes a relatively long time to review it, and we cannot truly reflect the real situation of an image of tourist attractions, so the collection time was set to 2020 - 2022. A total of 5463 original tourism evaluation data were obtained. Table 1 shows some evaluation data of Xi’an tourist attractions.

##### Table 1. Partial evaluation data of Xi’an tourist attractions.

In the online review data, because different tourists have different opinions on the scenic spots, the evaluation contents can be summarized as a whole and multi-dimensional evaluation. At the same time, there will be great differences in the content and format of the evaluation, and the analysis of these evaluations will affect the whole research process and results. Therefore, before officially starting the analysis of evaluation data, the data need to be processed to eliminate the repeated comments, garbled code, the number of words, the length of comments is too short and other comment data to ensure the quality of evaluation data.

Fig. 1 shows the specific methods of pretreatment. The first is text de-duplication. The purpose of this operation is to remove duplicate places in the evaluation data or similar comments of the same user. When clearing the same evaluation content, df.drop_ Duplicates and df.duplicates functions can be used. The second method is compressing words and phrases. After text reprocessing, the quality of comment data cannot meet the requirements of modeling and analysis. Text de-duplication is only for the whole comment, not for the phrases and words in the comment. This study deals with words and phrases in a compressed way.

The third method is to delete numbers, symbolic expressions, English, and short sentences
^{[16-}^{18]}. As the number of characters of tourists’ comments varies, the comments cover hundreds
of characters, emoticons, different formats, etc. Although some of the short comments
contain two or three words with experience, they cannot obtain in-depth information
from the comments, so they need to be deleted. During the deletion of short sentences,
the limit of the number of comments was set to 6 words. If the number is lower than
this value, the comment will be deleted, and if it is higher than this value, the
analysis will be retained.

The fourth method is to eliminate stop words. The stop words used in the study are from a list of the relevant comments on the scenic spot, the stop words list of the Harbin Institute of Technology, the stop words list of the Machine Intelligence Laboratory of Sichuan University, and the Baidu stop words list. The removal method was removing. Stopwords in Python Gensim. Fifth, Chinese word segmentation and part of speech analysis were done. In order to count the word frequency of each word and obtain the subject words and feature words in a comment, the comment content can be divided into valid words by using Chinese word segmentation. In view of the particularity of online reviews of scenic spots, this study added a customized dictionary related to scenic spots based on the reference of Tsinghua University Dictionary and Hownet Dictionary. The part of speech of the divided effective words was analyzed. The tool used for word segmentation processing was the Python jieba package.

### 3.2 Construction of Visitor Perception Model Applying Big Data Technology

After preprocessing with raw data, a visitor perception model that analyzes a valid comment set using the TF-IDF algorithm, NB network, and LDA topic model was constructed. The model uses the TF-IDF value of each word in the TF-IDF algorithm and ranks the top 50 feature words by the size of the TF-IDF value to obtain information about key topics frequently mentioned in a tourist evaluation. These key feature words obtained by the TF-IDF algorithm help to construct the cognitive image analysis dimension of tourists and help researchers understand the tourism-related things that tourists care about.

The model then uses the NB network to classify the tourists' evaluation text emotionally,
obtaining the main emotions of the tourists about various tourism matters. Finally,
the model uses the LDA theme model based on the results of the TF-IDF algorithm and
NB network analysis to construct the relationship between emotional evaluation and
tourism matters and conduct a thematic clustering analysis for an overall evaluation
of tourist attractions. The TF-IDF algorithm is a numerical statistical method that
is used as a weighting factor in the search process of user modeling, text mining,
and information retrieval. The value of this factor is proportional to the number
of words in comments ^{[19,}^{20]}. TF has many expressions, including logarithmic scale types, Boolean types, primitive
types, etc., which can be expressed by $f(t,d)$. IDF refers to a measure of the information
provided by a word, which can be referred to as $idf(t,D)$. Expression (1) refers to the text set.

The total number of text is $N$. $D$ refers to random variables in the text collection. $d$ is an element in $D$, and $i$ represents the i-th $d$.The word set in the text set is:

$M$ refers to the total number of words, and $W$ refers to random variables in the text collection. Assuming that the probability $P(d_{i})$ of all elements in $D$ is equal the corresponding value is:

The amount of information calculated for each document is $-\lg \left(\frac{1}{N}\right)$, and the entropy of random variable $D$ is:

We set the number of documents containing the subset of $w_{i}$ to $N_{i}$. If the probability of obtaining each document is the same, the amount of information is $-\lg \left(\frac{1}{N_{i}}\right)$, and the entropy of random variable $D$ is:

##### (5)

$ H\left(\Delta w_{i}\right)=-\sum _{{d_{j}}\in D}P\left(d_{i}w_{i}\right)\lg P\left(d_{i}w_{i}\right) $The probability of documents without $w_{i}$ in the selected subset is 0, and $N-N_{i}$ cannot appear in formula (5). If a word $w_{i}$ is arbitrarily obtained from the text, frequency $w_{i}$ in $d_{i}$ refers to $f_{ij}$. The frequency of $w_{i}$ in the whole text is $f_{{w_{i}}}$, and the total number of words in the text is $F$, and then the following holds.

The interactive information value $M\left(\Delta ,\Omega \right)$ is:

##### (7)

$ \begin{array}{l} M\left(\Delta ,\Omega \right)=H\left(\Delta \right)-H\left(\Delta \left| \Omega \right.\right)\\ =\sum _{{w_{i}}}P\left(w_{i}\right)H\left(\Delta \right)=H\left(\Delta lw_{i}\right)\\ =\sum _{w_{i}}P\left(w_{i}\right)\cdot idf\left(w_{i}\right) \end{array} $The calculation expression in the form of $f_{ij}$ can be obtained according to:

##### (8)

$ \begin{array}{l} M\left(\Delta ,\Omega \right)=H\left(\Delta \right)-H\left(\Delta \left| \Omega \right.\right)\\ =\sum _{{w_{i}}}P\left(w_{i}\right)H\left(\Delta \right)=H\left(\Delta lw_{i}\right)\\ =\sum _{{w_{i}}\in W}\sum _{{d_{j}}\in D}\frac{f_{ij}}{F}\lg \frac{N}{N_{i}} \end{array} $The IDF factor refers to the change of information quantity after observing a specific
word, and the TF factor refers to the probability estimate of actually observing a
word. Eqs. (7) and (8) refer to two different aspects. When TF refers to $f_{{w_{i}}}$, TF-IDF refers to
the measurement of word selection. When TF refers to $f_{ij}$, TF-IDF refers to the
measure of word weight ^{[21-}^{23]}.

A NB network is a probability distribution among a group of random variables, which can be divided into a static NB network and dynamic NB network. The difference is that the dynamic NB network considers the impact of time factors on the results. An NB network can be referred to by $G=\left(I,L\right)$. $L$ refers to the collection of segments connecting nodes, and $I$ refers to the collection of all nodes in the network structure. NB network can be divided into two parts, which are variable nodes and directed segments between nodes. The line segment is a conditional probability value. If the two nodes are not connected with each other, the random variables can be considered to be independent of each other, and the conditional probability value is 0.

We set the directed acyclic network diagram as $S$, and the joint probability distribution of variable $X=\left\{x_{1},x_{2},\cdots ,x_{n}\right\}$ as $P\left(x_{1},x_{2},\cdots x_{n}\right)$:

##### (9)

$ P\left(x_{1},x_{2},\cdots x_{n}\right)=\prod _{i=1}^{n}P\left(x_{i}\left| P_{ai}\right.\right) $In Eq. (9), $P_{ai}$ refers to the parent node of the variable. The calculation expression of joint probability coding of variable $X=\left\{x_{1},x_{2},\cdots ,x_{n}\right\}$ is Eq. (10).

##### (10)

$ P\left(x\left| \theta _{s},S^{b}\right.\right)=\prod _{i=1}^{n}p\left(x_{i}\left| p_{ai},\theta _{i},S^{b}\right.\right) $In Eq. (10), $\theta _{i}$ refers to the parameter variable. The vector formed by the parameter set is referred to by $\theta _{s}$. The joint probability distribution obtained from the decomposition of $S$ is $S^{b}$. The calculation expression of local distribution function is (11).

##### (11)

$ P\left(x\left| \theta _{s},S^{h}\right.\right)=\prod _{i=1}^{n}p\left(x_{i}\left| p_{ai},\theta _{i},S^{h}\right.\right) $Eq. (11) can be understood as a continuous variable regression function and discrete variable regression function. The construction of the NB network model is as follows. We determine the properties of the node variables, set the value range, and determine the conditional probability of the directed segment between the nodes. From the perspective of the reasoning direction of the NB network structure diagram, conditional probability can be divided into a prior probability and a posterior probability. A prior probability is obtained from background knowledge and historical data. The posterior probability is calculated on the basis of the prior probability. The two probabilities have the same form. We set $w_{1},\cdots w_{i}\cdots w_{n}$ as the weight of all categories, and the NB network equation is:

##### (12)

$ P\left(w_{i}\left| x\right.\right)=P\left(x\left| w_{i}\right.\right)\ast \frac{P\left(w_{i}\right)}{P\left(x\right)} $In Eq. (12), $P\left(x\left| w_{i}\right.\right)$ refers to the likelihood function of category $w_{i}$ with respect to feature vector $x$. $P\left(w_{i}\right)$ refers to the prediction of the probability of occurrence of various categories. $P\left(x\left| w_{i}\right.\right)$ refers to the probability of occurrence of feature vector $x$ in category $w_{i}$. $P\left(w_{i}\left| x\right.\right)$ is the posterior probability. $P\left(x\right)$ refers to the total probability of conditional probability.

A flow chart of NB network modeling is shown in Fig. 2. The key step is to determine the conditional probability and causal relationship
without knowledge based on the database and expert knowledge. The model determines
the relationship between various variables by learning the NB network structure ^{[24,}^{25]}.

The LDA topic model is a three-layer Bayesian probability model, which includes a three-layer structure for the text, topic, and word. The graph model is shown in Fig. 3. White circles and orange circles refer to hidden variables and observed variables, respectively. $\alpha $ and ${\beta}$ refer to the hyperparameters of topic distribution and term distribution, respectively. $\overset{\rightarrow }{\theta }_{d}$ and $\overset{\rightarrow }{\varphi }_{k}$ refer to the subject distribution of the text and the word distribution under the subject, respectively. $z_{d,n}$ and $w_{d,n}$ refer to the subject of the $d$-th word item and the $d$-th word item in the text. The number of topics is $K$, and the total number of words in the text of $d$ is $N_{d}$. Eq. (13) refers to the topic distribution of each text based on probability.

Eq. (14) refers to the term distribution of each topic $z\in \left\{1,2,\cdots ,K\right\}$ based on probability.

The joint probability of the implicit variable and the observed variable under the given parameters is:

##### (15)

$ p\left(\overset{\rightarrow }{w}_{d},\overset{\rightarrow }{z}_{d},\overset{\rightarrow }{\theta }_{d},\Phi \left| \overset{\rightarrow }{\alpha },\overset{\rightarrow }{\beta }\right.\right)=\prod _{n=1}^{N_{d}}p\left(w_{d,n}\left| \varphi _{zd,n}\right.\right) $In Eq. (15), $\Phi $ refers to an integral.

LDA is used to identify the topic information implied in a large text set or a large corpus. For all documents in this corpus, LDA has the following generation process. First, it extracts a topic from the topics distributed in the document. It extracts another word from the corresponding word distribution in the selected topic. It then repeats the process in a loop until it traverses all the words in the document. The LDA topic model can automatically identify the topic of the document.

The Gibbs sampling algorithm is easier to understand, and its implementation is not very complex. Especially when the subject is extracted from a large number of samples, the extraction effect is relatively significant. Therefore, the Gibbs sampling algorithm can be used to estimate the parameters of LDA subject model. Using the LDA topic model, we can calculate the topic probability of positive emotional text and negative emotional text. At the same time, the distribution probability topic vector of words contained in this topic is obtained, and finally, the clustering result of this topic is obtained. The LDA thematic clustering results are refined, and the overall image perception of Xi’an tourist attractions is summarized.

When determining the number of topics in a document set, the selection of the number of topics greatly affects the effect of topic modeling. Therefore, it is necessary to determine the optimal number of topics before formally establishing the LDA topic model. This study selected the degree of confusion as an indicator to select the optimal number of topics. When the degree of confusion is lower, the number of topics is the best. The Gibbs sampling method was used to calculate the puzzle degree of the number of topics between 2 and 40, and the relationship between the complexity and the number of topics was drawn.

## 4. The Tourist Perception Results of an Image of Xi’an Tourist Attractions under the Big Data Technology

In an experiment, the TF-IDF value of tourism evaluation was calculated and ranked,
the performance of NB network was compared, and the emotional image of tourists was
analyzed. Finally, the LDA theme model was used to analyze the overall image of a
scenic spot. The algorithms compared with the NB network were a text classification
model (IDL) of an integrated deep learning framework ^{[26]}, a recursive neural network gating recurrence unit (RNN-GRU) ^{[27]}, and CNN model of word embedding (WE-CNN) ^{[28]}. The environment of the experiment was Windows 10 and Python, and the model was implemented
on the Tensor Flow platform.

### 4.1 Analysis of Travel Feature Values based on the TF-IDF Algorithm

The TF-IDF algorithm was used to obtain the TF-IDF value of each word in the document. At the same time, the top 24 feature words were obtained according to the size of the value, and the specific results are shown in Table 2. The range of the TF-IDF value is 0.0245-0.2316, and the maximum value and minimum value correspond to service attitude and category, respectively.

##### Table 2. TF-IDF value of some words in a document.

According to the effective dataset of the review, cognitive images can be divided into four categories: tourism attractions, service measures, tourism management, and tourism services. Fig. 4 shows some characteristic words of tourist attractions. Among the top 20 feature words of tourist attractions, the TF-IDF value of feature words such as scenic spots is higher. Therefore, tourists have a higher degree of cognition and perception of the city wall, the Terra Cotta Warriors, the Big Wild Goose Pagoda, Huashan Mountain, and other scenic spots. At the same time, the TF-IDF values of such characteristic words as spectacular, good, shocking, and worthy are also high, which indicates that tourists have a positive perception of the tourist attractions of the scenic spot. Fig. 4(b) shows some characteristic values of the cognitive image of tourism service facilities.

The TF-IDF value of tourist vehicles, escalators, hotels, hotels, and other transportation facilities is high. Fig. 4(c) shows some characteristic values of a tourism management cognitive image. The TF-IDF value of management characteristic words such as queue, price, ticket price, and charge is high. The TF-IDF value of management features is relatively high, which shows that tourists pay more attention to the management of ticket charges and order management. Due to the higher TF-IDF value of "ticket," it can be seen that whether the ticket price of the scenic spot is reasonable affects the tourists' perception of the scenic spot to a large extent.

Fig. 4(d) shows some characteristic values of a tourism service cognitive image. The TF-IDF values of service characteristics such as service attitude, convenience, and tour guide are higher. The TF-IDF value of service attitude is as high as 0.2316, from which it can be seen that service attitude has a great impact on the tourism experience in the minds of tourists. Based on the analysis, compared with the hardware conditions such as scenic spots and service measures of tourist attractions, tourists pay more attention to the service attitude and level of tour guide service and believe that the service quality affects the tourism experience to a greater extent.

### 4.2 NB Model Performance Test and Tourism Emotional Image Perception Results

Fig. 5 shows the emotional image of tourism attractions analyzed by the NB model. The emotional image perception of Xi’an tourist attractions is mainly positive. The proportions of positive evaluation and negative evaluation were 98.56% and 1.44%, respectively. Fig. 5(b) shows the emotional image of tourism service facilities. It is consistent with the perception results of the emotional image of tourist attractions, with the proportions of positive and negative evaluations being 92.56% and 7.44%, respectively. Figs. 5(c) and (d) show the emotional image of tourism management and the emotional image of service, respectively, and the perception is mainly positive.

The performance comparison of the NB model used is shown in Fig. 6. The maximum number of iterations was set to 2000, and the performance was evaluated by running time, loss value, recall rate, and accuracy rate. Fig. 5 shows the performance of the NB model under different training times. As the number of iterations increases, the running time of the model increases gradually. When the maximum number of training times is reached, the running time also reaches the maximum value, which is 225.78 s. It can be seen from the figure that when the number of iterations is about 140, the recall rate and accuracy rate of the test set are ideal. The corresponding values are 0.78, 0.79, and 0.85, and the loss value of the model gradually converges to 0.05. This shows that the NB model is reliable and effective.

The performance of the NB model was studied by comparing the text classification model
(ID) ^{[26]}, the recursive neural network gating recursive unit (RNN-GRU) ^{[27]}, and the CNN model (WE-CNN) ^{[28]} to the NB model. Fig. 7 shows the accuracy of different algorithms in negative evaluation and positive evaluation.
It can be seen from the figure that the NB model has high accuracy for both negative
and positive evaluations under different data scales. The corresponding maximum values
are 0.83 and 0.85, respectively. The accuracy of other algorithms ranges from 0.60
to 0.80.

Fig. 8 shows the accuracy of different algorithms in negative evaluation and positive evaluation. It can be seen from the figure that the NB model has high accuracy for both negative and positive evaluations under different data scales. The corresponding maximum values are 0.82 and 0.85, respectively. The accuracy of other algorithms ranges from 0.60 to 0.80.

Fig. 9 shows the running time of different algorithms in negative evaluation and positive evaluation. For negative evaluation and positive evaluation, the NB model has a long running time under different data scales. The corresponding maximum values are 8.1 s and 7.9 s, respectively. The algorithm with the shortest running time under the same data size is NN, followed by IDL and RNN-GRU.

### 4.3 Results of the Overall Image Analysis of Xi'an Scenic Spot based on the LDA Theme Model

Fig. 10 shows the correlation between the degree of confusion of positive and negative emotional texts and the number of topics. When the number of topics is 4, the confusion degrees of positive emotional text and negative emotional text are the lowest. Therefore, this value was selected as the best number of topics. For both positive and negative emotional texts, the high-frequency words in theme 4 mainly focus on service facilities, such as sightseeing buses, escalators, hotels, etc. Based on the analysis, the main problems of Xi’an tourist attractions are high ticket prices, weak humanization of infrastructure, and low service quality.

## 5. Conclusion

In view of the problem that potential customers do not have a comprehensive understanding of tourist attractions, this study used big data technology to realize the tourist perception of an image of Xi’an tourist attractions. This included cognitive image and emotional image perception. For four cognitive images of tourism attractions, service measures, tourism management, and tourism services, the TF-IDF value of tourist vehicles, escalators, hotels, hotels, and other transportation facilities was high. The TF-IDF value of management characteristic words such as queue, price, ticket price, and charge was high. The TF-IDF values of service characteristics such as service attitude, convenience and tour guides were higher. For the four emotional images, the perception was mainly positive. When the number of iterations was about 140, the recall rate and accuracy rate of the test set were ideal. The corresponding values were 0.78, 0.79, and 0.85, and the loss value of the model gradually converged to 0.05.

For negative evaluation and positive evaluation, the NB model has high accuracy under different data scales, with corresponding maximum values of 0.83 and 0.85, respectively. This perception model can analyze the problems of confusion in management and high-ticket prices in the development of tourist attractions. At the same time, it can also give relevant optimization suggestions. However, there are still deficiencies in the study. The selected online evaluation data do not comprehensively analyze the attributes of tourists and relevant pictures. The selected online evaluation data type was relatively single, and only the text evaluation of tourists was analyzed. The pictures, expressions, and other information in the evaluation were not analyzed, the attributes of tourists were not comprehensively analyzed, and the tourists were divided into novice tourists and long-term tourists for analysis. In future work, a more comprehensive study will be carried out and will try to analyze the image type and expression type of tourists with different attributes.

### REFERENCES

## Author

Haiying Qi graduated from the Department of Journalism, School of Literature, Northeast Normal Univer-sity, majoring in Communications in 2009. Currently, she is working at Jilin Economic Management Cadre College in Changchun, Jilin Province in Northeast China, serving as the Deputy Minister of the Publicity & United Front Work Department and Associate Professor. She has guided students to participate in provincial vocational skills competitions for many times and won the first prize. She has served as a judge for oral examination of the national tour guide qualification certificate, published more than 20 provincial papers, edited three national 12th and 13th Five-Year Plan textbooks, and participated in the editing of many textbooks.