Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. ( Department of lntelligent Equipment, Changzhou College of Information Technology, Changzhou 213164, China yingying_xia2022@163.com)



Transfer learning, Convolutional neural networks, Fault diagnosis, Boundary self balancing adversarial generation network, Denoising autoencoder

1. Introduction

In the rapid development of industrial automation and intelligence, health monitoring and fault diagnosis (FD) of mechanical systems have become particularly important. Especially in the prediction and maintenance of key components such as bearings, accurate and timely FD is crucial for ensuring stable equipment operation and avoiding major accidents [1,2]. However, due to the complexity of the industrial environment and the difficulty of data collection, obtaining high-quality and accurately labeled fault data has become a major challenge. These factors collectively drive the demand for more efficient and intelligent FD technologies [3]. At present, traditional FD methods such as physical model-based analysis, statistical methods, and traditional machine learning techniques are often limited in practical applications due to a lack of sufficient annotated data and inability to handle complex nonlinear relationships [4]. Recently, the rise of deep learning and transfer learning (TL) has provided new solutions to this problem. These methods enhance the FD by learning feature representations from a large amount of data. Although deep learning has performed well in theFD, in practical applications, how to effectively handle the differences between different datasets and achieve accurate fault detection on limited or unlabeled data remains an important research topic. In addition, the generalization ability and robustness of the model in complex environments are also key challenges facing current research. In view of the above challenges, this study aims to develop a fault diagnosis model of convolutional neural networks, and optimize and improve the model by introducing transfer learning mechanism to effectively improve the accuracy and efficiency of fault diagnosis in different data sets and complex environments. By simulating the distribution of real system signals, the method uses transfer learning techniques to transfer the model from one data set to another, while maintaining high accuracy and robustness of fault detection.

2. Literature Review

Faced with increasingly complex industrial environments and ever-changing equipment conditions, traditional FD methods often seem inadequate. As the boost of artificial intelligence technology, more fault detection methods have been mentioned by scholars at home and abroad. Luoet al. applied conditional deep convolutional generative adversarial networks (GAN) to mechanical FD and evaluated them. On the grounds of the electronic engineering laboratory dataset of relevant dataset measured in the laboratory, three evaluation indicators for conditional deep convolutional GANs are presented. It is utilized for distinguishing between generated samples and real samples, test mode collapse, and detect overfitting. The applicability of conditional deep convolutional GANs in the mechanical diagnosis has been verified through experiments [5]. Huang et al. presented a simplified dynamic motor model for motors under any type of mechanical fault, and derived the amplitude relationship formula between rotor radial vibration in stator line current and fault related components. The amplitude of radial vibration is related to the severity of mechanical failures. Using this formula, MCSA technology can quickly evaluate the severity of mechanical faults on the ground of the amplitude of characteristic components in the collected stator current. The simulation results verified the accuracy of the simplified model and formulas. The experimental results of state monitoring in real induction motors clearly verify the effectiveness [6]. Shang et al. proposed a deep feature fusion strategy on the grounds of information entropy for getting low dimensional features. Finally, utilizing the advantages of deep confidence network probability models as fault classifiers for fault recognition. The effectiveness was verified. The experiment showcased that relative to traditional and existing intelligent FD methods, the proposed method could get representative information and features from raw data, and has more excellent classification accuracy [7]. Marmouch et al. introduced statistical neural networks as an effective classifier in artificial intelligence technology, which provides recognition of operational patterns. In this context, radial basis function neural networks and probabilistic neural networks were studied for fault detection in motor current analysis. In terms of feature selection, principal component analysis is applied to transform the original database into a new irrelevant feature space [8].

TL has the effect of reducing dependence on a large amount of annotated data and has significant advantages in data processing. Briceno-Mena et al. combined knowledge-based modeling with data-driven modeling by implementing a few sample learning method. The knowledge-based model originally developed for polymer electrolyte membrane fuel cells was used to generate simulation data and for pre training neural network source models adjusted through genetic algorithm based AutoML. It trained six target models through a few sample learning method. In all cases, models that cannot see data have achieved high accuracy [9]. Kaya and Nal proposed a new automated classification method that can extract images of both bleeding and ischemic lesions from non enhanced brain CT stroke patients during treatment. It aims to automate the use of U-Net models for high-precision detection of stroke lesions for segmentation. In experiments conducted on real datasets, the value of the accuracy classification model was 95.06%. For subdivision, the experimental results showed that the hemorrhagic IoU coefficient was 92.01%, while the ischemic values were 82.22% [10]. Adamset al. considered extracting plant trait data from soybean images taken at the Lincoln Greenhouse Innovation Center at the University of Nebraska. Applying TL and utilizing the VGG16 model and its parameters in the convolutional layer as part of the model, convolutional neural networks are trained for forecasting measurements. The results indicate that by utilizing TL, the studied CNN efficiently and accurately extracts feature measurements from images with relatively little training data [11]. Debicha et al. designed an efficient adversarial detector on the grounds of TL and evaluated the effectiveness of using multiple strategically placed adversarial detectors in intrusion detection systems relative to a single adversarial detector. In the experiment, the most advanced existing intrusion detection models were implemented. Then, it attacked these models with a selected set of evasion attacks. Aiming at detecting these adversarial attacks, multiple TL based adversarial detectors were designed and implemented, each detector receiving a subset of information transmitted through IDS. By combining their respective decisions, it is demonstrated that in parallel IDS design, combining multiple detectors could enhance the detectability of adversarial traffic relative to a single detector [12].

In summary, there are many methods for intelligent detection of mechanical faults, and certain results have been achieved. However, obtaining a large amount of accurately labeled fault data is a challenge in mechanical fault detection. Mechanical systems operate in various complex environments, and their failure modes are influenced by various factors, such as load fluctuations, temperature changes, etc., which makes it difficult for fault detection models to cope with various changes. TL can effectively utilize data and knowledge from other tasks or fields to solve the problems of data scarcity and labeling difficulties in mechanical fault detection. Through TL, it is possible to quickly adapt to new types of faults and operating conditions without sacrificing accuracy, thereby improving the accuracy and efficiency of FD. Furthermore, a TL based convolutional neural network is proposed for mechanical FD.

Fig. 1. Basic structure of adversarial generative network.

../../Resources/ieie/IEIESPC.2025.14.3.339/image1.png

3. A FD Model on the Ground of TL Improved Convolutional Neural Network

Given the difficulty in accurately labeling signals collected by mechanical systems, extracting fault information directly from them is challenging. In the field of fault mechanism, there are multiple models used to generate various fault simulation signals. It is worth mentioning that the adversarial generative network composed of a generator and a recognizer can generate highly realistic simulation data through internal interactive games. Therefore, starting from the physical model, the study uses GAN as the generator or discriminator of convolutional neural network models, and generates generated signals that match the real system signal distribution by inputting annotated simulation signals. Then, the study used these generated signals to train classifiers and conduct FD experiments. This method aims to achieve efficient FD using low-cost unlabeled data.

3.1 Convolutional Network Architecture on the Ground of GAN

Adversarial generative networks, as a method of fitting distributions, have been extensively utilized in the image processing. Convolutional neural networks (CNN) excel at extracting features from images, while GANs excel at generating realistic images. When using CNNs as part of GAN, it can help GAN better understand and simulate the complex structure of image data, thereby generating higher quality images. It can obtain generated signals that fit the true signal distribution, and can also be used for signal migration between different distribution domains [13]. As an unsupervised method, adversarial generative networks can learn data distributions and sample from them. In the convolutional network structure of GAN, the two main parts are the generator and the recognizer [14]. The purpose of the generator is to generate realistic data samples from random noise. In convolutional network-based GAN structures, the generator typically includes a series of convolutional layers, which can be traditional convolutional layers, transposed convolutional layers, or called deconvolution, often used to generate high-resolution images from random noise. The convolutional layers included in the recognizer are usually used to gradually extract low-level and high-level features in the image, and finally output a scalar value through one or several fully connected layers, representing the probability that the input sample is real data. The specific structure is shown in Fig. 1.

In Fig. 1, the generator aims to generate data with a similar distribution to the learning data. The two networks compete with each other during the training process, resulting in a generator that produces high-quality data samples. In the mechanical fault diagnosis model, the main function of GAN is to generate mechanical fault data, so as to provide a rich and diversified fault data set to train and verify the fault diagnosis model.

(1)
$ V(G,D) =\int P_{GD} (x)\cdot (-\log (1-D(x)))dx\\ \quad +\int P_{RD} (x)\cdot (-\log (D(x))) dx \\ =-\int P_{GD} (x)\cdot (-\log (1-D(x)))dx \\ \quad +P_{RD} (x)\cdot (\log (D(x)))dx. $

In formula (1), $x$ represents the collected signal, and $P_{GD} $serves as the distribution of generated data. $P_{RD} $serves as the distribution of real data,$G$ represents the generator function, and $D$ represents the recognizer function. It solves the formula and finds that when formula (1) is at its optimal value, the recognizer function is as shown in formula (2).

(2)
$ D(x)=\frac{P_{RD} (x)}{P_{RD} (x)+P_{GD} (x)} . $

By combining formula (2) with formula (1), the expression of formula (1) is rewritten as formula (3).

(3)
$ V(G,D) =-\int \left\{P_{GD} (x) \cdot \log \left(1-\frac{P_{RD} (x)}{P_{RD} (x)+P_{GD} (x)} \right)\right.\\ \quad +\left.P_{RD} (x) \cdot \log \left(\frac{P_{RD} (x)}{P_{RD} (x)+P_{GD} (x)} \right)\right\}dx \\ =\zeta (P_{GD} \left\| P_{1} \right. )+\zeta (P_{RD} \left\| P_{1} \right. )+\log 4. $

In formula (3), $\zeta $ represents relative entropy. By calculating formula (3), its minimum value can be optimized to $\log 4$. Therefore, a well trained distributor can achieve the goal of generating data with the same distribution as the real data. However, there is still a risk of pattern collapse in GAN, so the study adopts the Boundary Equilibrium Generative Adversarial Network (BEGAN). This network can quickly and stably fit, and its structure is shown in Fig. 2.

Fig. 2. Network structure diagram of BEGAN.

../../Resources/ieie/IEIESPC.2025.14.3.339/image2.png

In Fig. 2, the adversarial generative network has a generator that can map implicit changes to two-dimensional images, and a recognizer that can map two-dimensional images to implicit encoding. BEGAN uses W-distance as a measure of the distance between the source domain (SDO) and the generated domain, as shown in formula (4) [15].

(4)
$ W(\mu _{1} ,\mu _{2} )=\inf E\left|x_{1} -x_{2} \right| . $

In formula (4), $W$ represents the W-distance, and $\mu _{1} $ represents the measurement of the SDO. $\mu _{2} $ represents the generation domain, $\inf $serves as the infimum, and $E$serves as the expected value of a random variable. For the W-distance, there is a lower boundary called Lower Bound of Wasserstein Distance (LWD), which is expressed as formula (5).

(5)
$ \inf E\left|x_{1} -x_{2} \right|\ge \inf \left|E(x_{1} -x_{2} )\right|=\left|\mu _{1} -\mu _{2} \right| . $

Formula (5) indicates that BEGAN does not directly optimize LWD. For a well trained autoencoder in the SDO, it should be able to correctly decode and encode data that conforms to the sampling of the SDO data distribution. Therefore, BEGAN uses the distribution of encoding loss to measure the distribution of signals between different domains, and the loss functions of the generator and recognizer are shown in formula (6).

(6)
$ \left\{ \begin{aligned} L_D(n, x, \theta_D) &= \mathrm{Mean}(L_A(x)) - \mathrm{Mean}(L_A(G(N))), \\ L_G(n, \theta_G) &= \mathrm{Mean}(L_A(G(N))). \end{aligned} \right. $

In formula (6), $\theta _{D} $ represents the parameters of the recognizer, $\theta _{G} $ represents the parameters of the generator, and $L$ represents the loss function. The optimization goalis to reduce the encoding loss of the generated signal in the recognizer, while the recognizer's goal is to balance the encoding loss of the real signal and the generated signal. To adjust the generation quality and diversity, BEGAN introduces a hyperparameter to balance the generation quality and diversity of the model. The expression of the hyperparameter is shown in formula (7).

(7)
$ \gamma =\frac{E(L_{A} (G(n)))}{E(L_{A} (x))} . $

When the value of formula (7) is much less than 1, the model's attention will be focused on encoding the collected signal. On the ground of the above, the goal of BEGAN can be obtained. The GAN structure constructed above can generate fault signals close to the true value in fault diagnosis, which is conducive to expanding the training set. In this way, Gans can also improve the diversity of training data. The CNN structure is excellent at extracting signal features with time and frequency characteristics, so the combination of GAN and CNN can deal with problems such as the scarcity of annotated data, the diversity of failure modes, and the need to simulate the distribution of data under real operating conditions.

3.2 FD on the Ground of BEGAN Convolutional Network and TL

The study applies the TL algorithm on the ground of BEGAN to FD, using annotated simulation signals generated by dynamic models and real signals collected by test benches to jointly train adversarial generation networks. The results are shown in Fig. 3.

Fig. 3. TL model process on the ground of BEGAN.

../../Resources/ieie/IEIESPC.2025.14.3.339/image3.png

Fig. 4. Signal migration problem.

../../Resources/ieie/IEIESPC.2025.14.3.339/image4.png

In Fig. 3, the labeled signal generated by the dynamic model is fed into the generator to obtain the generated signal, which is then fed into the recognizer for training along with the actual unlabeled signal collected. The recognizer is trained by determining whether the input signal comes from a generated signal or a real collected signal. It uses a recognizer to train the generator, so that the signal obtained by the generator is the same as the real sampled signal, achieving the goal of adding real features to the simulated signal [16]. Finally, the classifier is trained using annotated generated signals with the same distribution as the real signal, achieving the goal of classifying the real signal. Although BEGAN has easy training characteristics, it cannot guarantee that the input and output signals belong to the same category. The study will analyze the transfer problem. The possible issues that BEGAN may encounter during the migration process are shown in Fig. 4.

In Fig. 4, the surface represents the distribution probability of high-level features, and the relevant position distribution can be obtained by following different projection surfaces. Different feature dimension reduction methods result in similar distributions, but their corresponding relationships are completely opposite. Therefore, this situation is referred to as a mapping error. To address this issue, a boundary self balancing network with constraints between the generated signal and the target domain signal has been studied. If pixel level regularization terms are added to constrain the changes between $\mu _{1} $ and $\mu _{2} $, it will reduce the domain adaptation ability of BEGAN. To achieve better regional adaptability and reduce mapping errors simultaneously, the average value is shown in formula (8).

(8)
$ Mean(x)=\frac{\sum _{x\in X}x }{N} . $

In formula (8), $N$ represents the number of signals. According to formula (8), the overall goal of the generator is shown in formula (9).

(9)
$ L_{G1} (n,\theta {}_{G} )=Mean(L_{A} (G(n))+L_{G} (n)) . $

By using the balance mechanism of BEGAN and proportional integral control to maintain the weights of two losses, the overall loss function can be obtained. Introducing a denoising autoencoder into the BEGAN structure resulted in a Denoising Autoencoder Boundary Equilibrium Generative Adversarial Network (DABEGAN) structure, as shown in Fig. 5.

Fig. 5. Structure of generation and recognition autoencoder.

../../Resources/ieie/IEIESPC.2025.14.3.339/image5.png

In Fig. 5, the generator and recognizer of DABEGAN include 3 layers of convolution and 3 layers of deconvolution. The Cov2D 2D convolutional layer has a certain number of convolution kernels, with a kernel size of $3 * 3$ and a step size of $2 * 2$. After passing through various convolutional layers, the batch specification layer is used to standardize the output features, and its activation function is chosen as the Leaky ReLu function, as shown in formula (10).

(10)
$ y=\max (0,x)+\psi \cdot \min (0,x) . $

In formula (10), $\psi $ represents a smaller normal number. Because the proposed DABEGAN has strong domain adaptation ability, it can convert the SDO signal into a generated signal with a distribution similar to that of the target domain signal. After determining the structure and optimization of DABEGAN and classifier through the above steps, this study establishes a bearing vibration dynamics model and analyzes its health status. For the bearing system, the shaft rotation speed, failure frequency of the outer raceway, and shaft rotation frequency are denoted as $\omega _{s} $, $\omega _{bpo} $, and $\omega _{c} $, unitrad/s, respectively. Its calculation is shown in formula (11).

(11)
$ \left\{ \begin{aligned} &\omega_{s} = \omega_{bpi}, \\ &\omega_{bpo} = 0, \\ &\omega_{c} = \frac{\omega_{s}}{2} \left(1 - \frac{D_{r}}{D_{s}}\right). \end{aligned} \right. $

In formula (11), $\omega _{bpi} $ represents the fault frequency of the inner raceway, unitrad/s. $D$ represents the bearing pitch. On the ground of formula (11), the contact deformation of the rolling element due to the presence of external forces is shown in formula (12).

(12)
$ \delta _{j} =(x_{i} -x_{o} )\cos \theta _{j} +(y_{i} -y_{o} )\sin \theta _{j} -\varepsilon . $

In formula (12), $x_{i} $ indicates the lateral displacement of the inner ring, in millimeters. $x_{0} $ represents the lateral displacement of the outer ring, in millimeters. $y_{i} $ represents the longitudinal displacement of the inner ring, in millimeters. $y_{0} $ represents the longitudinal displacement of the outer ring, in millimeters. $\delta $ represents contact deformation,in millimeters. $\theta $ represents the rotation angle of the rolling element, unit rad. $\varepsilon $ represents the radial clearance inside the bearing, in millimeters. According to Hertz contact mechanics, the total contact force between the inner and outer rings is able to be counted using formula (13) [17,18].

(13)
$ \left\{ \begin{aligned} f_{x} &= K \sum_{j=1}^{N} \xi_{j} \delta_{j}^{1.5} \cos \theta_{j}, \\ f_{y} &= K \sum_{j=1}^{N} \xi_{j} \delta_{j}^{1.5} \sin \theta_{j}. \end{aligned} \right. $

In formula (13), $f$ represents the non-equilibrium force, unit N. $K$ represents Hertz contact elasticity coefficient, unit N/m${}^{3/2}$. $\xi $ represents the load zone coefficient of the rolling element, and $\xi $ can be used to determine whether the flag has entered the fault zone. When the rolling element rolls into the fault zone, it will cause a sudden increase in radial clearance, followed by a rapid decrease in radial deformation in this direction. According to the Hertz formula, there is a direct relationship between contact force and radial deformation. Therefore, the reduction of radial deformation will lead to a decrease in Hertz force, causing a change in acceleration. The expression of radial deformation is shown in formula (14).

(14)
$ \delta _{j} =(x_{i} -x_{o} )\cos \theta _{j} +(y_{i} -y_{0} )\sin \theta _{j} -\varepsilon -\vartheta _{j} \Delta _{j} . $

In formula (14), $\vartheta $ represents whether the rolling element has reached the fault groove, and $\Delta $ represents the gap that increases when the rolling element reaches the fault groove, in millimeters. The situation when the rolling element enters the fault is shown in Fig. 6.

Fig. 6. Changes in rolling element entering fault zone.

../../Resources/ieie/IEIESPC.2025.14.3.339/image6.png

By determining the gap between the rolling element and the increased fault groove, the Hertz force can be obtained using formula (14). By solving the formula, a signal spectrum can be obtained to analyze the bearing state.

4. CNNs on the Ground of TL for Mechanical Intelligent FD Analysis

This study focuses on the mechanical intelligent FD analysis of the proposed model, and trains and analyzes the model using the Casey Reservoir bearing dataset, self collected bearing dataset, and IMS bearing dataset. Due to the absence of rolling element faults in the self collected dataset, a migration algorithm was used to evaluate the health status of bearings. The experiment first compares the simulated signal obtained by the model with the actual signal, and then repeatedly trains and visualizes the classification of the model. Finally, the application effect of the model is verified through comparative experiments.

4.1 DABEGAN Model Experiment and Training Results

The computer hardware Settings used in the experiment are as follows: the graphics card is RTX2060, the CPU is i5 13400F, and the operating system is Windows 10. The experiment was done by Python programming, in which TensorFlow was also used to implement the construction of the proposed model. The batch size of the research model training was set to 32, the maximum number of iterations was 500, the initial learning rate was 0.001, and the Adam optimizer was used for parameter updating. The generator and discriminator in the model are set up convolutional neural networks with a depth of 4 layers respectively. The Case Western Reserve bearing data set, self-collecting bearing data set and IMS bearing data set were used to train and test the model. The SPG bearing data set covers a variety of bearing failure modes under various load conditions, including inner ring failure, outer ring failure and rolling element failure. Fault sizes in the data set ranged from 0.007 inches to 0.040 inches; Different load conditions were included, with loads ranging from 0 to 3 HP and rotation speeds varying from 1797 RPM to 1730 RPM. The IMS bearing data set was obtained on a bearing test bench with four bearings, each monitored in real time by an accelerometer.

Fig. 7. Spectral comparison results between simulated and actual signals.

../../Resources/ieie/IEIESPC.2025.14.3.339/image7.png

The study analyzed the mechanical state signals through the constructed DABEGAN model, and the results are showcased in Fig. 7. In Fig. 7, the vibration response amplitudes of the simulated signal and the real signal are similar in the outer ring fault, inner ring fault, and healthy modes. However, due to the presence of noise, there is a possibility of distortion in the actual measured signal compared to the actual signal. Therefore, when the significant difference exists between the simulated signal and the real signal, using the simulated signal directly for training the fault classification algorithm is not the best choice. Meanwhile, it also confirms the correctness of this method, which uses the fault features generated by the model in the signal to fuse the real features in the collected signal for FD.

Fig. 8. Accuracy results of repeated model training.

../../Resources/ieie/IEIESPC.2025.14.3.339/image8.png

Fig. 9. Accuracy results of BEGAN.

../../Resources/ieie/IEIESPC.2025.14.3.339/image9.png

The training of adversarial generative neural networks is often affected by some issues, such as pattern collapse and fitting failure. Therefore, the training of the model requires multiple experiments to verify its reproducibility. The accuracy curve of repeated training is showcased in Fig. 8. Fig. 8 shows that there are some fluctuations during the adversarial training process of the generator and recognizer. However, all training processes ultimately achieved high accuracy, with the lowest accuracy being 90.5% and the highest accuracy being around 94.5%. The fluctuation of testing accuracy in early training is due to the proportional integral control of the loss design, which did not reach stability in the early stages and caused significant changes in coefficients, resulting in overall accuracy changes. The results show that the model has a certain degree of repeatability and stability, the network design is effective in training optimization, and can overcome the difficulties in the training process. To further verify that the proposed network can reduce mapping errors, a comparative experiment was conducted between unchanged BEGAN and DABGGAN, and the accuracy results of BEGAN are shown in Fig. 9.

The results in Fig. 9 show that the training accuracy of BEGAN fluctuates throughout the entire training process, with the highest accuracy of 64.9% and the lowest accuracy of 24.3%. Comparing the accuracy of the DABGGAN model, it can be inferred that the model in Fig. 8 may be superior to BEGAN in design and implementation, especially in terms of higher accuracy and stability during the training process. BEGAN showed great fluctuations in the training process, and the accuracy was relatively low. On the one hand, this may be due to the inadaptability of the BEGAN model to the task; on the other hand, the parameter adjustment in the training process did not reach the optimal state. Finally, the study visualized the features of different images generated by the network, and the results are shown in Fig. 10.

Fig. 10. Feature visualization results.

../../Resources/ieie/IEIESPC.2025.14.3.339/image10.png

The visualization results show changes in the distance between bearing health status, outer ring faults, and inner ring faults. Fig. 10(a)-\ref{fig:10}(e) represent the generated domain, recognition generated domain, and target domain, respectively. The visualization results of identifying the target domain and SDO. Among them, Fig. 10(c) shows the diversity of target domain samples and the difficulty of classification. In Fig. 10(d), the coverage of different types is significantly reduced and the distribution is more concentrated. In Fig. 10(c), the characteristics of health status are divided on both sides of the inner and outer ring faults. In Fig. 10(d), the generator and recognizer help the signals from the SDO migrate to the target domain. Although the features in Fig. 10(e) are completely different from the distributions in Figs. 10(c) and 10(d), the distributions in the generation domain and recognition generation domain are similar to the signal lines in the target domain and recognition target domain, respectively. Through visualization, the study can see that the characteristics of health state, outer circle fault and inner circle fault are clustered in some specific domains, which indicates that the model has a good ability to recognize different fault modes.

4.2 FD analysis on bearing dataset

This study compares several domain adaptation methods and several machine learning methods with the proposed method. This includes CNN, Support Vector Machine (SVM), Domain Discriminant Component (DCC), Domain Adversarial Neural Network (DANN), and Deep Convolutional Transformation Learning Network (DCTLN) [19,20]. Among them, CNN is a deep learning model that can extract useful features from raw sensor signals in fault diagnosis and is used for classification or regression tasks. SVM is a supervised learning model in traditional machine learning, which can be used to classify fault types in fault diagnosis. DCC is to find the feature transformation that can maximize the difference between two different domains, so that the transformed features are more discriminative, so as to improve the accuracy of cross-domain classification. DANN reduces the difference between source and target domains by introducing a domain classifier. DCTLN maps the data of source domain and target domain by designing transformation layer inside the network, so that the model can extract more general feature representation, and carry out effective transfer learning on this basis. The research will record the Case Western Storage bearing dataset, self collected bearing dataset, and IMS bearing dataset as A, B, and C. The predicted outcomes are showcased in Table Table 1.

Table 1. Accuracy comparison results under different situations.

Algorithm

A→B/%

B→A/%

A→C/%

C→A/%

B→C/%

C→B/%

Average value /%

CNN

76.06

67.89

52.94

73.12

70.45

70.46

68.09

SVM

60.83

73.63

61.46

66.86

73.76

77.24

68.96

DCC

74.67

72.82

75.56

73.98

70.02

58.72

70.96

DANN

78.91

85.38

81.91

78.87

73.83

64.81

77.29

DCTLN

88.09

85.15

90.01

89.81

82.47

80.71

86.04

DABGGAN

95.00

93.91

98.07

88.17

88.75

99.07

93.83

In Table Table 1, A $\to$ B represents training using labeled training data from dataset A and unlabeled data from dataset B, and testing is conducted on dataset B. The results showed that DABGGAN performed the best in the A $\to$ C scenario, with a prediction accuracy of 98.07%, indicating that the algorithm has excellent performance when migrating from dataset A to dataset C. The performance of moving the same source dataset to different target datasets can provide specific algorithms with the ability to adapt to changes. The average value provides the average performance of each algorithm in all test scenarios. In Table 1, DABGGAN has the highest average accuracy, reaching 93.83%, indicating its most stable and efficient performance in cross dataset migration scenarios. The outcomes showcase that the DABGGAN algorithm possesses the most excellent performance, especially approaching perfection in scenarios A $\to$ C and C $\to$ B, indicating that the proposed method is extremely effective on these datasets. To further verify the domain adaptation ability of the algorithm on different devices, a ball screw dataset was used for validation. The screw support forms were divided into two types, with fixed swimming as data D. Fixed - suspended data E, its accuracy results are shown in Fig. 11.

Fig. 11. Accurate comparison of FD for ball screw data migration.

../../Resources/ieie/IEIESPC.2025.14.3.339/image11.png

In Fig. 11, the highest accuracy of the method proposed by the research institute is 89.13%, with an average accuracy of 84.84%. Meanwhile, training CNN and SVM only on the SDO achieved an accuracy of 50.36% and 49.19%, indicating a significant difference between the source and target domains. Although existing TL methods such as DDC, DANN, and DCTLN have achieved certain improvements, the DABGGAN algorithm achieves the highest accuracy among all algorithms. DABGGAN continues to maintain high accuracy, meaning that the model also performs well on data migration across different devices.

5. Conclusion

The FD of mechanical systems is crucial for ensuring their normal operation and production safety. However, due to the complexity of the actual working environment and the scarcity of labeled data, developing diagnostic methods that can accurately reflect the actual fault situation has become a challenge. A highly efficient FD model is presented for addressing the difficulty of labeling and extracting mechanical fault signals using TL and convolutional network techniques. This model is on the ground of the construction of CNNs using BEGAN and DABEGAN. By fusing simulated data and real unlabeled data, TL techniques are used for feature transformation to improve generalization ability between different datasets. After multiple training verifications, the DABEGAN model performs the best in the FD task of A $\to$ C, with an accuracy rate of up to 98.07%. In the C $\to$ B scenario, the model's prediction accuracy reached 99.07%, highlighting the advantages of the model in practical applications. Compared with other TL methods, DABEGAN has the highest overall average accuracy of 93.83%, demonstrating better performance and robustness. In the FD accuracy test on the ball screw dataset, DABEGAN also performed well, achieving an average accuracy of 84.84%. The outcomes showcase that the model canintegrate fault features from simulated signals and collected signals, and achieve effective feature transfer between different datasets. The current research has not fully explored the performance of models in extreme environmental noise and more complex fault types. Future work will focus on improving the robustness of models in noisy environments, expanding to more industrial scenarios and fault types, and further improving the efficiency and accuracy of fault detection and diagnosis.

REFERENCES

1 
H. Li, H. Wang, Z. Xie, and M. He, ``Fault diagnosis of railway freight car wheelset based on deep belief network and cuckoo search algorithm,'' Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit, vol. 236, no. 5, pp. 501-510, July 2022.DOI
2 
X. Zhong, Q. Mei, X. Gao, and T. Haung, ``Fault diagnosis of rolling bearings based on improved direct fast iterative filtering and spectral amplitude modulation,'' Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 236, no. 9, pp. 5111-5123, January 2022.DOI
3 
A. Zhang, C. Shen, Q. He, F. Hu, F. Liu, and F. Kong, ``Doppler distortion removal based on Dopplerlet transform and re-sampling for wayside fault diagnosis of train bearings,'' Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 235, no. 17, pp. 3407-3423, October 2021.DOI
4 
Z. Mo, H. Zhang, J. Wang, H. Fu, and Q. Miao, ``Adaptive Meyer wavelet filters for machinery fault diagnosis based on harmonic infinite-taxicab norm and grasshopper optimization algorithm,'' Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 235, no. 19, pp. 4458-4474, November 2021.DOI
5 
J. Luo, J. Huang, J. Ma, and H. Li, ``An evaluation method of conditional deep convolutional generative adversarial networks for mechanical fault diagnosis,'' Journal of Vibration and Control, vol. 28, no. 11/12, pp. 1379-1389, February 2022.DOI
6 
J. Huang, Y. Liu, and Z. Liang, ``Rapid evaluation of the mechanical fault severity in induction motors using the model‐based diagnosis technique,'' IET Electric Power Applications, vol. 15, no. 3, pp. 145-158, January 2021.DOI
7 
Z. Shang, W. Li, M. Gao, X. Liu, and Y. Yu, ``An intelligent fault diagnosis method of multi-scale deep feature fusion based on information entropy,'' Chinese Journal of Mechanical Engineering, vol. 34, no. 1, pp. 1-16, December 2021.DOI
8 
S. Marmouch, T. Aroui, and Y. Koubaa, ``Statistical neural networks for induction machine fault diagnosis and features processing based on principal component analysis,'' IEEJ Transactions on Electrical and Electronic Engineering, vol. 16, no. 2, pp. 307-314, January 2021.DOI
9 
L. A. Briceno-Mena, J. A. Romagnoli, and C. G. Arges, ``PemNet: A transfer learning-based modeling approach of high-temperature polymer electrolyte membrane electrochemical systems,'' Industrial & Engineering Chemistry Research, vol. 61, no. 9, pp. 3350-3357, March 2022.DOI
10 
B. Kaya and M. Nal, ``A CNN transfer learning‐based approach for segmentation and classification of brain stroke from noncontrast CT images,'' International Journal of Imaging Systems and Technology, vol. 33, no. 1, pp. 1335-1352, February 2023.DOI
11 
J. Adams, Y. Qiu, L. Posadas, K. Eskridge, and G. Graef, ``Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning,'' Big Data & Information Analytics, vol. 6, no. 2, pp. 26-40, March 2021.DOI
12 
I. Debicha, R. Bauwens, T. Debatty, T. Kenaza, and W. Mees, ``TAD: Transfer learning-based multi-adversarial detection of evasion attacks against network intrusion detection systems,'' Future Generations Computer Systems: FGCS, vol. 138, no. 2, pp. 185-197, October 2023.DOI
13 
W. Zhu, J. Zhang, and J. Romagnoli, ``General feature extraction for process monitoring using transferlearning approaches,'' Industrial & Engineering Chemistry Research, vol. 61, no. 15, pp. 5202-5214, April 2022.DOI
14 
A. N. Chy, U. A. Siddiqua, and M. Aono, ``Exploiting transfer learning and hand-crafted features in a unified neural model for identifying actionable informative tweets,'' Journal of Information Processing, vol. 29, no. 1, pp. 16-29, January 2021.DOI
15 
P. M. Rajasree, A. Jatti, and D. Santosh, ``An improved transfer learning approach towards breast cancer classification on deep residual network,'' Indian Journal of Computer Science and Engineering, vol. 12, no. 4, pp. 1136-1148, August 2021.DOI
16 
M. Hasanvand, M. Nooshyar, E. Moharamkhani, and A. Selyari, ``Machine learning methodology for identifying vehicles using image processing,'' Artificial Intelligence and Applications, vol. 1, no. 3, pp. 170-178, April 2023.DOI
17 
H. Tang and L. Notash, ``Neural network-based transfer learning of manipulator inverse displacement analysis,'' Journal of Mechanisms and Robotics, vol. 13, no. 3, pp. 1-22, June 2021.DOI
18 
R. Rani and H. Singh, ``Fingerprint presentation attack detection using transfer learning approach,'' International Journal of Intelligent Information Technologies (IJIIT), vol. 17, no. 1, pp. 53-67, March 2021.DOI
19 
X. Yuan, E. Pang, K. Lin, and J. Hu, ``Deep protein subcellular localization predictor enhanced with transfer learning of GO annotation,'' IEEJ Transactions on Electrical and Electronic Engineering, vol. 16, no. 4, pp. 559-567, February 2021.DOI
20 
F. G. Waldamichael, T. G. Debelee, and Y. M. Ayano, ``Coffee disease detection using a robust HSV color-based segmentation and transfer learning for use on smartphones,'' International Journal of Intelligent Systems, vol. 37, no. 8, pp. 4967-4993, November 2021.DOI

Author

Ying Xia
../../Resources/ieie/IEIESPC.2025.14.3.339/author1.png

Ying Xia obtained her M.D. degree in control theory and control engineering (2007) from Nanjing University of Aeronautics and Astronautics majoring. Presently, she is working as an associate professor and teaching automation in the Intelligent Equipment School of Changzhou College of Information Technology. Since working, she has published 5 papers in the core journals and mainly participated in the Jiangsu Provincial Natural Science Foundation (surface) project. Her main research directions are neural networks, pattern recognition and satellite communication.