Based on the research of traditional object detection algorithms and PCNN, a new detection
algorithm has been proposed. It focuses on solving the complexity and difficulty of
PCNN parameter selection. By introducing the optimal family genetic algorithm, the
study aims to optimize the parameters of PCNN and achieve accurate recognition of
moving targets.
3.1 STM32 Embedded Motion Target Tracking and Detection Method
3.1.1 STM32 architecture
An embedded system is application centric and built on a computer foundation. The
customizable software and hardware configuration is equipped with dedicated microprocessors,
focusing on the design of customized software, decoders, and simulators for traditional
computer use [13]. Its hardware includes high-performance microprocessors and various interface circuits,
while its software covers real-time operating systems and application software. STM32
is an ARM Cortex-3 core microcontroller launched by STMicroelectronics, which has
the characteristics of high performance, low cost, and low power consumption [14]. The study adopts a hardware control platform based on STM32 microprocessor, combined
with motion target detection and tracking algorithm for interactive software design.
Real time control of the camera is achieved through the rotating platform of the mechanical
transmission system to complete MTTD. Fig. 1(a) shows the functional modules of the system. Fig. 1(b) is a schematic diagram of the mechanical rotation part studied.
Fig. 1. Functional module diagram of the system and schematic diagram of mechanical
rotation part.
Fig. 2. JTAG interface schematic diagram.
In Fig. 1(a), the main working modules of the control platform hardware system include power supply,
LCD screen, OV7725 camera, JTAG interface, and motor drive control. A 3.2-inch LCD
screen with a resolution of 320x240 is selected as the display screen, and data are
stored using the Flexible Static Memory Controller (FSMC) on the STM32 [15]. The LCD controller undergoes a series of conversions to convert the collected signal
data into RGB format and write it into the Graphics Random Access Memory (GRAM). In
Fig. 1(b), the entire control module is lightweight and compact, and the mechanical rotation
system operates stably. Due to the weight and installation requirements of the camera,
only one flange type deep groove ball bearing is used for fixation in both horizontal
and vertical directions. Fig. 2 is theJTAG interface.
In the JTAG interface schematic in Fig. 2, control and communication are achieved through four control lines (Test Mode Select:
TMS, Test Clock: TCK, Test Data In: TDI, Test Data Out: TDO). TCK receives the test
clock input, TDI is used for data input, TDO outputs data from the JTAG port, and
TMS sets the JTAG port to a specific test mode.
3.1.2 Introduction to existing PCNN
Unlike traditional pulsed NN and ANN, PCNN simulates information transmission between
neurons through pulses [16,17]. Fig. 3 shows the structure of PCNN neurons.
Fig. 3. Structure of PCNN neurons.
In Fig. 3, the pulse coupled neuron model consists of three main parts: receiving domain, modulation,
and pulse generation. The reception threshold consists of a linking channel and a
feedback channel. The receiving domain receives input and feedback information from
adjacent neurons. The feedback channel not only receives information between adjacent
neurons, but also receives external stimuli, represented by Eq. (1).
In Eq. (1), the ignition information is represented by $Y_{kl} $. The time decay constant is
$a_{F} $. The external input signal is represented by $I_{ij} $. $M$ is the connection
weight matrix. The feedback of the $(i,j)$-th neuron is $F_{ij} [n]$. Eq. (2) is the operational relationship of synaptic input information, used to describe the
reception of input between adjacent neurons.
In Eq. (2), $V_{L} $ represents the feedback constant. The linear input of the connected neuron
of the $(i,j)$-th neuron is $L_{ij} [n]$. The internal activities are represented
by Eq. (3).
In Eq. (3), $\beta $ represents the modulation constant between neurons. Whether neurons emit
pulses depends on the relationship between internal activity $U_{ij} [n]$ and threshold
$E_{ij} [n]$, represented by Eq. (4).
In Eq. (4), the ignition function $Y_{ij} $ represents neuronal ignition when it is equal to
1. When it is equal to 0, it indicates no ignition. The mathematical expression of
$E_{ij} [n]$ is represented by Eq. (5).
In Eq. (5), $V_{E} $ represents the threshold constant. $aE$ is the time decay constant. $E_{ij}
$ represents the threshold for change.
3.1.3 OFGA-based PCNN parameter optimization
To address the slow modeling and poor detection performance of traditional algorithms,
an optimized PCNN background differential motion target detection algorithm is proposed.
It adopts the dual threshold idea to simplify PCNN and improve it, while introducing
the Optimal Family Genetic Algorithm (OFGA). Fig. 4 is a simplified flowchart of the PCNN parameter optimization algorithm based on OFGA.
Fig. 4. Flowchart of PCNN parameter optimization algorithm based on OFGA.
In Fig. 4, the PCNN parameter optimization algorithm based on OFGA first receives image input,
and then initializes it through the OFGA system, using the PCNN system to calculate
fitness.Traditional PCNN relies on experience in initial parameter settings, and parameter
selection may be poor, which can affect convergence speed and performance. OFGA-PCNN
utilizes genetic algorithms to automatically optimize initial parameters, avoiding
the uncertainty of manually set parameters and quickly finding the global optimal
parameter combination, thereby accelerating convergence. Genetic algorithms have strong
global search capabilities and can find optimal solutions in complex search spaces.
Through genetic algorithms, OFGA-PCNN can avoid falling into local optima, thereby
improving the global optimal performance of the network. In addition, genetic algorithms
can efficiently explore the search space and accelerate the training process of the
network through selection, crossover, and mutation operations. In this experiment,
it is determined that the termination condition is met. If the condition is met, the
image is output. Otherwise, genetic operation is performed and a new population is
generated. During this process, the key parameters of PCNN are decoded to achieve
optimization. After simplification, the mathematical expression of $F_{ij} [n]$ is
represented by Eq. (6).
In Eq. (6), $I_{ij} $ represents the external input stimulus signal. The linear input $L_{ij}
[n]$ of the connected neurons of the $(i,j)$-th neuron is expressed mathematically
using Eq. (7).
$Y$ represents whether the neuron ignites in Eq. (7). $V_{L} $ is a connection constant. The mathematical expression of $Y_{ij} [n]$ is
represented by Eq. (8).
In Eq. (8), ${\rm U}_{ij} [n]$ represents the feedback input of the $(i,j)$-th neuron. $E_{ij}
[n-1]$ is a threshold function, and its threshold is crucial. Usually, traditional
algorithms use a single fixed threshold and set $E_{ij} [n-1]$ as a constant. However,
a threshold that is too small can lead to significant noise, while a threshold that
is too large may result in incomplete motion detection results. Therefore, the dual
threshold idea is introduced into each iteration of PCNN, and an improved PCNN motion
detection algorithm is proposed. The dual threshold idea uses upper and lower thresholds
in binary threshold segmentation [18].
3.2 Motion Target Tracking and Detection Method on the Foundation of Pulse-Coupled
Neural Network
The research on target tracking algorithms is committed to achieving real-time tracking
of detected moving targets. By confirming the shape, contour, or position of the moving
object in each frame of the image, target tracking calculates the direction and trajectory
of the object's motion, and provides timely feedback on the object's future motion
status. Mean Shift is a common target tracking algorithm [19]. Fig. 5 shows the Mean Shift vector.
Fig. 5. The mean shift vector.
In Fig. 5, the red dot is $x$. The dark blue center points around are sample point $x_{i} $.
The arrow represents the offsetting vector of $x_{i} $ relative to $x$. The average
offset points towards the direction of the densest sample points, which is the gradient
direction. Mean Shift includes two parts: target representation and target localization.
The user selects the target area and candidate window, and establishes a feature histogram
to represent the target model and candidate model. In motion target tracking, the
target model is determined, the similarity with the candidate model is calculated,
and the transfer vector is obtained. By iteratively calculating the Mean Shift vector,
the target converges to its true position, completing the update of the target position
and achieving target tracking [20]. In the feature space, the center of the target region is the origin, and the center
of the candidate region in subsequent frames is another position. Therefore, the feature
vectors of the target model are represented by Eq. (9).
In Eq. (9), $m$ represents the dimension of the feature vector. To conduct research, it should
establish a large number of motion target models for training. Eq. (10) shows the library of input samples.
In Eq. (10), $g_{c,n} $ represents the $n$-th action segment. The clip contains $m$'s movements
and postures. The calculation of $g_{c,n} $ action film in the $m$-th posture is represented
by Eq. (11).
After a complete action segment is projected to the output space, a ``trajectory''
will be formed in the output space and a set of index numbers containing timing information
will be obtained. Eq. (12) is the specific calculation.
In Eq. (12), $O_{c,n} $ is used to identify the index sequence of targets for each category.
According to the histogram statistics rule, a histogram of an action sequence containing
$n$ poses is represented by Eq. (13).
In Eq. (13), $f$ represents the frequency of the occurrence of the $u$-th output node in the
action. $N$ is the quantity of postures included in the action. New input actions
are computed by matching Euclidean distances to known action templates to discriminate
the class of unknown actions. The difference in action is calculated by the normalized
inner product of the feature vectors of two poses, represented by Eq. (14).
In Eq. (14), $f_{i,k} $ and $f_{j,k} $ are the $k$-th eigenvector values of attitudes $p_{i}
$ and $p_{j} $, respectively. $f_{k} (\max )$ is the $k$-th feature's maximum value.
$f_{k} (\min )$ represents this feature's minimum value. $w_{k} $meansits weight.
The common frame is the foundation of the algorithm, used to ensure that the front
and back actions have similar poses in order to achieve smooth transitions. According
to our understanding, for any two pose data, Eq. (15) can be used to measure the distance between their center of gravity.
In Eq. (15), $(x,y,z)$means the gravity center's coordinates of the human body. Conduct real-time
tracking experiments based on the traditional Mean Shift tracking algorithm and the
optimal family optimized PCNN motion target detection algorithm. Fig. 6 shows the entire algorithm process.
Fig. 6. Flow chart of detection algorithm.
In Fig. 6, first, Mean Shift is used to represent and locate the initial target, and feature
histograms of the target model and candidate model are established. Subsequently,
the targeting and candidate models' similarity is calculated using the optimal family
optimized PCNN, and the transition vector is obtained. By iteratively updating the
Mean Shift vector, the target quickly and accurately converges to its true position,
thereby achieving real-time tracking and detection of moving targets.