CN115471910A - Model training method and device for motion activity recognition model based on FPGA - Google Patents

Model training method and device for motion activity recognition model based on FPGA Download PDF

Info

Publication number
CN115471910A
CN115471910A CN202211085380.4A CN202211085380A CN115471910A CN 115471910 A CN115471910 A CN 115471910A CN 202211085380 A CN202211085380 A CN 202211085380A CN 115471910 A CN115471910 A CN 115471910A
Authority
CN
China
Prior art keywords
neural network
activity recognition
convolutional neural
fpga
human activity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211085380.4A
Other languages
Chinese (zh)
Inventor
颜延
任旭超
陈宇骞
王磊
熊璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202211085380.4A priority Critical patent/CN115471910A/en
Publication of CN115471910A publication Critical patent/CN115471910A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a model training method of a motion activity recognition model based on FPGA, terminal equipment and a computer readable storage medium. The method comprises the following steps: acquiring a human activity recognition data set, and dividing a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set; inputting a human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a full-connection IP core; and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network. According to the method, the FPGA is used as a hardware platform, and the FPGA implementation of the human behavior recognition model is researched in view of the feasibility of the FPGA in the aspect of deep learning acceleration; on the basis of human motion recognition based on UCI-HAR, a convolutional neural network with a BN fusion structure is built for hardware implementation, so that the consumption of memory is reduced, and a very good classification result is obtained.

Description

Model training method and device for motion activity recognition model based on FPGA
Technical Field
The present application relates to the field of neural network technologies, and in particular, to a model training method for a motion activity recognition model implemented based on an FPGA, a terminal device, and a computer-readable storage medium.
Background
The daily exercise behaviors of the human body are closely related to the health indexes and the energy balance of the human body, for example, the energy consumption of the human body can be calculated by monitoring the exercise behaviors such as running and walking, and the method has positive significance in the aspects of the health exercise, the body energy balance and the like of the human body. In addition, the person with dangerous conditions can be effectively rescued in time by identifying the abnormal motion behaviors (such as falling down) of the human body. However, the premise of these works is to develop a user-oriented real-time, portable and miniaturized device to support the technical implementation of algorithms such as neural network feature extraction, classifier, etc.
The increasing popularity and wide acceptance of smart phones and the addition of a large number of embedded sensors open a way for the adoption of mobile phones as data acquisition means. By collecting transmission signals of mobile sensors such as an accelerometer and a gyroscope, human motion analysis can be performed according to the motion acceleration and the angular rotation speed of a human body. In addition to data acquisition, recent advances in edge intelligence (EdgeAI) have introduced another interesting perspective for developing self-contained artificial intelligence devices. EdgeAI provides on-demand prediction in real-time with low latency using a pre-trained model on a smartphone, rather than relying on cloud deployment of a trained model. Such advancements provide an attractive ecosystem to model Human Activity Recognition (HAR) to quickly and accurately personalize an individual's activity patterns over time. It can then be integrated into the development pipeline of systems for video surveillance, patient rehabilitation, entertainment and smart home.
As a semi-custom circuit, FPGA, which is a form of edge calculation, has been widely used in signal processing, and overcomes the disadvantage that a custom circuit is not programmable in the field of Application Specific Integrated Circuits (ASICs), and a large number of logic cells also make programmable devices more flexible and have a wider application range. Based on the characteristics, many researchers at home and abroad deeply explore and utilize the advantages of high efficiency, low power consumption and the like of the FPGA to improve the real-time performance and the energy efficiency of the motion signal processing algorithm.
Human Activity Recognition (HAR) has several important applications, including medical monitoring, security and surveillance, assisted living, smart home, and video search and indexing. Despite recent advances in this area, significant challenges remain that require very high precision to be effectively applied to practical applications, from geriatric care to microsurgical devices. Deep learning models can achieve the highest accuracy, but these models are not easily deployed in handheld or wearable devices where resources are very limited.
At present, many advanced learning algorithm models with excellent performance, such as CNN and GNN, have been produced in the field of human motion behavior recognition, but these models have higher layer number and model complexity, and the calculation amount is large, so that the amount of output and weight parameters generated by the middle layer is increased dramatically, which brings the problems of high hardware requirement, difficulty in real-time processing and poor portability. In addition, the Vivado HLS is used for completing the FPGA realization of the neural network model in the prior art, but the method is only limited to the accelerated research of hardware and is not designed in a concrete way through the practical application background.
Disclosure of Invention
The application provides a model training method of a motion activity recognition model based on FPGA, terminal equipment and a computer readable storage medium.
In order to solve the technical problem, the present application provides a model training method for a motion activity recognition model based on an FPGA, the model training method comprising:
acquiring a human activity recognition data set, and dividing a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set;
inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a fully-connected IP core;
and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network.
Wherein the convolution IP core comprises a convolution layer, a normalization layer and an activation function.
The detailed operation process of the convolution IP core is as follows:
Figure BDA0003834793430000031
wherein the size of each output matrix is N-m + l, wherein,
Figure BDA0003834793430000032
where l denotes the l-th convolutional layer, i denotes a certain value of the i-th convolutional output matrix, j denotes the number of the corresponding output matrix, and is represented as 0 to N in order from left to right, and N denotes the number of convolutional output matrices. f denotes the nonlinear activation function.
The detailed operation process of the normalization layer is as follows:
Figure BDA0003834793430000033
where μ is the mean, σ, within a batch 2 Is a standard deviation in batch, epsilon is a preset constant, gamma and beta are both learnable parameters, and during the training process, the parameters are learnt through gradient descent as the parameters of other convolution kernels.
The fully-connected IP core is arranged at the output end of the convolutional neural network and is used for realizing the final classified output of the convolutional neural network.
After the human activity recognition training set is input into a convolutional neural network for training, the model training method further comprises:
and storing the weight parameters and the bias parameters of the trained convolutional neural network into a preset file format, storing the preset file format into an initialized memory card, and inputting the preset file format into a test platform for testing.
Wherein, the testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network comprises:
transmitting the human activity recognition test set to the test platform so that the test platform tests a convolutional neural network consisting of weight parameters and bias parameters in the memory card according to the human activity recognition test set;
obtaining a classification result returned by the convolutional neural network on the test platform;
calculating performance evaluation information of the convolutional neural network based on the classification result.
Wherein the type of the performance evaluation information comprises one or more index types of average accuracy, recall ratio, precision ratio and F1 value.
In order to solve the technical problem, the present application provides a terminal device, where the terminal device includes a processor and a memory connected to the processor, where the memory stores program instructions;
the processor is configured to execute the program instructions stored in the memory to implement the model training method for the motion activity recognition model implemented based on the FPGA as described above.
In order to solve the technical problem, the present application provides a computer-readable storage medium, where the storage medium stores program instructions, and the program instructions, when executed, implement the above method for model training of a motion activity recognition model implemented based on an FPGA.
Compared with the prior art, the beneficial effects of this application are: the method comprises the steps that terminal equipment obtains a human activity recognition data set, and divides a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set; inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a fully-connected IP core; and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network. According to the method, the FPGA is used as a hardware platform, and the FPGA implementation of the human behavior recognition model is researched in view of the feasibility of the FPGA in the aspect of deep learning acceleration; on the basis of human motion recognition based on UCI-HAR, a convolutional neural network with a BN fusion structure is built for hardware implementation, so that the consumption of memory is reduced, and a very good classification result is obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a schematic flowchart of an embodiment of a model training method for an FPGA-based motion activity recognition model provided in the present application;
FIG. 2 is a general framework diagram of a design scheme of software and hardware co-design provided in the present application;
FIG. 3 is a schematic overall flow chart of a model training method for an FPGA-based motion activity recognition model provided in the present application;
FIG. 4 is a block diagram of an embodiment of a convolutional neural network provided herein;
FIG. 5 is a schematic diagram of a Vivado HLS design flow provided herein;
FIG. 6 is a block diagram illustrating an embodiment of a convolutional layer IP core provided herein;
FIG. 7 is a block diagram illustrating an embodiment of a fully connected layer IP core provided herein;
fig. 8 is a schematic diagram of a PC-side training test result and a test result returned through a serial port under a view provided in the present application;
FIG. 9 is a block diagram of an embodiment of a terminal device provided herein;
FIG. 10 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to herein as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Aiming at the problems in the prior art, a convolutional neural network structure with high generalization is provided, hardware acceleration is realized through optimized instructions such as parallelization and array segmentation by means of design tools such as Vivado high-level synthesis (HLS), FPGA is used as a hardware platform, and FPGA realization of a human motion behavior recognition algorithm model is researched in view of feasibility of the FPGA in deep learning acceleration.
In the application, an FPGA (Field Programmable Gate Array) is used as a hardware platform, and in view of the feasibility of the FPGA in the aspect of deep learning acceleration, a convolutional neural network model is provided for researching the FPGA implementation of human motion behavior recognition. The system adopts a software and hardware collaborative design method, and carries out classification verification on six types of actions of walking, going upstairs, going downstairs, sitting still, standing and lying through a UCI-HAR data set, and can be used for judging the activity state of a subject. The technical category mainly comprises signal processing, deep learning and FPGA design, and is a machine learning classification and hardware acceleration problem based on signals.
Specifically, referring to fig. 1 to 3, fig. 1 is a schematic flowchart of an embodiment of a model training method for a motion activity recognition model implemented based on an FPGA according to the present application, fig. 2 is a schematic diagram of a general framework of a design scheme of software and hardware collaborative design according to the present application, and fig. 3 is a schematic diagram of an overall flowchart of the model training method for the motion activity recognition model implemented based on the FPGA according to the present application.
As shown in fig. 1, the model training method for an exercise activity recognition model implemented based on an FPGA of this embodiment specifically includes the following steps:
step S11: a human activity recognition data set is acquired and a human activity recognition training set and a human activity recognition test set are partitioned from the human activity recognition data set.
In the embodiment of the application, the terminal device, i.e. the PC end in fig. 2, acquires the UCI-HAR human activity recognition data set as the source data for model training. The UCI-HAR human activity recognition data set is activity recognition based on sensor data collected by a smart phone, and is created in 2012, and an experiment team is from university of Italian Therana.
The UCI-HAR human activity recognition dataset was collected from 30 volunteers aged between 19 and 48 years, who strapped the smartphone around the waist for one of 6 standard activities and recorded the athletic data through developed phone software. And simultaneously recording videos of each volunteer executing the activity, and manually marking the motion category according to the videos and the sensor data at a later stage, similarly editing the sound-picture synchronization in the videos. Using its embedded accelerometer and gyroscope, the present application captures 3-axis linear acceleration and 3-axis angular velocity at a constant rate of 50 Hz. Experiments have been videotaped to manually label data. The six activities performed are: walking, walking Upstairs, walking Downstairs, sitting, standing, and sleeping.
The terminal device may also pre-process data in the raw UCI-HAR human activity recognition dataset.
In one specific embodiment, the sensor signals in the data set are pre-processed by applying a noise filter, then sampled in fixed width sliding windows (128 readings/window) with 2.56 seconds and 50% overlap, and a 9 x 128 size signature is constructed from each window by sampling the data in nine dimensions, the data being randomly divided into two groups, 70% of which are selected to generate training data and 30% of which are used as test data.
Step S12: and inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP kernels and a fully-connected IP kernel.
In the embodiment of the present application, a training set is put into a convolutional neural network structure for training, the convolutional neural network structure adopted in the present invention is shown in fig. 4, and fig. 4 is a schematic diagram of a framework of an embodiment of the convolutional neural network provided in the present application.
As shown in fig. 4, the convolutional neural network of the present application includes a plurality of sets of convolutional IP kernels and fully-connected IP kernels, wherein each set of convolutional IP kernels includes a convolutional layer, a normalization layer and an activation function, and the fully-connected IP kernels are disposed at an output end of the convolutional neural network and are used for realizing final classification output of the convolutional neural network.
Specifically, batch Normalization (BN) in a convolutional neural network is widely applied because it can accelerate neural network training, make network training more stable, and have a certain regularization effect, and the operation of the BN layer in the training process is as follows:
Figure BDA0003834793430000071
where μ is the mean, σ, within a batch 2 For a standard deviation in batch, ε is a predetermined constant, e.g., 0.001, and γ and β are both learnable parameters, which are learned by gradient descent during training as are the parameters of other convolution kernels.
In python training, a BN layer is directly added behind a convolutional layer, however, after the training is finished, a hardware design stage is carried out, the BN layer can be fully fused to a former convolutional layer generally, the performance is not influenced at all, in the embodiment of the application, only the BN layer in a network needs to be removed, the original weight and bias of the convolutional layer are read, and four parameters (mean value mu and variance sigma) of the BN layer are read 2 γ, β), and the two are fused into oneConv + BN + ReLu (designed as Conv2d user IP core by Vivado HLS).
And finally, storing weight parameters (w) and bias parameters (b) in the convolution layer in the convolution neural network model structure obtained by training in the python environment into a bin file and storing the bin file into the initialized SD card.
In a specific embodiment, the terminal device may use Vivado HLS to complete the above-described algorithm IP core design.
As shown in fig. 5, fig. 5 is a schematic diagram of a Vivado HLS design flow provided by the present application. In fig. 5, the design flow of Vivado HLS is first compiling, executing (simulating), and debugging C algorithm; optionally, synthesizing the C algorithm into an RTL (Register Transfer Level) implementation using a user optimized instruction; comprehensively generating a comprehensive report, and analyzing the design; performing combined simulation verification on RTL implementation; and finally packages the RTL implementation into multiple IP formats.
The design of the two algorithm IP cores to be designed in the present application is described below:
1) conv2d convolution IP core
The user IP core finally generated by the algorithm design of the convolutional layer is shown in fig. 6, where fig. 6 is a schematic diagram of a framework of an embodiment of the convolutional layer IP core provided in the present application.
The convolutional layer is used as the most important layer in the CNN, hidden feature information of the image is extracted through convolution operation, and compared with a traditional neural network, the CNN uses two core ideas of sparse interaction and weight sharing. The convolution neural network adopts a local filter to perform convolution process, namely, a local submatrix of an input item and the local filter are taken to perform inner product operation, an output result is taken as a value of a corresponding dimension of a convolution output matrix, and the detailed operation process is shown as the following formula:
Figure BDA0003834793430000081
wherein the size of each output matrix is N-m + l, wherein,
Figure BDA0003834793430000091
where l denotes the l-th convolutional layer, i denotes a certain value of the i-th convolutional output matrix, j denotes the number of the corresponding output matrix, and is represented as 0 to N in order from left to right, and N denotes the number of convolutional output matrices. f denotes the nonlinear activation function. In the embodiment of the present application, the nonlinear activation function may adopt a ReLu function.
2) FC full-connection IP core
The user IP core finally generated by the algorithm design of the full connection layer is shown in fig. 7, where fig. 7 is a schematic diagram of a framework of an embodiment of the full connection layer IP core provided in the present application.
The full join operation is a special convolution operation, and each node of the full join layer is connected with all nodes of the previous layer to integrate the extracted features. Due to the fully-connected characteristic, the parameters of the general fully-connected layer are the most, so that in order to reduce the parameters, the method and the device ensure that the neural network model has less fully-connected operations as much as possible in the hardware implementation, and the fully-connected layer is only used for final classified output.
Step S13: and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network.
In the embodiment of the application, after the model training is finished, the classification performance of the model is verified through heartbeat data in the test set, and the experimental result shows that the average accuracy of the convolutional neural network model in the training set is 99.92% and the average accuracy in the test set is 96.27%.
In the above description, a hardware implementation of an athletic performance recognition system is specifically presented. The system is based on an ARM + FPGA hardware structure of a ZYNQ-7020 platform, and a design method of software and hardware collaborative simulation is adopted to realize a six-classification algorithm of common human body behavior actions. The hardware part firstly realizes convolution neural network convolution layers and full connection layer IP cores through a Vivado HLS 2019.2 tool Design, establishes a Block Design (Block Design) configuration connection IP core through a Vivado 2019.2 tool to establish a hardware platform, realizes a motion behavior recognition system on an FPGA development board through a vision software and hardware collaborative Design, and finally tests and analyzes FPGA Design and realization of an algorithm.
The system is generally divided into a PC end and an FPGA Venes2 platform (the model of a Zynq chip of a Zynq-7020 core board is XC7Z020CLG400-2, a PL logic unit reaches 85K, and BRAM storage resources are 4.9 Mbit), wherein the PC end is used for carrying out serial port data transmission with the FPGA platform on one hand, and then convolutional neural network weight parameters trained by the PC end are required to be led into the FPGA platform; the PS part of the FPGA platform is mainly responsible for controlling and driving each interface, and the PL part is responsible for accelerating the operation of the convolutional neural network model. And finally, returning the calculation result to the PC terminal through the serial port for displaying, wherein the general block diagram of the design scheme is shown in FIG. 2.
The hardware structure of ARM + FPGA not only plays the advantage that FPGA logic control carries out high-speed processing on a large amount of data, but also combines the characteristic of ARM software programming flexibility, and the main flow chart is shown in FIG. 3.
Specifically, the PC terminal transmits the human activity recognition test set to the FPGA platform through a Uart serial port, and then inserts the SD card storing the weight parameters and the bias parameters of the trained convolutional neural network into the FPGA platform. And the FPGA platform downloads a program, loads the convolutional neural network and starts to test the human activity recognition test set so as to obtain a test result, namely a classification result of the convolutional neural network. And the FPGA platform returns the test result to the PC end through the serial port, and the PC end evaluates the performance of the convolutional neural network.
In the embodiment of the application, in addition to the average accuracy, three indexes of Recall (Recall), precision (Precision) and F1 value (F1-score) are selected for performance evaluation of the algorithm. For each activity category in the dataset, the predictions of the model are compared to ground truth labels to calculate the number of True Positives (TP), true Negatives (TN), false Positives (FP), and False Negatives (FN). The overall accuracy ACC is equal to:
Figure BDA0003834793430000101
and Precision (Precision) and Recall (Recall) for a typical category may be calculated by the following equations:
Figure BDA0003834793430000102
Figure BDA0003834793430000103
wherein, F1-Score is a balance combination of precision and recall ratio, and the calculation formula is as follows:
Figure BDA0003834793430000104
in a specific embodiment, the performance evaluation condition of the convolutional neural network obtained by training the model training method provided by the present application is shown in the following table:
performance evaluation of table motion behavior recognition classification model
Type of athletic performance Precision[%] Recall[%] F1-score[%]
Walking 98.61 100.00 99.30
Walking Upstairs 95.83 100.00 97.87
Walking Downstairs 100.00 96.67 98.31
Sitting 95.00 86.36 90.48
Standing 88.16 94.37 91.16
Laying 100.00 100.00 100.00
In the embodiment of the application, the terminal device trains a motion behavior recognition classification model based on a convolutional neural network. And adjusting the filtered acceleration and angular velocity in the UCI-HAR action recognition database and nine-channel signals of a gyroscope into a feature matrix with the size of 9 x 128, training the proposed convolutional neural network model by utilizing the motion signal feature matrix data, and finally verifying the model on a test set, wherein the average classification accuracy is 96.27%. Under the condition of ensuring low delay, the resource utilization aspect is optimized so as to adapt to the given FPGA. In addition, the present application optimizes communications between hardware and software components by selecting the most eligible AXI4 protocol. Therefore, compared with the pure ARM software, the performance of the system is obviously improved, and the real-time constraint is met while limited hardware resources are efficiently utilized. The motion behavior recognition system based on the FPGA is realized. By utilizing the generated high-performance algorithm module IP core and through software and hardware collaborative simulation design, the motion behavior recognition system with high classification accuracy, high operation speed and low power consumption is finally realized on the Xilinx Zynq-7020 development platform.
The performance of small embedded processors is limited by cost, power consumption and size, and hardware acceleration typically improves the performance of these systems on chip (SoC), so today's socs (e.g., xilinx Zynqs) combine an ARM processor with an FPGA on one chip. This combines the advantages of both areas: certain parts of the system (e.g., the operating system) may be better implemented in software, while tasks with high parallelism may be better implemented in hardware, depending on performance. Hardware synthesis can be performed from C and C + + codes by means of design tools such as Vivado high-level synthesis (HLS), and although the conditions can quickly realize hardware acceleration, design optimization must be performed to meet resource and performance limitations; moreover, the vivado HLS is used for development, so that the defects of long development period, high cost, high difficulty, difficulty in debugging and the like caused by the fact that Verilog or VHDL hardware description language is used for development are overcome, and the method has the following main beneficial effects:
1. on the basis of human motion recognition based on UCI-HAR, a convolutional neural network with a BN fusion structure is built for hardware implementation, so that the consumption of memory is reduced, and a very good classification result is obtained.
2. The FPGA is used as a hardware platform, and in view of feasibility of the FPGA in the aspect of deep learning acceleration, the FPGA implementation of the human behavior recognition model is researched.
3. Under the condition of ensuring low delay, the resource utilization aspect is optimized so as to adapt to the given FPGA. In addition, the present application optimizes communications between hardware and software components by selecting the most eligible protocol. Therefore, the performance of the system is greatly improved compared with the pure ARM software, and the real-time constraint is met while the limited hardware resources are efficiently utilized. Finally, the method and the device realize the integration of the hardware IP core into a complex embedded system.
The method adopts UCI-HAR from UCI to verify the algorithm, and the UCI data set is a common machine learning standard test data set and is a database for machine learning proposed by University of California European Union (University of California Irvine). The UCI data set is mostly adopted for testing the machine learning algorithm, the important point is that the 'standard' two characters are adopted, the UCI data set can be adopted for testing a newly-compiled machine learning program, and similar machine learning algorithms can be higher.
Specific PC-side training test results and test results returned through a serial port under videos are shown in fig. 8, and although some accuracy percentages are lost due to fixed-point quantization of hardware, the overall system performance is not affected.
In order to implement the model training method for motion activity recognition model implemented based on FPGA in the foregoing embodiment, the present application further provides another terminal device 300, and specifically please refer to fig. 9, the terminal device 300 in the present embodiment includes a processor 31, a memory 32, an input/output device 33, and a bus 34.
The processor 31, the memory 32, and the input/output device 33 are respectively connected to the bus 34, the memory 32 stores program data, and the processor 31 is configured to execute the program data to implement the model training method for an athletic activity recognition model implemented based on an FPGA according to the above embodiment.
In the embodiment of the present application, the processor 31 may also be referred to as a CPU (Central Processing Unit). The processor 31 may be an integrated voltage control system chip with signal processing capability. The processor 31 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 31 may be any conventional processor or the like.
Please refer to fig. 10, and fig. 10 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application, in which program data 41 is stored in the computer storage medium 400, and when the program data 41 is executed by a processor, the method for training a model of an athletic activity recognition model based on FPGA is implemented.
Embodiments of the present application may be implemented in software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the claims and the accompanying drawings, and the equivalents and equivalent structures and equivalent processes used in the present application and the accompanying drawings are also directly or indirectly applicable to other related technical fields and are all included in the scope of the present application.

Claims (10)

1. A model training method for a motion activity recognition model based on FPGA is characterized by comprising the following steps:
acquiring a human activity recognition data set, and dividing a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set;
inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a fully-connected IP core;
and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network.
2. Model training method according to claim 1,
the convolutional IP core includes a convolutional layer, a normalization layer, and an activation function.
3. The model training method according to claim 2,
the detailed operation process of the convolution IP core is as follows:
Figure FDA0003834793420000011
wherein the size of each output matrix is N-m + l, wherein,
Figure FDA0003834793420000012
where l denotes the l-th convolutional layer, i denotes a certain value of the i-th convolutional output matrix, j denotes the number of the corresponding output matrix, and is represented as 0 to N in order from left to right, and N denotes the number of convolutional output matrices. f denotes the nonlinear activation function.
4. The model training method according to claim 2,
the detailed operation process of the normalization layer is as follows:
Figure FDA0003834793420000013
where μ is the mean, σ, over one batch 2 Is a standard deviation in batch, epsilon is a preset constant, gamma and beta are both learnable parameters, and during the training process, the parameters are learnt through gradient descent as the parameters of other convolution kernels.
5. Model training method according to claim 1,
the full-connection IP core is arranged at the output end of the convolutional neural network and used for realizing the final classified output of the convolutional neural network.
6. Model training method according to claim 1,
after the human activity recognition training set is input into a convolutional neural network for training, the model training method further comprises:
and storing the weight parameters and the bias parameters of the trained convolutional neural network into a preset file format, storing the preset file format into an initialized memory card, and inputting the preset file format into a test platform for testing.
7. The model training method according to claim 6,
the testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network comprises the following steps:
transmitting the human activity recognition test set to the test platform so that the test platform tests a convolutional neural network consisting of weight parameters and bias parameters in the memory card according to the human activity recognition test set;
obtaining a classification result returned by the convolutional neural network on the test platform;
calculating performance evaluation information of the convolutional neural network based on the classification result.
8. The model training method according to claim 7,
the types of the performance evaluation information comprise one or more index types of average accuracy, recall ratio, precision ratio and F1 value.
9. A terminal device, characterized in that the terminal device comprises a processor, a memory connected to the processor, wherein,
the memory stores program instructions;
the processor is configured to execute the program instructions stored in the memory to implement the model training method for an FPGA-based implemented athletic activity recognition model according to claims 1-8.
10. A computer-readable storage medium, characterized in that the storage medium stores program instructions that, when executed, implement the model training method for an FPGA-based implemented athletic activity recognition model according to claims 1-8.
CN202211085380.4A 2022-09-06 2022-09-06 Model training method and device for motion activity recognition model based on FPGA Pending CN115471910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211085380.4A CN115471910A (en) 2022-09-06 2022-09-06 Model training method and device for motion activity recognition model based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211085380.4A CN115471910A (en) 2022-09-06 2022-09-06 Model training method and device for motion activity recognition model based on FPGA

Publications (1)

Publication Number Publication Date
CN115471910A true CN115471910A (en) 2022-12-13

Family

ID=84368972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211085380.4A Pending CN115471910A (en) 2022-09-06 2022-09-06 Model training method and device for motion activity recognition model based on FPGA

Country Status (1)

Country Link
CN (1) CN115471910A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077586A (en) * 2023-10-16 2023-11-17 北京汤谷软件技术有限公司 Register transmission level resource prediction method, device and equipment for circuit design

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357542A1 (en) * 2018-06-08 2018-12-13 University Of Electronic Science And Technology Of China 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method
CN114943324A (en) * 2022-05-26 2022-08-26 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357542A1 (en) * 2018-06-08 2018-12-13 University Of Electronic Science And Technology Of China 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method
CN114943324A (en) * 2022-05-26 2022-08-26 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王嘉晨: "基于深度学习的高清图像目标检测算法FPGA实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 03, pages 27 *
苗珠环: "基于FPGA的VGG-16神经网络算法的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 4, pages 9 - 14 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077586A (en) * 2023-10-16 2023-11-17 北京汤谷软件技术有限公司 Register transmission level resource prediction method, device and equipment for circuit design
CN117077586B (en) * 2023-10-16 2024-01-19 北京汤谷软件技术有限公司 Register transmission level resource prediction method, device and equipment for circuit design

Similar Documents

Publication Publication Date Title
Nguyen et al. Trends in human activity recognition with focus on machine learning and power requirements
Tang et al. Layer-wise training convolutional neural networks with smaller filters for human activity recognition using wearable sensors
Xu et al. Human activity recognition and embedded application based on convolutional neural network
Torti et al. Embedded real-time fall detection with deep learning on wearable devices
Liu et al. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment
Mekruksavanich et al. Sport-Related Activity Recognition from Wearable Sensors Using Bidirectional GRU Network.
Li et al. Deep learning of smartphone sensor data for personal health assistance
WO2021139337A1 (en) Deep learning model-based gait recognition method and apparatus, and computer device
Coelho et al. A lightweight framework for human activity recognition on wearable devices
Kumar et al. MobiHisNet: A lightweight CNN in mobile edge computing for histopathological image classification
CN111954250B (en) Lightweight Wi-Fi behavior sensing method and system
Zheng et al. Meta-learning meets the Internet of Things: Graph prototypical models for sensor-based human activity recognition
US20230081715A1 (en) Neuromorphic Analog Signal Processor for Predictive Maintenance of Machines
CN111178288A (en) Human body posture recognition method and device based on local error layer-by-layer training
CN115471910A (en) Model training method and device for motion activity recognition model based on FPGA
Ding et al. Cascaded convolutional neural network with attention mechanism for mobile EEG-based driver drowsiness detection system
Marculescu et al. Edge AI: Systems design and ML for IoT data analytics
Wang et al. Real-time block-based embedded CNN for gesture classification on an FPGA
Mohammadi et al. Static hand gesture recognition for American sign language using neuromorphic hardware
Abreu et al. A framework for designing power-efficient inference accelerators in tree-based learning applications
Contoli et al. A study on the application of tensorflow compression techniques to human activity recognition
Krishnan et al. Small-world-based structural pruning for efficient FPGA inference of deep neural networks
CN115954019B (en) Method and system for identifying environmental noise by fusing self-attention and convolution operation
Wainwright et al. Human activity recognition making use of long short-term memory techniques
WO2024049998A1 (en) Neuromorphic analog signal processor for predictive maintenance of machines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination