CN115471910A

CN115471910A - Model training method and device for motion activity recognition model based on FPGA

Info

Publication number: CN115471910A
Application number: CN202211085380.4A
Authority: CN
Inventors: 颜延; 任旭超; 陈宇骞; 王磊; 熊璟
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-12-13

Abstract

The application provides a model training method of a motion activity recognition model based on FPGA, terminal equipment and a computer readable storage medium. The method comprises the following steps: acquiring a human activity recognition data set, and dividing a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set; inputting a human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a full-connection IP core; and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network. According to the method, the FPGA is used as a hardware platform, and the FPGA implementation of the human behavior recognition model is researched in view of the feasibility of the FPGA in the aspect of deep learning acceleration; on the basis of human motion recognition based on UCI-HAR, a convolutional neural network with a BN fusion structure is built for hardware implementation, so that the consumption of memory is reduced, and a very good classification result is obtained.

Description

Model training method and device for motion activity recognition model based on FPGA

Technical Field

The present application relates to the field of neural network technologies, and in particular, to a model training method for a motion activity recognition model implemented based on an FPGA, a terminal device, and a computer-readable storage medium.

Background

The daily exercise behaviors of the human body are closely related to the health indexes and the energy balance of the human body, for example, the energy consumption of the human body can be calculated by monitoring the exercise behaviors such as running and walking, and the method has positive significance in the aspects of the health exercise, the body energy balance and the like of the human body. In addition, the person with dangerous conditions can be effectively rescued in time by identifying the abnormal motion behaviors (such as falling down) of the human body. However, the premise of these works is to develop a user-oriented real-time, portable and miniaturized device to support the technical implementation of algorithms such as neural network feature extraction, classifier, etc.

The increasing popularity and wide acceptance of smart phones and the addition of a large number of embedded sensors open a way for the adoption of mobile phones as data acquisition means. By collecting transmission signals of mobile sensors such as an accelerometer and a gyroscope, human motion analysis can be performed according to the motion acceleration and the angular rotation speed of a human body. In addition to data acquisition, recent advances in edge intelligence (EdgeAI) have introduced another interesting perspective for developing self-contained artificial intelligence devices. EdgeAI provides on-demand prediction in real-time with low latency using a pre-trained model on a smartphone, rather than relying on cloud deployment of a trained model. Such advancements provide an attractive ecosystem to model Human Activity Recognition (HAR) to quickly and accurately personalize an individual's activity patterns over time. It can then be integrated into the development pipeline of systems for video surveillance, patient rehabilitation, entertainment and smart home.

As a semi-custom circuit, FPGA, which is a form of edge calculation, has been widely used in signal processing, and overcomes the disadvantage that a custom circuit is not programmable in the field of Application Specific Integrated Circuits (ASICs), and a large number of logic cells also make programmable devices more flexible and have a wider application range. Based on the characteristics, many researchers at home and abroad deeply explore and utilize the advantages of high efficiency, low power consumption and the like of the FPGA to improve the real-time performance and the energy efficiency of the motion signal processing algorithm.

Human Activity Recognition (HAR) has several important applications, including medical monitoring, security and surveillance, assisted living, smart home, and video search and indexing. Despite recent advances in this area, significant challenges remain that require very high precision to be effectively applied to practical applications, from geriatric care to microsurgical devices. Deep learning models can achieve the highest accuracy, but these models are not easily deployed in handheld or wearable devices where resources are very limited.

At present, many advanced learning algorithm models with excellent performance, such as CNN and GNN, have been produced in the field of human motion behavior recognition, but these models have higher layer number and model complexity, and the calculation amount is large, so that the amount of output and weight parameters generated by the middle layer is increased dramatically, which brings the problems of high hardware requirement, difficulty in real-time processing and poor portability. In addition, the Vivado HLS is used for completing the FPGA realization of the neural network model in the prior art, but the method is only limited to the accelerated research of hardware and is not designed in a concrete way through the practical application background.

Disclosure of Invention

The application provides a model training method of a motion activity recognition model based on FPGA, terminal equipment and a computer readable storage medium.

In order to solve the technical problem, the present application provides a model training method for a motion activity recognition model based on an FPGA, the model training method comprising:

acquiring a human activity recognition data set, and dividing a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set;

inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a fully-connected IP core;

and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network.

Wherein the convolution IP core comprises a convolution layer, a normalization layer and an activation function.

The detailed operation process of the convolution IP core is as follows:

wherein the size of each output matrix is N-m + l, wherein,

where l denotes the l-th convolutional layer, i denotes a certain value of the i-th convolutional output matrix, j denotes the number of the corresponding output matrix, and is represented as 0 to N in order from left to right, and N denotes the number of convolutional output matrices. f denotes the nonlinear activation function.

The detailed operation process of the normalization layer is as follows:

where μ is the mean, σ, within a batch ² Is a standard deviation in batch, epsilon is a preset constant, gamma and beta are both learnable parameters, and during the training process, the parameters are learnt through gradient descent as the parameters of other convolution kernels.

The fully-connected IP core is arranged at the output end of the convolutional neural network and is used for realizing the final classified output of the convolutional neural network.

After the human activity recognition training set is input into a convolutional neural network for training, the model training method further comprises:

and storing the weight parameters and the bias parameters of the trained convolutional neural network into a preset file format, storing the preset file format into an initialized memory card, and inputting the preset file format into a test platform for testing.

Wherein, the testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network comprises:

transmitting the human activity recognition test set to the test platform so that the test platform tests a convolutional neural network consisting of weight parameters and bias parameters in the memory card according to the human activity recognition test set;

obtaining a classification result returned by the convolutional neural network on the test platform;

calculating performance evaluation information of the convolutional neural network based on the classification result.

Wherein the type of the performance evaluation information comprises one or more index types of average accuracy, recall ratio, precision ratio and F1 value.

In order to solve the technical problem, the present application provides a terminal device, where the terminal device includes a processor and a memory connected to the processor, where the memory stores program instructions;

the processor is configured to execute the program instructions stored in the memory to implement the model training method for the motion activity recognition model implemented based on the FPGA as described above.

In order to solve the technical problem, the present application provides a computer-readable storage medium, where the storage medium stores program instructions, and the program instructions, when executed, implement the above method for model training of a motion activity recognition model implemented based on an FPGA.

Compared with the prior art, the beneficial effects of this application are: the method comprises the steps that terminal equipment obtains a human activity recognition data set, and divides a human activity recognition training set and a human activity recognition testing set from the human activity recognition data set; inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP cores and a fully-connected IP core; and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network. According to the method, the FPGA is used as a hardware platform, and the FPGA implementation of the human behavior recognition model is researched in view of the feasibility of the FPGA in the aspect of deep learning acceleration; on the basis of human motion recognition based on UCI-HAR, a convolutional neural network with a BN fusion structure is built for hardware implementation, so that the consumption of memory is reduced, and a very good classification result is obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flowchart of an embodiment of a model training method for an FPGA-based motion activity recognition model provided in the present application;

FIG. 2 is a general framework diagram of a design scheme of software and hardware co-design provided in the present application;

FIG. 3 is a schematic overall flow chart of a model training method for an FPGA-based motion activity recognition model provided in the present application;

FIG. 4 is a block diagram of an embodiment of a convolutional neural network provided herein;

FIG. 5 is a schematic diagram of a Vivado HLS design flow provided herein;

FIG. 6 is a block diagram illustrating an embodiment of a convolutional layer IP core provided herein;

FIG. 7 is a block diagram illustrating an embodiment of a fully connected layer IP core provided herein;

fig. 8 is a schematic diagram of a PC-side training test result and a test result returned through a serial port under a view provided in the present application;

FIG. 9 is a block diagram of an embodiment of a terminal device provided herein;

FIG. 10 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to herein as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Aiming at the problems in the prior art, a convolutional neural network structure with high generalization is provided, hardware acceleration is realized through optimized instructions such as parallelization and array segmentation by means of design tools such as Vivado high-level synthesis (HLS), FPGA is used as a hardware platform, and FPGA realization of a human motion behavior recognition algorithm model is researched in view of feasibility of the FPGA in deep learning acceleration.

In the application, an FPGA (Field Programmable Gate Array) is used as a hardware platform, and in view of the feasibility of the FPGA in the aspect of deep learning acceleration, a convolutional neural network model is provided for researching the FPGA implementation of human motion behavior recognition. The system adopts a software and hardware collaborative design method, and carries out classification verification on six types of actions of walking, going upstairs, going downstairs, sitting still, standing and lying through a UCI-HAR data set, and can be used for judging the activity state of a subject. The technical category mainly comprises signal processing, deep learning and FPGA design, and is a machine learning classification and hardware acceleration problem based on signals.

Specifically, referring to fig. 1 to 3, fig. 1 is a schematic flowchart of an embodiment of a model training method for a motion activity recognition model implemented based on an FPGA according to the present application, fig. 2 is a schematic diagram of a general framework of a design scheme of software and hardware collaborative design according to the present application, and fig. 3 is a schematic diagram of an overall flowchart of the model training method for the motion activity recognition model implemented based on the FPGA according to the present application.

As shown in fig. 1, the model training method for an exercise activity recognition model implemented based on an FPGA of this embodiment specifically includes the following steps:

step S11: a human activity recognition data set is acquired and a human activity recognition training set and a human activity recognition test set are partitioned from the human activity recognition data set.

In the embodiment of the application, the terminal device, i.e. the PC end in fig. 2, acquires the UCI-HAR human activity recognition data set as the source data for model training. The UCI-HAR human activity recognition data set is activity recognition based on sensor data collected by a smart phone, and is created in 2012, and an experiment team is from university of Italian Therana.

The UCI-HAR human activity recognition dataset was collected from 30 volunteers aged between 19 and 48 years, who strapped the smartphone around the waist for one of 6 standard activities and recorded the athletic data through developed phone software. And simultaneously recording videos of each volunteer executing the activity, and manually marking the motion category according to the videos and the sensor data at a later stage, similarly editing the sound-picture synchronization in the videos. Using its embedded accelerometer and gyroscope, the present application captures 3-axis linear acceleration and 3-axis angular velocity at a constant rate of 50 Hz. Experiments have been videotaped to manually label data. The six activities performed are: walking, walking Upstairs, walking Downstairs, sitting, standing, and sleeping.

The terminal device may also pre-process data in the raw UCI-HAR human activity recognition dataset.

In one specific embodiment, the sensor signals in the data set are pre-processed by applying a noise filter, then sampled in fixed width sliding windows (128 readings/window) with 2.56 seconds and 50% overlap, and a 9 x 128 size signature is constructed from each window by sampling the data in nine dimensions, the data being randomly divided into two groups, 70% of which are selected to generate training data and 30% of which are used as test data.

Step S12: and inputting the human activity recognition training set into a convolutional neural network for training, wherein the convolutional neural network comprises a plurality of groups of convolutional IP kernels and a fully-connected IP kernel.

In the embodiment of the present application, a training set is put into a convolutional neural network structure for training, the convolutional neural network structure adopted in the present invention is shown in fig. 4, and fig. 4 is a schematic diagram of a framework of an embodiment of the convolutional neural network provided in the present application.

As shown in fig. 4, the convolutional neural network of the present application includes a plurality of sets of convolutional IP kernels and fully-connected IP kernels, wherein each set of convolutional IP kernels includes a convolutional layer, a normalization layer and an activation function, and the fully-connected IP kernels are disposed at an output end of the convolutional neural network and are used for realizing final classification output of the convolutional neural network.

Specifically, batch Normalization (BN) in a convolutional neural network is widely applied because it can accelerate neural network training, make network training more stable, and have a certain regularization effect, and the operation of the BN layer in the training process is as follows:

where μ is the mean, σ, within a batch ² For a standard deviation in batch, ε is a predetermined constant, e.g., 0.001, and γ and β are both learnable parameters, which are learned by gradient descent during training as are the parameters of other convolution kernels.

In python training, a BN layer is directly added behind a convolutional layer, however, after the training is finished, a hardware design stage is carried out, the BN layer can be fully fused to a former convolutional layer generally, the performance is not influenced at all, in the embodiment of the application, only the BN layer in a network needs to be removed, the original weight and bias of the convolutional layer are read, and four parameters (mean value mu and variance sigma) of the BN layer are read ² γ, β), and the two are fused into oneConv + BN + ReLu (designed as Conv2d user IP core by Vivado HLS).

And finally, storing weight parameters (w) and bias parameters (b) in the convolution layer in the convolution neural network model structure obtained by training in the python environment into a bin file and storing the bin file into the initialized SD card.

In a specific embodiment, the terminal device may use Vivado HLS to complete the above-described algorithm IP core design.

As shown in fig. 5, fig. 5 is a schematic diagram of a Vivado HLS design flow provided by the present application. In fig. 5, the design flow of Vivado HLS is first compiling, executing (simulating), and debugging C algorithm; optionally, synthesizing the C algorithm into an RTL (Register Transfer Level) implementation using a user optimized instruction; comprehensively generating a comprehensive report, and analyzing the design; performing combined simulation verification on RTL implementation; and finally packages the RTL implementation into multiple IP formats.

The design of the two algorithm IP cores to be designed in the present application is described below:

1) conv2d convolution IP core

The user IP core finally generated by the algorithm design of the convolutional layer is shown in fig. 6, where fig. 6 is a schematic diagram of a framework of an embodiment of the convolutional layer IP core provided in the present application.

The convolutional layer is used as the most important layer in the CNN, hidden feature information of the image is extracted through convolution operation, and compared with a traditional neural network, the CNN uses two core ideas of sparse interaction and weight sharing. The convolution neural network adopts a local filter to perform convolution process, namely, a local submatrix of an input item and the local filter are taken to perform inner product operation, an output result is taken as a value of a corresponding dimension of a convolution output matrix, and the detailed operation process is shown as the following formula:

wherein the size of each output matrix is N-m + l, wherein,

where l denotes the l-th convolutional layer, i denotes a certain value of the i-th convolutional output matrix, j denotes the number of the corresponding output matrix, and is represented as 0 to N in order from left to right, and N denotes the number of convolutional output matrices. f denotes the nonlinear activation function. In the embodiment of the present application, the nonlinear activation function may adopt a ReLu function.

2) FC full-connection IP core

The user IP core finally generated by the algorithm design of the full connection layer is shown in fig. 7, where fig. 7 is a schematic diagram of a framework of an embodiment of the full connection layer IP core provided in the present application.

The full join operation is a special convolution operation, and each node of the full join layer is connected with all nodes of the previous layer to integrate the extracted features. Due to the fully-connected characteristic, the parameters of the general fully-connected layer are the most, so that in order to reduce the parameters, the method and the device ensure that the neural network model has less fully-connected operations as much as possible in the hardware implementation, and the fully-connected layer is only used for final classified output.

Step S13: and testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network.

In the embodiment of the application, after the model training is finished, the classification performance of the model is verified through heartbeat data in the test set, and the experimental result shows that the average accuracy of the convolutional neural network model in the training set is 99.92% and the average accuracy in the test set is 96.27%.

In the above description, a hardware implementation of an athletic performance recognition system is specifically presented. The system is based on an ARM + FPGA hardware structure of a ZYNQ-7020 platform, and a design method of software and hardware collaborative simulation is adopted to realize a six-classification algorithm of common human body behavior actions. The hardware part firstly realizes convolution neural network convolution layers and full connection layer IP cores through a Vivado HLS 2019.2 tool Design, establishes a Block Design (Block Design) configuration connection IP core through a Vivado 2019.2 tool to establish a hardware platform, realizes a motion behavior recognition system on an FPGA development board through a vision software and hardware collaborative Design, and finally tests and analyzes FPGA Design and realization of an algorithm.

The system is generally divided into a PC end and an FPGA Venes2 platform (the model of a Zynq chip of a Zynq-7020 core board is XC7Z020CLG400-2, a PL logic unit reaches 85K, and BRAM storage resources are 4.9 Mbit), wherein the PC end is used for carrying out serial port data transmission with the FPGA platform on one hand, and then convolutional neural network weight parameters trained by the PC end are required to be led into the FPGA platform; the PS part of the FPGA platform is mainly responsible for controlling and driving each interface, and the PL part is responsible for accelerating the operation of the convolutional neural network model. And finally, returning the calculation result to the PC terminal through the serial port for displaying, wherein the general block diagram of the design scheme is shown in FIG. 2.

The hardware structure of ARM + FPGA not only plays the advantage that FPGA logic control carries out high-speed processing on a large amount of data, but also combines the characteristic of ARM software programming flexibility, and the main flow chart is shown in FIG. 3.

Specifically, the PC terminal transmits the human activity recognition test set to the FPGA platform through a Uart serial port, and then inserts the SD card storing the weight parameters and the bias parameters of the trained convolutional neural network into the FPGA platform. And the FPGA platform downloads a program, loads the convolutional neural network and starts to test the human activity recognition test set so as to obtain a test result, namely a classification result of the convolutional neural network. And the FPGA platform returns the test result to the PC end through the serial port, and the PC end evaluates the performance of the convolutional neural network.

In the embodiment of the application, in addition to the average accuracy, three indexes of Recall (Recall), precision (Precision) and F1 value (F1-score) are selected for performance evaluation of the algorithm. For each activity category in the dataset, the predictions of the model are compared to ground truth labels to calculate the number of True Positives (TP), true Negatives (TN), false Positives (FP), and False Negatives (FN). The overall accuracy ACC is equal to:

and Precision (Precision) and Recall (Recall) for a typical category may be calculated by the following equations:

wherein, F1-Score is a balance combination of precision and recall ratio, and the calculation formula is as follows:

in a specific embodiment, the performance evaluation condition of the convolutional neural network obtained by training the model training method provided by the present application is shown in the following table:

performance evaluation of table motion behavior recognition classification model

Type of athletic performance	Precision[％]	Recall[％]	F1-score[％]
				Walking	98.61	100.00	99.30
Walking Upstairs	95.83	100.00	97.87
				Walking Downstairs	100.00	96.67	98.31
Sitting	95.00	86.36	90.48
				Standing	88.16	94.37	91.16
Laying	100.00	100.00	100.00

In the embodiment of the application, the terminal device trains a motion behavior recognition classification model based on a convolutional neural network. And adjusting the filtered acceleration and angular velocity in the UCI-HAR action recognition database and nine-channel signals of a gyroscope into a feature matrix with the size of 9 x 128, training the proposed convolutional neural network model by utilizing the motion signal feature matrix data, and finally verifying the model on a test set, wherein the average classification accuracy is 96.27%. Under the condition of ensuring low delay, the resource utilization aspect is optimized so as to adapt to the given FPGA. In addition, the present application optimizes communications between hardware and software components by selecting the most eligible AXI4 protocol. Therefore, compared with the pure ARM software, the performance of the system is obviously improved, and the real-time constraint is met while limited hardware resources are efficiently utilized. The motion behavior recognition system based on the FPGA is realized. By utilizing the generated high-performance algorithm module IP core and through software and hardware collaborative simulation design, the motion behavior recognition system with high classification accuracy, high operation speed and low power consumption is finally realized on the Xilinx Zynq-7020 development platform.

The performance of small embedded processors is limited by cost, power consumption and size, and hardware acceleration typically improves the performance of these systems on chip (SoC), so today's socs (e.g., xilinx Zynqs) combine an ARM processor with an FPGA on one chip. This combines the advantages of both areas: certain parts of the system (e.g., the operating system) may be better implemented in software, while tasks with high parallelism may be better implemented in hardware, depending on performance. Hardware synthesis can be performed from C and C + + codes by means of design tools such as Vivado high-level synthesis (HLS), and although the conditions can quickly realize hardware acceleration, design optimization must be performed to meet resource and performance limitations; moreover, the vivado HLS is used for development, so that the defects of long development period, high cost, high difficulty, difficulty in debugging and the like caused by the fact that Verilog or VHDL hardware description language is used for development are overcome, and the method has the following main beneficial effects:

1. on the basis of human motion recognition based on UCI-HAR, a convolutional neural network with a BN fusion structure is built for hardware implementation, so that the consumption of memory is reduced, and a very good classification result is obtained.

2. The FPGA is used as a hardware platform, and in view of feasibility of the FPGA in the aspect of deep learning acceleration, the FPGA implementation of the human behavior recognition model is researched.

3. Under the condition of ensuring low delay, the resource utilization aspect is optimized so as to adapt to the given FPGA. In addition, the present application optimizes communications between hardware and software components by selecting the most eligible protocol. Therefore, the performance of the system is greatly improved compared with the pure ARM software, and the real-time constraint is met while the limited hardware resources are efficiently utilized. Finally, the method and the device realize the integration of the hardware IP core into a complex embedded system.

The method adopts UCI-HAR from UCI to verify the algorithm, and the UCI data set is a common machine learning standard test data set and is a database for machine learning proposed by University of California European Union (University of California Irvine). The UCI data set is mostly adopted for testing the machine learning algorithm, the important point is that the 'standard' two characters are adopted, the UCI data set can be adopted for testing a newly-compiled machine learning program, and similar machine learning algorithms can be higher.

Specific PC-side training test results and test results returned through a serial port under videos are shown in fig. 8, and although some accuracy percentages are lost due to fixed-point quantization of hardware, the overall system performance is not affected.

In order to implement the model training method for motion activity recognition model implemented based on FPGA in the foregoing embodiment, the present application further provides another terminal device 300, and specifically please refer to fig. 9, the terminal device 300 in the present embodiment includes a processor 31, a memory 32, an input/output device 33, and a bus 34.

The processor 31, the memory 32, and the input/output device 33 are respectively connected to the bus 34, the memory 32 stores program data, and the processor 31 is configured to execute the program data to implement the model training method for an athletic activity recognition model implemented based on an FPGA according to the above embodiment.

In the embodiment of the present application, the processor 31 may also be referred to as a CPU (Central Processing Unit). The processor 31 may be an integrated voltage control system chip with signal processing capability. The processor 31 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 31 may be any conventional processor or the like.

Please refer to fig. 10, and fig. 10 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application, in which program data 41 is stored in the computer storage medium 400, and when the program data 41 is executed by a processor, the method for training a model of an athletic activity recognition model based on FPGA is implemented.

Embodiments of the present application may be implemented in software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the claims and the accompanying drawings, and the equivalents and equivalent structures and equivalent processes used in the present application and the accompanying drawings are also directly or indirectly applicable to other related technical fields and are all included in the scope of the present application.

Claims

1. A model training method for a motion activity recognition model based on FPGA is characterized by comprising the following steps:

2. Model training method according to claim 1,

the convolutional IP core includes a convolutional layer, a normalization layer, and an activation function.

3. The model training method according to claim 2,

the detailed operation process of the convolution IP core is as follows:

wherein the size of each output matrix is N-m + l, wherein,

4. The model training method according to claim 2,

the detailed operation process of the normalization layer is as follows:

where μ is the mean, σ, over one batch ² Is a standard deviation in batch, epsilon is a preset constant, gamma and beta are both learnable parameters, and during the training process, the parameters are learnt through gradient descent as the parameters of other convolution kernels.

5. Model training method according to claim 1,

the full-connection IP core is arranged at the output end of the convolutional neural network and used for realizing the final classified output of the convolutional neural network.

6. Model training method according to claim 1,

7. The model training method according to claim 6,

the testing the trained convolutional neural network by using the human activity recognition test set to obtain the performance evaluation information of the trained convolutional neural network comprises the following steps:

8. The model training method according to claim 7,

the types of the performance evaluation information comprise one or more index types of average accuracy, recall ratio, precision ratio and F1 value.

9. A terminal device, characterized in that the terminal device comprises a processor, a memory connected to the processor, wherein,

the memory stores program instructions;

the processor is configured to execute the program instructions stored in the memory to implement the model training method for an FPGA-based implemented athletic activity recognition model according to claims 1-8.

10. A computer-readable storage medium, characterized in that the storage medium stores program instructions that, when executed, implement the model training method for an FPGA-based implemented athletic activity recognition model according to claims 1-8.