CN113420848A - Neural network model training method and device and gesture recognition method and device - Google Patents

Neural network model training method and device and gesture recognition method and device Download PDF

Info

Publication number
CN113420848A
CN113420848A CN202110974865.8A CN202110974865A CN113420848A CN 113420848 A CN113420848 A CN 113420848A CN 202110974865 A CN202110974865 A CN 202110974865A CN 113420848 A CN113420848 A CN 113420848A
Authority
CN
China
Prior art keywords
gesture
neural network
network model
loss value
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110974865.8A
Other languages
Chinese (zh)
Inventor
钱程浩
黄雪峰
熊海飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinrun Fulian Digital Technology Co Ltd
Original Assignee
Shenzhen Xinrun Fulian Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinrun Fulian Digital Technology Co Ltd filed Critical Shenzhen Xinrun Fulian Digital Technology Co Ltd
Priority to CN202110974865.8A priority Critical patent/CN113420848A/en
Publication of CN113420848A publication Critical patent/CN113420848A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a training method and a device of a neural network model and a gesture recognition method and a device, wherein the training method of the neural network model comprises the following steps: inputting the sample image and the corresponding thermal image into a neural network model for feature extraction to obtain feature data; wherein the sample image includes a gesture therein, and the feature data includes at least one of: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram; obtaining a loss value between the characteristic data and corresponding original data; and updating the neural network model based on the loss value, and continuing training the updated neural network model until the loss value is smaller than a preset threshold value. Through the method and the device, the problem that the accuracy rate of gesture recognition is low in the prior art is solved.

Description

Neural network model training method and device and gesture recognition method and device
Technical Field
The present application relates to the technical field of neural network models, and in particular, to a training method and apparatus for a neural network model, and a gesture recognition method and apparatus.
Background
Gestures are a form of non-verbal communication that can be used in a number of areas such as communication between deaf-mutes, robotic control, Human-Computer Interaction (HCI), home automation, and medical applications. Gesture recognition has taken many different forms, mainly including:
1) and (3) template matching, namely matching the characteristic parameters of the gesture to be recognized with the pre-stored template characteristic parameters, and completing the recognition task by measuring the similarity between the two parameters. For example, the edge images of the gesture to be recognized and the template gesture are transformed to Euclidean distance space, the Hausdorff (Housdorff) distance of the gesture to be recognized and the template gesture is calculated or corrected, the similarity between the gesture to be recognized and the template gesture is represented by the distance value, and the template gesture corresponding to the minimum distance value is taken as the recognition result.
2) Statistical analysis, i.e. a classification method based on probability statistics theory that determines a classifier by counting sample feature vectors. Extracting fingertip and gravity center characteristics from each image, calculating a distance and an included angle, respectively counting the distance and the included angle for different gestures to obtain distributed digital characteristics, obtaining values for segmenting the distance and the included angle of different gestures according to Bayesian decision based on minimum error rate, and after obtaining a classifier, carrying out classification and identification on the acquired gesture images.
The following problems exist with the above-described approach to gesture recognition: 1) for the template matching mode, a large amount of manual design feature operation is needed, and under different environment backgrounds, the considered features are various, so that the engineering quantity is large, the system implementation is complex, and the gesture recognition rate is low; 2) for statistical analysis, although the feature set of different gesture class characteristics is allowed to be defined, a local optimal linear discriminator is estimated, and corresponding gesture classes are identified according to a large number of features extracted from a gesture image, the learning efficiency is not high, and along with the continuous increase of the sample size, the improvement of the algorithm identification rate is not obvious, so that the hand identification rate is low.
Disclosure of Invention
An embodiment of the application aims to provide a training method and device for a neural network model and a gesture recognition method and device, so as to solve the problem that in the prior art, the accuracy of gesture recognition is low. The specific technical scheme is as follows:
in a first aspect of the present application, there is provided a method for training a neural network model, including: inputting the sample image and the corresponding thermal image into a neural network model for feature extraction to obtain feature data; wherein the sample image includes a gesture therein, and the feature data includes at least one of: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram; obtaining a loss value between the characteristic data and corresponding original data; and updating the neural network model based on the loss value, and continuing training the updated neural network model until the loss value is smaller than a preset threshold value.
In a second aspect of the present application, there is provided a method for performing gesture recognition based on the neural network model in the training method in the first aspect, including: acquiring image data to be identified; wherein the image data comprises a gesture; inputting the image data to be identified into the neural network model to obtain an output result; wherein the output result is used for representing the recognition result of the gesture.
In a third aspect of the present application, there is provided a training apparatus for a neural network model, including: the first processing module is used for inputting the sample image and the corresponding thermal image into the neural network model for feature extraction to obtain feature data; wherein the sample image includes a gesture therein, and the feature data includes at least one of: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram; the first acquisition module is used for acquiring loss values between the characteristic data and the corresponding original data; and the training module is used for updating the neural network model based on the loss value and continuously training the updated neural network model until the loss value is smaller than a preset threshold value.
In a fourth aspect of the present application, there is provided an apparatus for performing gesture recognition based on the neural network model in the training apparatus in the third aspect, including: the second acquisition module is used for acquiring image data to be identified; wherein the image data comprises a gesture; the second processing module is used for inputting the image data to be identified into the neural network model to obtain an output result; wherein the output result is used for representing the recognition result of the gesture.
In a fifth aspect implemented by the present application, there is provided a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of any of the first aspects described above, or cause the computer to perform the method of any of the second aspects described above.
In a sixth aspect of an embodiment of the present application, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the first aspects described above, or cause the computer to perform the method of any one of the second aspects described above.
In the embodiment of the present application, at least one item is included in the feature data: the method comprises the steps of predicting gesture types, predicting gesture calibration frames, predicting gesture key points and predicting gesture thermodynamic diagrams, updating the neural network model through loss values between feature data and corresponding original data, enabling the neural network model to pay more attention to a hand during training, and reducing the situation that similar objects such as human faces are mistakenly recognized as gestures.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method for training a neural network model in an embodiment of the present application;
FIG. 2 is a schematic diagram of training a neural network model according to an embodiment of the present application;
FIG. 3 is a flow chart of a gesture recognition method in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an apparatus for training a neural network model according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a gesture recognition apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The embodiment of the application provides a training method of a neural network model, as shown in fig. 1, the method includes the following steps:
step 102, inputting the sample image and the corresponding thermal image into a neural network model for feature extraction to obtain feature data; wherein, the sample image comprises gestures, and the characteristic data comprises at least one of the following items: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram;
it should be noted that the gesture calibration box refers to a position area of the gesture in the image; the gesture key points generally include 21 points on the hand, such as points on finger joints, points on finger tips, and the like; in other application scenarios of the embodiment of the present application, there may be more or less than 21 points, which may be determined according to actual situations.
In addition, the gesture category refers to gesture gestures, such as an "ok" gesture, a "ye" gesture, a "eight character" gesture, and the like, and in addition, the gesture calibration box and the gesture key points in the embodiment of the present application are used for determining the gesture category. Gesture enthalpies refer to graphical representations of gesture areas that are intended for visitors in the form of special highlights.
104, acquiring a loss value between the characteristic data and the corresponding original data;
and 106, updating the neural network model based on the loss value, and continuing training the updated neural network model until the loss value is smaller than a preset threshold value.
It should be noted that the preset threshold in the embodiment of the present application may be set correspondingly according to an actual situation.
Through the above steps 102 to 106 of the embodiment of the present application, since the feature data includes at least one item: the method comprises the steps of predicting gesture types, predicting gesture calibration frames, predicting gesture key points and predicting gesture thermodynamic diagrams, updating the neural network model through loss values between feature data and corresponding original data, enabling the neural network model to pay more attention to a hand during training, and reducing the situation that similar objects such as human faces are mistakenly recognized as gestures.
In an optional implementation manner of the embodiment of the present application, the manner of obtaining the loss value between the feature data and the corresponding original data, which is referred to in the above step 104, includes at least one of the following:
1) acquiring a first loss value between the predicted gesture thermodynamic diagram and a thermodynamic diagram corresponding to the sample image;
in one example, the first Loss value is denoted as Loss heat, i.e., Loss heat refers to the difference between the predicted thermodynamic diagram and the original thermodynamic diagram corresponding to the sample image. For example, the pixel size of the original thermodynamic diagram is 128x128, the pixel size of the predicted thermodynamic diagram is 128x128, and the 128x128 pixel points have respective values, and the Loss heat can be obtained by subtracting the 128x128 point values of the predicted thermal diagram from the 128x128 point values of the original thermal diagram and then squaring the subtracted values.
2) Acquiring a second loss value between the predicted coordinates of the gesture key points and the coordinates of the gesture key points in the sample image;
in one example, the second Loss value is recorded as a Loss point, which is the difference between the predicted gesture keypoint and the original keypoint in the sample image. For example, there are 21, i.e., 21 (x, y) such coordinate pairs for the original keypoint. The predicted key points are also 21, and the coordinate pairs of the corresponding 21 key points are subtracted and squared to obtain the Loss points of the key points.
3) Acquiring a third loss value between the predicted gesture calibration frame and the gesture calibration frame in the sample image;
in one example, the third penalty value is noted as a Loss box, which refers to the difference between the calibration box of the predicted gesture location and the calibration box of the original gesture in the sample image. It should be noted that the data of the calibration frame may be represented by (x, y, w, h), where x, y are coordinates of the center point of the gesture, and w, h are the length and width of the calibration frame.
4) And acquiring a fourth loss value between the predicted gesture category and the gesture category in the sample image.
In one example, the fourth penalty value is noted as a Loss class, which may be determined based on a cross entropy penalty function.
Based on the loss values in 1) to 4), the method for updating the neural network model based on the loss values in step 106 according to the embodiment of the present application further includes: updating the neural network model based on a sum of at least one of: a first loss value, a second loss value, a third loss value, and a fourth loss value.
In an example, if the penalty value includes a first penalty value, a second penalty value, a third penalty value, and a fourth penalty value, the penalty value is a sum of the first penalty value, the second penalty value, the third penalty value, and the fourth penalty value. That is, the loss values include which loss values, and the result is the sum of the included loss values as the loss values for updating the neural network model.
In an optional implementation manner of the embodiment of the present application, regarding the manner of obtaining the loss value between the feature data and the corresponding original data, which is referred to in the above step 104, further may include:
step 11, determining a difference value between the characteristic data and the corresponding original data;
and step 12, squaring the difference value to obtain a loss value.
Through the above steps 11 and 12, in the embodiment of the present application, the loss value is obtained based on the square of the difference between the feature data and the corresponding original data, and by this way, the obtained loss value can be more accurate, that is, the recognition rate of the neural network model updated by the loss value to the gesture is more accurate.
In this embodiment of the application, in the case that the feature data is a predicted gesture thermodynamic diagram, for inputting the sample image and the corresponding thermodynamic image into the neural network model for feature extraction, the manner of obtaining the feature data further may include:
step 21, inputting the thermodynamic diagrams corresponding to the sample images into a neural network model, and reducing the thermodynamic diagram size corresponding to the sample images through a convolution layer in the neural network model;
and step 22, up-sampling the thermodynamic diagrams corresponding to the sample images with the reduced sizes to obtain predicted gesture thermodynamic diagrams.
In the embodiment of the application, the neural network model can focus more attention on the hand during training through the training thermodynamic diagram, so that the situation that similar objects such as human faces are mistakenly recognized as gestures is reduced, and the accuracy of gesture recognition is improved.
The following exemplifies the present application with reference to specific embodiments of the present application; the specific embodiment provides a gesture recognition method, fig. 2 is a schematic diagram of neural network model training in an embodiment of the present application, and based on fig. 2, the method includes the steps of:
step 201, an original image (sample image) containing the gesture and the generated thermodynamic diagram are sent to a convolutional neural network to obtain a feature layer.
In step 202, the resolution size of the thermodynamic diagram passing through the convolutional layer is reduced, so that the thermodynamic diagram is restored to the original resolution size through an up-sampling mode to obtain a predicted gesture thermodynamic diagram.
And step 203, in the training process, subtracting the thermodynamic diagram generated by the original gesture from the predicted thermodynamic diagram, and then squaring to obtain the MSE (mean square error) loss which is used as the loss heat of the predicted thermodynamic diagram task. And predicting the key points of the gestures and the coordinates of the positions of the gestures to obtain a loss point and a loss frame. And predicting the gesture class by adopting a common multi-class cross entropy loss function to obtain a loss class. Finally, the whole neural network model is updated by the Loss sums of the four tasks:
total Loss = Loss Hot + Loss Point + Loss Box + Loss class
The Loss heat refers to a difference between the predicted thermodynamic diagram and the original thermodynamic diagram. The Loss point is the difference between the predicted key point and the original key point. The Loss box is the difference between the calibration box that indicates the predicted gesture location and the calibration box of the original gesture. The Loss class refers to the difference of the predicted gesture classification.
In another embodiment of the present application, there is further provided a method for performing gesture recognition based on the neural network model in the training method in fig. 1, as shown in fig. 3, the method includes the steps of:
step 302, acquiring image data to be identified; wherein the image data comprises gestures;
step 304, inputting image data to be identified into a neural network model to obtain an output result; and the output result is used for representing the recognition result of the gesture.
It can be seen that the loss value due to updating the neural network model is obtained as a difference value including at least one of: the difference of the gesture categories, the difference of the gesture calibration boxes, the difference of the gesture key points and the difference of the gesture thermodynamic diagrams. In addition, the gesture key points describe the outline of the hand and the range of the hand positioned by the gesture calibration frame, so that the gesture can be recognized more accurately, namely, the trained neural network model improves the accuracy of gesture recognition.
Corresponding to fig. 1, the present application also provides a training apparatus for a neural network model, as shown in fig. 4, the apparatus includes:
the first processing module 42 is configured to input the sample image and the corresponding thermal image into the neural network model for feature extraction, so as to obtain feature data; wherein, the sample image comprises gestures, and the characteristic data comprises at least one of the following items: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram;
a first obtaining module 44, configured to obtain a loss value between the feature data and the corresponding original data;
and the training module 46 is configured to update the neural network model based on the loss value, and continue training the updated neural network model until the loss value is smaller than the preset threshold value.
By the device of the embodiment of the application, at least one item is included in the feature data: the method comprises the steps of predicting gesture types, predicting gesture calibration frames, predicting gesture key points and predicting gesture thermodynamic diagrams, updating the neural network model through loss values between feature data and corresponding original data, enabling the neural network model to pay more attention to a hand during training, and reducing the situation that similar objects such as human faces are mistakenly recognized as gestures.
Optionally, the first obtaining module 44 in this embodiment of the present application includes at least one of: the first acquisition unit is used for acquiring a first loss value between the predicted gesture thermodynamic diagram and the thermodynamic diagram corresponding to the sample image; the second obtaining unit is used for obtaining a second loss value between the predicted coordinates of the gesture key points and the coordinates of the gesture key points in the sample image; the third obtaining unit is used for obtaining a third loss value between the predicted gesture calibration frame and the gesture calibration frame in the sample image; and the fourth acquisition unit is used for acquiring a fourth loss value between the predicted gesture category and the gesture category in the sample image.
Optionally, the training module 46 in the embodiment of the present application further includes: an updating unit for updating the neural network model based on a sum of at least one of: a first loss value, a second loss value, a third loss value, and a fourth loss value.
Optionally, the first obtaining module in this embodiment of the present application includes: a determining unit for determining a difference between the feature data and the corresponding original data; and the first processing unit is used for squaring the difference value to obtain a loss value.
Optionally, in the case that the feature data is a predicted gesture thermodynamic diagram, the first processing module 42 in the embodiment of the present application further includes: the second processing unit is used for inputting the thermodynamic diagrams corresponding to the sample images into the neural network model and reducing the thermodynamic diagram size corresponding to the sample images through the convolution layer in the neural network model; and the third processing unit is used for up-sampling the thermodynamic diagrams corresponding to the sample images with the reduced sizes to obtain the predicted gesture thermodynamic diagrams.
Based on the foregoing fig. 4, an embodiment of the present application further provides an apparatus for performing gesture recognition based on the neural network model in the training method in fig. 4, as shown in fig. 5, the apparatus includes:
a second obtaining module 52, configured to obtain image data to be identified; wherein the image data comprises gestures;
the second processing module 54 is configured to input image data to be identified into the neural network model, so as to obtain an output result; and the output result is used for representing the recognition result of the gesture.
It can be seen that the loss value due to updating the neural network model is obtained as a difference value including at least one of: the difference of the gesture categories, the difference of the gesture calibration boxes, the difference of the gesture key points and the difference of the gesture thermodynamic diagrams. In addition, the gesture key points describe the outline of the hand and the range of the hand positioned by the gesture calibration frame, so that the gesture can be recognized more accurately, namely, the trained neural network model improves the accuracy of gesture recognition.
The embodiment of the present application further provides an electronic device, as shown in fig. 6, which includes a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the following method steps in fig. 1 or fig. 3 when executing the program stored in the memory 603.
In addition, the functions performed in the process of implementing the method steps in fig. 1 or fig. 3 are also similar, and are not described in detail herein.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, which has instructions stored therein, and when the instructions are executed on a computer, the instructions cause the computer to perform any one of the above-mentioned neural network model training methods or any one of the above-mentioned gesture recognition methods.
In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above described methods of training a neural network model or any of the above described methods of gesture recognition.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A training method of a neural network model is characterized by comprising the following steps:
inputting the sample image and the corresponding thermal image into a neural network model for feature extraction to obtain feature data; wherein the sample image includes a gesture therein, and the feature data includes at least one of: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram;
obtaining a loss value between the characteristic data and corresponding original data;
and updating the neural network model based on the loss value, and continuing training the updated neural network model until the loss value is smaller than a preset threshold value.
2. The method of claim 1, wherein the obtaining a loss value between the feature data and corresponding raw data comprises at least one of:
obtaining a first loss value between the predicted gesture thermodynamic diagram and a thermodynamic diagram corresponding to the sample image;
acquiring a second loss value between the predicted coordinates of the gesture key points and the coordinates of the gesture key points in the sample image;
obtaining a third loss value between the predicted gesture calibration frame and a gesture calibration frame in the sample image;
obtaining a fourth loss value between the predicted gesture category and the gesture category in the sample image.
3. The method of claim 2, wherein said updating the neural network model based on the loss value comprises:
updating the neural network model based on a sum of at least one of: the first loss value, the second loss value, the third loss value, and the fourth loss value.
4. The method of claim 1, wherein the obtaining the loss value between the feature data and the corresponding raw data comprises:
determining a difference between the feature data and corresponding raw data;
and squaring the difference to obtain the loss value.
5. The method of claim 1, wherein in the case that the feature data is a predicted gesture thermodynamic diagram, inputting the sample image and the corresponding thermodynamic image into a neural network model for feature extraction, and obtaining the feature data comprises:
inputting thermodynamic diagrams corresponding to the sample images into the neural network model, and reducing the thermodynamic diagrams corresponding to the sample images through convolution layers in the neural network model;
and upsampling the thermodynamic diagram corresponding to the sample image with the reduced size to obtain the predicted gesture thermodynamic diagram.
6. A method for gesture recognition based on the neural network model in the training method of any one of claims 1 to 5, comprising:
acquiring image data to be identified; wherein the image data comprises a gesture;
inputting the image data to be identified into the neural network model to obtain an output result; wherein the output result is used for representing the recognition result of the gesture.
7. An apparatus for training a neural network model, comprising:
the first processing module is used for inputting the sample image and the corresponding thermal image into the neural network model for feature extraction to obtain feature data; wherein the sample image includes a gesture therein, and the feature data includes at least one of: a predicted gesture category, a predicted gesture scaling box, a predicted gesture key point, a predicted gesture thermodynamic diagram;
the first acquisition module is used for acquiring loss values between the characteristic data and the corresponding original data;
and the training module is used for updating the neural network model based on the loss value and continuously training the updated neural network model until the loss value is smaller than a preset threshold value.
8. An apparatus for performing gesture recognition based on the neural network model in the training apparatus of claim 7, comprising:
the second acquisition module is used for acquiring image data to be identified; wherein the image data comprises a gesture;
the second processing module is used for inputting the image data to be identified into the neural network model to obtain an output result; wherein the output result is used for representing the recognition result of the gesture.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 5 or the method steps of claim 6 when executing a program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 5 and carries out the method steps of claim 6.
CN202110974865.8A 2021-08-24 2021-08-24 Neural network model training method and device and gesture recognition method and device Pending CN113420848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110974865.8A CN113420848A (en) 2021-08-24 2021-08-24 Neural network model training method and device and gesture recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110974865.8A CN113420848A (en) 2021-08-24 2021-08-24 Neural network model training method and device and gesture recognition method and device

Publications (1)

Publication Number Publication Date
CN113420848A true CN113420848A (en) 2021-09-21

Family

ID=77719218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110974865.8A Pending CN113420848A (en) 2021-08-24 2021-08-24 Neural network model training method and device and gesture recognition method and device

Country Status (1)

Country Link
CN (1) CN113420848A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780478A (en) * 2021-10-26 2021-12-10 平安科技(深圳)有限公司 Activity classification model training method, classification method, apparatus, device and medium
CN114035687A (en) * 2021-11-12 2022-02-11 郑州大学 Gesture recognition method and system based on virtual reality
WO2024007938A1 (en) * 2022-07-04 2024-01-11 北京字跳网络技术有限公司 Multi-task prediction method and apparatus, electronic device, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464904A (en) * 2020-12-15 2021-03-09 北京乐学帮网络技术有限公司 Classroom behavior analysis method and device, electronic equipment and storage medium
CN112699889A (en) * 2021-01-07 2021-04-23 浙江科技学院 Unmanned real-time road scene semantic segmentation method based on multi-task supervision
CN112699837A (en) * 2021-01-13 2021-04-23 新大陆数字技术股份有限公司 Gesture recognition method and device based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464904A (en) * 2020-12-15 2021-03-09 北京乐学帮网络技术有限公司 Classroom behavior analysis method and device, electronic equipment and storage medium
CN112699889A (en) * 2021-01-07 2021-04-23 浙江科技学院 Unmanned real-time road scene semantic segmentation method based on multi-task supervision
CN112699837A (en) * 2021-01-13 2021-04-23 新大陆数字技术股份有限公司 Gesture recognition method and device based on deep learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780478A (en) * 2021-10-26 2021-12-10 平安科技(深圳)有限公司 Activity classification model training method, classification method, apparatus, device and medium
CN113780478B (en) * 2021-10-26 2024-05-28 平安科技(深圳)有限公司 Activity classification model training method, classification method, device, equipment and medium
CN114035687A (en) * 2021-11-12 2022-02-11 郑州大学 Gesture recognition method and system based on virtual reality
CN114035687B (en) * 2021-11-12 2023-07-25 郑州大学 Gesture recognition method and system based on virtual reality
WO2024007938A1 (en) * 2022-07-04 2024-01-11 北京字跳网络技术有限公司 Multi-task prediction method and apparatus, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN113420848A (en) Neural network model training method and device and gesture recognition method and device
WO2017152794A1 (en) Method and device for target tracking
US8542912B2 (en) Determining the uniqueness of a model for machine vision
US9971954B2 (en) Apparatus and method for producing image processing filter
CN111797078A (en) Data cleaning method, model training method, device, storage medium and equipment
JP6352512B1 (en) Signal processing apparatus, signal processing method, signal processing program, and data structure
CN107272899B (en) VR (virtual reality) interaction method and device based on dynamic gestures and electronic equipment
CN113792853B (en) Training method of character generation model, character generation method, device and equipment
CN114973300B (en) Component type identification method and device, electronic equipment and storage medium
US8542905B2 (en) Determining the uniqueness of a model for machine vision
JP2019086979A (en) Information processing device, information processing method, and program
CN114595352A (en) Image identification method and device, electronic equipment and readable storage medium
CN113312969A (en) Part identification and positioning method and system based on three-dimensional vision
CN111353514A (en) Model training method, image recognition method, device and terminal equipment
WO2024093665A1 (en) Identity recognition image processing method and apparatus, computer device, and storage medium
CN114139630A (en) Gesture recognition method and device, storage medium and electronic equipment
CN112465805A (en) Neural network training method for quality detection of steel bar stamping and bending
JP2019194788A (en) Learning device, recognition device, learning method and computer program
CN111583159A (en) Image completion method and device and electronic equipment
CN115909356A (en) Method and device for determining paragraph of digital document, electronic equipment and storage medium
US20230401809A1 (en) Image data augmentation device and method
CN112396057A (en) Character recognition method and device and electronic equipment
CN113033542B (en) Method and device for generating text recognition model
CN108776972A (en) A kind of method for tracing object and device
CN111640076B (en) Image complement method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210921