CN114333068A - Training method and training device - Google Patents

Training method and training device Download PDF

Info

Publication number
CN114333068A
CN114333068A CN202111680833.3A CN202111680833A CN114333068A CN 114333068 A CN114333068 A CN 114333068A CN 202111680833 A CN202111680833 A CN 202111680833A CN 114333068 A CN114333068 A CN 114333068A
Authority
CN
China
Prior art keywords
tracking
training
result
determining
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111680833.3A
Other languages
Chinese (zh)
Inventor
王瑶
张珏
程和平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Jingruikang Molecular Medicine Technology Co ltd
Original Assignee
Nanjing Jingruikang Molecular Medicine Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Jingruikang Molecular Medicine Technology Co ltd filed Critical Nanjing Jingruikang Molecular Medicine Technology Co ltd
Priority to CN202111680833.3A priority Critical patent/CN114333068A/en
Publication of CN114333068A publication Critical patent/CN114333068A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a training method and a training device. The method comprises the following steps: acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture; and training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images. When the neural network model trained according to the method is used for recognizing the postures of rodents such as mice and the like, the phenomenon that the recognition result shakes in the time domain can be effectively avoided.

Description

Training method and training device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a training method and a training device.
Background
Gesture recognition refers to recognition and/or extraction of actions and/or key points of a living being in an image or video using a neural network model.
In the prior art, when identifying key points in an image sequence of rodent motion, the key points in a single image or a single image frame are usually extracted, and the key points can be joints of the rodent and the like. The recognition result obtained in this way is easy to shake, so that the motion track of the key point is not smooth enough.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a training method and a training apparatus to improve accuracy and recognition efficiency of a neural network model in rodent posture recognition, and ensure continuity of a recognition result in a time domain.
In a first aspect, a training method is provided, the method including: acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture; and training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
Optionally, the neural network model comprises a HRNet network.
Optionally, before the training of the neural network model, the training method further includes: and determining the time constraint item according to the error between the position of the key point acquired by using a tracking method and the position of the key point in the identification result.
Optionally, the tracking method comprises a Lucas-Kanade optical flow method.
Optionally, the determining the time constraint term according to an error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result includes: selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2; identifying the tracking sample
Figure BDA0003446488120000021
The first frame in (1)
Figure BDA0003446488120000022
As initial frame, proceeding forward tracking to obtain forward tracking result
Figure BDA0003446488120000023
Determining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:
Figure BDA0003446488120000024
wherein ω is the number of key points on each image frame; the last frame in the identification result of the tracking sample
Figure BDA0003446488120000025
As the termination frame, carrying out backward tracking to obtain the backward tracking result
Figure BDA0003446488120000026
Determining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Figure BDA0003446488120000027
Identifying the first frame in the identification result of the tracking sample
Figure BDA0003446488120000028
As an initial frame
Figure BDA0003446488120000029
Carrying out forward tracking to obtain a forward tracking result
Figure BDA00034464881200000210
Tracking the result in the forward direction
Figure BDA00034464881200000211
As initial frame, performing backward tracking to obtain second backward tracking result
Figure BDA00034464881200000212
The second backward tracking result and the initial frame
Figure BDA00034464881200000213
By a difference of
Figure BDA00034464881200000214
Determining the time constraint term as:
Figure BDA00034464881200000215
wherein E1Is a threshold value.
Optionally, the loss function further includes a spatial constraint term for defining the location of keypoints in the pose of the rodent in the same frame of image.
Optionally, before the training of the neural network model, the training method further includes: and determining the space constraint term according to the difference value between the positions of the plurality of key points in the identification result.
Optionally, the determining the spatial constraint term according to a difference between positions of a plurality of key points in the recognition result includes: from the training sampleSelecting p samples, wherein p is a positive integer greater than or equal to 2; determining recognition results of the p samples
Figure BDA00034464881200000216
Distance between two key points
Figure BDA00034464881200000217
Said distance
Figure BDA00034464881200000218
Determining said distance in accordance with a Gaussian distribution
Figure BDA00034464881200000219
Mean μ and variance σ of2Wherein ω is the number of key points on each image frame; determining the spatial constraint term as:
Figure BDA0003446488120000031
optionally, the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the pose of the rodent.
Optionally, the error loss term is a mean square error loss term.
In a second aspect, there is provided an exercise device comprising: the acquisition module is used for acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; the input module is used for inputting the training sample into a neural network model to obtain a recognition result of the posture of the rodent, wherein the recognition result comprises key points in the posture of the rodent; and the training module is used for training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
Optionally, the neural network model comprises a HRNet network.
Optionally, before the training of the neural network model, the training apparatus further includes: and the first determining module is used for determining the time constraint item according to the error between the position of the key point acquired by using the tracking method and the position of the key point in the identification result.
Optionally, the tracking method comprises a Lucas-Kanade optical flow method.
Optionally, the first determining module is configured to: selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2; identifying the tracking sample
Figure BDA0003446488120000032
The first frame in (1)
Figure BDA0003446488120000033
As initial frame, proceeding forward tracking to obtain forward tracking result
Figure BDA0003446488120000034
Determining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:
Figure BDA0003446488120000035
wherein ω is the number of key points on each image frame; the last frame in the identification result of the tracking sample
Figure BDA0003446488120000036
As the termination frame, carrying out backward tracking to obtain the backward tracking result
Figure BDA0003446488120000037
Determining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Figure BDA0003446488120000041
Identifying the first frame in the identification result of the tracking sample
Figure BDA0003446488120000042
As an initial frame
Figure BDA0003446488120000043
Carrying out forward tracking to obtain a forward tracking result
Figure BDA0003446488120000044
Tracking the result in the forward direction
Figure BDA0003446488120000045
As initial frame, performing backward tracking to obtain second backward tracking result
Figure BDA0003446488120000046
The second backward tracking result and the initial frame
Figure BDA0003446488120000047
By a difference of
Figure BDA0003446488120000048
Determining the time constraint term as:
Figure BDA0003446488120000049
wherein E1Is a threshold value.
Optionally, the loss function further includes a spatial constraint term for defining the location of keypoints in the pose of the rodent in the same frame of image.
Optionally, the training device further comprises: and the second determining module is used for determining the space constraint item according to the difference value between the positions of the plurality of key points in the identification result.
Optionally, the second determining module is configured to: selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2; determining recognition results of the p samples
Figure BDA00034464881200000410
Distance between two key points
Figure BDA00034464881200000411
Said distance
Figure BDA00034464881200000412
Determining said distance in accordance with a Gaussian distribution
Figure BDA00034464881200000413
Mean μ and variance σ of2Wherein ω is the number of key points on each image frame; determining the spatial constraint term as:
Figure BDA00034464881200000414
optionally, the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the pose of the rodent.
Optionally, the error loss term is a mean square error loss term.
According to the method and the device, time constraint is introduced in the training process of the neural network model, so that the neural network model can have higher accuracy rate when processing shielding and fuzzy images, and meanwhile, the jitter phenomenon of the recognition result of the neural network model on the time domain can be effectively inhibited.
Drawings
Fig. 1 is a schematic flowchart of a training method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a method for determining a time constraint term according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of a method for determining a spatial constraint term according to an embodiment of the present application.
Fig. 4 is a schematic flow chart of a method for determining an error constraint term according to an embodiment of the present application.
Fig. 5 is a schematic block diagram of a training apparatus according to an embodiment of the present application.
Fig. 6 is a schematic block diagram of a training device according to another embodiment of the present application.
Fig. 7 is a schematic block diagram of an application scenario according to an embodiment of the present application.
Detailed Description
The method and the device in the embodiment of the application can be applied to various scenes based on the gesture recognition of the rodent in the image sequence. The image sequence may be a plurality of image frames in a video. The plurality of image frames may be a plurality of image frames in succession in the video. The image sequence may also be a plurality of images of the animal captured by an image capturing device such as a camera. The rodent may be, for example, a mouse.
To facilitate an understanding of the embodiments of the present application, the background of the present application is first illustrated in detail.
The behavior of biological neurons is closely related to the activity of animals, and changes in the posture of animals usually cause corresponding changes in the neurons. Therefore, the exploration of the connection and interaction pattern of complex networks of neurons under specific behaviors is very important for the fields of neuroscience and medicine. In the field, a quantitative analysis method is generally adopted, namely, the corresponding relation of the posture information of the animal and the behavior of the neuron is determined by acquiring the posture information of the animal and the behavior of the neuron.
The behavior of the animal neurons can be acquired by means of ray scanning, a miniaturized multi-photon microscope and the like.
There are various methods for obtaining posture information of an animal. For example, pose information of an animal can be obtained by manually labeling key points in the image sequence. However, for massive data, the efficiency of manual processing is low, errors are prone to occur, and the accuracy of the obtained posture information cannot be guaranteed.
For another example, a marker (e.g., a displacement or acceleration sensor) may be placed at a key point of the animal body, and the posture change of the animal may be determined from the change in information such as the position of the marker. However, in rodents, due to their small size, the placement of markers interferes with their natural behavior, resulting in less accurate data being collected.
As another example, an animal in space may be positioned with a depth camera to obtain pose information thereof. However, this method is sensitive to imaging conditions and scene changes and is not suitable for all situations.
With the development of the field of artificial intelligence, the animal posture recognition method based on the neural network is gradually replacing the traditional technology. The motion rule of key points of rodents in an image sequence along with time is usually not considered in the training of the current neural network model. These neural network models have the following problems in the gesture recognition process:
in recognizing animal poses in a sequence of images, neural network models typically perform pose recognition based on each frame of image itself. For example, the image sequence to be recognized includes a first frame image and a second frame image in chronological order. And the neural network model identifies the animal posture in the first frame image according to the image of the first frame to obtain a first posture identification result corresponding to the first frame image. And recognizing the animal posture in the second frame image according to the image of the second frame to obtain a second posture recognition result corresponding to the second frame image. By adopting the method for directly identifying the animal posture by using the current frame image, the accuracy of the obtained identification result is low and the identification result is not smooth enough in time. In addition, when the image frames in the acquired image sequence have the condition of blurring or being blocked, for example, when the tail of a rodent is curled or blocked, the accuracy of the position information of the key point output by the neural network model is low.
In addition, the existing neural network model is usually trained by using a back propagation algorithm based on the error between the recognition result and the manual labeling result to construct a loss function. The neural network model does not consider the continuous change of key points in a time domain during training, so that the problem of low accuracy rate can occur when rodent posture recognition is carried out. On the other hand, training the neural network model by using the error construction loss function of the recognition result and the manual labeling result usually makes the initial training process slow.
In view of the foregoing problems, embodiments of the present application provide a training method and a training apparatus. According to the method provided by the embodiment of the application, the time constraint is introduced in the training process of the neural network model, so that the jitter phenomenon of the recognition result of the neural network model on the time domain is effectively inhibited.
The training method provided by the embodiment of the present application is described in detail below with reference to fig. 1 to 4. Fig. 1 is a schematic flow chart of a training method provided in an embodiment of the present application. The training method shown in FIG. 1 may include steps S11-S13.
In step S11, a training sample is obtained.
In one embodiment of the present application, the training sample may include a sequence of images recording rodent movements and the results of the tagging. It is understood that the marking result may include position information of a preset number of key points of the rodent body. For example, the key points may be various joints and key parts of the body, such as joints on the limbs of the mouse, tail, eyes, nose, ears, and the like. The location information may be coordinate information of the key point.
The embodiment of the present application does not limit the manner of obtaining the pre-labeled result. For example, manual labeling may be used to label image frames in an image sequence on a frame-by-frame basis. Other methods with higher confidence may also be used for annotation as possible implementations.
The way of obtaining the training sample may be many, and the embodiment of the present application is not limited to this. For example, as one implementation, the image sequence may be directly acquired by, for example, an image acquisition device (e.g., a camera, a medical imaging device, a lidar, etc.), and may include a plurality of images of rodents arranged in a time sequence. As another example, the training samples may be obtained from a server (e.g., a local server or a cloud server, etc.). Alternatively, training samples may also be obtained on the network or other content platforms, for example, open-source training data sets such as MSCOCO data sets, MPII data sets, and posetrack data sets may be used; alternatively, it may be a locally pre-stored image sequence.
And step S12, inputting the training sample obtained in the step S11 into a neural network model, and obtaining a recognition result of the posture of the rodent.
The embodiment of the present application does not specifically limit the neural network model, and any neural network model capable of realizing the gesture recognition described in the present application may be used. For example, the neural network model can be a 2D convolutional neural network such as VGG, ResNet, HRNet, etc.
Alternatively, HRNet (high resolution Network) can maintain high resolution all the time when performing feature extraction, and can perform poor fusion of features of different resolutions in the feature extraction process. The method is particularly suitable for being applied to scenes such as semantic segmentation, human body posture, image classification, facial marker detection, general target recognition and the like.
The recognition result may include location information (also referred to as recognition location) of a preset number of rodent body key points recognized by the neural network model.
And step S13, training the neural network model by using a gradient descent method according to the recognition result in the step S12 and by using a loss function.
Wherein the loss function may include a time constraint term Ltemporal
Time constraint term L is described below in conjunction with FIG. 2temporalThe determination method of (2) is described in detail. Referring to fig. 2, fig. 2 illustrates a method for determining a time constraint term.
Time constraint term LtemporalMay be used to constrain the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
In some embodiments, the time constraint term LtemporalIt may be determined according to an error between the position information of the key point acquired by the tracking method and the position information of the key node in the recognition result.
In the training method provided in the embodiment of the present application, the tracking method may be an unsupervised tracking method, for example, a Lucas-Kanade optical flow method.
The method shown in FIG. 2 may include steps S1311-S1315.
In step S1311, m images are selected from the training samples as tracking samples.
The m images are any m images in the training sample. The m images may be consecutive m images in the training sample. It is understood that the m images may also be all images in the training sample.
Step S1312, using a first image frame of the m images in the training sample as an initial frame, performing forward tracking using the identification result of the initial frame to obtain a first forward tracking result, and determining a first difference between the first forward tracking result and the identification result of the mth image frame, where the first forward tracking result includes a tracking position of a keypoint in the mth image frame. Wherein m is a positive integer greater than or equal to 2. In other words, the first difference value may be a difference value between the tracking position and the recognition position of the same keypoint in the mth image frame.
For convenience of description, a set of m images will be referred to as I hereinafter1,i(I-1, 2, …, m), and grouping the I1,iIs recorded as the result of recognition
Figure BDA0003446488120000081
(i ═ 1,2 …, m;. omega.: 1,2 …, h), where omega is the number of keypoints in each image frame.
Taking the first frame of the m images as the initial frame, and utilizing the identification result of the initial frame
Figure BDA0003446488120000091
Performing forward tracking to obtain a first forward tracking result
Figure BDA0003446488120000092
Determining a first forward tracking result and a set I1,iIdentification result of the m-th frame in (1)
Figure BDA0003446488120000093
Difference value F between1Comprises the following steps:
Figure BDA0003446488120000094
step S1313, taking the mth image frame of the m images as a termination frame, and performing backward tracking by using the identification result of the termination frame to obtain a first backward tracking result, where the first backward tracking result includes a tracking position of a keypoint in the first image frame. It is to be understood that the mth image may also be referred to as the last image frame of the m images. A second difference between the first back tracking result and the recognition result of the first image frame is determined. In other words, the second difference may be a difference between the tracking position and the identified position of the same keypoint in the first image frame.
Using the last frame of the m images as the termination frame, and using the recognition result of the termination frame
Figure BDA0003446488120000095
Carrying out backward tracking to obtain a first backward tracking result
Figure BDA0003446488120000096
Determining a first back tracking result
Figure BDA0003446488120000097
And set I1,iThe result of recognition of the first frame in (1)
Figure BDA0003446488120000098
Difference value F between2Comprises the following steps:
Figure BDA0003446488120000099
in step S1314, the first frame of the m images is taken as the initial frame
Figure BDA00034464881200000910
Using the recognition result of the initial frame
Figure BDA00034464881200000911
Performing forward tracking to obtain a first forward tracking result
Figure BDA00034464881200000912
Then, the first forward tracking result is used
Figure BDA00034464881200000913
As a termination frame, performing backward tracking, and determining a second backward tracking result
Figure BDA00034464881200000914
Determining a second back tracking result
Figure BDA00034464881200000915
And initial frame
Figure BDA00034464881200000916
Difference F of3Comprises the following steps:
Figure BDA00034464881200000917
in step S1315, time constraint terms are determined.
When the first difference and the second difference are both smaller than or equal to a preset threshold, determining that the time constraint term is 0; when the first difference value and/or the second difference value is/are larger than the preset threshold value, determining that the time constraint item is a second back tracking result
Figure BDA00034464881200000918
And initial frame
Figure BDA00034464881200000919
Difference F of3. I.e. the time constraint term is
Figure BDA0003446488120000101
Wherein E1Is a preset threshold value which is related to the movement characteristic of the living being. It should be noted that, compared with the prediction result of the neural network model, the tracking result obtained by using the tracking method can ensure that the tracking position of the same key point smoothly changes in the time domain. Therefore, when the difference (e.g., the first difference or the second difference) is smaller than the preset threshold, it indicates that the recognition result is close to the tracking result, and the recognition result of the neural network model is relatively smooth in the time domain, and the time constraint term may not be set at this time. And when the difference value is larger than the preset threshold value, the difference between the identification result and the tracking result is larger. That is, the recognition result is jittered in the time domain. At this time, the neural network model can be trained by setting a time constraint item, so that the recognition result output by the neural network model is smoother.
The method for determining the time constraint item is not particularly limited in the embodiments of the present application. For example, the first difference may be used as a time constraint term. For another example, the second difference may be used as a time constraint term. For another example, the first forward tracking result may be tracked backward to obtain a second backward tracking result; and determining a time constraint term according to the difference value between the second back tracking result and the identification result of the first image frame. For another example, the first backward tracking result may be subjected to forward tracking to obtain a second forward tracking result; a time constraint term is determined based on a difference between the second forward tracking result and the recognition result of the first image frame.
In some embodiments, the loss function further includes a spatial constraint term for defining the location of keypoints in the rodent pose in the same frame image.
Referring to fig. 3, fig. 3 illustrates a method of determining a spatial constraint term.
Spatial constraint term LspaticalCan be used to define the location of key points in the pose of the living being in the same image frame. In some embodiments, the spatial constraint term LspaticalCan be based on multiple keys in the recognition resultThe difference between the positions of the points.
Determining a spatial constraint term L provided in an embodiment of the present applicationspaticalMay include steps S1321-S1322.
Step S1321, selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2;
the p images are any p images in the training sample. The p images may be consecutive p images in the training sample. It is understood that the p images may also be all images in the training sample. Wherein p is a positive integer greater than or equal to 2.
For convenience of description, a set of p images will be referred to as I hereinafter2,j(j ═ 1,2, …, p), and grouping I2,jIs recorded as the result of recognition
Figure BDA0003446488120000111
(i ═ 1,2 …, p ═ 1,2 …, h), where ω is the number of keypoints in each image frame.
Step S1322 is to determine a distance between two key points in the same image in the training sample.
Determining recognition results of the p samples
Figure BDA0003446488120000112
(j 1,2, …, p, ω 1,2, … h) of the distance between two key points
Figure BDA0003446488120000113
(j=1,2,…,p,ω=1,2,…h)
In step S1323, a spatial constraint term is determined.
The above distance
Figure BDA0003446488120000114
Determining said distance in accordance with a Gaussian distribution
Figure BDA0003446488120000115
Mean μ and variance σ of2Where ω is the key on each image frameThe number of points;
determining the spatial constraint term as:
Figure BDA0003446488120000116
it should be noted that the definite space constraint term L provided in the above steps S1321-S1323spaticalThe method of (d) is merely an example and may be determined in other ways. For example, the spatial constraint term may also be determined based on an error between a distance between every two key points in the recognition result and a distance between every two key points in the corresponding labeling result, which is not limited in the present application.
In some embodiments, the loss function may further include an error constraint term LMSE. In some embodiments, the error constraint term LMSEThe method can be determined according to the error of the position information of the same key point in the identification result and the labeling result of the training sample. Referring to FIG. 4, taking the mean square error as an example, determining the error constraint term may include steps S1331-S1333.
Step S1331, selecting n images from the training samples obtained in step S11 to form a sample set I3,k(k ═ 1,2, …, n), where n is a positive integer greater than or equal to 1.
The n images are any n images in the training sample. The n images may be consecutive n images in the training sample. It is understood that the n images may also be all images in the training sample.
Step S1332, determining a sample set I3,kIs recognized as a result
Figure BDA0003446488120000121
(k-1, 2 …, n; ω -1, 2 …, h) and labeling the results
Figure BDA0003446488120000122
(k=1,2…,n;ω=1,2…,h)。
Step S1333, calculating the recognition result
Figure BDA0003446488120000123
And annotating the results
Figure BDA0003446488120000124
Determining an error loss term as:
Figure BDA0003446488120000125
for the error loss term, in addition to the mean square error loss, cross entropy loss, 0-1 loss, absolute value loss, etc. commonly used in the art may be employed. The methods shown in steps S1331-S1333 are only examples and do not limit the scope of the present application.
In some embodiments, the aforementioned error constraint term L may also be usedMSETime constraint term LtemporalAnd a spatial constraint term LspaticalThe sum is weighted to determine the loss function. That is, the loss function L is LMSE+aLtemporal+bLspaticalWherein a and b are hyper-parameters, and the value of the hyper-parameters is greater than or equal to 0.
An embodiment of the exercise device provided by the present application is described in detail below in conjunction with fig. 5. It is to be understood that the apparatus embodiments correspond to the description of the method embodiments described above. Therefore, reference is made to the preceding method embodiments for parts not described in detail.
Fig. 5 is a schematic block diagram of a training device 50 provided in one embodiment of the present application. It should be understood that the apparatus 50 shown in fig. 5 is merely an example, and the apparatus 50 of an embodiment of the present invention may also include other modules or units.
It should be understood that the apparatus 50 is capable of performing various steps in the methods of fig. 1-4, and will not be described here again to avoid repetition.
As a possible implementation, the apparatus includes:
and an obtaining module 51, configured to obtain a training sample.
The training samples and the obtaining method thereof may be the same as step S11 of the foregoing method, and are not described herein again.
And the input module 52 is configured to input the training sample into a neural network model, so as to obtain a recognition result of the posture of the rodent, where the recognition result includes a key point in the posture of the rodent.
And the training module 53 is configured to train the neural network model by using a gradient descent method according to the recognition result of the posture of the rodent and using a loss function.
Optionally, the neural network model comprises a HRNet network.
Optionally, before the training of the neural network model, the training apparatus further includes: and the first determining module is used for determining the time constraint item according to the error between the position of the key point acquired by using the tracking method and the position of the key point in the identification result.
Optionally, the tracking method comprises a Lucas-Kanade optical flow method.
Optionally, the first determining module is configured to: selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2; identifying the tracking sample
Figure BDA0003446488120000131
First frame in (i ═ 1.2 … m, ω ═ 1,2, …, h)
Figure BDA0003446488120000132
As initial frame, proceeding forward tracking to obtain forward tracking result
Figure BDA0003446488120000133
Determining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:
Figure BDA0003446488120000134
wherein ω is the number of key points on each image frame; the last frame in the identification result of the tracking sample
Figure BDA0003446488120000135
As the termination frame, carrying out backward tracking to obtain a backward tracking nodeFruit
Figure BDA0003446488120000136
Determining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Figure BDA0003446488120000137
Identifying the first frame in the identification result of the tracking sample
Figure BDA0003446488120000138
As an initial frame
Figure BDA0003446488120000139
Carrying out forward tracking to obtain a forward tracking result
Figure BDA00034464881200001310
Tracking the result in the forward direction
Figure BDA00034464881200001311
As initial frame, performing backward tracking to obtain second backward tracking result
Figure BDA00034464881200001312
The second backward tracking result and the initial frame
Figure BDA00034464881200001313
By a difference of
Figure BDA00034464881200001314
Determining the time constraint term as:
Figure BDA00034464881200001315
wherein E1Is a threshold value.
Optionally, the loss function further includes a spatial constraint term for defining the location of keypoints in the pose of the rodent in the same frame of image.
Optionally, the training device further comprises: and the second determining module is used for determining the space constraint item according to the difference value between the positions of the plurality of key points in the identification result.
Optionally, the second determining module is configured to: selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2; determining recognition results of the p samples
Figure BDA0003446488120000141
(j 1,2, …, p, ω 1,2, … h) of the distance between two key points
Figure BDA0003446488120000142
(j-1, 2, …, p, ω -1, 2, … h), said distance
Figure BDA0003446488120000143
Determining said distance in accordance with a Gaussian distribution
Figure BDA0003446488120000144
Mean μ and variance σ of2Wherein ω is the number of key points on each image frame; determining the spatial constraint term as:
Figure BDA0003446488120000145
optionally, the loss function further comprises an error constraint term for constraining errors in the recognition results and the annotation results for the keypoints in the pose of the living being.
Optionally, the error loss term is a mean square error loss term.
It should be appreciated that the apparatus 50 for training a neural network model herein is embodied in the form of a functional module. The term "module" herein may be implemented in software and/or hardware, and is not particularly limited thereto. For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the functionality described above. The hardware circuitry may include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared processor, a dedicated processor, or a group of processors) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.
As an example, the apparatus 50 for training a neural network model provided in the embodiment of the present invention may be a processor or a chip, so as to perform the method described in the embodiment of the present invention.
Fig. 6 is a schematic block diagram of a training device 60 provided in another embodiment of the present application. The apparatus 60 shown in fig. 6 comprises a memory 61, a processor 62, a communication interface 63 and a bus 64. The memory 61, the processor 62 and the communication interface 63 are connected to each other through a bus 64.
The memory 61 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 61 may store a program, and when the program stored in the memory 61 is executed by the processor 62, the processor 62 is configured to perform the steps of the training method provided by the embodiment of the present invention, for example, the steps of the embodiments shown in fig. 1 to 4 may be performed.
The processor 62 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the training method of the embodiment of the present invention.
The processor 62 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the training method provided by the embodiment of the present invention may be implemented by integrated logic circuits of hardware in the processor 62 or instructions in the form of software.
The processor 62 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 61, and the processor 62 reads the information in the memory 61, and performs the functions required to be performed by the units included in the gesture recognition apparatus according to the embodiment of the present invention, or performs the training method according to the embodiment of the method according to the present invention, in combination with the hardware thereof. For example, various steps/functions of the embodiments shown in fig. 1-4 may be performed.
Communication interface 63 may enable communication between apparatus 60 and other devices or communication networks using, but not limited to, transceiver devices.
Bus 64 may include a path that conveys information between various components of apparatus 60 (e.g., memory 61, processor 62, communication interface 63).
It should be understood that the apparatus 60 shown in the embodiments of the present invention may be a processor or a chip for performing the methods described in the embodiments of the present invention.
It should be understood that the processor in the embodiments of the present invention may be a Central Processing Unit (CPU), and the processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Specific applications of the embodiment of the present application are described below with reference to the application scenario of fig. 7. It should be noted that the following description about fig. 7 is only an example and is not limited thereto, and the method in the embodiment of the present application is not limited thereto, and may also be applied to other scenarios of gesture recognition.
The application scenario in fig. 7 may include an image acquisition device 71 and an image processing device 72.
Wherein the image acquisition device 71 can be used to acquire a sequence of images of a rodent. The image processing apparatus 72 may be integrated into an electronic device, which may be a server or a terminal, and the present embodiment is not limited thereto. For example, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud computing, cloud storage, cloud communication, big data and artificial intelligence platforms. The terminal can be a smart phone, a tablet computer, a computer, an intelligent Internet of things device and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
The image processing device 72 may be deployed with a neural network model, and may be configured to identify an image by using the neural network model after using the image sequence acquired by the image acquiring device 71, so as to obtain position information of a key point in the image to be processed. The position information of the key points may include, for example, the position coordinate information of the joints, trunk, or five sense organs of the rodent body, and the like.
The electronic device may further acquire a training sample by using the image acquisition device 71, and train the neural network model by using a loss function according to an identification result of the training sample and a result of artificial labeling. The image processing device 72 may also recognize the image to be processed through the trained neural network model, so as to achieve the purpose of accurately recognizing the image.
The embodiments described above are only a part of the embodiments of the present application, and not all of the embodiments. The order in which the above-described embodiments are described is not intended to be a limitation on the preferred order of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

1. A method of training, comprising:
acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent;
inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture;
training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent;
wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
2. The method of claim 1, wherein the neural network model comprises a HRNet network.
3. The training method of claim 2, wherein prior to said training the neural network model, the training method further comprises:
and determining the time constraint item according to the error between the position of the key point acquired by using a tracking method and the position of the key point in the identification result.
4. The training method of claim 3, wherein the tracking method comprises the Lucas-Kanade optical flow method.
5. The training method according to claim 3, wherein the determining the time constraint term according to the error between the positions of the key points obtained by the tracking method and the positions of the key points in the recognition result comprises:
selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2;
identifying the tracking sample
Figure FDA0003446488110000011
The first frame in (1)
Figure FDA0003446488110000012
As initial frame, proceeding forward tracking to obtain forward tracking result
Figure FDA0003446488110000013
Determining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:
Figure FDA0003446488110000014
wherein ω is the number of key points on each image frame;
the last frame in the identification result of the tracking sample
Figure FDA0003446488110000021
As the termination frame, carrying out backward tracking to obtain the backward tracking result
Figure FDA0003446488110000022
Determining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Figure FDA0003446488110000023
Identifying the first frame in the identification result of the tracking sample
Figure FDA0003446488110000024
As an initial frame
Figure FDA0003446488110000025
Carrying out forward tracking to obtain a forward tracking result
Figure FDA0003446488110000026
Tracking the result in the forward direction
Figure FDA0003446488110000027
As initial frame, performing backward tracking to obtain second backward tracking result
Figure FDA0003446488110000028
The second backward tracking result and the initial frame
Figure FDA0003446488110000029
By a difference of
Figure FDA00034464881100000210
Determining the time constraint term as:
Figure FDA00034464881100000211
wherein E1Is a threshold value.
6. Training method according to claim 1,
the loss function also includes a spatial constraint term that defines the location of keypoints in the rodent pose in the same frame of image.
7. The training method of claim 6, wherein prior to said training the neural network model, the training method further comprises:
and determining the space constraint term according to the difference value between the positions of the plurality of key points in the identification result.
8. The training method according to claim 7, wherein the determining the spatial constraint term according to the difference between the positions of the plurality of key points in the recognition result comprises:
selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2;
determining recognition results of the p samples
Figure FDA00034464881100000212
Distance between two key points
Figure FDA00034464881100000213
Said distance
Figure FDA00034464881100000214
Determining said distance in accordance with a Gaussian distribution
Figure FDA00034464881100000215
Mean μ and variance σ of2Wherein ω is the number of key points on each image frame;
determining the spatial constraint term as:
Figure FDA00034464881100000216
9. the training method of claim 1, wherein the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the rodent pose.
10. The training method of claim 9, wherein the error loss term is a mean square error loss term.
11. An exercise device, comprising:
the acquisition module is used for acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent;
the input module is used for inputting the training sample into a neural network model to obtain a recognition result of the posture of the rodent, wherein the recognition result comprises key points in the posture of the rodent;
the training module is used for training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent;
wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
12. The training device of claim 11, wherein the neural network model comprises a HRNet network.
13. The training apparatus of claim 12, wherein prior to said training the neural network model, the training apparatus further comprises:
and the first determining module is used for determining the time constraint item according to the error between the position of the key point acquired by using the tracking method and the position of the key point in the identification result.
14. The training apparatus of claim 13, wherein the tracking method comprises Lucas-Kanade optical flow.
15. The training apparatus of claim 14, wherein the first determining module is configured to:
selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2;
identifying the tracking sample
Figure FDA0003446488110000031
The first frame in (1)
Figure FDA0003446488110000041
As initial frame, proceeding forward tracking to obtain forward tracking result
Figure FDA0003446488110000042
Determining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:
Figure FDA0003446488110000043
wherein ω is the number of key points on each image frame;
the last frame in the identification result of the tracking sample
Figure FDA0003446488110000044
As the termination frame, carrying out backward tracking to obtain the backward tracking result
Figure FDA0003446488110000045
Determining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Figure FDA0003446488110000046
Identifying the first frame in the identification result of the tracking sample
Figure FDA0003446488110000047
As an initial frame
Figure FDA0003446488110000048
Carrying out forward tracking to obtain a forward tracking result
Figure FDA0003446488110000049
Tracking the result in the forward direction
Figure FDA00034464881100000410
As initial frame, performing backward tracking to obtain second backward tracking result
Figure FDA00034464881100000411
The second backward tracking result and the initial frame
Figure FDA00034464881100000412
By a difference of
Figure FDA00034464881100000413
Determining the time constraint term as:
Figure FDA00034464881100000414
wherein E1Is a threshold value.
16. The training device of claim 11, wherein the loss function further comprises a spatial constraint term for defining the location of keypoints in the rodent pose in the same frame of image.
17. An exercise device as recited in claim 16, further comprising:
and the second determining module is used for determining the space constraint item according to the difference value between the positions of the plurality of key points in the identification result.
18. The training apparatus of claim 17, wherein the second determining module is configured to:
selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2;
determining recognition results of the p samples
Figure FDA00034464881100000415
Distance between two key points
Figure FDA00034464881100000416
Said distance
Figure FDA00034464881100000417
Determining said distance in accordance with a Gaussian distribution
Figure FDA00034464881100000418
Mean μ and variance σ of2Wherein ω is the number of key points on each image frame;
determining the spatial constraint term as:
Figure FDA0003446488110000051
19. the training device of claim 11, wherein the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the rodent pose.
20. Training apparatus according to claim 19 wherein the error penalty term is a mean square error penalty term.
CN202111680833.3A 2021-12-30 2021-12-30 Training method and training device Pending CN114333068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111680833.3A CN114333068A (en) 2021-12-30 2021-12-30 Training method and training device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111680833.3A CN114333068A (en) 2021-12-30 2021-12-30 Training method and training device

Publications (1)

Publication Number Publication Date
CN114333068A true CN114333068A (en) 2022-04-12

Family

ID=81022304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111680833.3A Pending CN114333068A (en) 2021-12-30 2021-12-30 Training method and training device

Country Status (1)

Country Link
CN (1) CN114333068A (en)

Similar Documents

Publication Publication Date Title
CN108470332B (en) Multi-target tracking method and device
WO2018228218A1 (en) Identification method, computing device, and storage medium
CN109934065B (en) Method and device for gesture recognition
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
CN109299658B (en) Face detection method, face image rendering device and storage medium
JP7494033B2 (en) ACTIVITY DETECTION DEVICE, ACTIVITY DETECTION SYSTEM, AND ACTIVITY DETECTION METHOD
Liu et al. Real-time facial expression recognition based on cnn
CN108509994B (en) Method and device for clustering character images
US11551407B1 (en) System and method to convert two-dimensional video into three-dimensional extended reality content
CN105844204B (en) Human behavior recognition method and device
US20240303848A1 (en) Electronic device and method for determining human height using neural networks
CN110096989B (en) Image processing method and device
CN114519401A (en) Image classification method and device, electronic equipment and storage medium
Waldmann et al. 3d-muppet: 3d multi-pigeon pose estimation and tracking
Liu et al. Automated player identification and indexing using two-stage deep learning network
CN114359965A (en) Training method and training device
WO2024011853A1 (en) Human body image quality measurement method and apparatus, electronic device, and storage medium
Truong et al. Single object tracking using particle filter framework and saliency-based weighted color histogram
JP2020173781A (en) Number recognition device, method, and electronic apparatus
CN112613436B (en) Examination cheating detection method and device
CN114333068A (en) Training method and training device
CN113724176B (en) Multi-camera motion capture seamless connection method, device, terminal and medium
CN116129523A (en) Action recognition method, device, terminal and computer readable storage medium
Gao et al. UD-YOLOv5s: Recognition of cattle regurgitation behavior based on upper and lower jaw skeleton feature extraction
CN112784691B (en) Target detection model training method, target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination