CN114333068A - Training method and training device - Google Patents
Training method and training device Download PDFInfo
- Publication number
- CN114333068A CN114333068A CN202111680833.3A CN202111680833A CN114333068A CN 114333068 A CN114333068 A CN 114333068A CN 202111680833 A CN202111680833 A CN 202111680833A CN 114333068 A CN114333068 A CN 114333068A
- Authority
- CN
- China
- Prior art keywords
- tracking
- training
- result
- determining
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The application provides a training method and a training device. The method comprises the following steps: acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture; and training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images. When the neural network model trained according to the method is used for recognizing the postures of rodents such as mice and the like, the phenomenon that the recognition result shakes in the time domain can be effectively avoided.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a training method and a training device.
Background
Gesture recognition refers to recognition and/or extraction of actions and/or key points of a living being in an image or video using a neural network model.
In the prior art, when identifying key points in an image sequence of rodent motion, the key points in a single image or a single image frame are usually extracted, and the key points can be joints of the rodent and the like. The recognition result obtained in this way is easy to shake, so that the motion track of the key point is not smooth enough.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a training method and a training apparatus to improve accuracy and recognition efficiency of a neural network model in rodent posture recognition, and ensure continuity of a recognition result in a time domain.
In a first aspect, a training method is provided, the method including: acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture; and training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
Optionally, the neural network model comprises a HRNet network.
Optionally, before the training of the neural network model, the training method further includes: and determining the time constraint item according to the error between the position of the key point acquired by using a tracking method and the position of the key point in the identification result.
Optionally, the tracking method comprises a Lucas-Kanade optical flow method.
Optionally, the determining the time constraint term according to an error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result includes: selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2; identifying the tracking sampleThe first frame in (1)As initial frame, proceeding forward tracking to obtain forward tracking resultDetermining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:wherein ω is the number of key points on each image frame; the last frame in the identification result of the tracking sampleAs the termination frame, carrying out backward tracking to obtain the backward tracking resultDetermining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample asIdentifying the first frame in the identification result of the tracking sampleAs an initial frameCarrying out forward tracking to obtain a forward tracking resultTracking the result in the forward directionAs initial frame, performing backward tracking to obtain second backward tracking resultThe second backward tracking result and the initial frameBy a difference ofDetermining the time constraint term as:wherein E1Is a threshold value.
Optionally, the loss function further includes a spatial constraint term for defining the location of keypoints in the pose of the rodent in the same frame of image.
Optionally, before the training of the neural network model, the training method further includes: and determining the space constraint term according to the difference value between the positions of the plurality of key points in the identification result.
Optionally, the determining the spatial constraint term according to a difference between positions of a plurality of key points in the recognition result includes: from the training sampleSelecting p samples, wherein p is a positive integer greater than or equal to 2; determining recognition results of the p samplesDistance between two key pointsSaid distanceDetermining said distance in accordance with a Gaussian distributionMean μ and variance σ of2Wherein ω is the number of key points on each image frame; determining the spatial constraint term as:
optionally, the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the pose of the rodent.
Optionally, the error loss term is a mean square error loss term.
In a second aspect, there is provided an exercise device comprising: the acquisition module is used for acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; the input module is used for inputting the training sample into a neural network model to obtain a recognition result of the posture of the rodent, wherein the recognition result comprises key points in the posture of the rodent; and the training module is used for training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
Optionally, the neural network model comprises a HRNet network.
Optionally, before the training of the neural network model, the training apparatus further includes: and the first determining module is used for determining the time constraint item according to the error between the position of the key point acquired by using the tracking method and the position of the key point in the identification result.
Optionally, the tracking method comprises a Lucas-Kanade optical flow method.
Optionally, the first determining module is configured to: selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2; identifying the tracking sampleThe first frame in (1)As initial frame, proceeding forward tracking to obtain forward tracking resultDetermining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:wherein ω is the number of key points on each image frame; the last frame in the identification result of the tracking sampleAs the termination frame, carrying out backward tracking to obtain the backward tracking resultDetermining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample asIdentifying the first frame in the identification result of the tracking sampleAs an initial frameCarrying out forward tracking to obtain a forward tracking resultTracking the result in the forward directionAs initial frame, performing backward tracking to obtain second backward tracking resultThe second backward tracking result and the initial frameBy a difference ofDetermining the time constraint term as:wherein E1Is a threshold value.
Optionally, the loss function further includes a spatial constraint term for defining the location of keypoints in the pose of the rodent in the same frame of image.
Optionally, the training device further comprises: and the second determining module is used for determining the space constraint item according to the difference value between the positions of the plurality of key points in the identification result.
Optionally, the second determining module is configured to: selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2; determining recognition results of the p samplesDistance between two key pointsSaid distanceDetermining said distance in accordance with a Gaussian distributionMean μ and variance σ of2Wherein ω is the number of key points on each image frame; determining the spatial constraint term as:
optionally, the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the pose of the rodent.
Optionally, the error loss term is a mean square error loss term.
According to the method and the device, time constraint is introduced in the training process of the neural network model, so that the neural network model can have higher accuracy rate when processing shielding and fuzzy images, and meanwhile, the jitter phenomenon of the recognition result of the neural network model on the time domain can be effectively inhibited.
Drawings
Fig. 1 is a schematic flowchart of a training method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a method for determining a time constraint term according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of a method for determining a spatial constraint term according to an embodiment of the present application.
Fig. 4 is a schematic flow chart of a method for determining an error constraint term according to an embodiment of the present application.
Fig. 5 is a schematic block diagram of a training apparatus according to an embodiment of the present application.
Fig. 6 is a schematic block diagram of a training device according to another embodiment of the present application.
Fig. 7 is a schematic block diagram of an application scenario according to an embodiment of the present application.
Detailed Description
The method and the device in the embodiment of the application can be applied to various scenes based on the gesture recognition of the rodent in the image sequence. The image sequence may be a plurality of image frames in a video. The plurality of image frames may be a plurality of image frames in succession in the video. The image sequence may also be a plurality of images of the animal captured by an image capturing device such as a camera. The rodent may be, for example, a mouse.
To facilitate an understanding of the embodiments of the present application, the background of the present application is first illustrated in detail.
The behavior of biological neurons is closely related to the activity of animals, and changes in the posture of animals usually cause corresponding changes in the neurons. Therefore, the exploration of the connection and interaction pattern of complex networks of neurons under specific behaviors is very important for the fields of neuroscience and medicine. In the field, a quantitative analysis method is generally adopted, namely, the corresponding relation of the posture information of the animal and the behavior of the neuron is determined by acquiring the posture information of the animal and the behavior of the neuron.
The behavior of the animal neurons can be acquired by means of ray scanning, a miniaturized multi-photon microscope and the like.
There are various methods for obtaining posture information of an animal. For example, pose information of an animal can be obtained by manually labeling key points in the image sequence. However, for massive data, the efficiency of manual processing is low, errors are prone to occur, and the accuracy of the obtained posture information cannot be guaranteed.
For another example, a marker (e.g., a displacement or acceleration sensor) may be placed at a key point of the animal body, and the posture change of the animal may be determined from the change in information such as the position of the marker. However, in rodents, due to their small size, the placement of markers interferes with their natural behavior, resulting in less accurate data being collected.
As another example, an animal in space may be positioned with a depth camera to obtain pose information thereof. However, this method is sensitive to imaging conditions and scene changes and is not suitable for all situations.
With the development of the field of artificial intelligence, the animal posture recognition method based on the neural network is gradually replacing the traditional technology. The motion rule of key points of rodents in an image sequence along with time is usually not considered in the training of the current neural network model. These neural network models have the following problems in the gesture recognition process:
in recognizing animal poses in a sequence of images, neural network models typically perform pose recognition based on each frame of image itself. For example, the image sequence to be recognized includes a first frame image and a second frame image in chronological order. And the neural network model identifies the animal posture in the first frame image according to the image of the first frame to obtain a first posture identification result corresponding to the first frame image. And recognizing the animal posture in the second frame image according to the image of the second frame to obtain a second posture recognition result corresponding to the second frame image. By adopting the method for directly identifying the animal posture by using the current frame image, the accuracy of the obtained identification result is low and the identification result is not smooth enough in time. In addition, when the image frames in the acquired image sequence have the condition of blurring or being blocked, for example, when the tail of a rodent is curled or blocked, the accuracy of the position information of the key point output by the neural network model is low.
In addition, the existing neural network model is usually trained by using a back propagation algorithm based on the error between the recognition result and the manual labeling result to construct a loss function. The neural network model does not consider the continuous change of key points in a time domain during training, so that the problem of low accuracy rate can occur when rodent posture recognition is carried out. On the other hand, training the neural network model by using the error construction loss function of the recognition result and the manual labeling result usually makes the initial training process slow.
In view of the foregoing problems, embodiments of the present application provide a training method and a training apparatus. According to the method provided by the embodiment of the application, the time constraint is introduced in the training process of the neural network model, so that the jitter phenomenon of the recognition result of the neural network model on the time domain is effectively inhibited.
The training method provided by the embodiment of the present application is described in detail below with reference to fig. 1 to 4. Fig. 1 is a schematic flow chart of a training method provided in an embodiment of the present application. The training method shown in FIG. 1 may include steps S11-S13.
In step S11, a training sample is obtained.
In one embodiment of the present application, the training sample may include a sequence of images recording rodent movements and the results of the tagging. It is understood that the marking result may include position information of a preset number of key points of the rodent body. For example, the key points may be various joints and key parts of the body, such as joints on the limbs of the mouse, tail, eyes, nose, ears, and the like. The location information may be coordinate information of the key point.
The embodiment of the present application does not limit the manner of obtaining the pre-labeled result. For example, manual labeling may be used to label image frames in an image sequence on a frame-by-frame basis. Other methods with higher confidence may also be used for annotation as possible implementations.
The way of obtaining the training sample may be many, and the embodiment of the present application is not limited to this. For example, as one implementation, the image sequence may be directly acquired by, for example, an image acquisition device (e.g., a camera, a medical imaging device, a lidar, etc.), and may include a plurality of images of rodents arranged in a time sequence. As another example, the training samples may be obtained from a server (e.g., a local server or a cloud server, etc.). Alternatively, training samples may also be obtained on the network or other content platforms, for example, open-source training data sets such as MSCOCO data sets, MPII data sets, and posetrack data sets may be used; alternatively, it may be a locally pre-stored image sequence.
And step S12, inputting the training sample obtained in the step S11 into a neural network model, and obtaining a recognition result of the posture of the rodent.
The embodiment of the present application does not specifically limit the neural network model, and any neural network model capable of realizing the gesture recognition described in the present application may be used. For example, the neural network model can be a 2D convolutional neural network such as VGG, ResNet, HRNet, etc.
Alternatively, HRNet (high resolution Network) can maintain high resolution all the time when performing feature extraction, and can perform poor fusion of features of different resolutions in the feature extraction process. The method is particularly suitable for being applied to scenes such as semantic segmentation, human body posture, image classification, facial marker detection, general target recognition and the like.
The recognition result may include location information (also referred to as recognition location) of a preset number of rodent body key points recognized by the neural network model.
And step S13, training the neural network model by using a gradient descent method according to the recognition result in the step S12 and by using a loss function.
Wherein the loss function may include a time constraint term Ltemporal。
Time constraint term L is described below in conjunction with FIG. 2temporalThe determination method of (2) is described in detail. Referring to fig. 2, fig. 2 illustrates a method for determining a time constraint term.
Time constraint term LtemporalMay be used to constrain the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
In some embodiments, the time constraint term LtemporalIt may be determined according to an error between the position information of the key point acquired by the tracking method and the position information of the key node in the recognition result.
In the training method provided in the embodiment of the present application, the tracking method may be an unsupervised tracking method, for example, a Lucas-Kanade optical flow method.
The method shown in FIG. 2 may include steps S1311-S1315.
In step S1311, m images are selected from the training samples as tracking samples.
The m images are any m images in the training sample. The m images may be consecutive m images in the training sample. It is understood that the m images may also be all images in the training sample.
Step S1312, using a first image frame of the m images in the training sample as an initial frame, performing forward tracking using the identification result of the initial frame to obtain a first forward tracking result, and determining a first difference between the first forward tracking result and the identification result of the mth image frame, where the first forward tracking result includes a tracking position of a keypoint in the mth image frame. Wherein m is a positive integer greater than or equal to 2. In other words, the first difference value may be a difference value between the tracking position and the recognition position of the same keypoint in the mth image frame.
For convenience of description, a set of m images will be referred to as I hereinafter1,i(I-1, 2, …, m), and grouping the I1,iIs recorded as the result of recognition(i ═ 1,2 …, m;. omega.: 1,2 …, h), where omega is the number of keypoints in each image frame.
Taking the first frame of the m images as the initial frame, and utilizing the identification result of the initial framePerforming forward tracking to obtain a first forward tracking resultDetermining a first forward tracking result and a set I1,iIdentification result of the m-th frame in (1)Difference value F between1Comprises the following steps:
step S1313, taking the mth image frame of the m images as a termination frame, and performing backward tracking by using the identification result of the termination frame to obtain a first backward tracking result, where the first backward tracking result includes a tracking position of a keypoint in the first image frame. It is to be understood that the mth image may also be referred to as the last image frame of the m images. A second difference between the first back tracking result and the recognition result of the first image frame is determined. In other words, the second difference may be a difference between the tracking position and the identified position of the same keypoint in the first image frame.
Using the last frame of the m images as the termination frame, and using the recognition result of the termination frameCarrying out backward tracking to obtain a first backward tracking resultDetermining a first back tracking resultAnd set I1,iThe result of recognition of the first frame in (1)Difference value F between2Comprises the following steps:
in step S1314, the first frame of the m images is taken as the initial frameUsing the recognition result of the initial framePerforming forward tracking to obtain a first forward tracking resultThen, the first forward tracking result is usedAs a termination frame, performing backward tracking, and determining a second backward tracking resultDetermining a second back tracking resultAnd initial frameDifference F of3Comprises the following steps:
in step S1315, time constraint terms are determined.
When the first difference and the second difference are both smaller than or equal to a preset threshold, determining that the time constraint term is 0; when the first difference value and/or the second difference value is/are larger than the preset threshold value, determining that the time constraint item is a second back tracking resultAnd initial frameDifference F of3. I.e. the time constraint term is
Wherein E1Is a preset threshold value which is related to the movement characteristic of the living being. It should be noted that, compared with the prediction result of the neural network model, the tracking result obtained by using the tracking method can ensure that the tracking position of the same key point smoothly changes in the time domain. Therefore, when the difference (e.g., the first difference or the second difference) is smaller than the preset threshold, it indicates that the recognition result is close to the tracking result, and the recognition result of the neural network model is relatively smooth in the time domain, and the time constraint term may not be set at this time. And when the difference value is larger than the preset threshold value, the difference between the identification result and the tracking result is larger. That is, the recognition result is jittered in the time domain. At this time, the neural network model can be trained by setting a time constraint item, so that the recognition result output by the neural network model is smoother.
The method for determining the time constraint item is not particularly limited in the embodiments of the present application. For example, the first difference may be used as a time constraint term. For another example, the second difference may be used as a time constraint term. For another example, the first forward tracking result may be tracked backward to obtain a second backward tracking result; and determining a time constraint term according to the difference value between the second back tracking result and the identification result of the first image frame. For another example, the first backward tracking result may be subjected to forward tracking to obtain a second forward tracking result; a time constraint term is determined based on a difference between the second forward tracking result and the recognition result of the first image frame.
In some embodiments, the loss function further includes a spatial constraint term for defining the location of keypoints in the rodent pose in the same frame image.
Referring to fig. 3, fig. 3 illustrates a method of determining a spatial constraint term.
Spatial constraint term LspaticalCan be used to define the location of key points in the pose of the living being in the same image frame. In some embodiments, the spatial constraint term LspaticalCan be based on multiple keys in the recognition resultThe difference between the positions of the points.
Determining a spatial constraint term L provided in an embodiment of the present applicationspaticalMay include steps S1321-S1322.
Step S1321, selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2;
the p images are any p images in the training sample. The p images may be consecutive p images in the training sample. It is understood that the p images may also be all images in the training sample. Wherein p is a positive integer greater than or equal to 2.
For convenience of description, a set of p images will be referred to as I hereinafter2,j(j ═ 1,2, …, p), and grouping I2,jIs recorded as the result of recognition(i ═ 1,2 …, p ═ 1,2 …, h), where ω is the number of keypoints in each image frame.
Step S1322 is to determine a distance between two key points in the same image in the training sample.
Determining recognition results of the p samples(j 1,2, …, p, ω 1,2, … h) of the distance between two key points(j=1,2,…,p,ω=1,2,…h)
In step S1323, a spatial constraint term is determined.
The above distanceDetermining said distance in accordance with a Gaussian distributionMean μ and variance σ of2Where ω is the key on each image frameThe number of points;
it should be noted that the definite space constraint term L provided in the above steps S1321-S1323spaticalThe method of (d) is merely an example and may be determined in other ways. For example, the spatial constraint term may also be determined based on an error between a distance between every two key points in the recognition result and a distance between every two key points in the corresponding labeling result, which is not limited in the present application.
In some embodiments, the loss function may further include an error constraint term LMSE. In some embodiments, the error constraint term LMSEThe method can be determined according to the error of the position information of the same key point in the identification result and the labeling result of the training sample. Referring to FIG. 4, taking the mean square error as an example, determining the error constraint term may include steps S1331-S1333.
Step S1331, selecting n images from the training samples obtained in step S11 to form a sample set I3,k(k ═ 1,2, …, n), where n is a positive integer greater than or equal to 1.
The n images are any n images in the training sample. The n images may be consecutive n images in the training sample. It is understood that the n images may also be all images in the training sample.
Step S1332, determining a sample set I3,kIs recognized as a result(k-1, 2 …, n; ω -1, 2 …, h) and labeling the results(k=1,2…,n;ω=1,2…,h)。
Step S1333, calculating the recognition resultAnd annotating the resultsDetermining an error loss term as:
for the error loss term, in addition to the mean square error loss, cross entropy loss, 0-1 loss, absolute value loss, etc. commonly used in the art may be employed. The methods shown in steps S1331-S1333 are only examples and do not limit the scope of the present application.
In some embodiments, the aforementioned error constraint term L may also be usedMSETime constraint term LtemporalAnd a spatial constraint term LspaticalThe sum is weighted to determine the loss function. That is, the loss function L is LMSE+aLtemporal+bLspaticalWherein a and b are hyper-parameters, and the value of the hyper-parameters is greater than or equal to 0.
An embodiment of the exercise device provided by the present application is described in detail below in conjunction with fig. 5. It is to be understood that the apparatus embodiments correspond to the description of the method embodiments described above. Therefore, reference is made to the preceding method embodiments for parts not described in detail.
Fig. 5 is a schematic block diagram of a training device 50 provided in one embodiment of the present application. It should be understood that the apparatus 50 shown in fig. 5 is merely an example, and the apparatus 50 of an embodiment of the present invention may also include other modules or units.
It should be understood that the apparatus 50 is capable of performing various steps in the methods of fig. 1-4, and will not be described here again to avoid repetition.
As a possible implementation, the apparatus includes:
and an obtaining module 51, configured to obtain a training sample.
The training samples and the obtaining method thereof may be the same as step S11 of the foregoing method, and are not described herein again.
And the input module 52 is configured to input the training sample into a neural network model, so as to obtain a recognition result of the posture of the rodent, where the recognition result includes a key point in the posture of the rodent.
And the training module 53 is configured to train the neural network model by using a gradient descent method according to the recognition result of the posture of the rodent and using a loss function.
Optionally, the neural network model comprises a HRNet network.
Optionally, before the training of the neural network model, the training apparatus further includes: and the first determining module is used for determining the time constraint item according to the error between the position of the key point acquired by using the tracking method and the position of the key point in the identification result.
Optionally, the tracking method comprises a Lucas-Kanade optical flow method.
Optionally, the first determining module is configured to: selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2; identifying the tracking sampleFirst frame in (i ═ 1.2 … m, ω ═ 1,2, …, h)As initial frame, proceeding forward tracking to obtain forward tracking resultDetermining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:wherein ω is the number of key points on each image frame; the last frame in the identification result of the tracking sampleAs the termination frame, carrying out backward tracking to obtain a backward tracking nodeFruitDetermining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample asIdentifying the first frame in the identification result of the tracking sampleAs an initial frameCarrying out forward tracking to obtain a forward tracking resultTracking the result in the forward directionAs initial frame, performing backward tracking to obtain second backward tracking resultThe second backward tracking result and the initial frameBy a difference ofDetermining the time constraint term as:wherein E1Is a threshold value.
Optionally, the loss function further includes a spatial constraint term for defining the location of keypoints in the pose of the rodent in the same frame of image.
Optionally, the training device further comprises: and the second determining module is used for determining the space constraint item according to the difference value between the positions of the plurality of key points in the identification result.
Optionally, the second determining module is configured to: selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2; determining recognition results of the p samples(j 1,2, …, p, ω 1,2, … h) of the distance between two key points(j-1, 2, …, p, ω -1, 2, … h), said distanceDetermining said distance in accordance with a Gaussian distributionMean μ and variance σ of2Wherein ω is the number of key points on each image frame; determining the spatial constraint term as:
optionally, the loss function further comprises an error constraint term for constraining errors in the recognition results and the annotation results for the keypoints in the pose of the living being.
Optionally, the error loss term is a mean square error loss term.
It should be appreciated that the apparatus 50 for training a neural network model herein is embodied in the form of a functional module. The term "module" herein may be implemented in software and/or hardware, and is not particularly limited thereto. For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the functionality described above. The hardware circuitry may include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared processor, a dedicated processor, or a group of processors) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.
As an example, the apparatus 50 for training a neural network model provided in the embodiment of the present invention may be a processor or a chip, so as to perform the method described in the embodiment of the present invention.
Fig. 6 is a schematic block diagram of a training device 60 provided in another embodiment of the present application. The apparatus 60 shown in fig. 6 comprises a memory 61, a processor 62, a communication interface 63 and a bus 64. The memory 61, the processor 62 and the communication interface 63 are connected to each other through a bus 64.
The memory 61 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 61 may store a program, and when the program stored in the memory 61 is executed by the processor 62, the processor 62 is configured to perform the steps of the training method provided by the embodiment of the present invention, for example, the steps of the embodiments shown in fig. 1 to 4 may be performed.
The processor 62 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the training method of the embodiment of the present invention.
The processor 62 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the training method provided by the embodiment of the present invention may be implemented by integrated logic circuits of hardware in the processor 62 or instructions in the form of software.
The processor 62 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 61, and the processor 62 reads the information in the memory 61, and performs the functions required to be performed by the units included in the gesture recognition apparatus according to the embodiment of the present invention, or performs the training method according to the embodiment of the method according to the present invention, in combination with the hardware thereof. For example, various steps/functions of the embodiments shown in fig. 1-4 may be performed.
Communication interface 63 may enable communication between apparatus 60 and other devices or communication networks using, but not limited to, transceiver devices.
Bus 64 may include a path that conveys information between various components of apparatus 60 (e.g., memory 61, processor 62, communication interface 63).
It should be understood that the apparatus 60 shown in the embodiments of the present invention may be a processor or a chip for performing the methods described in the embodiments of the present invention.
It should be understood that the processor in the embodiments of the present invention may be a Central Processing Unit (CPU), and the processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Specific applications of the embodiment of the present application are described below with reference to the application scenario of fig. 7. It should be noted that the following description about fig. 7 is only an example and is not limited thereto, and the method in the embodiment of the present application is not limited thereto, and may also be applied to other scenarios of gesture recognition.
The application scenario in fig. 7 may include an image acquisition device 71 and an image processing device 72.
Wherein the image acquisition device 71 can be used to acquire a sequence of images of a rodent. The image processing apparatus 72 may be integrated into an electronic device, which may be a server or a terminal, and the present embodiment is not limited thereto. For example, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud computing, cloud storage, cloud communication, big data and artificial intelligence platforms. The terminal can be a smart phone, a tablet computer, a computer, an intelligent Internet of things device and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
The image processing device 72 may be deployed with a neural network model, and may be configured to identify an image by using the neural network model after using the image sequence acquired by the image acquiring device 71, so as to obtain position information of a key point in the image to be processed. The position information of the key points may include, for example, the position coordinate information of the joints, trunk, or five sense organs of the rodent body, and the like.
The electronic device may further acquire a training sample by using the image acquisition device 71, and train the neural network model by using a loss function according to an identification result of the training sample and a result of artificial labeling. The image processing device 72 may also recognize the image to be processed through the trained neural network model, so as to achieve the purpose of accurately recognizing the image.
The embodiments described above are only a part of the embodiments of the present application, and not all of the embodiments. The order in which the above-described embodiments are described is not intended to be a limitation on the preferred order of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (20)
1. A method of training, comprising:
acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent;
inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture;
training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent;
wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
2. The method of claim 1, wherein the neural network model comprises a HRNet network.
3. The training method of claim 2, wherein prior to said training the neural network model, the training method further comprises:
and determining the time constraint item according to the error between the position of the key point acquired by using a tracking method and the position of the key point in the identification result.
4. The training method of claim 3, wherein the tracking method comprises the Lucas-Kanade optical flow method.
5. The training method according to claim 3, wherein the determining the time constraint term according to the error between the positions of the key points obtained by the tracking method and the positions of the key points in the recognition result comprises:
selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2;
identifying the tracking sampleThe first frame in (1)As initial frame, proceeding forward tracking to obtain forward tracking resultDetermining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:wherein ω is the number of key points on each image frame;
the last frame in the identification result of the tracking sampleAs the termination frame, carrying out backward tracking to obtain the backward tracking resultDetermining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Identifying the first frame in the identification result of the tracking sampleAs an initial frameCarrying out forward tracking to obtain a forward tracking result
Tracking the result in the forward directionAs initial frame, performing backward tracking to obtain second backward tracking resultThe second backward tracking result and the initial frameBy a difference of
6. Training method according to claim 1,
the loss function also includes a spatial constraint term that defines the location of keypoints in the rodent pose in the same frame of image.
7. The training method of claim 6, wherein prior to said training the neural network model, the training method further comprises:
and determining the space constraint term according to the difference value between the positions of the plurality of key points in the identification result.
8. The training method according to claim 7, wherein the determining the spatial constraint term according to the difference between the positions of the plurality of key points in the recognition result comprises:
selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2;
determining recognition results of the p samplesDistance between two key pointsSaid distanceDetermining said distance in accordance with a Gaussian distributionMean μ and variance σ of2Wherein ω is the number of key points on each image frame;
9. the training method of claim 1, wherein the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the rodent pose.
10. The training method of claim 9, wherein the error loss term is a mean square error loss term.
11. An exercise device, comprising:
the acquisition module is used for acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent;
the input module is used for inputting the training sample into a neural network model to obtain a recognition result of the posture of the rodent, wherein the recognition result comprises key points in the posture of the rodent;
the training module is used for training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent;
wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images.
12. The training device of claim 11, wherein the neural network model comprises a HRNet network.
13. The training apparatus of claim 12, wherein prior to said training the neural network model, the training apparatus further comprises:
and the first determining module is used for determining the time constraint item according to the error between the position of the key point acquired by using the tracking method and the position of the key point in the identification result.
14. The training apparatus of claim 13, wherein the tracking method comprises Lucas-Kanade optical flow.
15. The training apparatus of claim 14, wherein the first determining module is configured to:
selecting m samples from the training samples as tracking samples, wherein m is a positive integer greater than or equal to 2;
identifying the tracking sampleThe first frame in (1)As initial frame, proceeding forward tracking to obtain forward tracking resultDetermining a difference between the forward tracking result and the identification result of the mth image frame in the tracking sample as:wherein ω is the number of key points on each image frame;
the last frame in the identification result of the tracking sampleAs the termination frame, carrying out backward tracking to obtain the backward tracking resultDetermining a difference value between the backward tracking result and the recognition result of the 1 st image frame in the tracking sample as
Identifying the first frame in the identification result of the tracking sampleAs an initial frameCarrying out forward tracking to obtain a forward tracking result
Tracking the result in the forward directionAs initial frame, performing backward tracking to obtain second backward tracking resultThe second backward tracking result and the initial frameBy a difference of
16. The training device of claim 11, wherein the loss function further comprises a spatial constraint term for defining the location of keypoints in the rodent pose in the same frame of image.
17. An exercise device as recited in claim 16, further comprising:
and the second determining module is used for determining the space constraint item according to the difference value between the positions of the plurality of key points in the identification result.
18. The training apparatus of claim 17, wherein the second determining module is configured to:
selecting p samples from the training samples, wherein p is a positive integer greater than or equal to 2;
determining recognition results of the p samplesDistance between two key pointsSaid distanceDetermining said distance in accordance with a Gaussian distributionMean μ and variance σ of2Wherein ω is the number of key points on each image frame;
19. the training device of claim 11, wherein the loss function further comprises an error constraint term for constraining errors in the recognition and annotation results for keypoints in the rodent pose.
20. Training apparatus according to claim 19 wherein the error penalty term is a mean square error penalty term.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111680833.3A CN114333068A (en) | 2021-12-30 | 2021-12-30 | Training method and training device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111680833.3A CN114333068A (en) | 2021-12-30 | 2021-12-30 | Training method and training device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114333068A true CN114333068A (en) | 2022-04-12 |
Family
ID=81022304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111680833.3A Pending CN114333068A (en) | 2021-12-30 | 2021-12-30 | Training method and training device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114333068A (en) |
-
2021
- 2021-12-30 CN CN202111680833.3A patent/CN114333068A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108470332B (en) | Multi-target tracking method and device | |
WO2018228218A1 (en) | Identification method, computing device, and storage medium | |
CN109934065B (en) | Method and device for gesture recognition | |
WO2023010758A1 (en) | Action detection method and apparatus, and terminal device and storage medium | |
CN109299658B (en) | Face detection method, face image rendering device and storage medium | |
JP7494033B2 (en) | ACTIVITY DETECTION DEVICE, ACTIVITY DETECTION SYSTEM, AND ACTIVITY DETECTION METHOD | |
Liu et al. | Real-time facial expression recognition based on cnn | |
CN108509994B (en) | Method and device for clustering character images | |
US11551407B1 (en) | System and method to convert two-dimensional video into three-dimensional extended reality content | |
CN105844204B (en) | Human behavior recognition method and device | |
US20240303848A1 (en) | Electronic device and method for determining human height using neural networks | |
CN110096989B (en) | Image processing method and device | |
CN114519401A (en) | Image classification method and device, electronic equipment and storage medium | |
Waldmann et al. | 3d-muppet: 3d multi-pigeon pose estimation and tracking | |
Liu et al. | Automated player identification and indexing using two-stage deep learning network | |
CN114359965A (en) | Training method and training device | |
WO2024011853A1 (en) | Human body image quality measurement method and apparatus, electronic device, and storage medium | |
Truong et al. | Single object tracking using particle filter framework and saliency-based weighted color histogram | |
JP2020173781A (en) | Number recognition device, method, and electronic apparatus | |
CN112613436B (en) | Examination cheating detection method and device | |
CN114333068A (en) | Training method and training device | |
CN113724176B (en) | Multi-camera motion capture seamless connection method, device, terminal and medium | |
CN116129523A (en) | Action recognition method, device, terminal and computer readable storage medium | |
Gao et al. | UD-YOLOv5s: Recognition of cattle regurgitation behavior based on upper and lower jaw skeleton feature extraction | |
CN112784691B (en) | Target detection model training method, target detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |