CN114333068A - Training method and training device - Google Patents

Training method and training device Download PDF

Info

Publication number
CN114333068A
CN114333068A CN202111680833.3A CN202111680833A CN114333068A CN 114333068 A CN114333068 A CN 114333068A CN 202111680833 A CN202111680833 A CN 202111680833A CN 114333068 A CN114333068 A CN 114333068A
Authority
CN
China
Prior art keywords
tracking
training
result
frame
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111680833.3A
Other languages
Chinese (zh)
Other versions
CN114333068B (en
Inventor
王瑶
张珏
程和平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Jingruikang Molecular Medicine Technology Co ltd
Original Assignee
Nanjing Jingruikang Molecular Medicine Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Jingruikang Molecular Medicine Technology Co ltd filed Critical Nanjing Jingruikang Molecular Medicine Technology Co ltd
Priority to CN202111680833.3A priority Critical patent/CN114333068B/en
Publication of CN114333068A publication Critical patent/CN114333068A/en
Application granted granted Critical
Publication of CN114333068B publication Critical patent/CN114333068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a training method and a training device. The method comprises the following steps: acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture; and training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images. When the neural network model trained according to the method is used for recognizing the postures of rodents such as mice and the like, the phenomenon that the recognition result shakes in the time domain can be effectively avoided.

Description

训练方法及训练装置Training method and training device

技术领域technical field

本申请涉及人工智能技术领域,具体涉及一种训练方法及训练装置。The present application relates to the technical field of artificial intelligence, and in particular, to a training method and a training device.

背景技术Background technique

姿态识别是指利用神经网络模型,对图像或视频中的生物的动作和/或关键点进行识别和/或提取。Gesture recognition refers to the use of neural network models to identify and/or extract the actions and/or key points of creatures in images or videos.

现有技术中,在对啮齿类动物运动的图像序列中的关键点进行识别时,通常时对单个图像或单个图像帧中的关键点进行提取,上述关键点例如可以是啮齿类动物的关节等。通过这种方式获得的识别结果容易出现抖动,使关键点的运动轨迹不够平滑。In the prior art, when identifying key points in an image sequence in which rodents are moving, the key points in a single image or a single image frame are usually extracted. For example, the key points can be joints of rodents, etc. . The recognition results obtained in this way are prone to jitter, which makes the movement trajectory of key points not smooth enough.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本申请实施例提供一种训练方法及训练装置,以提高神经网络模型在进行啮齿类动物姿态识别时的准确度和识别效率,并且确保识别结果在时域上的连续性。In view of this, the embodiments of the present application provide a training method and a training device, so as to improve the accuracy and recognition efficiency of a neural network model in rodent gesture recognition, and ensure the continuity of recognition results in the time domain.

第一方面,提供了一种训练方法,所述方法包括:获取训练样本,所述训练样本为记录啮齿类动物运动的图像序列;将所述训练样本输入神经网络模型,得到所述啮齿类动物的姿态的识别结果,所述识别结果包括所述啮齿类动物的姿态中的关键点;根据所述啮齿类动物的姿态的识别结果,利用损失函数,使用梯度下降法对所述神经网络模型进行训练。其中,所述损失函数包括时间约束项,所述时间约束项用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。In a first aspect, a training method is provided, the method comprising: acquiring a training sample, where the training sample is an image sequence recording rodent movements; inputting the training sample into a neural network model to obtain the rodent The recognition result of the posture of the rodent, and the recognition result includes the key points in the posture of the rodent; according to the recognition result of the posture of the rodent, the neural network model is subjected to the gradient descent method using the loss function. train. Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence.

可选地,所述神经网络模型包括HRNet网络。Optionally, the neural network model includes HRNet network.

可选地,在所述对所述神经网络模型进行训练之前,所述训练方法还包括:根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项。Optionally, before the training of the neural network model, the training method further includes: according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result, determining the time constraints.

可选地,所述跟踪方法包括Lucas-Kanade光流法。Optionally, the tracking method includes Lucas-Kanade optical flow method.

可选地,所述根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项,包括:从所述训练样本中选择m个样本作为跟踪样本,其中m为大于或等于2的正整数;将所述跟踪样本的识别结果

Figure BDA0003446488120000021
中的第一帧
Figure BDA0003446488120000022
作为初始帧,进行前向跟踪,得到前向跟踪结果
Figure BDA0003446488120000023
确定所述前向跟踪结果与所述跟踪样本中的第m个图像帧的识别结果的差值为:
Figure BDA0003446488120000024
其中ω为每一个图像帧上的关键点的个数;将所述跟踪样本的识别结果中的最后一帧
Figure BDA0003446488120000025
作为终止帧,进行后向跟踪,得到后向跟踪结果
Figure BDA0003446488120000026
确定所述后向跟踪结果与所述跟踪样本中的第1个图像帧的识别结果的差值为
Figure BDA0003446488120000027
将所述跟踪样本的识别结果中的第一帧
Figure BDA0003446488120000028
作为初始帧
Figure BDA0003446488120000029
进行前向跟踪,得到前向跟踪结果
Figure BDA00034464881200000210
以所述前向跟踪结果
Figure BDA00034464881200000211
作为初始帧,进行后向跟踪,得到第二后向跟踪结果
Figure BDA00034464881200000212
所述第二后向跟踪结果与所述初始帧
Figure BDA00034464881200000213
的差值为
Figure BDA00034464881200000214
确定时间约束项为:
Figure BDA00034464881200000215
其中E1为阈值。Optionally, the determining the time constraint item according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result includes: selecting m samples from the training samples as tracking sample, where m is a positive integer greater than or equal to 2; the identification result of the tracking sample
Figure BDA0003446488120000021
first frame in
Figure BDA0003446488120000022
As the initial frame, forward tracking is performed to obtain the forward tracking result
Figure BDA0003446488120000023
It is determined that the difference between the forward tracking result and the recognition result of the mth image frame in the tracking sample is:
Figure BDA0003446488120000024
where ω is the number of key points on each image frame; the last frame in the recognition result of the tracking sample
Figure BDA0003446488120000025
As the termination frame, backward tracking is performed to obtain the backward tracking result
Figure BDA0003446488120000026
Determine the difference between the backward tracking result and the recognition result of the first image frame in the tracking sample as
Figure BDA0003446488120000027
The first frame of the identification result of the tracking sample
Figure BDA0003446488120000028
as initial frame
Figure BDA0003446488120000029
Perform forward tracking and get forward tracking results
Figure BDA00034464881200000210
Take the forward trace result
Figure BDA00034464881200000211
As the initial frame, backward tracking is performed to obtain the second backward tracking result
Figure BDA00034464881200000212
the second backtracking result and the initial frame
Figure BDA00034464881200000213
The difference is
Figure BDA00034464881200000214
Determine the time constraints as:
Figure BDA00034464881200000215
where E 1 is the threshold.

可选地,所述损失函数还包括空间约束项,所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。Optionally, the loss function further includes a space constraint term, and the space constraint term is used to define the positions of the key points in the pose of the rodent in the same frame of image.

可选地,在所述对所述神经网络模型进行训练之前,所述训练方法还包括:根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项。Optionally, before the training of the neural network model, the training method further includes: determining the space constraint item according to the difference between the positions of a plurality of key points in the recognition result.

可选地,所述根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项,包括:从所述训练样本中选择p个样本,其中p为大于或等于2的正整数;确定所述p个样本的识别结果

Figure BDA00034464881200000216
中两两关键点的距离
Figure BDA00034464881200000217
所述距离
Figure BDA00034464881200000218
符合高斯分布,确定所述距离
Figure BDA00034464881200000219
的均值μ和方差σ2,其中ω为每一个图像帧上的关键点的个数;确定空间约束项为:Optionally, the determining the space constraint item according to the difference between the positions of multiple key points in the recognition result includes: selecting p samples from the training samples, where p is greater than or A positive integer equal to 2; determine the recognition result of the p samples
Figure BDA00034464881200000216
The distance between two key points in the middle
Figure BDA00034464881200000217
the distance
Figure BDA00034464881200000218
fit a Gaussian distribution, determine the distance
Figure BDA00034464881200000219
The mean μ and variance σ 2 of , where ω is the number of key points on each image frame; the spatial constraint term is determined as:

Figure BDA0003446488120000031
Figure BDA0003446488120000031

可选地,所述损失函数还包括误差约束项,所述误差约束项用于约束所述啮齿类动物的姿态中的关键点在所述识别结果和标注结果中的误差。Optionally, the loss function further includes an error constraint term, and the error constraint term is used to constrain the errors of the key points in the pose of the rodent in the recognition result and the labeling result.

可选地,所述误差损失项为均方误差损失项。Optionally, the error loss term is a mean square error loss term.

第二方面,提供一种训练装置,包括:获取模块,用于获取训练样本,所述训练样本为记录啮齿类动物运动的图像序列;输入模块,用于将所述训练样本输入神经网络模型,得到所述啮齿类动物的姿态的识别结果,所述识别结果包括所述啮齿类动物的姿态中的关键点;训练模块,用于根据所述啮齿类动物的姿态的识别结果,利用损失函数,使用梯度下降法对所述神经网络模型进行训练。其中,所述损失函数包括时间约束项,所述时间约束项用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。In a second aspect, a training device is provided, comprising: an acquisition module for acquiring training samples, where the training samples are image sequences recording rodent movements; an input module for inputting the training samples into a neural network model, Obtain the recognition result of the posture of the rodent, where the recognition result includes key points in the posture of the rodent; the training module is configured to use a loss function according to the recognition result of the posture of the rodent, using a loss function, The neural network model is trained using gradient descent. Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence.

可选地,所述神经网络模型包括HRNet网络。Optionally, the neural network model includes HRNet network.

可选地,在所述对所述神经网络模型进行训练之前,所述训练装置还包括:第一确定模块,用于根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项。Optionally, before the training of the neural network model, the training device further includes: a first determination module, configured to determine the difference between the position of the key point obtained by using the tracking method and the key point in the identification result. The error of the position determines the time constraint term.

可选地,所述跟踪方法包括Lucas-Kanade光流法。Optionally, the tracking method includes Lucas-Kanade optical flow method.

可选地,所述第一确定模块用于:从所述训练样本中选择m个样本作为跟踪样本,其中m为大于或等于2的正整数;将所述跟踪样本的识别结果

Figure BDA0003446488120000032
中的第一帧
Figure BDA0003446488120000033
作为初始帧,进行前向跟踪,得到前向跟踪结果
Figure BDA0003446488120000034
确定所述前向跟踪结果与所述跟踪样本中的第m个图像帧的识别结果的差值为:
Figure BDA0003446488120000035
其中ω为每一个图像帧上的关键点的个数;将所述跟踪样本的识别结果中的最后一帧
Figure BDA0003446488120000036
作为终止帧,进行后向跟踪,得到后向跟踪结果
Figure BDA0003446488120000037
确定所述后向跟踪结果与所述跟踪样本中的第1个图像帧的识别结果的差值为
Figure BDA0003446488120000041
将所述跟踪样本的识别结果中的第一帧
Figure BDA0003446488120000042
作为初始帧
Figure BDA0003446488120000043
进行前向跟踪,得到前向跟踪结果
Figure BDA0003446488120000044
以所述前向跟踪结果
Figure BDA0003446488120000045
作为初始帧,进行后向跟踪,得到第二后向跟踪结果
Figure BDA0003446488120000046
所述第二后向跟踪结果与所述初始帧
Figure BDA0003446488120000047
的差值为
Figure BDA0003446488120000048
确定时间约束项为:
Figure BDA0003446488120000049
其中E1为阈值。Optionally, the first determining module is configured to: select m samples from the training samples as tracking samples, where m is a positive integer greater than or equal to 2;
Figure BDA0003446488120000032
first frame in
Figure BDA0003446488120000033
As the initial frame, forward tracking is performed to obtain the forward tracking result
Figure BDA0003446488120000034
It is determined that the difference between the forward tracking result and the recognition result of the mth image frame in the tracking sample is:
Figure BDA0003446488120000035
where ω is the number of key points on each image frame; the last frame in the recognition result of the tracking sample
Figure BDA0003446488120000036
As the termination frame, backward tracking is performed to obtain the backward tracking result
Figure BDA0003446488120000037
Determine the difference between the backward tracking result and the recognition result of the first image frame in the tracking sample as
Figure BDA0003446488120000041
The first frame of the identification result of the tracking sample
Figure BDA0003446488120000042
as initial frame
Figure BDA0003446488120000043
Perform forward tracking and get forward tracking results
Figure BDA0003446488120000044
Take the forward trace result
Figure BDA0003446488120000045
As the initial frame, backward tracking is performed to obtain the second backward tracking result
Figure BDA0003446488120000046
the second backtracking result and the initial frame
Figure BDA0003446488120000047
The difference is
Figure BDA0003446488120000048
Determine the time constraints as:
Figure BDA0003446488120000049
where E 1 is the threshold.

可选地,所述损失函数还包括空间约束项,所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。Optionally, the loss function further includes a space constraint term, and the space constraint term is used to define the positions of the key points in the pose of the rodent in the same frame of image.

可选地,所述训练装置还包括:第二确定模块,用于根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项。Optionally, the training device further includes: a second determination module, configured to determine the space constraint item according to the difference between the positions of the multiple key points in the recognition result.

可选地,所述第二确定模块用于:从所述训练样本中选择p个样本,其中p为大于或等于2的正整数;确定所述p个样本的识别结果

Figure BDA00034464881200000410
中两两关键点的距离
Figure BDA00034464881200000411
所述距离
Figure BDA00034464881200000412
符合高斯分布,确定所述距离
Figure BDA00034464881200000413
的均值μ和方差σ2,其中ω为每一个图像帧上的关键点的个数;确定空间约束项为:Optionally, the second determining module is configured to: select p samples from the training samples, where p is a positive integer greater than or equal to 2; determine the identification results of the p samples
Figure BDA00034464881200000410
The distance between two key points in the middle
Figure BDA00034464881200000411
the distance
Figure BDA00034464881200000412
fit a Gaussian distribution, determine the distance
Figure BDA00034464881200000413
The mean μ and variance σ 2 of , where ω is the number of key points on each image frame; the spatial constraint term is determined as:

Figure BDA00034464881200000414
Figure BDA00034464881200000414

可选地,所述损失函数还包括误差约束项,所述误差约束项用于约束所述啮齿类动物的姿态中的关键点在所述识别结果和标注结果中的误差。Optionally, the loss function further includes an error constraint term, and the error constraint term is used to constrain the errors of the key points in the pose of the rodent in the recognition result and the labeling result.

可选地,所述误差损失项为均方误差损失项。Optionally, the error loss term is a mean square error loss term.

本申请通过在神经网络模型的训练过程中引入时间约束,使得所述神经网络模型在处理遮挡和模糊图像时能够具有较高的准确率,同时能够有效抑制神经网络模型的识别结果在时域上的抖动现象。In the present application, by introducing time constraints in the training process of the neural network model, the neural network model can have a high accuracy rate when processing occlusion and blurred images, and at the same time, it can effectively suppress the recognition results of the neural network model in the time domain. jitter phenomenon.

附图说明Description of drawings

图1为本申请一实施例提供的训练方法的示意性流程图。FIG. 1 is a schematic flowchart of a training method provided by an embodiment of the present application.

图2为本申请一实施例提供的时间约束项的确定方法的示意性流程图。FIG. 2 is a schematic flowchart of a method for determining a time constraint item according to an embodiment of the present application.

图3为本申请一实施例提供的空间约束项的确定方法的示意性流程图。FIG. 3 is a schematic flowchart of a method for determining a space constraint item according to an embodiment of the present application.

图4为本申请一实施例提供的误差约束项的确定方法的示意性流程图。FIG. 4 is a schematic flowchart of a method for determining an error constraint term provided by an embodiment of the present application.

图5为本申请一实施例提供的训练装置的示意性框图。FIG. 5 is a schematic block diagram of a training apparatus provided by an embodiment of the present application.

图6为本申请另一实施例提供的训练装置的示意性框图。FIG. 6 is a schematic block diagram of a training apparatus provided by another embodiment of the present application.

图7为本申请实施例的应用场景的示意性框图。FIG. 7 is a schematic block diagram of an application scenario of an embodiment of the present application.

具体实施方式Detailed ways

本申请实施例中的方法及装置可以应用于对各种基于图像序列中的啮齿类动物的姿态识别的场景。该图像序列可以为视频中的多个图像帧。多个图像帧可以为视频中连续的多个图像帧。图像序列也可以是摄像机等图像采集设备采集的动物的多张图像。该啮齿类动物例如可以是小鼠等。The methods and apparatuses in the embodiments of the present application can be applied to various scenarios based on gesture recognition of rodents in image sequences. The sequence of images can be multiple image frames in the video. The multiple image frames may be consecutive multiple image frames in the video. The image sequence may also be multiple images of the animal captured by an image capturing device such as a camera. The rodent may be, for example, a mouse or the like.

为了便于理解本申请实施例,首先对本申请的背景进行详细的举例说明。In order to facilitate the understanding of the embodiments of the present application, the background of the present application is firstly described in detail with examples.

生物神经元的行为与动物的活动息息相关,动物的姿态变化通常会引起神经元的相应变化。因此,对特定行为下由神经元构成的复杂的网络的连接和交互方式的探索对神经科学和医学领域是非常重要的。本领域一般采用定量分析的方法,即通过获取动物的姿态信息及神经元的行为,确定其对应关系。The behavior of biological neurons is closely related to the activities of animals, and changes in animal posture usually cause corresponding changes in neurons. Therefore, the exploration of the connection and interaction of complex networks of neurons under specific behaviors is of great importance to the fields of neuroscience and medicine. Quantitative analysis methods are generally used in the art, that is, by acquiring animal posture information and neuron behaviors, to determine the corresponding relationship.

获取动物神经元的行为可以利用射线扫描以及微型化多光子显微镜等方法获取。The behavior of animal neurons can be obtained using methods such as ray scanning and miniaturized multiphoton microscopy.

获取动物的姿态信息的方法有多种。例如,可以通过对图像序列中的关键点进行人工标注,以获取动物的姿态信息。但是,面对海量的数据,人工处理的效率较低且容易出错,无法保证得到的姿态信息的准确性。There are many ways to obtain the posture information of animals. For example, the pose information of animals can be obtained by manually annotating key points in an image sequence. However, in the face of massive data, the efficiency of manual processing is low and error-prone, and the accuracy of the obtained attitude information cannot be guaranteed.

又例如,还可以动物身体的关键点处设置标记物(例如位移或加速度传感器),根据标记物位置等信息的变化确定动物的姿态变化。但是,对于啮齿类动物来说,由于其体型较小,设置标记物会干扰其自然行为,导致采集到的数据的准确性降低。For another example, markers (such as displacement or acceleration sensors) may also be set at key points of the animal's body, and changes in the animal's posture can be determined according to changes in information such as the position of the markers. However, in rodents, due to their small size, setting markers interferes with their natural behavior, resulting in less accurate data collected.

再例如,可以利用深度相机对空间中的动物进行定位以获取其姿态信息。但是,该方法对成像条件以及场景变化较为敏感,并不适用于所有场合。For another example, a depth camera can be used to locate an animal in space to obtain its pose information. However, this method is sensitive to imaging conditions and scene changes, and is not suitable for all occasions.

随着人工智能领域的发展,基于神经网络的动物姿态识别方法正逐步取代传统技术。目前的神经网络模型在训练时通常不会考虑图像序列中啮齿类动物的关键点随时间的运动规律。这些神经网络模型在姿态识别过程中具有以下问题:With the development of artificial intelligence, animal gesture recognition methods based on neural networks are gradually replacing traditional techniques. Current neural network models typically do not take into account the time-dependent movement patterns of rodent keypoints in image sequences. These neural network models have the following problems in the gesture recognition process:

在识别图像序列中的动物姿态时,神经网络模型通常是基于每一帧图像本身进行姿态识别。例如,待识别图像序列按照时间顺序包括第一帧图像和第二帧图像。神经网络模型根据第一帧的图像对第一帧图像中的动物姿态进行识别,得到第一帧图像对应的第一姿态识别结果。根据第二帧的图像对第二帧图像中的动物姿态进行识别,得到第二帧图像对应的第二姿态识别结果。采用上述直接利用当前帧图像对动物姿态进行识别的方法,得到的识别结果的准确率较低,在时间上不够平滑。此外,当采集的图像序列中的图像帧存在模糊或被遮挡的情况时,例如当啮齿类动物的尾巴发生卷曲或被遮挡时,神经网络模型输出的关键点位置信息的准确性较低。When recognizing animal poses in image sequences, neural network models usually perform pose recognition based on each frame of image itself. For example, the image sequence to be recognized includes a first frame of images and a second frame of images in time sequence. The neural network model recognizes the animal pose in the first frame image according to the first frame image, and obtains a first pose recognition result corresponding to the first frame image. The animal gesture in the second frame image is recognized according to the second frame image, and the second gesture recognition result corresponding to the second frame image is obtained. By adopting the above-mentioned method of directly using the current frame image to recognize the animal pose, the accuracy of the obtained recognition result is low, and the time is not smooth enough. In addition, when the image frames in the acquired image sequence are blurred or occluded, such as when the rodent's tail is curled or occluded, the accuracy of the keypoint location information output by the neural network model is low.

此外,现有的神经网络模型通常是基于识别结果与人工标注结果之间的误差来构造损失函数,利用反向传播算法进行训练的。这种神经网络模型在训练时不会考虑关键点在时域上的连续变化,从而导致在执行啮齿类动物姿态识别时会出现准确率较低的问题。另一方面,利用上述识别结果与人工标注结果的误差构造损失函数对神经网络模型进行训练,通常会使初始训练过程较慢。In addition, the existing neural network models usually construct a loss function based on the error between the recognition result and the manual labeling result, and use the back-propagation algorithm for training. This neural network model does not consider the continuous changes of key points in the time domain during training, which leads to the problem of low accuracy when performing rodent pose recognition. On the other hand, constructing a loss function to train a neural network model using the error between the above recognition results and manual annotation results usually makes the initial training process slow.

有鉴于上述问题,本申请实施例提供了一种训练方法及训练装置。本申请实施例提供的方法通过在神经网络模型的训练过程中引入时间约束,有效抑制了神经网络模型的识别结果在时域上的抖动现象。In view of the above problems, embodiments of the present application provide a training method and a training device. The method provided by the embodiment of the present application effectively suppresses the jitter phenomenon in the time domain of the recognition result of the neural network model by introducing a time constraint in the training process of the neural network model.

下面结合图1-图4,对本申请实施例提供的训练方法进行详细介绍。图1是本申请实施例提供的训练方法的示意性流程图。图1所示的训练方法可包括步骤S11-S13。The training method provided by the embodiment of the present application will be described in detail below with reference to FIG. 1 to FIG. 4 . FIG. 1 is a schematic flowchart of a training method provided by an embodiment of the present application. The training method shown in FIG. 1 may include steps S11-S13.

步骤S11,获取训练样本。Step S11, acquiring training samples.

在本申请的一个实施例中,训练样本可包括记录啮齿类动物运动的图像序列以及标记结果。可以理解,标记结果可包括预设数目个啮齿类动物身体关键点的位置信息。例如,关键点可以是身体各关节点和关键部位,例如可以是小鼠四肢上的关节点以及尾巴、眼睛、鼻子、耳朵等。该位置信息可以为关键点的坐标信息。In one embodiment of the present application, the training samples may include image sequences recording rodent movements and labelling results. It can be understood that the marking result may include the position information of a preset number of key points on the rodent body. For example, the key points can be various joint points and key parts of the body, such as joint points on the limbs of a mouse, tail, eyes, nose, ears, and the like. The location information may be coordinate information of key points.

本申请实施例对预先标注的结果的获取方式不做限定。例如,可以使用人工标注的方法对图像序列中的图像帧做逐帧标注。作为可能的实现方式,也可使用其他置信度较高的方法进行标注。The embodiment of the present application does not limit the acquisition manner of the pre-marked result. For example, an image frame in an image sequence can be annotated frame by frame using a manual annotation method. As a possible implementation, other methods with higher confidence can also be used for labeling.

获取训练样本的方式可以有很多种,本申请实施例对此也不做限定。例如,作为一种实现方式,可以通过例如图像获取设备(如摄像机、摄像头、医疗影像设备、激光雷达等)直接获取的图像序列,该图像序列可包括按时间顺序排列的多张啮齿类动物的图像。又例如,可以从服务器(例如本地服务器或云服务器等)获取训练样本。或者,还可以在网络上或其他内容平台上获取训练样本,例如可以使用MSCOCO数据集、MPII数据集以及POSETTRACK数据集等开源的训练数据集等;或者,还可以是预先存储在本地的图像序列。There may be many ways to obtain training samples, which are not limited in this embodiment of the present application. For example, as an implementation, an image sequence that can be directly acquired by, for example, an image acquisition device (eg, a camera, a camera, a medical imaging device, a lidar, etc.) image. For another example, training samples may be obtained from a server (eg, a local server or a cloud server, etc.). Alternatively, training samples can also be obtained on the Internet or other content platforms, for example, open-source training datasets such as MSCOCO dataset, MPII dataset, and POSETTRACK dataset can be used; .

步骤S12,将前述步骤S11中获取的训练样本输入神经网络模型,得到啮齿类动物的姿态的识别结果。In step S12, the training samples obtained in the foregoing step S11 are input into the neural network model to obtain the recognition result of the posture of the rodent.

本申请实施例对神经网络模型不做具体限定,任何能够实现本申请所述姿态识别的神经网络模型均可。例如,神经网络模型可以是VGG、ResNet、HRNet等2D卷积神经网络。The embodiments of the present application do not specifically limit the neural network model, and any neural network model capable of realizing the gesture recognition described in the present application may be used. For example, the neural network model can be a 2D convolutional neural network such as VGG, ResNet, HRNet, etc.

可选地,HRNet(高分辨率网络HighResolution Network)在进行特征提取时能够全程保持高分辨率,并且在特征提取过程中能够进行不同分辨率特征的较差融合。尤其适合应用在语义分割、人体姿态、图像分类、面部标志物检测、通用目标识别等场景。Optionally, HRNet (HighResolution Network) can maintain high resolution throughout the feature extraction process, and can perform poor fusion of features of different resolutions during the feature extraction process. It is especially suitable for use in semantic segmentation, human pose, image classification, facial landmark detection, general target recognition and other scenarios.

其中,识别结果可以包括由神经网络模型识别的预设数目个啮齿类动物身体关键点的位置信息(也可以简称为识别位置)。Wherein, the identification result may include position information of a preset number of rodent body key points identified by the neural network model (also referred to as identification positions for short).

步骤S13,根据步骤S12中的识别结果,利用损失函数,使用梯度下降法对所述神经网络模型进行训练。In step S13, according to the identification result in step S12, using the loss function, the neural network model is trained by using the gradient descent method.

其中,损失函数可包括时间约束项LtemporalWherein, the loss function may include a temporal constraint term L temporal .

下面结合图2对时间约束项Ltemporal的确定方法做以详细的描述。参阅图2,图2示出的是一种时间约束项的确定方法。The method for determining the time constraint item L temporal will be described in detail below with reference to FIG. 2 . Referring to FIG. 2, FIG. 2 shows a method for determining a time constraint item.

时间约束项Ltemporal可用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。A temporal constraint term L temporal may be used to constrain the positions of keypoints in the rodent's pose between adjacent image frames in the image sequence.

在一些实施例中,时间约束项Ltemporal可以根据利用跟踪方法获取的关键点的位置信息与识别结果中关键节点的位置信息的误差来确定。In some embodiments, the temporal constraint term L temporal may be determined according to the error between the position information of the key points obtained by using the tracking method and the position information of the key nodes in the recognition result.

本申请实施例提供的训练方法中,跟踪方法可以为无监督的跟踪方法,例如可以为Lucas-Kanade光流法。In the training method provided by the embodiment of the present application, the tracking method may be an unsupervised tracking method, for example, the Lucas-Kanade optical flow method.

图2所示的方法可包括步骤S1311-S1315。The method shown in FIG. 2 may include steps S1311-S1315.

步骤S1311,从训练样本中选择m个图像作为跟踪样本。Step S1311, select m images from the training samples as tracking samples.

所述m个图像为训练样本中的任意m个图像。该m个图像可以为训练样本中的连续的m个图像。可以理解的是,该m个图像也可以为训练样本中的所有图像。The m images are any m images in the training sample. The m images may be consecutive m images in the training sample. It can be understood that the m images can also be all images in the training sample.

步骤S1312,将所述训练样本中的m个图像中的第一个图像帧作为初始帧,利用所述初始帧的识别结果进行前向跟踪,得到第一前向跟踪结果,确定所述第一前向跟踪结果与第m个图像帧的识别结果之间的第一差值,所述第一前向跟踪结果包括第m个图像帧中的关键点的跟踪位置。其中,m为大于或等于2的正整数。换句话说,该第一差值可以为第m个图像帧中的同一个关键点的跟踪位置与识别位置之间的差值。Step S1312, taking the first image frame of the m images in the training sample as an initial frame, and using the identification result of the initial frame to perform forward tracking to obtain a first forward tracking result, and determine the first forward tracking result. The first difference between the forward tracking result and the recognition result of the mth image frame, where the first forward tracking result includes the tracking position of the key point in the mth image frame. Among them, m is a positive integer greater than or equal to 2. In other words, the first difference value may be the difference value between the tracked position and the recognized position of the same key point in the mth image frame.

为方便描述,下文将将m个图像构成的集合记为I1,i(i=1,2,…,m),将该集合I1,i的识别结果记为

Figure BDA0003446488120000081
(i=1,2…,m;ω=1,2…,h),其中,ω为每个图像帧中的关键点的个数。For the convenience of description, the set composed of m images will be denoted as I 1,i (i=1,2,...,m), and the recognition result of the set I 1,i will be denoted as
Figure BDA0003446488120000081
(i=1,2...,m; ω=1,2...,h), where ω is the number of key points in each image frame.

将m个图像中的第一帧作为初始帧,利用该初始帧的识别结果

Figure BDA0003446488120000091
进行前向跟踪,得到第一前向跟踪结果
Figure BDA0003446488120000092
确定第一前向跟踪结果和集合I1,i中的第m帧的识别结果
Figure BDA0003446488120000093
之间的差值F1为:Take the first frame of the m images as the initial frame, and use the recognition result of the initial frame
Figure BDA0003446488120000091
Perform forward tracking to get the first forward tracking result
Figure BDA0003446488120000092
Determine the first forward tracking result and the recognition result of the mth frame in the set I 1,i
Figure BDA0003446488120000093
The difference between F 1 is:

Figure BDA0003446488120000094
Figure BDA0003446488120000094

步骤S1313,将所述m个图像中的第m个图像帧作为终止帧,利用所述终止帧的识别结果进行后向跟踪,得到第一后向跟踪结果,所述第一后向跟踪结果包括第一个图像帧中的关键点的跟踪位置。可以理解的是,第m个图像也可以称为m个图像中的最后一个图像帧。确定所述第一后向跟踪结果与所述第一个图像帧的识别结果之间的第二差值。换句话说,该第二差值可以为第一个图像帧中的同一个关键点的跟踪位置与识别位置之间的差值。Step S1313, taking the m-th image frame in the m images as a termination frame, and using the identification result of the termination frame to perform backward tracking to obtain a first backward tracking result, where the first backward tracking result includes: Tracked locations of keypoints in the first image frame. It can be understood that the m th image may also be referred to as the last image frame in the m images. A second difference between the first backward tracking result and the identification result of the first image frame is determined. In other words, the second difference value may be the difference value between the tracked position and the recognized position of the same key point in the first image frame.

将m个图像中的最后一帧作为终止帧,利用该终止帧的识别结果

Figure BDA0003446488120000095
进行后向跟踪,得到第一后向跟踪结果
Figure BDA0003446488120000096
确定第一后向跟踪结果
Figure BDA0003446488120000097
和集合I1,i中的第一帧的识别结果
Figure BDA0003446488120000098
之间的差值F2为:Take the last frame in the m images as the end frame, and use the recognition result of the end frame
Figure BDA0003446488120000095
Perform backward tracking to get the first backward tracking result
Figure BDA0003446488120000096
Determine the first backtracking result
Figure BDA0003446488120000097
and the recognition result of the first frame in the set I 1,i
Figure BDA0003446488120000098
The difference between F 2 is:

Figure BDA0003446488120000099
Figure BDA0003446488120000099

在步骤S1314,将m个图像中的第一帧作为初始帧

Figure BDA00034464881200000910
利用该初始帧的识别结果
Figure BDA00034464881200000911
进行前向跟踪,得到第一前向跟踪结果
Figure BDA00034464881200000912
再以第一前向跟踪结果
Figure BDA00034464881200000913
作为终止帧,进行后向跟踪,确定第二后向跟踪结果
Figure BDA00034464881200000914
确定第二后向跟踪结果
Figure BDA00034464881200000915
与初始帧
Figure BDA00034464881200000916
的差值F3为:In step S1314, the first frame of the m images is taken as the initial frame
Figure BDA00034464881200000910
Use the recognition result of this initial frame
Figure BDA00034464881200000911
Perform forward tracking to get the first forward tracking result
Figure BDA00034464881200000912
Then trace the result in the first forward direction
Figure BDA00034464881200000913
As a termination frame, backward tracking is performed, and the second backward tracking result is determined
Figure BDA00034464881200000914
Determine the second backtracking result
Figure BDA00034464881200000915
with the initial frame
Figure BDA00034464881200000916
The difference F3 is:

Figure BDA00034464881200000917
Figure BDA00034464881200000917

在步骤S1315,确定时间约束项。In step S1315, a time constraint item is determined.

当所述第一差值和所述第二差值均小于或等于预设阈值时,确定所述时间约束项为0;当所述第一差值和/或所述第二差值大于所述预设阈值时,确定所述时间约束项为第二后向跟踪结果

Figure BDA00034464881200000918
与初始帧
Figure BDA00034464881200000919
的差值F3。即所述时间约束项为
Figure BDA0003446488120000101
When both the first difference and the second difference are less than or equal to a preset threshold, the time constraint is determined to be 0; when the first difference and/or the second difference is greater than the predetermined threshold When the preset threshold is set, it is determined that the time constraint item is the second backward tracking result
Figure BDA00034464881200000918
with the initial frame
Figure BDA00034464881200000919
The difference F 3 . That is, the time constraint term is
Figure BDA0003446488120000101

其中E1为预设阈值,该预设阈值与生物的运动特性相关。需要说明的是,相比于神经网络模型的预测结果,利用跟踪方法得到的跟踪结果能够确保同一个关键点跟踪位置在时域上平滑变化。因此,当差值(如第一差值或第二差值)小于预设阈值时,表示识别结果接近跟踪结果,神经网络模型的识别结果在时域上比较平滑,此时可以不设置时间约束项。而当差值大于预设阈值时,表示识别结果与跟踪结果相差较大。也就是说,识别结果在时域上较为抖动。此时可以通过设置时间约束项对神经网络模型进行训练,以使神经网络模型输出的识别结果更加平滑。Among them, E 1 is a preset threshold, and the preset threshold is related to the movement characteristics of living things. It should be noted that, compared with the prediction results of the neural network model, the tracking results obtained by using the tracking method can ensure that the tracking position of the same key point changes smoothly in the time domain. Therefore, when the difference value (such as the first difference value or the second difference value) is smaller than the preset threshold, it means that the recognition result is close to the tracking result, and the recognition result of the neural network model is relatively smooth in the time domain, and no time constraint can be set at this time. item. And when the difference is greater than the preset threshold, it means that the recognition result is quite different from the tracking result. That is to say, the recognition result is relatively jittery in the time domain. At this time, the neural network model can be trained by setting the time constraint item, so as to make the recognition result output by the neural network model smoother.

本申请实施例对确定所述时间约束项的方式不做具体限定。例如,可以将第一差值作为时间约束项。又例如,可以将第二差值作为时间约束项。再例如,可以对第一前向跟踪结果进行后向跟踪,得到第二后向跟踪结果;根据第二后向跟踪结果与第一个图像帧的识别结果之间的差值,确定时间约束项。再例如,可以对第一后向跟踪结果进行前向跟踪,得到第二前向跟踪结果;根据第二前向跟踪结果与第一个图像帧的识别结果之间的差值,确定时间约束项。The embodiments of the present application do not specifically limit the manner of determining the time constraint item. For example, the first difference can be used as a time constraint. For another example, the second difference can be used as a time constraint. For another example, backward tracking can be performed on the first forward tracking result to obtain a second backward tracking result; according to the difference between the second backward tracking result and the recognition result of the first image frame, the time constraint is determined. . For another example, forward tracking may be performed on the first backward tracking result to obtain a second forward tracking result; according to the difference between the second forward tracking result and the recognition result of the first image frame, the time constraint is determined. .

在一些实施方式中,所述损失函数还包括空间约束项,所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。In some embodiments, the loss function further includes a spatial constraint term, the spatial constraint term is used to define the positions of the key points in the pose of the rodent in the same frame of image.

参阅图3,图3示出的是一种空间约束项的确定方法。Referring to Fig. 3, Fig. 3 shows a method for determining a space constraint item.

空间约束项Lspatical可用于限定生物的姿态中的关键点在同一图像帧中的位置。在一些实施例中,空间约束项Lspatical可根据识别结果中的多个关键点的位置之间的差值来确定。The spatial constraint term L spatial can be used to define the positions of key points in the pose of the creature in the same image frame. In some embodiments, the spatial constraint term L spatial may be determined according to the difference between the positions of a plurality of key points in the recognition result.

本申请一实施例提供的确定空间约束项Lspatical的方法可包括步骤S1321-S1322。The method for determining a spatial constraint item L spatial provided by an embodiment of the present application may include steps S1321 -S1322.

步骤S1321,从所述训练样本中选择p个样本,其中p为大于或等于2的正整数;Step S1321, select p samples from the training samples, where p is a positive integer greater than or equal to 2;

所述p个图像为训练样本中的任意p个图像。该p个图像可以为训练样本中的连续的p个图像。可以理解的是,该p个图像也可以为训练样本中的所有图像。其中,p为大于或等于2的正整数。The p images are any p images in the training sample. The p images may be consecutive p images in the training sample. It can be understood that the p images can also be all images in the training sample. Among them, p is a positive integer greater than or equal to 2.

为方便描述,下文将将p个图像构成的集合记为I2,j(j=1,2,…,p),将该集合I2,j的识别结果记为

Figure BDA0003446488120000111
(i=1,2…,p;ω=1,2…,h),其中,ω为每个图像帧中的关键点的个数。For the convenience of description, the set composed of p images will be denoted as I 2,j (j=1,2,...,p), and the recognition result of the set I 2,j will be denoted as
Figure BDA0003446488120000111
(i=1,2...,p; ω=1,2...,h), where ω is the number of key points in each image frame.

步骤S1322,确定所述训练样本中的同一个图像中的两个关键点之间的距离。Step S1322, determining the distance between two key points in the same image in the training sample.

确定所述p个样本的识别结果

Figure BDA0003446488120000112
(j=1,2,…,p,ω=1,2,…h)中两两关键点的距离
Figure BDA0003446488120000113
(j=1,2,…,p,ω=1,2,…h)Determine the recognition results of the p samples
Figure BDA0003446488120000112
(j=1,2,...,p,ω=1,2,...h) The distance between the key points in pairs
Figure BDA0003446488120000113
(j=1,2,…,p,ω=1,2,…h)

步骤S1323,确定空间约束项。Step S1323, determine the space constraint item.

上述距离

Figure BDA0003446488120000114
符合高斯分布,确定所述距离
Figure BDA0003446488120000115
的均值μ和方差σ2,其中ω为每一个图像帧上的关键点的个数;above distance
Figure BDA0003446488120000114
fit a Gaussian distribution, determine the distance
Figure BDA0003446488120000115
The mean μ and variance σ 2 of , where ω is the number of key points on each image frame;

确定空间约束项为:

Figure BDA0003446488120000116
Determine the space constraints as:
Figure BDA0003446488120000116

需要说明的是,上述步骤S1321-S1323提供的确定空间约束项Lspatical的方法仅为示例,也可通过其他方式来确定。例如,也可以基于识别结果中两两关键点的距离与对应的标注结果中两两关键点的距离的误差来确定空间约束项,本申请对此不作限定。It should be noted that the method for determining the spatial constraint item L spatial provided in the above steps S1321- S1323 is only an example, and can also be determined in other ways. For example, the space constraint item may also be determined based on the error between the distance between the key points in the recognition result and the distance between the key points in the corresponding labeling result, which is not limited in this application.

在一些实施例中,损失函数还可以包括误差约束项LMSE。在一些实施例中,误差约束项LMSE可以根据训练样本的识别结果与标注结果中同一关键点的位置信息的误差确定。参阅图4,以均方误差为例,确定误差约束项可包括步骤S1331-S1333。In some embodiments, the loss function may also include an error constraint term L MSE . In some embodiments, the error constraint term L MSE may be determined according to the error between the recognition result of the training sample and the position information of the same key point in the labeling result. Referring to FIG. 4, taking the mean square error as an example, determining the error constraint term may include steps S1331-S1333.

步骤S1331,从由步骤S11获取的训练样本中选择n个图像构成样本集I3,k(k=1,2,…,n),其中n为大于或等于1的正整数。Step S1331, select n images from the training samples obtained in step S11 to form a sample set I 3,k (k=1, 2, . . . , n), where n is a positive integer greater than or equal to 1.

所述n个图像为训练样本中的任意n个图像。该n个图像可以为训练样本中的连续的n个图像。可以理解的是,该n个图像也可以为训练样本中的所有图像。The n images are any n images in the training sample. The n images may be consecutive n images in the training sample. It can be understood that the n images can also be all images in the training sample.

步骤S1332,确定样本集I3,k的识别结果

Figure BDA0003446488120000121
(k=1,2…,n;ω=1,2…,h)以及标注结果
Figure BDA0003446488120000122
(k=1,2…,n;ω=1,2…,h)。Step S1332, determine the identification result of the sample set I 3,k
Figure BDA0003446488120000121
(k=1,2...,n; ω=1,2...,h) and labeling results
Figure BDA0003446488120000122
(k=1,2...,n; ω=1,2...,h).

步骤S1333,计算前述识别结果

Figure BDA0003446488120000123
和标注结果
Figure BDA0003446488120000124
的均方误差,确定误差损失项为:Step S1333, calculate the aforementioned recognition result
Figure BDA0003446488120000123
and labeling results
Figure BDA0003446488120000124
The mean square error of , determines the error loss term as:

Figure BDA0003446488120000125
Figure BDA0003446488120000125

对于误差损失项来说,除了均方误差损失,还可采用本领域常用的交叉熵损失、0-1损失、绝对值损失等。上述步骤S1331-S1333所示的方法仅为示例,并不具有对本申请保护范围的限定作用。For the error loss term, in addition to the mean square error loss, cross entropy loss, 0-1 loss, absolute value loss, etc. commonly used in the art can also be used. The methods shown in the above steps S1331-S1333 are only examples, and do not have a limiting effect on the protection scope of the present application.

在一些实施例中,还可将前述误差约束项LMSE、时间约束项Ltemporal以及空间约束项Lspatical加权求和来确定损失函数。即,损失函数L=LMSE+aLtemporal+bLspatical,其中,a和b为超参数,其取值为大于或等于0。In some embodiments, the loss function may also be determined by a weighted summation of the aforementioned error constraint term L MSE , temporal constraint term L temporal , and spatial constraint term L spatial . That is, the loss function L=L MSE + aL temporal + bL spatical , where a and b are hyperparameters whose values are greater than or equal to 0.

下面结合图5详细描述本申请提供的训练装置的实施例。应理解,装置实施例与前述方法实施例的描述相互对应。因此,未详细描述的部分可参见前述方法实施例。An embodiment of the training device provided by the present application will be described in detail below with reference to FIG. 5 . It should be understood that the descriptions of the apparatus embodiments and the foregoing method embodiments correspond to each other. Therefore, the parts not described in detail can refer to the foregoing method embodiments.

图5是本申请一个实施例提供的训练装置50的示意性框图。应理解,图5示出的装置50仅是示例,本发明实施例的装置50还可包括其他模块或单元。FIG. 5 is a schematic block diagram of a training apparatus 50 provided by an embodiment of the present application. It should be understood that the apparatus 50 shown in FIG. 5 is only an example, and the apparatus 50 in this embodiment of the present invention may further include other modules or units.

应理解,装置50能够执行图1-图4的方法中的各个步骤,为了避免重复,此处不再赘述。It should be understood that the apparatus 50 can perform various steps in the methods of FIGS. 1-4 , and in order to avoid repetition, details are not described herein again.

作为一种可能的实现方式,所述装置包括:As a possible implementation manner, the device includes:

获取模块51,用于获取训练样本。The acquiring module 51 is used for acquiring training samples.

其中,训练样本及其获取方式可与前述方法的步骤S11一致,此处不再赘述。Wherein, the training samples and the acquisition method thereof may be consistent with step S11 of the foregoing method, and details are not described herein again.

输入模块52,用于将所述训练样本输入神经网络模型,得到所述啮齿类动物的姿态的识别结果,所述识别结果包括所述啮齿类动物的姿态中的关键点。The input module 52 is configured to input the training sample into a neural network model to obtain a recognition result of the posture of the rodent, where the recognition result includes key points in the posture of the rodent.

训练模块53,用于根据所述啮齿类动物的姿态的识别结果,利用损失函数,使用梯度下降法对所述神经网络模型进行训练。The training module 53 is configured to use a loss function to train the neural network model by using a gradient descent method according to the recognition result of the posture of the rodent.

可选地,所述神经网络模型包括HRNet网络。Optionally, the neural network model includes HRNet network.

可选地,在所述对所述神经网络模型进行训练之前,所述训练装置还包括:第一确定模块,用于根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项。Optionally, before the training of the neural network model, the training device further includes: a first determination module, configured to determine the difference between the position of the key point obtained by using the tracking method and the key point in the identification result. The error of the position determines the time constraint term.

可选地,所述跟踪方法包括Lucas-Kanade光流法。Optionally, the tracking method includes Lucas-Kanade optical flow method.

可选地,所述第一确定模块用于:从所述训练样本中选择m个样本作为跟踪样本,其中m为大于或等于2的正整数;将所述跟踪样本的识别结果

Figure BDA0003446488120000131
(i=1.2…m,ω=1,2,…,h)中的第一帧
Figure BDA0003446488120000132
作为初始帧,进行前向跟踪,得到前向跟踪结果
Figure BDA0003446488120000133
确定所述前向跟踪结果与所述跟踪样本中的第m个图像帧的识别结果的差值为:
Figure BDA0003446488120000134
其中ω为每一个图像帧上的关键点的个数;将所述跟踪样本的识别结果中的最后一帧
Figure BDA0003446488120000135
作为终止帧,进行后向跟踪,得到后向跟踪结果
Figure BDA0003446488120000136
确定所述后向跟踪结果与所述跟踪样本中的第1个图像帧的识别结果的差值为
Figure BDA0003446488120000137
将所述跟踪样本的识别结果中的第一帧
Figure BDA0003446488120000138
作为初始帧
Figure BDA0003446488120000139
进行前向跟踪,得到前向跟踪结果
Figure BDA00034464881200001310
以所述前向跟踪结果
Figure BDA00034464881200001311
作为初始帧,进行后向跟踪,得到第二后向跟踪结果
Figure BDA00034464881200001312
所述第二后向跟踪结果与所述初始帧
Figure BDA00034464881200001313
的差值为
Figure BDA00034464881200001314
确定时间约束项为:
Figure BDA00034464881200001315
其中E1为阈值。Optionally, the first determining module is configured to: select m samples from the training samples as tracking samples, where m is a positive integer greater than or equal to 2;
Figure BDA0003446488120000131
The first frame in (i=1.2...m,ω=1,2,...,h)
Figure BDA0003446488120000132
As the initial frame, forward tracking is performed to obtain the forward tracking result
Figure BDA0003446488120000133
It is determined that the difference between the forward tracking result and the recognition result of the mth image frame in the tracking sample is:
Figure BDA0003446488120000134
where ω is the number of key points on each image frame; the last frame in the recognition result of the tracking sample
Figure BDA0003446488120000135
As the termination frame, backward tracking is performed to obtain the backward tracking result
Figure BDA0003446488120000136
Determine the difference between the backward tracking result and the recognition result of the first image frame in the tracking sample as
Figure BDA0003446488120000137
The first frame of the identification result of the tracking sample
Figure BDA0003446488120000138
as initial frame
Figure BDA0003446488120000139
Perform forward tracking and get forward tracking results
Figure BDA00034464881200001310
Take the forward trace result
Figure BDA00034464881200001311
As the initial frame, backward tracking is performed to obtain the second backward tracking result
Figure BDA00034464881200001312
the second backtracking result and the initial frame
Figure BDA00034464881200001313
The difference is
Figure BDA00034464881200001314
Determine the time constraints as:
Figure BDA00034464881200001315
where E 1 is the threshold.

可选地,所述损失函数还包括空间约束项,所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。Optionally, the loss function further includes a space constraint term, and the space constraint term is used to define the positions of the key points in the pose of the rodent in the same frame of image.

可选地,所述训练装置还包括:第二确定模块,用于根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项。Optionally, the training device further includes: a second determination module, configured to determine the space constraint item according to the difference between the positions of the multiple key points in the recognition result.

可选地,所述第二确定模块用于:从所述训练样本中选择p个样本,其中p为大于或等于2的正整数;确定所述p个样本的识别结果

Figure BDA0003446488120000141
(j=1,2,…,p,ω=1,2,…h)中两两关键点的距离
Figure BDA0003446488120000142
(j=1,2,…,p,ω=1,2,…h),所述距离
Figure BDA0003446488120000143
符合高斯分布,确定所述距离
Figure BDA0003446488120000144
的均值μ和方差σ2,其中ω为每一个图像帧上的关键点的个数;确定空间约束项为:Optionally, the second determining module is configured to: select p samples from the training samples, where p is a positive integer greater than or equal to 2; determine the identification results of the p samples
Figure BDA0003446488120000141
(j=1,2,...,p,ω=1,2,...h) The distance between the key points in pairs
Figure BDA0003446488120000142
(j=1,2,...,p,ω=1,2,...h), the distance
Figure BDA0003446488120000143
fit a Gaussian distribution, determine the distance
Figure BDA0003446488120000144
The mean μ and variance σ 2 of , where ω is the number of key points on each image frame; the spatial constraint term is determined as:

Figure BDA0003446488120000145
Figure BDA0003446488120000145

可选地,所述损失函数还包括误差约束项,所述误差约束项用于约束所述生物的姿态中的关键点在所述识别结果和标注结果中的误差。Optionally, the loss function further includes an error constraint term, and the error constraint term is used to constrain the error of the key points in the posture of the creature in the recognition result and the labeling result.

可选地,所述误差损失项为均方误差损失项。Optionally, the error loss term is a mean square error loss term.

应理解,这里的训练神经网络模型的装置50以功能模块的形式体现。这里的术语“模块”可以通过软件和/或硬件形式实现,对此不作具体限定。例如,“模块”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。It should be understood that the apparatus 50 for training a neural network model here is embodied in the form of functional modules. The term "module" here can be implemented in the form of software and/or hardware, which is not specifically limited. For example, a "module" may be a software program, a hardware circuit, or a combination of the two that implement the above-mentioned functions. The hardware circuits may include application specific integrated circuits (ASICs), electronic circuits, processors (eg, shared processors, proprietary processors, or group processors) for executing one or more software or firmware programs etc.) and memory, merge logic and/or other suitable components to support the described functions.

作为一个示例,本发明实施例提供的训练神经网络模型的装置50可以是处理器或芯片,以用于执行本发明实施例所述的方法。As an example, the apparatus 50 for training a neural network model provided by the embodiment of the present invention may be a processor or a chip, so as to execute the method described in the embodiment of the present invention.

图6是本申请另一实施例提供的训练装置60的示意性框图。图6所示的装置60包括存储器61、处理器62、通信接口63以及总线64。其中,存储器61、处理器62、通信接口63通过总线64实现彼此之间的通信连接。FIG. 6 is a schematic block diagram of a training apparatus 60 provided by another embodiment of the present application. The apparatus 60 shown in FIG. 6 includes a memory 61 , a processor 62 , a communication interface 63 and a bus 64 . The memory 61 , the processor 62 , and the communication interface 63 are connected to each other through the bus 64 for communication.

存储器61可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器61可以存储程序,当存储器61中存储的程序被处理器62执行时,处理器62用于执行本发明实施例提供的训练方法的各个步骤,例如,可以执行图1-图4所示实施例的各个步骤。The memory 61 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 61 may store a program, and when the program stored in the memory 61 is executed by the processor 62, the processor 62 is configured to execute each step of the training method provided in this embodiment of the present invention. For example, the implementation shown in FIG. 1 to FIG. 4 may be executed. each step of the example.

处理器62可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本发明方法实施例的训练方法。The processor 62 can be a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for executing relevant programs to The training method of the method embodiment of the present invention is implemented.

处理器62还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本发明实施例提供的训练方法的各个步骤可以通过处理器62中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 62 can also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the training method provided by the embodiment of the present invention may be completed by an integrated logic circuit of hardware in the processor 62 or an instruction in the form of software.

上述处理器62还可以是通用处理器、数字信号处理器(digital signalprocessing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gatearray,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The above-mentioned processor 62 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates. Or transistor logic devices, discrete hardware components. Various methods, steps, and logical block diagrams disclosed in the embodiments of the present invention can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器61,处理器62读取存储器61中的信息,结合其硬件完成本发明实施例中姿态识别的装置包括的单元所需执行的功能,或者,执行本发明方法实施例的训练方法。例如,可以执行图1-图4所示实施例的各个步骤/功能。The steps of the method disclosed in conjunction with the embodiments of the present invention may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 61, and the processor 62 reads the information in the memory 61 and, in combination with its hardware, completes the functions required to be performed by the units included in the gesture recognition apparatus in the embodiment of the present invention, or performs the training of the method embodiment of the present invention method. For example, various steps/functions of the embodiments shown in FIGS. 1-4 may be performed.

通信接口63可以使用但不限于收发器一类的收发装置,来实现装置60与其他设备或通信网络之间的通信。The communication interface 63 can use, but is not limited to, a transceiver such as a transceiver to implement communication between the device 60 and other devices or a communication network.

总线64可以包括在装置60各个部件(例如,存储器61、处理器62、通信接口63)之间传送信息的通路。Bus 64 may include a pathway for communicating information between various components of device 60 (eg, memory 61, processor 62, communication interface 63).

应理解,本发明实施例所示的装置60可以是处理器或芯片,以用于执行本发明实施例所述的方法。It should be understood that the apparatus 60 shown in the embodiment of the present invention may be a processor or a chip, so as to execute the method described in the embodiment of the present invention.

应理解,本发明实施例中的处理器可以为中央处理单元(central processingunit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signalprocessor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor in this embodiment of the present invention may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

下面结合图7的应用场景,对本申请实施例的具体应用进行介绍。需要说明的是,下述关于图7的描述仅为示例而非限定,本申请实施例中的方法并不限于此,也可以应用于其他姿态识别的场景。The specific application of the embodiment of the present application will be introduced below with reference to the application scenario of FIG. 7 . It should be noted that the following description about FIG. 7 is only an example and not a limitation, and the method in this embodiment of the present application is not limited thereto, and may also be applied to other gesture recognition scenarios.

图7中的应用场景可以包括图像获取装置71及图像处理装置72。The application scenario in FIG. 7 may include an image acquisition device 71 and an image processing device 72 .

其中,图像获取装置71可用于获取啮齿类动物的图像序列。图像处理装置72可以集成在电子设备中,该电子设备可以是服务器也可以是终端等设备,本申请实施例对此不作限定。例如,服务器可以是独立的物理服务器,也可是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云计算、云存储、云通信以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以使智能手机、平板电脑、计算机以及智能物联网设备等。终端以及服务器可以通过有线或无线通信方式进行直接或间接的连接,本申请对此不作限制。Among them, the image acquisition device 71 can be used to acquire a sequence of images of rodents. The image processing apparatus 72 may be integrated in an electronic device, and the electronic device may be a server or a terminal or other device, which is not limited in this embodiment of the present application. For example, a server can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a basic cloud that provides cloud services, cloud computing, cloud storage, cloud communication, and big data and artificial intelligence platforms. Cloud servers for computing services. Terminals can be smartphones, tablets, computers, and smart IoT devices. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.

图像处理装置72中可以部署有神经网络模型,可用于在利用上述图像获取装置71获取的图像序列后,采用神经网络模型对图像进行识别,得到待处理图像中的关键点的位置信息。其中,关键点的位置信息可包括例如啮齿类动物身体关节、躯干或五官的位置坐标信息等。A neural network model can be deployed in the image processing device 72, which can be used to identify the image by using the neural network model after using the image sequence obtained by the above-mentioned image acquisition device 71 to obtain position information of key points in the image to be processed. The location information of the key points may include, for example, the location coordinate information of the rodent body joints, torso or facial features, and the like.

上述电子设备还可以利用图像获取装置71获取训练样本,根据训练样本的识别结果以及人为标注的结果,利用损失函数对神经网络模型进行训练。图像处理装置72还可以通过训练后的神经网络模型对待处理图像进行识别,进而达到精准识别图像的目的。The above electronic device may also use the image acquisition device 71 to acquire training samples, and use the loss function to train the neural network model according to the recognition results of the training samples and the results of human annotation. The image processing device 72 can also recognize the image to be processed through the trained neural network model, thereby achieving the purpose of accurately recognizing the image.

上文描述的实施例仅是本申请一部分实施例,而不是全部的实施例。上述实施例的描述顺序不作为对实施例优选顺序的限定。基于本申请中的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。The above-described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. The description order of the above embodiments is not intended to limit the preferred order of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。It should be understood that, in this embodiment of the present application, "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B according to A does not mean that B is only determined according to A, and B may also be determined according to A and/or other information.

应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" in this document is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, and A and B exist simultaneously , there are three cases of B alone. In addition, the character "/" in this text generally indicates that the related objects are an "or" relationship.

应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够读取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital video disc,DVD))或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be read by a computer or a data storage device such as a server, a data center, etc. that includes one or more available media integrated. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, digital video discs (DVDs)), or semiconductor media (eg, solid state disks (SSDs)) )Wait.

以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (20)

1.一种训练方法,其特征在于,包括:1. a training method, is characterized in that, comprises: 获取训练样本,所述训练样本为记录啮齿类动物运动的图像序列;obtaining a training sample, the training sample is an image sequence recording rodent movement; 将所述训练样本输入神经网络模型,得到所述啮齿类动物的姿态的识别结果,所述识别结果包括所述啮齿类动物的姿态中的关键点;Inputting the training sample into a neural network model to obtain a recognition result of the rodent's posture, where the recognition result includes key points in the rodent's posture; 根据所述啮齿类动物的姿态的识别结果,利用损失函数,使用梯度下降法对所述神经网络模型进行训练;According to the recognition result of the posture of the rodent, use the loss function to train the neural network model by using the gradient descent method; 其中,所述损失函数包括时间约束项,所述时间约束项用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence. 2.根据权利要求1所述的方法,其特征在于,所述神经网络模型包括HRNet网络。2. The method of claim 1, wherein the neural network model comprises a HRNet network. 3.根据权利要求2所述的训练方法,其特征在于,在所述对所述神经网络模型进行训练之前,所述训练方法还包括:3. The training method according to claim 2, wherein, before the neural network model is trained, the training method further comprises: 根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项。The time constraint item is determined according to the error between the position of the key point obtained by the tracking method and the position of the key point in the recognition result. 4.根据要求3所述的训练方法,其特征在于,所述跟踪方法包括Lucas-Kanade光流法。4. The training method according to claim 3, wherein the tracking method comprises a Lucas-Kanade optical flow method. 5.根据权利要求3所述的训练方法,其特征在于,所述根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项,包括:5. The training method according to claim 3, wherein the time constraint is determined according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the identification result, comprising: 从所述训练样本中选择m个样本作为跟踪样本,其中m为大于或等于2的正整数;Select m samples from the training samples as tracking samples, where m is a positive integer greater than or equal to 2; 将所述跟踪样本的识别结果
Figure FDA0003446488110000011
中的第一帧
Figure FDA0003446488110000012
作为初始帧,进行前向跟踪,得到前向跟踪结果
Figure FDA0003446488110000013
确定所述前向跟踪结果与所述跟踪样本中的第m个图像帧的识别结果的差值为:
Figure FDA0003446488110000014
其中ω为每一个图像帧上的关键点的个数;
The identification results of the tracking samples
Figure FDA0003446488110000011
first frame in
Figure FDA0003446488110000012
As the initial frame, forward tracking is performed to obtain the forward tracking result
Figure FDA0003446488110000013
It is determined that the difference between the forward tracking result and the recognition result of the mth image frame in the tracking sample is:
Figure FDA0003446488110000014
where ω is the number of key points on each image frame;
将所述跟踪样本的识别结果中的最后一帧
Figure FDA0003446488110000021
作为终止帧,进行后向跟踪,得到后向跟踪结果
Figure FDA0003446488110000022
确定所述后向跟踪结果与所述跟踪样本中的第1个图像帧的识别结果的差值为
Figure FDA0003446488110000023
The last frame in the recognition result of the tracking sample
Figure FDA0003446488110000021
As the termination frame, backward tracking is performed to obtain the backward tracking result
Figure FDA0003446488110000022
Determine the difference between the backward tracking result and the recognition result of the first image frame in the tracking sample as
Figure FDA0003446488110000023
将所述跟踪样本的识别结果中的第一帧
Figure FDA0003446488110000024
作为初始帧
Figure FDA0003446488110000025
进行前向跟踪,得到前向跟踪结果
Figure FDA0003446488110000026
The first frame of the identification result of the tracking sample
Figure FDA0003446488110000024
as initial frame
Figure FDA0003446488110000025
Perform forward tracking and get forward tracking results
Figure FDA0003446488110000026
以所述前向跟踪结果
Figure FDA0003446488110000027
作为初始帧,进行后向跟踪,得到第二后向跟踪结果
Figure FDA0003446488110000028
所述第二后向跟踪结果与所述初始帧
Figure FDA0003446488110000029
的差值为
Figure FDA00034464881100000210
Take the forward trace result
Figure FDA0003446488110000027
As the initial frame, backward tracking is performed to obtain the second backward tracking result
Figure FDA0003446488110000028
the second backtracking result and the initial frame
Figure FDA0003446488110000029
The difference is
Figure FDA00034464881100000210
确定时间约束项为:
Figure FDA00034464881100000211
其中E1为阈值。
Determine the time constraints as:
Figure FDA00034464881100000211
where E 1 is the threshold.
6.根据权利要求1所述的训练方法,其特征在于,6. training method according to claim 1, is characterized in that, 所述损失函数还包括空间约束项,所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。The loss function further includes a spatial constraint term, the spatial constraint term is used to define the position of the key points in the pose of the rodent in the same frame of image. 7.根据权利要求6所述的训练方法,其特征在于,在所述对所述神经网络模型进行训练之前,所述训练方法还包括:7. The training method according to claim 6, wherein, before the neural network model is trained, the training method further comprises: 根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项。The space constraint item is determined according to the difference between the positions of a plurality of key points in the recognition result. 8.根据权利要求7所述的训练方法,其特征在于,所述根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项,包括:8. The training method according to claim 7, wherein the determining the space constraint item according to the difference between the positions of multiple key points in the recognition result comprises: 从所述训练样本中选择p个样本,其中p为大于或等于2的正整数;select p samples from the training samples, where p is a positive integer greater than or equal to 2; 确定所述p个样本的识别结果
Figure FDA00034464881100000212
中两两关键点的距离
Figure FDA00034464881100000213
所述距离
Figure FDA00034464881100000214
符合高斯分布,确定所述距离
Figure FDA00034464881100000215
的均值μ和方差σ2,其中ω为每一个图像帧上的关键点的个数;
Determine the recognition results of the p samples
Figure FDA00034464881100000212
The distance between two key points in the middle
Figure FDA00034464881100000213
the distance
Figure FDA00034464881100000214
fit a Gaussian distribution, determine the distance
Figure FDA00034464881100000215
The mean μ and variance σ 2 of , where ω is the number of key points on each image frame;
确定空间约束项为:
Figure FDA00034464881100000216
Determine the space constraints as:
Figure FDA00034464881100000216
9.根据权利要求1所述的训练方法,其特征在于,所述损失函数还包括误差约束项,所述误差约束项用于约束所述啮齿类动物的姿态中的关键点在所述识别结果和标注结果中的误差。9 . The training method according to claim 1 , wherein the loss function further comprises an error constraint term, and the error constraint term is used to constrain the key points in the pose of the rodent in the recognition result. 10 . and errors in the annotation results. 10.根据权利要求9所述的训练方法,其特征在于,所述误差损失项为均方误差损失项。10. The training method according to claim 9, wherein the error loss term is a mean square error loss term. 11.一种训练装置,其特征在于,包括:11. A training device, characterized in that, comprising: 获取模块,用于获取训练样本,所述训练样本为记录啮齿类动物运动的图像序列;an acquisition module for acquiring a training sample, where the training sample is an image sequence for recording rodent movements; 输入模块,用于将所述训练样本输入神经网络模型,得到所述啮齿类动物的姿态的识别结果,所述识别结果包括所述啮齿类动物的姿态中的关键点;an input module, configured to input the training sample into a neural network model to obtain a recognition result of the posture of the rodent, where the recognition result includes key points in the posture of the rodent; 训练模块,用于根据所述啮齿类动物的姿态的识别结果,利用损失函数,使用梯度下降法对所述神经网络模型进行训练;a training module for training the neural network model by using a loss function and gradient descent method according to the recognition result of the posture of the rodent; 其中,所述损失函数包括时间约束项,所述时间约束项用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence. 12.根据权利要求11所述的训练装置,其特征在于,所述神经网络模型包括HRNet网络。12. The training device according to claim 11, wherein the neural network model comprises a HRNet network. 13.根据权利要求12所述的训练装置,其特征在于,在所述对所述神经网络模型进行训练之前,所述训练装置还包括:13. The training device according to claim 12, wherein before the training of the neural network model, the training device further comprises: 第一确定模块,用于根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差,确定所述时间约束项。The first determination module is configured to determine the time constraint item according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result. 14.根据权利要求13所述的训练装置,其特征在于,所述跟踪方法包括Lucas-Kanade光流法。14. The training device according to claim 13, wherein the tracking method comprises a Lucas-Kanade optical flow method. 15.根据权利要求14所述的训练装置,其特征在于,所述第一确定模块用于:15. The training device according to claim 14, wherein the first determination module is used for: 从所述训练样本中选择m个样本作为跟踪样本,其中m为大于或等于2的正整数;Select m samples from the training samples as tracking samples, where m is a positive integer greater than or equal to 2; 将所述跟踪样本的识别结果
Figure FDA0003446488110000031
中的第一帧
Figure FDA0003446488110000041
作为初始帧,进行前向跟踪,得到前向跟踪结果
Figure FDA0003446488110000042
确定所述前向跟踪结果与所述跟踪样本中的第m个图像帧的识别结果的差值为:
Figure FDA0003446488110000043
其中ω为每一个图像帧上的关键点的个数;
The identification results of the tracking samples
Figure FDA0003446488110000031
first frame in
Figure FDA0003446488110000041
As the initial frame, forward tracking is performed to obtain the forward tracking result
Figure FDA0003446488110000042
It is determined that the difference between the forward tracking result and the recognition result of the mth image frame in the tracking sample is:
Figure FDA0003446488110000043
where ω is the number of key points on each image frame;
将所述跟踪样本的识别结果中的最后一帧
Figure FDA0003446488110000044
作为终止帧,进行后向跟踪,得到后向跟踪结果
Figure FDA0003446488110000045
确定所述后向跟踪结果与所述跟踪样本中的第1个图像帧的识别结果的差值为
Figure FDA0003446488110000046
The last frame in the recognition result of the tracking sample
Figure FDA0003446488110000044
As the termination frame, backward tracking is performed to obtain the backward tracking result
Figure FDA0003446488110000045
Determine the difference between the backward tracking result and the recognition result of the first image frame in the tracking sample as
Figure FDA0003446488110000046
将所述跟踪样本的识别结果中的第一帧
Figure FDA0003446488110000047
作为初始帧
Figure FDA0003446488110000048
进行前向跟踪,得到前向跟踪结果
Figure FDA0003446488110000049
The first frame of the identification result of the tracking sample
Figure FDA0003446488110000047
as initial frame
Figure FDA0003446488110000048
Perform forward tracking and get forward tracking results
Figure FDA0003446488110000049
以所述前向跟踪结果
Figure FDA00034464881100000410
作为初始帧,进行后向跟踪,得到第二后向跟踪结果
Figure FDA00034464881100000411
所述第二后向跟踪结果与所述初始帧
Figure FDA00034464881100000412
的差值为
Figure FDA00034464881100000413
Take the forward trace result
Figure FDA00034464881100000410
As the initial frame, backward tracking is performed to obtain the second backward tracking result
Figure FDA00034464881100000411
the second backtracking result and the initial frame
Figure FDA00034464881100000412
The difference is
Figure FDA00034464881100000413
确定时间约束项为:
Figure FDA00034464881100000414
其中E1为阈值。
Determine the time constraints as:
Figure FDA00034464881100000414
where E 1 is the threshold.
16.根据权利要求11所述的训练装置,其特征在于,所述损失函数还包括空间约束项,所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。16 . The training device according to claim 11 , wherein the loss function further comprises a spatial constraint term, and the spatial constraint term is used to limit the key points in the pose of the rodent to be in the same frame of image. 17 . s position. 17.根据权利要求16所述的训练装置,其特征在于,所述训练装置还包括:17. The training device of claim 16, wherein the training device further comprises: 第二确定模块,用于根据所述识别结果中的多个关键点的位置之间的差值,确定所述空间约束项。The second determination module is configured to determine the space constraint item according to the difference between the positions of the multiple key points in the identification result. 18.根据权利要求17所述的训练装置,其特征在于,所述第二确定模块用于:18. The training device according to claim 17, wherein the second determination module is used for: 从所述训练样本中选择p个样本,其中p为大于或等于2的正整数;select p samples from the training samples, where p is a positive integer greater than or equal to 2; 确定所述p个样本的识别结果
Figure FDA00034464881100000415
中两两关键点的距离
Figure FDA00034464881100000416
所述距离
Figure FDA00034464881100000417
符合高斯分布,确定所述距离
Figure FDA00034464881100000418
的均值μ和方差σ2,其中ω为每一个图像帧上的关键点的个数;
Determine the recognition results of the p samples
Figure FDA00034464881100000415
The distance between two key points in the middle
Figure FDA00034464881100000416
the distance
Figure FDA00034464881100000417
fit a Gaussian distribution, determine the distance
Figure FDA00034464881100000418
The mean μ and variance σ 2 of , where ω is the number of key points on each image frame;
确定空间约束项为:
Figure FDA0003446488110000051
Determine the space constraints as:
Figure FDA0003446488110000051
19.根据权利要求11所述的训练装置,其特征在于,所述损失函数还包括误差约束项,所述误差约束项用于约束所述啮齿类动物的姿态中的关键点在所述识别结果和标注结果中的误差。19 . The training device according to claim 11 , wherein the loss function further comprises an error constraint term, and the error constraint term is used to constrain key points in the posture of the rodent in the recognition result. 20 . and errors in the annotation results. 20.根据权利要求19所述的训练装置,其特征在于,所述误差损失项为均方误差损失项。20. The training device according to claim 19, wherein the error loss term is a mean square error loss term.
CN202111680833.3A 2021-12-30 2021-12-30 Training method and training device Active CN114333068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111680833.3A CN114333068B (en) 2021-12-30 2021-12-30 Training method and training device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111680833.3A CN114333068B (en) 2021-12-30 2021-12-30 Training method and training device

Publications (2)

Publication Number Publication Date
CN114333068A true CN114333068A (en) 2022-04-12
CN114333068B CN114333068B (en) 2025-02-25

Family

ID=81022304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111680833.3A Active CN114333068B (en) 2021-12-30 2021-12-30 Training method and training device

Country Status (1)

Country Link
CN (1) CN114333068B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119648571A (en) * 2025-02-18 2025-03-18 西湖大学 Method, device, system and medium for processing image sequences for two-photon microscopy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019037498A1 (en) * 2017-08-25 2019-02-28 腾讯科技(深圳)有限公司 Active tracking method, device and system
CN111950723A (en) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 Neural network model training method, image processing method, device and terminal equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019037498A1 (en) * 2017-08-25 2019-02-28 腾讯科技(深圳)有限公司 Active tracking method, device and system
CN111950723A (en) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 Neural network model training method, image processing method, device and terminal equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张博言;钟勇;: "一种基于多样性正实例的单目标跟踪算法", 哈尔滨工业大学学报, no. 10, 25 September 2020 (2020-09-25) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119648571A (en) * 2025-02-18 2025-03-18 西湖大学 Method, device, system and medium for processing image sequences for two-photon microscopy

Also Published As

Publication number Publication date
CN114333068B (en) 2025-02-25

Similar Documents

Publication Publication Date Title
CN108470332B (en) Multi-target tracking method and device
CN108764133B (en) Image recognition method, device and system
CN107066938B (en) Video analysis apparatus, method and computer program product
US20200410669A1 (en) Animal Detection Based on Detection and Association of Parts
Shi et al. Multiscale multitask deep NetVLAD for crowd counting
Kirac et al. Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data
CN111126346A (en) Face recognition method, training method and device of classification model and storage medium
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
CN114746898A (en) Method and system for generating trisection images of image matting
US10679362B1 (en) Multi-camera homogeneous object trajectory alignment
CN114093022A (en) Activity detection device, activity detection system, and activity detection method
Mar et al. Cow detection and tracking system utilizing multi-feature tracking algorithm
CN114612987A (en) A method and device for facial expression recognition
CN111753618B (en) Image recognition method, device, computer equipment and computer readable storage medium
CN108229375B (en) Method and device for detecting face image
CN110516731B (en) Visual odometer feature point detection method and system based on deep learning
Naik Bukht et al. A Novel Human Interaction Framework Using Quadratic Discriminant Analysis with HMM.
WO2021169642A1 (en) Video-based eyeball turning determination method and system
Tong et al. A real-time detector of chicken healthy status based on modified YOLO
Guler et al. Human joint angle estimation and gesture recognition for assistive robotic vision
Barbed et al. Tracking adaptation to improve superpoint for 3d reconstruction in endoscopy
CN114333068B (en) Training method and training device
CN112070022A (en) Face image recognition method and device, electronic equipment and computer readable medium
CN112053382A (en) Access & exit monitoring method, equipment and computer readable storage medium
CN116129523A (en) Action recognition method, device, terminal and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant