CN114333068A

CN114333068A - Training method and training device

Info

Publication number: CN114333068A
Application number: CN202111680833.3A
Authority: CN
Inventors: 王瑶; 张珏; 程和平
Original assignee: Nanjing Jingruikang Molecular Medicine Technology Co ltd
Current assignee: Nanjing Jingruikang Molecular Medicine Technology Co ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-12
Anticipated expiration: 2041-12-30
Also published as: CN114333068B

Abstract

The application provides a training method and a training device. The method comprises the following steps: acquiring a training sample, wherein the training sample is an image sequence for recording the movement of the rodent; inputting the training sample into a neural network model to obtain a recognition result of the rodent posture, wherein the recognition result comprises key points in the rodent posture; and training the neural network model by using a gradient descent method by using a loss function according to the recognition result of the posture of the rodent. Wherein the loss function includes a temporal constraint term for constraining the position of keypoints in the rodent's pose between adjacent image frames in the sequence of images. When the neural network model trained according to the method is used for recognizing the postures of rodents such as mice and the like, the phenomenon that the recognition result shakes in the time domain can be effectively avoided.

Description

Training method and training device

技术领域technical field

本申请涉及人工智能技术领域，具体涉及一种训练方法及训练装置。The present application relates to the technical field of artificial intelligence, and in particular, to a training method and a training device.

背景技术Background technique

姿态识别是指利用神经网络模型，对图像或视频中的生物的动作和/或关键点进行识别和/或提取。Gesture recognition refers to the use of neural network models to identify and/or extract the actions and/or key points of creatures in images or videos.

现有技术中，在对啮齿类动物运动的图像序列中的关键点进行识别时，通常时对单个图像或单个图像帧中的关键点进行提取，上述关键点例如可以是啮齿类动物的关节等。通过这种方式获得的识别结果容易出现抖动，使关键点的运动轨迹不够平滑。In the prior art, when identifying key points in an image sequence in which rodents are moving, the key points in a single image or a single image frame are usually extracted. For example, the key points can be joints of rodents, etc. . The recognition results obtained in this way are prone to jitter, which makes the movement trajectory of key points not smooth enough.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请实施例提供一种训练方法及训练装置，以提高神经网络模型在进行啮齿类动物姿态识别时的准确度和识别效率，并且确保识别结果在时域上的连续性。In view of this, the embodiments of the present application provide a training method and a training device, so as to improve the accuracy and recognition efficiency of a neural network model in rodent gesture recognition, and ensure the continuity of recognition results in the time domain.

第一方面，提供了一种训练方法，所述方法包括：获取训练样本，所述训练样本为记录啮齿类动物运动的图像序列；将所述训练样本输入神经网络模型，得到所述啮齿类动物的姿态的识别结果，所述识别结果包括所述啮齿类动物的姿态中的关键点；根据所述啮齿类动物的姿态的识别结果，利用损失函数，使用梯度下降法对所述神经网络模型进行训练。其中，所述损失函数包括时间约束项，所述时间约束项用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。In a first aspect, a training method is provided, the method comprising: acquiring a training sample, where the training sample is an image sequence recording rodent movements; inputting the training sample into a neural network model to obtain the rodent The recognition result of the posture of the rodent, and the recognition result includes the key points in the posture of the rodent; according to the recognition result of the posture of the rodent, the neural network model is subjected to the gradient descent method using the loss function. train. Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence.

可选地，所述神经网络模型包括HRNet网络。Optionally, the neural network model includes HRNet network.

可选地，在所述对所述神经网络模型进行训练之前，所述训练方法还包括：根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差，确定所述时间约束项。Optionally, before the training of the neural network model, the training method further includes: according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result, determining the time constraints.

可选地，所述跟踪方法包括Lucas-Kanade光流法。Optionally, the tracking method includes Lucas-Kanade optical flow method.

可选地，所述根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差，确定所述时间约束项，包括：从所述训练样本中选择m个样本作为跟踪样本，其中m为大于或等于2的正整数；将所述跟踪样本的识别结果

中的第一帧

作为初始帧，进行前向跟踪，得到前向跟踪结果

确定所述前向跟踪结果与所述跟踪样本中的第m个图像帧的识别结果的差值为：

其中ω为每一个图像帧上的关键点的个数；将所述跟踪样本的识别结果中的最后一帧

作为终止帧，进行后向跟踪，得到后向跟踪结果

确定所述后向跟踪结果与所述跟踪样本中的第1个图像帧的识别结果的差值为

将所述跟踪样本的识别结果中的第一帧

作为初始帧

进行前向跟踪，得到前向跟踪结果

以所述前向跟踪结果

作为初始帧，进行后向跟踪，得到第二后向跟踪结果

所述第二后向跟踪结果与所述初始帧

的差值为

确定时间约束项为：

其中E₁为阈值。Optionally, the determining the time constraint item according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result includes: selecting m samples from the training samples as tracking sample, where m is a positive integer greater than or equal to 2; the identification result of the tracking sample

first frame in

As the initial frame, forward tracking is performed to obtain the forward tracking result

It is determined that the difference between the forward tracking result and the recognition result of the mth image frame in the tracking sample is:

where ω is the number of key points on each image frame; the last frame in the recognition result of the tracking sample

As the termination frame, backward tracking is performed to obtain the backward tracking result

Determine the difference between the backward tracking result and the recognition result of the first image frame in the tracking sample as

The first frame of the identification result of the tracking sample

as initial frame

Perform forward tracking and get forward tracking results

Take the forward trace result

As the initial frame, backward tracking is performed to obtain the second backward tracking result

the second backtracking result and the initial frame

The difference is

Determine the time constraints as:

where E ₁ is the threshold.

可选地，所述损失函数还包括空间约束项，所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。Optionally, the loss function further includes a space constraint term, and the space constraint term is used to define the positions of the key points in the pose of the rodent in the same frame of image.

可选地，在所述对所述神经网络模型进行训练之前，所述训练方法还包括：根据所述识别结果中的多个关键点的位置之间的差值，确定所述空间约束项。Optionally, before the training of the neural network model, the training method further includes: determining the space constraint item according to the difference between the positions of a plurality of key points in the recognition result.

可选地，所述根据所述识别结果中的多个关键点的位置之间的差值，确定所述空间约束项，包括：从所述训练样本中选择p个样本，其中p为大于或等于2的正整数；确定所述p个样本的识别结果

中两两关键点的距离

所述距离

符合高斯分布，确定所述距离

的均值μ和方差σ²，其中ω为每一个图像帧上的关键点的个数；确定空间约束项为：Optionally, the determining the space constraint item according to the difference between the positions of multiple key points in the recognition result includes: selecting p samples from the training samples, where p is greater than or A positive integer equal to 2; determine the recognition result of the p samples

The distance between two key points in the middle

the distance

fit a Gaussian distribution, determine the distance

The mean μ and variance σ ² of , where ω is the number of key points on each image frame; the spatial constraint term is determined as:

可选地，所述损失函数还包括误差约束项，所述误差约束项用于约束所述啮齿类动物的姿态中的关键点在所述识别结果和标注结果中的误差。Optionally, the loss function further includes an error constraint term, and the error constraint term is used to constrain the errors of the key points in the pose of the rodent in the recognition result and the labeling result.

可选地，所述误差损失项为均方误差损失项。Optionally, the error loss term is a mean square error loss term.

第二方面，提供一种训练装置，包括：获取模块，用于获取训练样本，所述训练样本为记录啮齿类动物运动的图像序列；输入模块，用于将所述训练样本输入神经网络模型，得到所述啮齿类动物的姿态的识别结果，所述识别结果包括所述啮齿类动物的姿态中的关键点；训练模块，用于根据所述啮齿类动物的姿态的识别结果，利用损失函数，使用梯度下降法对所述神经网络模型进行训练。其中，所述损失函数包括时间约束项，所述时间约束项用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。In a second aspect, a training device is provided, comprising: an acquisition module for acquiring training samples, where the training samples are image sequences recording rodent movements; an input module for inputting the training samples into a neural network model, Obtain the recognition result of the posture of the rodent, where the recognition result includes key points in the posture of the rodent; the training module is configured to use a loss function according to the recognition result of the posture of the rodent, using a loss function, The neural network model is trained using gradient descent. Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence.

可选地，在所述对所述神经网络模型进行训练之前，所述训练装置还包括：第一确定模块，用于根据利用跟踪方法获取的关键点的位置与所述识别结果中关键点的位置的误差，确定所述时间约束项。Optionally, before the training of the neural network model, the training device further includes: a first determination module, configured to determine the difference between the position of the key point obtained by using the tracking method and the key point in the identification result. The error of the position determines the time constraint term.

可选地，所述第一确定模块用于：从所述训练样本中选择m个样本作为跟踪样本，其中m为大于或等于2的正整数；将所述跟踪样本的识别结果

中的第一帧

作为初始帧，进行前向跟踪，得到前向跟踪结果

作为终止帧，进行后向跟踪，得到后向跟踪结果

将所述跟踪样本的识别结果中的第一帧

作为初始帧

进行前向跟踪，得到前向跟踪结果

以所述前向跟踪结果

作为初始帧，进行后向跟踪，得到第二后向跟踪结果

所述第二后向跟踪结果与所述初始帧

的差值为

确定时间约束项为：

其中E₁为阈值。Optionally, the first determining module is configured to: select m samples from the training samples as tracking samples, where m is a positive integer greater than or equal to 2;

first frame in

The first frame of the identification result of the tracking sample

as initial frame

Perform forward tracking and get forward tracking results

Take the forward trace result

the second backtracking result and the initial frame

The difference is

Determine the time constraints as:

where E ₁ is the threshold.

可选地，所述训练装置还包括：第二确定模块，用于根据所述识别结果中的多个关键点的位置之间的差值，确定所述空间约束项。Optionally, the training device further includes: a second determination module, configured to determine the space constraint item according to the difference between the positions of the multiple key points in the recognition result.

可选地，所述第二确定模块用于：从所述训练样本中选择p个样本，其中p为大于或等于2的正整数；确定所述p个样本的识别结果

中两两关键点的距离

所述距离

符合高斯分布，确定所述距离

的均值μ和方差σ²，其中ω为每一个图像帧上的关键点的个数；确定空间约束项为：Optionally, the second determining module is configured to: select p samples from the training samples, where p is a positive integer greater than or equal to 2; determine the identification results of the p samples

The distance between two key points in the middle

the distance

fit a Gaussian distribution, determine the distance

本申请通过在神经网络模型的训练过程中引入时间约束，使得所述神经网络模型在处理遮挡和模糊图像时能够具有较高的准确率，同时能够有效抑制神经网络模型的识别结果在时域上的抖动现象。In the present application, by introducing time constraints in the training process of the neural network model, the neural network model can have a high accuracy rate when processing occlusion and blurred images, and at the same time, it can effectively suppress the recognition results of the neural network model in the time domain. jitter phenomenon.

附图说明Description of drawings

图1为本申请一实施例提供的训练方法的示意性流程图。FIG. 1 is a schematic flowchart of a training method provided by an embodiment of the present application.

图2为本申请一实施例提供的时间约束项的确定方法的示意性流程图。FIG. 2 is a schematic flowchart of a method for determining a time constraint item according to an embodiment of the present application.

图3为本申请一实施例提供的空间约束项的确定方法的示意性流程图。FIG. 3 is a schematic flowchart of a method for determining a space constraint item according to an embodiment of the present application.

图4为本申请一实施例提供的误差约束项的确定方法的示意性流程图。FIG. 4 is a schematic flowchart of a method for determining an error constraint term provided by an embodiment of the present application.

图5为本申请一实施例提供的训练装置的示意性框图。FIG. 5 is a schematic block diagram of a training apparatus provided by an embodiment of the present application.

图6为本申请另一实施例提供的训练装置的示意性框图。FIG. 6 is a schematic block diagram of a training apparatus provided by another embodiment of the present application.

图7为本申请实施例的应用场景的示意性框图。FIG. 7 is a schematic block diagram of an application scenario of an embodiment of the present application.

具体实施方式Detailed ways

本申请实施例中的方法及装置可以应用于对各种基于图像序列中的啮齿类动物的姿态识别的场景。该图像序列可以为视频中的多个图像帧。多个图像帧可以为视频中连续的多个图像帧。图像序列也可以是摄像机等图像采集设备采集的动物的多张图像。该啮齿类动物例如可以是小鼠等。The methods and apparatuses in the embodiments of the present application can be applied to various scenarios based on gesture recognition of rodents in image sequences. The sequence of images can be multiple image frames in the video. The multiple image frames may be consecutive multiple image frames in the video. The image sequence may also be multiple images of the animal captured by an image capturing device such as a camera. The rodent may be, for example, a mouse or the like.

为了便于理解本申请实施例，首先对本申请的背景进行详细的举例说明。In order to facilitate the understanding of the embodiments of the present application, the background of the present application is firstly described in detail with examples.

生物神经元的行为与动物的活动息息相关，动物的姿态变化通常会引起神经元的相应变化。因此，对特定行为下由神经元构成的复杂的网络的连接和交互方式的探索对神经科学和医学领域是非常重要的。本领域一般采用定量分析的方法，即通过获取动物的姿态信息及神经元的行为，确定其对应关系。The behavior of biological neurons is closely related to the activities of animals, and changes in animal posture usually cause corresponding changes in neurons. Therefore, the exploration of the connection and interaction of complex networks of neurons under specific behaviors is of great importance to the fields of neuroscience and medicine. Quantitative analysis methods are generally used in the art, that is, by acquiring animal posture information and neuron behaviors, to determine the corresponding relationship.

获取动物神经元的行为可以利用射线扫描以及微型化多光子显微镜等方法获取。The behavior of animal neurons can be obtained using methods such as ray scanning and miniaturized multiphoton microscopy.

获取动物的姿态信息的方法有多种。例如，可以通过对图像序列中的关键点进行人工标注，以获取动物的姿态信息。但是，面对海量的数据，人工处理的效率较低且容易出错，无法保证得到的姿态信息的准确性。There are many ways to obtain the posture information of animals. For example, the pose information of animals can be obtained by manually annotating key points in an image sequence. However, in the face of massive data, the efficiency of manual processing is low and error-prone, and the accuracy of the obtained attitude information cannot be guaranteed.

又例如，还可以动物身体的关键点处设置标记物(例如位移或加速度传感器)，根据标记物位置等信息的变化确定动物的姿态变化。但是，对于啮齿类动物来说，由于其体型较小，设置标记物会干扰其自然行为，导致采集到的数据的准确性降低。For another example, markers (such as displacement or acceleration sensors) may also be set at key points of the animal's body, and changes in the animal's posture can be determined according to changes in information such as the position of the markers. However, in rodents, due to their small size, setting markers interferes with their natural behavior, resulting in less accurate data collected.

再例如，可以利用深度相机对空间中的动物进行定位以获取其姿态信息。但是，该方法对成像条件以及场景变化较为敏感，并不适用于所有场合。For another example, a depth camera can be used to locate an animal in space to obtain its pose information. However, this method is sensitive to imaging conditions and scene changes, and is not suitable for all occasions.

随着人工智能领域的发展，基于神经网络的动物姿态识别方法正逐步取代传统技术。目前的神经网络模型在训练时通常不会考虑图像序列中啮齿类动物的关键点随时间的运动规律。这些神经网络模型在姿态识别过程中具有以下问题：With the development of artificial intelligence, animal gesture recognition methods based on neural networks are gradually replacing traditional techniques. Current neural network models typically do not take into account the time-dependent movement patterns of rodent keypoints in image sequences. These neural network models have the following problems in the gesture recognition process:

在识别图像序列中的动物姿态时，神经网络模型通常是基于每一帧图像本身进行姿态识别。例如，待识别图像序列按照时间顺序包括第一帧图像和第二帧图像。神经网络模型根据第一帧的图像对第一帧图像中的动物姿态进行识别，得到第一帧图像对应的第一姿态识别结果。根据第二帧的图像对第二帧图像中的动物姿态进行识别，得到第二帧图像对应的第二姿态识别结果。采用上述直接利用当前帧图像对动物姿态进行识别的方法，得到的识别结果的准确率较低，在时间上不够平滑。此外，当采集的图像序列中的图像帧存在模糊或被遮挡的情况时，例如当啮齿类动物的尾巴发生卷曲或被遮挡时，神经网络模型输出的关键点位置信息的准确性较低。When recognizing animal poses in image sequences, neural network models usually perform pose recognition based on each frame of image itself. For example, the image sequence to be recognized includes a first frame of images and a second frame of images in time sequence. The neural network model recognizes the animal pose in the first frame image according to the first frame image, and obtains a first pose recognition result corresponding to the first frame image. The animal gesture in the second frame image is recognized according to the second frame image, and the second gesture recognition result corresponding to the second frame image is obtained. By adopting the above-mentioned method of directly using the current frame image to recognize the animal pose, the accuracy of the obtained recognition result is low, and the time is not smooth enough. In addition, when the image frames in the acquired image sequence are blurred or occluded, such as when the rodent's tail is curled or occluded, the accuracy of the keypoint location information output by the neural network model is low.

此外，现有的神经网络模型通常是基于识别结果与人工标注结果之间的误差来构造损失函数，利用反向传播算法进行训练的。这种神经网络模型在训练时不会考虑关键点在时域上的连续变化，从而导致在执行啮齿类动物姿态识别时会出现准确率较低的问题。另一方面，利用上述识别结果与人工标注结果的误差构造损失函数对神经网络模型进行训练，通常会使初始训练过程较慢。In addition, the existing neural network models usually construct a loss function based on the error between the recognition result and the manual labeling result, and use the back-propagation algorithm for training. This neural network model does not consider the continuous changes of key points in the time domain during training, which leads to the problem of low accuracy when performing rodent pose recognition. On the other hand, constructing a loss function to train a neural network model using the error between the above recognition results and manual annotation results usually makes the initial training process slow.

有鉴于上述问题，本申请实施例提供了一种训练方法及训练装置。本申请实施例提供的方法通过在神经网络模型的训练过程中引入时间约束，有效抑制了神经网络模型的识别结果在时域上的抖动现象。In view of the above problems, embodiments of the present application provide a training method and a training device. The method provided by the embodiment of the present application effectively suppresses the jitter phenomenon in the time domain of the recognition result of the neural network model by introducing a time constraint in the training process of the neural network model.

下面结合图1-图4，对本申请实施例提供的训练方法进行详细介绍。图1是本申请实施例提供的训练方法的示意性流程图。图1所示的训练方法可包括步骤S11-S13。The training method provided by the embodiment of the present application will be described in detail below with reference to FIG. 1 to FIG. 4 . FIG. 1 is a schematic flowchart of a training method provided by an embodiment of the present application. The training method shown in FIG. 1 may include steps S11-S13.

步骤S11，获取训练样本。Step S11, acquiring training samples.

在本申请的一个实施例中，训练样本可包括记录啮齿类动物运动的图像序列以及标记结果。可以理解，标记结果可包括预设数目个啮齿类动物身体关键点的位置信息。例如，关键点可以是身体各关节点和关键部位，例如可以是小鼠四肢上的关节点以及尾巴、眼睛、鼻子、耳朵等。该位置信息可以为关键点的坐标信息。In one embodiment of the present application, the training samples may include image sequences recording rodent movements and labelling results. It can be understood that the marking result may include the position information of a preset number of key points on the rodent body. For example, the key points can be various joint points and key parts of the body, such as joint points on the limbs of a mouse, tail, eyes, nose, ears, and the like. The location information may be coordinate information of key points.

本申请实施例对预先标注的结果的获取方式不做限定。例如，可以使用人工标注的方法对图像序列中的图像帧做逐帧标注。作为可能的实现方式，也可使用其他置信度较高的方法进行标注。The embodiment of the present application does not limit the acquisition manner of the pre-marked result. For example, an image frame in an image sequence can be annotated frame by frame using a manual annotation method. As a possible implementation, other methods with higher confidence can also be used for labeling.

获取训练样本的方式可以有很多种，本申请实施例对此也不做限定。例如，作为一种实现方式，可以通过例如图像获取设备(如摄像机、摄像头、医疗影像设备、激光雷达等)直接获取的图像序列，该图像序列可包括按时间顺序排列的多张啮齿类动物的图像。又例如，可以从服务器(例如本地服务器或云服务器等)获取训练样本。或者，还可以在网络上或其他内容平台上获取训练样本，例如可以使用MSCOCO数据集、MPII数据集以及POSETTRACK数据集等开源的训练数据集等；或者，还可以是预先存储在本地的图像序列。There may be many ways to obtain training samples, which are not limited in this embodiment of the present application. For example, as an implementation, an image sequence that can be directly acquired by, for example, an image acquisition device (eg, a camera, a camera, a medical imaging device, a lidar, etc.) image. For another example, training samples may be obtained from a server (eg, a local server or a cloud server, etc.). Alternatively, training samples can also be obtained on the Internet or other content platforms, for example, open-source training datasets such as MSCOCO dataset, MPII dataset, and POSETTRACK dataset can be used; .

步骤S12，将前述步骤S11中获取的训练样本输入神经网络模型，得到啮齿类动物的姿态的识别结果。In step S12, the training samples obtained in the foregoing step S11 are input into the neural network model to obtain the recognition result of the posture of the rodent.

本申请实施例对神经网络模型不做具体限定，任何能够实现本申请所述姿态识别的神经网络模型均可。例如，神经网络模型可以是VGG、ResNet、HRNet等2D卷积神经网络。The embodiments of the present application do not specifically limit the neural network model, and any neural network model capable of realizing the gesture recognition described in the present application may be used. For example, the neural network model can be a 2D convolutional neural network such as VGG, ResNet, HRNet, etc.

可选地，HRNet(高分辨率网络HighResolution Network)在进行特征提取时能够全程保持高分辨率，并且在特征提取过程中能够进行不同分辨率特征的较差融合。尤其适合应用在语义分割、人体姿态、图像分类、面部标志物检测、通用目标识别等场景。Optionally, HRNet (HighResolution Network) can maintain high resolution throughout the feature extraction process, and can perform poor fusion of features of different resolutions during the feature extraction process. It is especially suitable for use in semantic segmentation, human pose, image classification, facial landmark detection, general target recognition and other scenarios.

其中，识别结果可以包括由神经网络模型识别的预设数目个啮齿类动物身体关键点的位置信息(也可以简称为识别位置)。Wherein, the identification result may include position information of a preset number of rodent body key points identified by the neural network model (also referred to as identification positions for short).

步骤S13，根据步骤S12中的识别结果，利用损失函数，使用梯度下降法对所述神经网络模型进行训练。In step S13, according to the identification result in step S12, using the loss function, the neural network model is trained by using the gradient descent method.

其中，损失函数可包括时间约束项L_temporal。Wherein, the loss function may include a temporal constraint term L _temporal .

下面结合图2对时间约束项L_temporal的确定方法做以详细的描述。参阅图2，图2示出的是一种时间约束项的确定方法。The method for determining the time constraint item L _temporal will be described in detail below with reference to FIG. 2 . Referring to FIG. 2, FIG. 2 shows a method for determining a time constraint item.

时间约束项L_temporal可用于约束所述啮齿类动物的姿态中的关键点在所述图像序列中的相邻图像帧之间的位置。A temporal constraint term L _temporal may be used to constrain the positions of keypoints in the rodent's pose between adjacent image frames in the image sequence.

在一些实施例中，时间约束项L_temporal可以根据利用跟踪方法获取的关键点的位置信息与识别结果中关键节点的位置信息的误差来确定。In some embodiments, the temporal constraint term L _temporal may be determined according to the error between the position information of the key points obtained by using the tracking method and the position information of the key nodes in the recognition result.

本申请实施例提供的训练方法中，跟踪方法可以为无监督的跟踪方法，例如可以为Lucas-Kanade光流法。In the training method provided by the embodiment of the present application, the tracking method may be an unsupervised tracking method, for example, the Lucas-Kanade optical flow method.

图2所示的方法可包括步骤S1311-S1315。The method shown in FIG. 2 may include steps S1311-S1315.

步骤S1311，从训练样本中选择m个图像作为跟踪样本。Step S1311, select m images from the training samples as tracking samples.

所述m个图像为训练样本中的任意m个图像。该m个图像可以为训练样本中的连续的m个图像。可以理解的是，该m个图像也可以为训练样本中的所有图像。The m images are any m images in the training sample. The m images may be consecutive m images in the training sample. It can be understood that the m images can also be all images in the training sample.

步骤S1312，将所述训练样本中的m个图像中的第一个图像帧作为初始帧，利用所述初始帧的识别结果进行前向跟踪，得到第一前向跟踪结果，确定所述第一前向跟踪结果与第m个图像帧的识别结果之间的第一差值，所述第一前向跟踪结果包括第m个图像帧中的关键点的跟踪位置。其中，m为大于或等于2的正整数。换句话说，该第一差值可以为第m个图像帧中的同一个关键点的跟踪位置与识别位置之间的差值。Step S1312, taking the first image frame of the m images in the training sample as an initial frame, and using the identification result of the initial frame to perform forward tracking to obtain a first forward tracking result, and determine the first forward tracking result. The first difference between the forward tracking result and the recognition result of the mth image frame, where the first forward tracking result includes the tracking position of the key point in the mth image frame. Among them, m is a positive integer greater than or equal to 2. In other words, the first difference value may be the difference value between the tracked position and the recognized position of the same key point in the mth image frame.

为方便描述，下文将将m个图像构成的集合记为I_1,i(i＝1,2,…,m)，将该集合I_1,i的识别结果记为

(i＝1,2…,m；ω＝1,2…,h)，其中，ω为每个图像帧中的关键点的个数。For the convenience of description, the set composed of m images will be denoted as I _1,i (i=1,2,...,m), and the recognition result of the set I _1,i will be denoted as

(i=1,2...,m; ω=1,2...,h), where ω is the number of key points in each image frame.

将m个图像中的第一帧作为初始帧，利用该初始帧的识别结果

进行前向跟踪，得到第一前向跟踪结果

确定第一前向跟踪结果和集合I_1,i中的第m帧的识别结果

之间的差值F₁为：Take the first frame of the m images as the initial frame, and use the recognition result of the initial frame

Perform forward tracking to get the first forward tracking result

Determine the first forward tracking result and the recognition result of the mth frame in the set I _1,i

The difference between F ₁ is:

步骤S1313，将所述m个图像中的第m个图像帧作为终止帧，利用所述终止帧的识别结果进行后向跟踪，得到第一后向跟踪结果，所述第一后向跟踪结果包括第一个图像帧中的关键点的跟踪位置。可以理解的是，第m个图像也可以称为m个图像中的最后一个图像帧。确定所述第一后向跟踪结果与所述第一个图像帧的识别结果之间的第二差值。换句话说，该第二差值可以为第一个图像帧中的同一个关键点的跟踪位置与识别位置之间的差值。Step S1313, taking the m-th image frame in the m images as a termination frame, and using the identification result of the termination frame to perform backward tracking to obtain a first backward tracking result, where the first backward tracking result includes: Tracked locations of keypoints in the first image frame. It can be understood that the m th image may also be referred to as the last image frame in the m images. A second difference between the first backward tracking result and the identification result of the first image frame is determined. In other words, the second difference value may be the difference value between the tracked position and the recognized position of the same key point in the first image frame.

将m个图像中的最后一帧作为终止帧，利用该终止帧的识别结果

进行后向跟踪，得到第一后向跟踪结果

确定第一后向跟踪结果

和集合I_1,i中的第一帧的识别结果

之间的差值F₂为：Take the last frame in the m images as the end frame, and use the recognition result of the end frame

Perform backward tracking to get the first backward tracking result

Determine the first backtracking result

and the recognition result of the first frame in the set I _1,i

The difference between F ₂ is:

在步骤S1314，将m个图像中的第一帧作为初始帧

利用该初始帧的识别结果

进行前向跟踪，得到第一前向跟踪结果

再以第一前向跟踪结果

作为终止帧，进行后向跟踪，确定第二后向跟踪结果

确定第二后向跟踪结果

与初始帧

的差值F₃为：In step S1314, the first frame of the m images is taken as the initial frame

Use the recognition result of this initial frame

Perform forward tracking to get the first forward tracking result

Then trace the result in the first forward direction

As a termination frame, backward tracking is performed, and the second backward tracking result is determined

Determine the second backtracking result

with the initial frame

_The difference F3 is:

在步骤S1315，确定时间约束项。In step S1315, a time constraint item is determined.

当所述第一差值和所述第二差值均小于或等于预设阈值时，确定所述时间约束项为0；当所述第一差值和/或所述第二差值大于所述预设阈值时，确定所述时间约束项为第二后向跟踪结果

与初始帧

的差值F₃。即所述时间约束项为

When both the first difference and the second difference are less than or equal to a preset threshold, the time constraint is determined to be 0; when the first difference and/or the second difference is greater than the predetermined threshold When the preset threshold is set, it is determined that the time constraint item is the second backward tracking result

with the initial frame

The difference F ₃ . That is, the time constraint term is

其中E₁为预设阈值，该预设阈值与生物的运动特性相关。需要说明的是，相比于神经网络模型的预测结果，利用跟踪方法得到的跟踪结果能够确保同一个关键点跟踪位置在时域上平滑变化。因此，当差值(如第一差值或第二差值)小于预设阈值时，表示识别结果接近跟踪结果，神经网络模型的识别结果在时域上比较平滑，此时可以不设置时间约束项。而当差值大于预设阈值时，表示识别结果与跟踪结果相差较大。也就是说，识别结果在时域上较为抖动。此时可以通过设置时间约束项对神经网络模型进行训练，以使神经网络模型输出的识别结果更加平滑。Among them, E ₁ is a preset threshold, and the preset threshold is related to the movement characteristics of living things. It should be noted that, compared with the prediction results of the neural network model, the tracking results obtained by using the tracking method can ensure that the tracking position of the same key point changes smoothly in the time domain. Therefore, when the difference value (such as the first difference value or the second difference value) is smaller than the preset threshold, it means that the recognition result is close to the tracking result, and the recognition result of the neural network model is relatively smooth in the time domain, and no time constraint can be set at this time. item. And when the difference is greater than the preset threshold, it means that the recognition result is quite different from the tracking result. That is to say, the recognition result is relatively jittery in the time domain. At this time, the neural network model can be trained by setting the time constraint item, so as to make the recognition result output by the neural network model smoother.

本申请实施例对确定所述时间约束项的方式不做具体限定。例如，可以将第一差值作为时间约束项。又例如，可以将第二差值作为时间约束项。再例如，可以对第一前向跟踪结果进行后向跟踪，得到第二后向跟踪结果；根据第二后向跟踪结果与第一个图像帧的识别结果之间的差值，确定时间约束项。再例如，可以对第一后向跟踪结果进行前向跟踪，得到第二前向跟踪结果；根据第二前向跟踪结果与第一个图像帧的识别结果之间的差值，确定时间约束项。The embodiments of the present application do not specifically limit the manner of determining the time constraint item. For example, the first difference can be used as a time constraint. For another example, the second difference can be used as a time constraint. For another example, backward tracking can be performed on the first forward tracking result to obtain a second backward tracking result; according to the difference between the second backward tracking result and the recognition result of the first image frame, the time constraint is determined. . For another example, forward tracking may be performed on the first backward tracking result to obtain a second forward tracking result; according to the difference between the second forward tracking result and the recognition result of the first image frame, the time constraint is determined. .

在一些实施方式中，所述损失函数还包括空间约束项，所述空间约束项用于限定所述啮齿类动物的姿态中的关键点在同一帧图像中的位置。In some embodiments, the loss function further includes a spatial constraint term, the spatial constraint term is used to define the positions of the key points in the pose of the rodent in the same frame of image.

参阅图3，图3示出的是一种空间约束项的确定方法。Referring to Fig. 3, Fig. 3 shows a method for determining a space constraint item.

空间约束项L_spatical可用于限定生物的姿态中的关键点在同一图像帧中的位置。在一些实施例中，空间约束项L_spatical可根据识别结果中的多个关键点的位置之间的差值来确定。The spatial constraint term L _spatial can be used to define the positions of key points in the pose of the creature in the same image frame. In some embodiments, the spatial constraint term L _spatial may be determined according to the difference between the positions of a plurality of key points in the recognition result.

本申请一实施例提供的确定空间约束项L_spatical的方法可包括步骤S1321-S1322。The method for determining a spatial constraint item L spatial provided by an embodiment of the present application may include steps _S1321 -S1322.

步骤S1321，从所述训练样本中选择p个样本，其中p为大于或等于2的正整数；Step S1321, select p samples from the training samples, where p is a positive integer greater than or equal to 2;

所述p个图像为训练样本中的任意p个图像。该p个图像可以为训练样本中的连续的p个图像。可以理解的是，该p个图像也可以为训练样本中的所有图像。其中，p为大于或等于2的正整数。The p images are any p images in the training sample. The p images may be consecutive p images in the training sample. It can be understood that the p images can also be all images in the training sample. Among them, p is a positive integer greater than or equal to 2.

为方便描述，下文将将p个图像构成的集合记为I_2,j(j＝1,2,…,p)，将该集合I_2,j的识别结果记为

(i＝1,2…,p；ω＝1,2…,h)，其中，ω为每个图像帧中的关键点的个数。For the convenience of description, the set composed of p images will be denoted as I _2,j (j=1,2,...,p), and the recognition result of the set I _2,j will be denoted as

(i=1,2...,p; ω=1,2...,h), where ω is the number of key points in each image frame.

步骤S1322，确定所述训练样本中的同一个图像中的两个关键点之间的距离。Step S1322, determining the distance between two key points in the same image in the training sample.

确定所述p个样本的识别结果

(j＝1,2,…,p,ω＝1,2,…h)中两两关键点的距离

(j＝1,2,…,p,ω＝1,2,…h)Determine the recognition results of the p samples

(j=1,2,...,p,ω=1,2,...h) The distance between the key points in pairs

(j=1,2,…,p,ω=1,2,…h)

步骤S1323，确定空间约束项。Step S1323, determine the space constraint item.

上述距离

符合高斯分布，确定所述距离

的均值μ和方差σ²，其中ω为每一个图像帧上的关键点的个数；above distance

fit a Gaussian distribution, determine the distance

The mean μ and variance σ ² of , where ω is the number of key points on each image frame;

确定空间约束项为：

Determine the space constraints as:

需要说明的是，上述步骤S1321-S1323提供的确定空间约束项L_spatical的方法仅为示例，也可通过其他方式来确定。例如，也可以基于识别结果中两两关键点的距离与对应的标注结果中两两关键点的距离的误差来确定空间约束项，本申请对此不作限定。It should be noted that the method for determining the spatial constraint item L spatial provided in the above steps S1321- _S1323 is only an example, and can also be determined in other ways. For example, the space constraint item may also be determined based on the error between the distance between the key points in the recognition result and the distance between the key points in the corresponding labeling result, which is not limited in this application.

在一些实施例中，损失函数还可以包括误差约束项L_MSE。在一些实施例中，误差约束项L_MSE可以根据训练样本的识别结果与标注结果中同一关键点的位置信息的误差确定。参阅图4，以均方误差为例，确定误差约束项可包括步骤S1331-S1333。In some embodiments, the loss function may also include an error constraint term L _MSE . In some embodiments, the error constraint term L _MSE may be determined according to the error between the recognition result of the training sample and the position information of the same key point in the labeling result. Referring to FIG. 4, taking the mean square error as an example, determining the error constraint term may include steps S1331-S1333.

步骤S1331，从由步骤S11获取的训练样本中选择n个图像构成样本集I_3,k(k＝1,2,…,n)，其中n为大于或等于1的正整数。Step S1331, select n images from the training samples obtained in step S11 to form a sample set I _3,k (k=1, 2, . . . , n), where n is a positive integer greater than or equal to 1.

所述n个图像为训练样本中的任意n个图像。该n个图像可以为训练样本中的连续的n个图像。可以理解的是，该n个图像也可以为训练样本中的所有图像。The n images are any n images in the training sample. The n images may be consecutive n images in the training sample. It can be understood that the n images can also be all images in the training sample.

步骤S1332，确定样本集I_3,k的识别结果

(k＝1,2…,n；ω＝1,2…,h)以及标注结果

(k＝1,2…,n；ω＝1,2…,h)。Step S1332, determine the identification result of the sample set I _3,k

(k=1,2...,n; ω=1,2...,h) and labeling results

(k=1,2...,n; ω=1,2...,h).

步骤S1333，计算前述识别结果

和标注结果

的均方误差，确定误差损失项为：Step S1333, calculate the aforementioned recognition result

and labeling results

The mean square error of , determines the error loss term as:

对于误差损失项来说，除了均方误差损失，还可采用本领域常用的交叉熵损失、0-1损失、绝对值损失等。上述步骤S1331-S1333所示的方法仅为示例，并不具有对本申请保护范围的限定作用。For the error loss term, in addition to the mean square error loss, cross entropy loss, 0-1 loss, absolute value loss, etc. commonly used in the art can also be used. The methods shown in the above steps S1331-S1333 are only examples, and do not have a limiting effect on the protection scope of the present application.

在一些实施例中，还可将前述误差约束项L_MSE、时间约束项L_temporal以及空间约束项L_spatical加权求和来确定损失函数。即，损失函数L＝L_MSE+aL_temporal+bL_spatical，其中，a和b为超参数，其取值为大于或等于0。In some embodiments, the loss function may also be determined by a weighted summation of the aforementioned error constraint term L _MSE , temporal constraint term L _temporal , and spatial constraint term L _spatial . That is, the loss function L=L _MSE + aL _temporal + bL _spatical , where a and b are hyperparameters whose values are greater than or equal to 0.

下面结合图5详细描述本申请提供的训练装置的实施例。应理解，装置实施例与前述方法实施例的描述相互对应。因此，未详细描述的部分可参见前述方法实施例。An embodiment of the training device provided by the present application will be described in detail below with reference to FIG. 5 . It should be understood that the descriptions of the apparatus embodiments and the foregoing method embodiments correspond to each other. Therefore, the parts not described in detail can refer to the foregoing method embodiments.

图5是本申请一个实施例提供的训练装置50的示意性框图。应理解，图5示出的装置50仅是示例，本发明实施例的装置50还可包括其他模块或单元。FIG. 5 is a schematic block diagram of a training apparatus 50 provided by an embodiment of the present application. It should be understood that the apparatus 50 shown in FIG. 5 is only an example, and the apparatus 50 in this embodiment of the present invention may further include other modules or units.

应理解，装置50能够执行图1-图4的方法中的各个步骤，为了避免重复，此处不再赘述。It should be understood that the apparatus 50 can perform various steps in the methods of FIGS. 1-4 , and in order to avoid repetition, details are not described herein again.

作为一种可能的实现方式，所述装置包括：As a possible implementation manner, the device includes:

获取模块51，用于获取训练样本。The acquiring module 51 is used for acquiring training samples.

其中，训练样本及其获取方式可与前述方法的步骤S11一致，此处不再赘述。Wherein, the training samples and the acquisition method thereof may be consistent with step S11 of the foregoing method, and details are not described herein again.

输入模块52，用于将所述训练样本输入神经网络模型，得到所述啮齿类动物的姿态的识别结果，所述识别结果包括所述啮齿类动物的姿态中的关键点。The input module 52 is configured to input the training sample into a neural network model to obtain a recognition result of the posture of the rodent, where the recognition result includes key points in the posture of the rodent.

训练模块53，用于根据所述啮齿类动物的姿态的识别结果，利用损失函数，使用梯度下降法对所述神经网络模型进行训练。The training module 53 is configured to use a loss function to train the neural network model by using a gradient descent method according to the recognition result of the posture of the rodent.

(i＝1.2…m,ω＝1,2,…,h)中的第一帧

作为初始帧，进行前向跟踪，得到前向跟踪结果

作为终止帧，进行后向跟踪，得到后向跟踪结果

将所述跟踪样本的识别结果中的第一帧

作为初始帧

进行前向跟踪，得到前向跟踪结果

以所述前向跟踪结果

作为初始帧，进行后向跟踪，得到第二后向跟踪结果

所述第二后向跟踪结果与所述初始帧

的差值为

确定时间约束项为：

The first frame in (i=1.2...m,ω=1,2,...,h)

The first frame of the identification result of the tracking sample

as initial frame

Perform forward tracking and get forward tracking results

Take the forward trace result

the second backtracking result and the initial frame

The difference is

Determine the time constraints as:

where E ₁ is the threshold.

(j＝1,2,…,p,ω＝1,2,…h)中两两关键点的距离

(j＝1,2,…,p,ω＝1,2,…h)，所述距离

符合高斯分布，确定所述距离

(j=1,2,...,p,ω=1,2,...h) The distance between the key points in pairs

(j=1,2,...,p,ω=1,2,...h), the distance

fit a Gaussian distribution, determine the distance

可选地，所述损失函数还包括误差约束项，所述误差约束项用于约束所述生物的姿态中的关键点在所述识别结果和标注结果中的误差。Optionally, the loss function further includes an error constraint term, and the error constraint term is used to constrain the error of the key points in the posture of the creature in the recognition result and the labeling result.

应理解，这里的训练神经网络模型的装置50以功能模块的形式体现。这里的术语“模块”可以通过软件和/或硬件形式实现，对此不作具体限定。例如，“模块”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit，ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。It should be understood that the apparatus 50 for training a neural network model here is embodied in the form of functional modules. The term "module" here can be implemented in the form of software and/or hardware, which is not specifically limited. For example, a "module" may be a software program, a hardware circuit, or a combination of the two that implement the above-mentioned functions. The hardware circuits may include application specific integrated circuits (ASICs), electronic circuits, processors (eg, shared processors, proprietary processors, or group processors) for executing one or more software or firmware programs etc.) and memory, merge logic and/or other suitable components to support the described functions.

作为一个示例，本发明实施例提供的训练神经网络模型的装置50可以是处理器或芯片，以用于执行本发明实施例所述的方法。As an example, the apparatus 50 for training a neural network model provided by the embodiment of the present invention may be a processor or a chip, so as to execute the method described in the embodiment of the present invention.

图6是本申请另一实施例提供的训练装置60的示意性框图。图6所示的装置60包括存储器61、处理器62、通信接口63以及总线64。其中，存储器61、处理器62、通信接口63通过总线64实现彼此之间的通信连接。FIG. 6 is a schematic block diagram of a training apparatus 60 provided by another embodiment of the present application. The apparatus 60 shown in FIG. 6 includes a memory 61 , a processor 62 , a communication interface 63 and a bus 64 . The memory 61 , the processor 62 , and the communication interface 63 are connected to each other through the bus 64 for communication.

存储器61可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器61可以存储程序，当存储器61中存储的程序被处理器62执行时，处理器62用于执行本发明实施例提供的训练方法的各个步骤，例如，可以执行图1-图4所示实施例的各个步骤。The memory 61 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 61 may store a program, and when the program stored in the memory 61 is executed by the processor 62, the processor 62 is configured to execute each step of the training method provided in this embodiment of the present invention. For example, the implementation shown in FIG. 1 to FIG. 4 may be executed. each step of the example.

处理器62可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，或者一个或多个集成电路，用于执行相关程序，以实现本发明方法实施例的训练方法。The processor 62 can be a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for executing relevant programs to The training method of the method embodiment of the present invention is implemented.

处理器62还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本发明实施例提供的训练方法的各个步骤可以通过处理器62中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 62 can also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the training method provided by the embodiment of the present invention may be completed by an integrated logic circuit of hardware in the processor 62 or an instruction in the form of software.

上述处理器62还可以是通用处理器、数字信号处理器(digital signalprocessing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gatearray，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The above-mentioned processor 62 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates. Or transistor logic devices, discrete hardware components. Various methods, steps, and logical block diagrams disclosed in the embodiments of the present invention can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器61，处理器62读取存储器61中的信息，结合其硬件完成本发明实施例中姿态识别的装置包括的单元所需执行的功能，或者，执行本发明方法实施例的训练方法。例如，可以执行图1-图4所示实施例的各个步骤/功能。The steps of the method disclosed in conjunction with the embodiments of the present invention may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 61, and the processor 62 reads the information in the memory 61 and, in combination with its hardware, completes the functions required to be performed by the units included in the gesture recognition apparatus in the embodiment of the present invention, or performs the training of the method embodiment of the present invention method. For example, various steps/functions of the embodiments shown in FIGS. 1-4 may be performed.

通信接口63可以使用但不限于收发器一类的收发装置，来实现装置60与其他设备或通信网络之间的通信。The communication interface 63 can use, but is not limited to, a transceiver such as a transceiver to implement communication between the device 60 and other devices or a communication network.

总线64可以包括在装置60各个部件(例如，存储器61、处理器62、通信接口63)之间传送信息的通路。Bus 64 may include a pathway for communicating information between various components of device 60 (eg, memory 61, processor 62, communication interface 63).

应理解，本发明实施例所示的装置60可以是处理器或芯片，以用于执行本发明实施例所述的方法。It should be understood that the apparatus 60 shown in the embodiment of the present invention may be a processor or a chip, so as to execute the method described in the embodiment of the present invention.

应理解，本发明实施例中的处理器可以为中央处理单元(central processingunit，CPU)，该处理器还可以是其他通用处理器、数字信号处理器(digital signalprocessor，DSP)、专用集成电路(application specific integrated circuit，ASIC)、现成可编程门阵列(field programmable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor in this embodiment of the present invention may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

下面结合图7的应用场景，对本申请实施例的具体应用进行介绍。需要说明的是，下述关于图7的描述仅为示例而非限定，本申请实施例中的方法并不限于此，也可以应用于其他姿态识别的场景。The specific application of the embodiment of the present application will be introduced below with reference to the application scenario of FIG. 7 . It should be noted that the following description about FIG. 7 is only an example and not a limitation, and the method in this embodiment of the present application is not limited thereto, and may also be applied to other gesture recognition scenarios.

图7中的应用场景可以包括图像获取装置71及图像处理装置72。The application scenario in FIG. 7 may include an image acquisition device 71 and an image processing device 72 .

其中，图像获取装置71可用于获取啮齿类动物的图像序列。图像处理装置72可以集成在电子设备中，该电子设备可以是服务器也可以是终端等设备，本申请实施例对此不作限定。例如，服务器可以是独立的物理服务器，也可是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云计算、云存储、云通信以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以使智能手机、平板电脑、计算机以及智能物联网设备等。终端以及服务器可以通过有线或无线通信方式进行直接或间接的连接，本申请对此不作限制。Among them, the image acquisition device 71 can be used to acquire a sequence of images of rodents. The image processing apparatus 72 may be integrated in an electronic device, and the electronic device may be a server or a terminal or other device, which is not limited in this embodiment of the present application. For example, a server can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a basic cloud that provides cloud services, cloud computing, cloud storage, cloud communication, and big data and artificial intelligence platforms. Cloud servers for computing services. Terminals can be smartphones, tablets, computers, and smart IoT devices. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.

图像处理装置72中可以部署有神经网络模型，可用于在利用上述图像获取装置71获取的图像序列后，采用神经网络模型对图像进行识别，得到待处理图像中的关键点的位置信息。其中，关键点的位置信息可包括例如啮齿类动物身体关节、躯干或五官的位置坐标信息等。A neural network model can be deployed in the image processing device 72, which can be used to identify the image by using the neural network model after using the image sequence obtained by the above-mentioned image acquisition device 71 to obtain position information of key points in the image to be processed. The location information of the key points may include, for example, the location coordinate information of the rodent body joints, torso or facial features, and the like.

上述电子设备还可以利用图像获取装置71获取训练样本，根据训练样本的识别结果以及人为标注的结果，利用损失函数对神经网络模型进行训练。图像处理装置72还可以通过训练后的神经网络模型对待处理图像进行识别，进而达到精准识别图像的目的。The above electronic device may also use the image acquisition device 71 to acquire training samples, and use the loss function to train the neural network model according to the recognition results of the training samples and the results of human annotation. The image processing device 72 can also recognize the image to be processed through the trained neural network model, thereby achieving the purpose of accurately recognizing the image.

上文描述的实施例仅是本申请一部分实施例，而不是全部的实施例。上述实施例的描述顺序不作为对实施例优选顺序的限定。基于本申请中的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。The above-described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. The description order of the above embodiments is not intended to limit the preferred order of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

应理解，在本申请实施例中，“与A相应的B”表示B与A相关联，根据A可以确定B。但还应理解，根据A确定B并不意味着仅仅根据A确定B，还可以根据A和/或其它信息确定B。It should be understood that, in this embodiment of the present application, "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B according to A does not mean that B is only determined according to A, and B may also be determined according to A and/or other information.

应理解，本文中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" in this document is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, and A and B exist simultaneously , there are three cases of B alone. In addition, the character "/" in this text generally indicates that the related objects are an "or" relationship.

应理解，在本申请的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber Line，DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够读取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，数字通用光盘(digital video disc，DVD))或者半导体介质(例如，固态硬盘(solid state disk，SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be read by a computer or a data storage device such as a server, a data center, etc. that includes one or more available media integrated. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, digital video discs (DVDs)), or semiconductor media (eg, solid state disks (SSDs)) )Wait.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. a training method, is characterized in that, comprises:

obtaining a training sample, the training sample is an image sequence recording rodent movement;

Inputting the training sample into a neural network model to obtain a recognition result of the rodent's posture, where the recognition result includes key points in the rodent's posture;

According to the recognition result of the posture of the rodent, use the loss function to train the neural network model by using the gradient descent method;

Wherein, the loss function includes a time constraint term, and the time constraint term is used to constrain the positions of key points in the pose of the rodent between adjacent image frames in the image sequence.

2. The method of claim 1, wherein the neural network model comprises a HRNet network.

3. The training method according to claim 2, wherein, before the neural network model is trained, the training method further comprises:

The time constraint item is determined according to the error between the position of the key point obtained by the tracking method and the position of the key point in the recognition result.

4. The training method according to claim 3, wherein the tracking method comprises a Lucas-Kanade optical flow method.

5. The training method according to claim 3, wherein the time constraint is determined according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the identification result, comprising:

Select m samples from the training samples as tracking samples, where m is a positive integer greater than or equal to 2;

The identification results of the tracking samples

first frame in

where ω is the number of key points on each image frame;

The last frame in the recognition result of the tracking sample

The first frame of the identification result of the tracking sample

as initial frame

Perform forward tracking and get forward tracking results

Take the forward trace result

the second backtracking result and the initial frame

The difference is

Determine the time constraints as:

where E ₁ is the threshold.

6. training method according to claim 1, is characterized in that,

The loss function further includes a spatial constraint term, the spatial constraint term is used to define the position of the key points in the pose of the rodent in the same frame of image.

7. The training method according to claim 6, wherein, before the neural network model is trained, the training method further comprises:

The space constraint item is determined according to the difference between the positions of a plurality of key points in the recognition result.

8. The training method according to claim 7, wherein the determining the space constraint item according to the difference between the positions of multiple key points in the recognition result comprises:

select p samples from the training samples, where p is a positive integer greater than or equal to 2;

Determine the recognition results of the p samples

The distance between two key points in the middle

the distance

fit a Gaussian distribution, determine the distance

Determine the space constraints as:

9 . The training method according to claim 1 , wherein the loss function further comprises an error constraint term, and the error constraint term is used to constrain the key points in the pose of the rodent in the recognition result. 10 . and errors in the annotation results.

10. The training method according to claim 9, wherein the error loss term is a mean square error loss term.

11. A training device, characterized in that, comprising:

an acquisition module for acquiring a training sample, where the training sample is an image sequence for recording rodent movements;

an input module, configured to input the training sample into a neural network model to obtain a recognition result of the posture of the rodent, where the recognition result includes key points in the posture of the rodent;

a training module for training the neural network model by using a loss function and gradient descent method according to the recognition result of the posture of the rodent;

12. The training device according to claim 11, wherein the neural network model comprises a HRNet network.

13. The training device according to claim 12, wherein before the training of the neural network model, the training device further comprises:

The first determination module is configured to determine the time constraint item according to the error between the position of the key point obtained by using the tracking method and the position of the key point in the recognition result.

14. The training device according to claim 13, wherein the tracking method comprises a Lucas-Kanade optical flow method.

15. The training device according to claim 14, wherein the first determination module is used for:

The identification results of the tracking samples

first frame in

where ω is the number of key points on each image frame;

The last frame in the recognition result of the tracking sample

The first frame of the identification result of the tracking sample

as initial frame

Perform forward tracking and get forward tracking results

Take the forward trace result

the second backtracking result and the initial frame

The difference is

Determine the time constraints as:

where E ₁ is the threshold.

16 . The training device according to claim 11 , wherein the loss function further comprises a spatial constraint term, and the spatial constraint term is used to limit the key points in the pose of the rodent to be in the same frame of image. 17 . s position.

17. The training device of claim 16, wherein the training device further comprises:

The second determination module is configured to determine the space constraint item according to the difference between the positions of the multiple key points in the identification result.

18. The training device according to claim 17, wherein the second determination module is used for:

Determine the recognition results of the p samples

The distance between two key points in the middle

the distance

fit a Gaussian distribution, determine the distance

Determine the space constraints as:

19 . The training device according to claim 11 , wherein the loss function further comprises an error constraint term, and the error constraint term is used to constrain key points in the posture of the rodent in the recognition result. 20 . and errors in the annotation results.

20. The training device according to claim 19, wherein the error loss term is a mean square error loss term.