CN113869353A

CN113869353A - Model training method, tiger key point detection method and related device

Info

Publication number: CN113869353A
Application number: CN202110936493.XA
Authority: CN
Inventors: 陈海波; 罗志鹏
Original assignee: Shenyan Technology Beijing Co ltd
Current assignee: Shenyan Technology Beijing Co ltd
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2021-12-31

Abstract

The application provides a model training method, a tiger key point detection method and a related device, wherein the model training method comprises the following steps: acquiring a plurality of training data, wherein each training data comprises a training image and label data corresponding to the training image; inputting each training image into a fusion feature extraction network to obtain a fusion feature extraction result corresponding to each training image; and training the fusion feature extraction network based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image to obtain the tiger key point detection model. In the application, the fusion extraction features are obtained based on the feature extraction networks corresponding to the key points, and the feature extraction networks corresponding to the key points are the feature extraction networks with the highest average confidence degrees of the key points, so that the accuracy of key point detection is improved.

Description

Model training method, tiger key point detection method and related device

Technical Field

The application relates to the technical field of image processing and computer vision, in particular to a model training method, a tiger key point detection method and a related device.

Background

Protection of wild life is crucial to maintain a healthy and balanced ecosystem and to ensure the continuous biodiversity of our world. Computer vision techniques can collect a large amount of image data from camera traps and even drones and use this image to construct a system from edge to cloud and can be applied to intelligent imaging sensors to capture images/video related to and monitor wildlife.

Chinese patent CN208282866U discloses a health detection robot for animal husbandry, in which a moving mechanism moves linearly on a track in a farm, a route planning device provides a detection travel route for manually setting the moving mechanism, a radio frequency positioning device records the real-time position of the detection robot, an ear tag reading device reads animal individual identification information stored in an animal ear tag and transmits the information to an embedded arithmetic device, and the information of inspection point marks, ear tags, body temperature, images, animal activity, environment temperature and humidity, air concentration and air odor obtained by detection are summarized and transmitted to an external control host through a wireless network interface to be displayed, so as to comprehensively apply automation technology and robot technology to automatically detect animal health in the field of animal husbandry application, the technical aims of improving the working efficiency and reducing the labor intensity of workers are fulfilled.

Chinese patent publication No. CN111310596A discloses a system and a method for monitoring diseased states of animals, wherein the system comprises: the system comprises an image acquisition module, a gateway and an image analysis module; the image acquisition module is used for acquiring images of the feeding area; the gateway is in communication connection with the image acquisition module and is used for controlling the image acquisition module to acquire the images of the feeding area; the image analysis module is in communication connection with the gateway and is used for receiving the feeding area image sent by the gateway, determining the ill information of the animals in the corresponding feeding area according to the feeding area image and sending the ill information to the gateway. The system in the embodiment can realize the state of the unmanned monitoring animal, and compared with the traditional animal sick state monitoring method, the monitoring efficiency is effectively improved; meanwhile, the obtaining efficiency of the abnormal state can be improved through the obtained illness information, and the economic benefit of the breeding enterprise is guaranteed.

However, in the above two prior arts, one is to detect only information of one part of the animal body, and the other is to detect data by collecting and analyzing images of the animal feeding area, which is obviously low in accuracy.

In recent years, similar to the above-mentioned animal monitoring or animal protecting inventions, the invention designs a more stable and higher-accuracy detection method by adopting a deep learning target-based detection method, so as to monitor the geographic spatial distribution trend of the northeast tiger and track the population, and has high accuracy and good application prospect.

Disclosure of Invention

The application aims to provide a model training method, a tiger key point detection method and a related device, and solves the problem of low accuracy in the prior art.

The purpose of the application is realized by adopting the following technical scheme:

in a first aspect, the present application provides a model training method, including: acquiring a plurality of training data, wherein each training data comprises a training image and marking data corresponding to the training image, and the marking data corresponding to the training image is used for indicating the positions of a plurality of key points of the tiger in the training image; inputting each training image into a fusion feature extraction network to obtain a fusion feature extraction result corresponding to each training image, wherein the fusion feature extraction result corresponding to each training image comprises detection results of a plurality of key points of the tiger in each training image by the fusion feature extraction network, and the detection result of each key point is used for indicating the position and the confidence coefficient of the key point; training the fusion feature extraction network based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image to obtain a tiger key point detection model, wherein the tiger key point detection model is used for detecting the positions of a plurality of key points of the tiger in the tiger image; the method for acquiring the fusion feature extraction network comprises the following steps: respectively inputting each training image into an ith feature extraction network to obtain an ith feature extraction result corresponding to each training image, wherein the ith feature extraction result corresponding to each training image comprises the detection result of the ith feature extraction network on a plurality of key points of the tiger in each training image, the value of i is each positive integer not greater than N, N is the number of feature extraction networks and N is an integer greater than 1; for each key point, acquiring the average confidence of the ith feature extraction network for the key point based on the ith feature extraction result corresponding to each training image; acquiring a feature extraction network with the highest average confidence coefficient aiming at the key points in all the feature extraction networks as a feature extraction network corresponding to the key points; and acquiring a fused feature extraction network based on the feature extraction network corresponding to each key point. The method has the advantages that the positions of a plurality of key points of the tiger are obtained by acquiring the image of the tiger and carrying out data annotation, the training images are input into the fusion feature extraction network to obtain fusion feature extraction results corresponding to the training images, the fusion feature extraction network corresponding to the training images is used for extracting the detection results of the plurality of key points of the tiger, the fusion feature extraction network is trained on the basis of the annotation data and the fusion feature extraction results corresponding to the training images to obtain a tiger key point detection model, and the tiger key point detection model is used for detecting the positions of the plurality of key points of the tiger in the tiger image; the fusion extraction features are obtained based on the feature extraction networks corresponding to the key points, and the feature extraction networks corresponding to the key points are the feature extraction networks with the highest average confidence degrees of the key points, so that the accuracy of key point detection is improved.

In some optional embodiments, the obtaining a plurality of training data comprises: acquiring a plurality of tiger images to be marked, and taking each tiger image to be marked as one training image; and carrying out data annotation on each training image to obtain annotation data corresponding to each training image. The technical scheme has the advantages that a plurality of tiger images to be marked are obtained to serve as training images, and data marking is carried out on each training image to obtain the positions of key points of the tiger in the training images.

In some alternative embodiments, N ═ 3; the first feature extraction network takes HRNetW48 as a main network and adds a Spatial Group-wise Enhance Module attention Module; the second feature extraction network takes H RNetW48 as a main network and adds a Spatial Attention module; the third feature extraction network uses the Unet Plus + ResNet152 as a backbone network. The technical scheme has the advantages that the HRNetW48 is used as a backbone network, and the high resolution is kept by parallelly connecting the subnets from the high resolution to the low resolution, so that a predicted heat map is more accurate in space, and the detection result of a key point is more accurate; the Unet is used as a basic structure, the Unet Plus + ResNet152 is improved, the Unet is used as a main network, meanwhile, the structure that HR Net keeps corresponding resolution layer by layer is used for reference, information exchange exists among different resolutions, multi-scale information is fused, the original Skip Connection structure of the Unet is replaced by a structure that part of deep information flows upwards, meanwhile, the Concat operation is carried out after the Unet outputs to each layer and the same size is sampled, the Concat operation is more meaningful, and the key point detection result of the output result is more accurate; a Spatial Group-wise Enhance Module Attention Module and a Spatial Attention Module are respectively added to the first feature extraction network and the second feature extraction network, so that when the model detects key points, computing resources can be allocated to more important tasks, and the efficiency of key point detection is improved.

In some optional embodiments, the training the fused feature extraction network based on the labeling data corresponding to each of the training images and the fused feature extraction result corresponding to each of the training images to obtain the tiger keypoint detection model includes: based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image, scoring the fusion feature extraction network by using mAP and OKS to obtain a scoring result corresponding to the fusion feature extraction network; determining the square error loss corresponding to each training image by using a square error loss function based on the annotation data corresponding to each training image and the fusion feature extraction result corresponding to each training image; and iterating the fusion feature extraction network based on the scoring result corresponding to the fusion feature extraction network and the square error loss corresponding to each training image until the fusion feature extraction network meets the condition of ending iteration, and taking the trained fusion feature extraction network as the tiger key point detection model. The technical scheme has the beneficial effects that the fusion feature extraction network meeting the iteration condition is used as the tiger key point detection model, so that the detection model has higher stability and precision.

In some optional embodiments, the method further comprises: obtaining tiger key point data; and setting weights for the plurality of key points of the tiger respectively based on the tiger key point data. The technical scheme has the beneficial effects that the weighted value is set according to the posture and body characteristics of the tiger, so that the key points with better identification effect can be highlighted, and the distribution area of the tiger can be accurately monitored and the tiger population can be tracked.

In a second aspect, the present application provides a method for detecting a tiger key point, including: acquiring a tiger image to be detected; inputting the image of the tiger to be detected into a tiger key point detection model to obtain the positions of a plurality of key points of the tiger in the image of the tiger to be detected; the tiger key point detection model is obtained by training by using any one of the model training methods. The beneficial effects of this technical scheme lie in, utilize the multiple key point position of tiger in the key point detection model detection obtains tiger's image, and more accurate than the position of marking the tiger key point with the manual work, it is more convenient, and intelligent degree is higher, has improved the efficiency that the tiger key point detected, can realize waiting to detect the rapid processing of tiger's image in a large number, is favorable to the protection work to this kind of species of tiger more.

In a third aspect, the present application provides a model training apparatus, the apparatus comprising: the training data module is used for acquiring a plurality of training data, each training data comprises a training image and marking data corresponding to the training image, and the marking data corresponding to the training image is used for indicating the positions of a plurality of key points of the tiger in the training image; the feature extraction module is used for inputting each training image into a fusion feature extraction network to obtain fusion feature extraction results corresponding to each training image, the fusion feature extraction results corresponding to each training image comprise detection results of a plurality of key points of the tiger in each training image by the fusion feature extraction network, and the detection result of each key point is used for indicating the position and the confidence coefficient of the key point; the model training module is used for training the fusion feature extraction network based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image to obtain a tiger key point detection model, and the tiger key point detection model is used for detecting the positions of a plurality of key points of the tiger in the tiger image; the converged feature extraction network is obtained by utilizing a network convergence module to perform convergence, and the network convergence module comprises: the extraction unit is used for respectively inputting the training images into an ith feature extraction network to obtain ith feature extraction results corresponding to the training images, the ith feature extraction results corresponding to the training images comprise detection results of the ith feature extraction network on a plurality of key points of the tiger in the training images, the value of i is each positive integer not greater than N, N is the number of the feature extraction networks, and N is an integer greater than 1; a confidence coefficient obtaining unit, configured to obtain, for each key point, an average confidence coefficient of the ith feature extraction network for the key point based on an ith feature extraction result corresponding to each training image; a key point corresponding unit, configured to acquire a feature extraction network with the highest average confidence for the key point in all feature extraction networks as a feature extraction network corresponding to the key point; and the fusion network unit is used for acquiring a fusion feature extraction network based on the feature extraction network corresponding to each key point.

In some optional embodiments, the training data module comprises: the to-be-labeled image unit is used for acquiring a plurality of to-be-labeled tiger images and taking each to-be-labeled tiger image as one training image; and the data labeling unit is used for performing data labeling on each training image to obtain labeled data corresponding to each training image.

In some optional embodiments, the model training module comprises: a scoring result unit, configured to score the fused feature extraction network by using the mAP and the OKS based on the labeled data corresponding to each training image and the fused feature extraction result corresponding to each training image, so as to obtain a scoring result corresponding to the fused feature extraction network; an error loss unit, configured to determine a square error loss corresponding to each training image by using a square error loss function based on the annotation data corresponding to each training image and the fusion feature extraction result corresponding to each training image; and the detection model unit is used for iterating the fusion feature extraction network based on the scoring result corresponding to the fusion feature extraction network and the square error loss corresponding to each training image until the fusion feature extraction network meets the iteration ending condition, and taking the trained fusion feature extraction network as the tiger key point detection model.

In some optional embodiments, the model training apparatus further comprises: the key point data module is used for acquiring tiger key point data; and the weight setting module is used for respectively setting weights for the plurality of key points of the tiger based on the tiger key point data.

In a fourth aspect, the present application provides a tiger key point detection device, including: the to-be-detected image module is used for acquiring a to-be-detected tiger image; the key point detection module is used for inputting the tiger image to be detected into the tiger key point detection model to obtain the positions of a plurality of key points of the tiger in the tiger image to be detected; wherein, the tiger key point detection model is obtained by the training method.

In a fifth aspect, the present application provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any of the above-mentioned model training methods or the steps of any of the above-mentioned slot keypoint detection methods when executing the computer program.

In some optional embodiments, the electronic device is further provided with a camera.

In a sixth aspect, the present application provides a computer readable storage medium storing a computer program or a tiger keypoint detection model;

the computer program, when executed by a processor, implements the steps of any of the above-described model training methods or the steps of any of the above-described tiger keypoint detection methods;

the tiger key point detection model is obtained by training by using any one of the model training methods.

Drawings

The present application is further described below with reference to the drawings and examples.

FIG. 1 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for acquiring a converged feature extraction network according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an HRNet according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a pnet Plus + ResNet152 provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a Spatial Group-wise Enhance Module according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a Spatial Attention module according to an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart illustrating a process for obtaining a plurality of training data according to an embodiment of the present application;

FIG. 8 is a schematic flow chart illustrating a method for obtaining a tiger key point detection model according to an embodiment of the present application;

FIG. 9 is a partial schematic flow chart diagram of another model training method provided in the embodiments of the present application;

FIG. 10 is a schematic illustration of a tiger key point provided in an embodiment of the present application;

FIG. 11 is a schematic flowchart of a method for detecting key points of a slot according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a network convergence module according to an embodiment of the present application;

FIG. 14 is a schematic partial structural diagram of another model training apparatus provided in the embodiments of the present application;

FIG. 15 is a schematic structural diagram of another model training apparatus provided in the embodiments of the present application;

FIG. 16 is a schematic structural diagram of another model training apparatus provided in the embodiments of the present application;

FIG. 17 is a schematic structural diagram of a slot key point detection apparatus according to an embodiment of the present application;

fig. 18 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 19 is a schematic structural diagram of a program product for implementing a model training method according to an embodiment of the present application;

fig. 20 is a flowchart illustrating another model training method according to an embodiment of the present application.

Detailed Description

The present application is further described with reference to the accompanying drawings and the detailed description, and it should be noted that, in the present application, the embodiments or technical features described below may be arbitrarily combined to form a new embodiment without conflict.

Referring to fig. 1, an embodiment of the present application provides a model training method, which includes steps S101 to S103.

Step S101: the method comprises the steps of obtaining a plurality of training data, wherein each training data comprises a training image and marking data corresponding to the training image, and the marking data corresponding to the training image is used for indicating the positions of a plurality of key points of the tiger in the training image. The number of the tiger key points is not limited in the embodiment of the application, and the number of the tiger key points can be 12, 15 or 18.

Step S102: inputting each training image into a fusion feature extraction network to obtain a fusion feature extraction result corresponding to each training image, wherein the fusion feature extraction result corresponding to each training image comprises detection results of a plurality of key points of the tiger in each training image by the fusion feature extraction network, and the detection result of each key point is used for indicating the position and the confidence coefficient of the key point.

Step S103: training the fusion feature extraction network based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image to obtain a tiger key point detection model, wherein the tiger key point detection model is used for detecting the positions of a plurality of key points of the tiger in the tiger image.

Referring to fig. 2, the method for acquiring the fusion feature extraction network may include steps S201 to S204.

Step S201: and respectively inputting the training images into an ith feature extraction network to obtain ith feature extraction results corresponding to the training images, wherein the ith feature extraction results corresponding to the training images comprise detection results of the ith feature extraction network on a plurality of key points of the tiger in the training images, the value of i is each positive integer not greater than N, N is the number of the feature extraction networks, and N is an integer greater than 1.

Step S202: and aiming at each key point, acquiring the average confidence of the ith feature extraction network aiming at the key point based on the ith feature extraction result corresponding to each training image.

Step S203: and acquiring the feature extraction network with the highest average confidence coefficient aiming at the key points in all the feature extraction networks as the feature extraction network corresponding to the key points.

Step S204: and acquiring a fused feature extraction network based on the feature extraction network corresponding to each key point.

In some embodiments, N ═ 3; the first feature extraction network takes HRNetW48 as a main network and adds a Spatial Group-wise Enhance Module attention Module; the second feature extraction network takes H RNetW48 as a main network and adds a Spatial Attention module; the third feature extraction network uses the Unet Plus + ResNet152 as a backbone network.

In some embodiments, the HRNet structure can be as shown in fig. 3, and its main body includes a network stage composed of four parallel subnets, the resolution of which is reduced by half per layer, and the number of corresponding channels is doubled, the first stage includes 4 residual module units, each unit is composed of a bottleneck with a width of 64 as in ResNet-50; and then, the feature graph enters a second stage after one-time downsampling, the second stage has two parallel subnets, the second stage enters a third stage through 4 residual module units, and so on, finally, the outputs of the second, third and fourth subnets are upsampled until the resolution is the same as that of the first subnet, and then feature fusion is carried out. The number of channels of the four parallel subnets in the HRNetW48 is 48, 96, 192, 384, respectively.

Since the data set is small and is an estimate of the pose of the wildlife, if too complex a model is used, it is likely to cause overfitting, or cause the model to fail to converge, using HRNetW48 as BaseLine, HRNe t is a subnet that connects high to low resolution in parallel, rather than in series. Thus, this approach can maintain high resolution rather than restoring the resolution through a low-to-high process, so that the predicted heat map is more spatially accurate.

In some embodiments, as shown in fig. 4, the Unet Plus + ResNet152 structure may be configured such that a key point detects a part similar to semantic segmentation on Pipeline, so that Unet is used as a basic structure, and by using a structure that maintains corresponding resolutions layer by layer and has information exchange among different resolutions, like HRNet, information of multiple scales is fused, and an original Skip Connection structure of Unet is replaced with a structure that part of deep information flows upward. Wherein, X0,0, X1,0, X2,0, X3,0, X4,0 respectively correspond to 5 stages of ResNet152 to increase the depth of the network so as to make the model learning easier, the deep layer can also obtain shallow layer network characteristics, and finally, the output of each layer is upsampled to the same size and then is subjected to Concat operation.

In some embodiments, the structure of the Spatial Group-wise Enhance Module attention Module (SGE for short) may be as shown in fig. 5, and by generating an attribute fact or in each Group, the importance of each sub feature can be obtained, and each Group can also learn and suppress noise in a targeted manner; the attribute factor is only determined by the similarity between global and local features in each group, the SGE obviously improves the spatial distribution of different semantic sub-features in the group, generates larger statistical variance, enhances the feature learning of semantic regions, and compresses noise and interference, and the SGE is very light.

In some embodiments, the Spatial Attention module may, as shown in fig. 6, similar to the channel Attention, give a H × W × C feature F, first perform average pooling and maximum pooling for one channel dimension respectively to obtain two H × W × 1 channel descriptions, and concatenate the two descriptions according to channels. Then, through a 7 × 7 convolutional layer, the activation function is Sigmoid, and the weight coefficient Ms is obtained. And finally, multiplying the weight coefficient by the characteristic F to obtain a new scaled characteristic.

Therefore, the HRNetW48 is used as a backbone network, and the high resolution is kept by connecting the subnets with high resolution to low resolution in parallel, so that the predicted heat map is more accurate in space, and the detection result of the key point is further more accurate; the Unet is used as a basic structure, the Unet Plus + ResNet152 is improved, the Unet Plus + ResNet152 is used as a main network, the structure that HRNet keeps corresponding resolution ratio layer by layer is used for reference, information exchange structures exist among different resolution ratios, multi-scale information is fused, the original Skip Connection structure of the Unet is replaced by a structure that part of deep information flows upwards, meanwhile, the Concat operation is carried out after the Unet is up-sampled to the same size for the output of each layer, the Concat operation is more meaningful, and the key point detection result is more accurate; a Spatial Group-wise Enhance module Attention module and a Spatial Attention module are respectively added to the first feature extraction network and the second feature extraction network, so that when the model detects key points, computing resources can be allocated to more important tasks, and the efficiency of key point detection is improved.

Acquiring images of the tiger and carrying out data annotation to obtain a plurality of key point positions of the tiger, inputting the training images into a fusion feature extraction network to obtain fusion feature extraction results corresponding to the training images, extracting detection results of a plurality of key points of the tiger corresponding to the fusion feature extraction network corresponding to the training images, and training the fusion feature extraction network based on the annotation data and the fusion feature extraction results corresponding to the training images to obtain a tiger key point detection model, wherein the tiger key point detection model is used for detecting the plurality of key point positions of the tiger in the tiger images; the fusion extraction features are obtained based on the feature extraction networks corresponding to the key points, and the feature extraction networks corresponding to the key points are the feature extraction networks with the highest average confidence degrees of the key points, so that the accuracy of key point detection is improved.

Referring to fig. 7, in a specific implementation, the step S101 may include steps S301 to S302.

Step S301: and acquiring a plurality of tiger images to be marked, and taking each tiger image to be marked as one training image.

Step S302: and carrying out data annotation on each training image to obtain annotation data corresponding to each training image.

Therefore, a plurality of images of the tiger to be marked are obtained and used as training images, and data marking is carried out on each training image so as to obtain the positions of key points of the tiger in the training images.

Referring to fig. 8, the step S103 may include steps S401 to S403.

Step S401: and based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image, scoring the fusion feature extraction network by using mAP and OKS to obtain the scoring result corresponding to the fusion feature extraction network.

Step S402: and determining the square error loss corresponding to each training image by using a square error loss function based on the annotation data corresponding to each training image and the fusion feature extraction result corresponding to each training image.

Step S403: and iterating the fusion feature extraction network based on the scoring result corresponding to the fusion feature extraction network and the square error loss corresponding to each training image until the fusion feature extraction network meets the condition of ending iteration, and taking the trained fusion feature extraction network as the tiger key point detection model.

The results are scored using mAP and OKS (object Keypoint similarity), namely the similarity of key points, and in the key point scoring task, the similarity between two points is calculated not only by simple Euclidean distance but also by adding a certain scale for the good and bad key points obtained by the network. OKS formula is as follows:

wherein, p represents the id of a tiger in the groudtruth; i represents the id of the key point; d_p ⁱRepresenting the Euclidean distance of each tiger in the group dtruth and the predicted key point of each tiger; p is a radical ofⁱKey point id representing a tiger; v. of_p ⁱ1 indicates that the visibility of this keypoint is 1 (i.e. visible on the picture); s_pThe square root of the area occupied by the tiger is shown and is obtained by calculation according to the tiger box in the groudtruth; sigma_iThe normalization factor is obtained by calculating the standard deviation of all groups tret h in the existing data set, the influence degree of the current bone point on the whole is reflected, the larger the value of the normalization factor is, the worse the labeling effect on the point in the whole data set is, and the smaller the value is, the better the labeling effect on the point in the whole data set is; δ is a function for selecting the visible point to calculate.

Loss function selection key point detection common loss function: and (5) loss of square error, and training a plurality of epochs to obtain a final detection result.

Therefore, the fusion feature extraction network meeting the iteration condition is used as the tiger key point detection model, so that the detection model has high stability and precision.

Referring to fig. 9, in an implementation, the model training method may further include steps S104 to S105.

Step S104: and obtaining the data of the tiger key points.

Step S105: and setting weights for the plurality of key points of the tiger respectively based on the tiger key point data.

The existing research on the detection of the human body posture is more, but the detection of the tiger posture is less, and different from the detection of the human body posture, the upper half body and the lower half body of the tiger as posture key points and the arrangement of the lattice points are different from the detection of the human body posture. Referring to fig. 10, in some embodiments, there are fifteen key points of the tiger in the data set, respectively: left _ ear, right _ ear, nose, front _ right _ elbow, front _ right _ wrart, front _ left _ elbow, front _ left _ wrart, back _ right _ hip, back _ right _ knee, back _ left _ hip, back _ left _ knee, tail, and nic. Observing the characteristics of the tiger key points, and comparing the characteristics with the human body key points, respectively setting the weight of the fifteen key points as follows: [1,1,1,1.2,1.2,1.2,1.2,1.5,1,1.2,1.5,1,1.2,1.5,1.5]. The set tiger key point weight can be used in the training process of the tiger key point detection model and/or the acquisition process of the fusion feature extraction network.

Therefore, the weight values are set according to the posture characteristics of the tigers, so that key points with good identification effect can be highlighted, and the distribution area of the tigers and the tiger population can be accurately monitored and tracked.

Referring to fig. 11, an embodiment of the present application further provides a tiger key point detection method, which may include steps S501 to S502.

Step S501: and acquiring the tiger image to be detected.

Step S502: inputting the image of the tiger to be detected into a tiger key point detection model to obtain the positions of a plurality of key points of the tiger in the image of the tiger to be detected; the tiger key point detection model is obtained by training by using any one of the model training methods.

From this, utilize the multiple key point position of tiger in the key point detection model detection obtains tiger's image, more accurate than the position with artifical mark tiger key point, it is more convenient, intelligent degree is higher, has improved the efficiency that the tiger key point detected, can realize waiting to detect the fast processing of tiger's image in a large number, is favorable to the protection work to this kind of species of tiger more.

Referring to fig. 12, an embodiment of the present application further provides a model training apparatus, and a specific implementation manner of the model training apparatus is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method, and details of a part of the implementation manner and the achieved technical effect are not repeated.

The model training apparatus includes: the training data module 101 is configured to obtain a plurality of training data, where each training data includes a training image and label data corresponding to the training image, and the label data corresponding to the training image is used to indicate positions of a plurality of key points of a tiger in the training image; the feature extraction module 102 is configured to input each training image into a fused feature extraction network to obtain a fused feature extraction result corresponding to each training image, where the fused feature extraction result corresponding to each training image includes detection results of multiple key points of tiger in each training image by the fused feature extraction network, and the detection result of each key point is used to indicate a position and a confidence of the key point; the model training module 103 is configured to train the fused feature extraction network based on the labeling data corresponding to each training image and the fused feature extraction result corresponding to each training image to obtain a tiger key point detection model, where the tiger key point detection model is configured to detect positions of multiple key points of a tiger in the tiger image;

referring to fig. 13, the fusion feature extraction network is obtained by fusing a network fusion module, where the network fusion module includes:

a respective extraction unit 201, configured to input each of the training images into an ith feature extraction network, respectively, to obtain an ith feature extraction result corresponding to each of the training images, where the ith feature extraction result corresponding to each of the training images includes a detection result of the ith feature extraction network on a plurality of key points of the tiger in each of the training images, a value of i is each positive integer not greater than N, N is the number of feature extraction networks, and N is an integer greater than 1; a confidence coefficient obtaining unit 202, configured to obtain, for each key point, an average confidence coefficient of the ith feature extraction network for the key point based on an ith feature extraction result corresponding to each training image; a key point corresponding unit 203, configured to obtain a feature extraction network with the highest average confidence for the key point in all feature extraction networks as a feature extraction network corresponding to the key point; and the converged network unit 204 is configured to obtain a converged feature extraction network based on the feature extraction network corresponding to each of the key points.

Referring to fig. 14, in a specific embodiment, the training data module may include: the to-be-labeled image unit 301 is configured to obtain a plurality of to-be-labeled slot images, and use each to-be-labeled slot image as one training image; a data labeling unit 302, configured to perform data labeling on each training image to obtain labeled data corresponding to each training image.

Referring to fig. 15, in a specific embodiment, the model training module may include: a scoring result unit 401, configured to score the fused feature extraction network by using the maps and the OKS based on the labeled data corresponding to each training image and the fused feature extraction result corresponding to each training image, so as to obtain a scoring result corresponding to the fused feature extraction network; an error loss unit 402, configured to determine a square error loss corresponding to each training image by using a square error loss function based on the annotation data corresponding to each training image and the fusion feature extraction result corresponding to each training image; a detection model unit 403, configured to iterate the fused feature extraction network based on a scoring result corresponding to the fused feature extraction network and a square error loss corresponding to each training image, until the fused feature extraction network meets an iteration termination condition, and use the trained fused feature extraction network as the tiger keypoint detection model.

Referring to fig. 16, in some optional embodiments, the model training apparatus may further include: a key point data module 104, configured to obtain tiger key point data; and the weight setting module 105 is configured to set weights for the plurality of key points of the tiger respectively based on the tiger key point data.

Referring to fig. 17, in a specific implementation, an embodiment of the present application further provides a tiger key point detection device, and a specific implementation manner of the detection device is consistent with technical effects achieved by the implementation manners described in the embodiments of the model training method, and details of some details are not repeated.

The tiger key point detection device includes: the to-be-detected image module 601 is used for acquiring a to-be-detected tiger image; a key point detection module 602, configured to input the slot image to be detected into a slot key point detection model, so as to obtain positions of multiple key points of the slot in the slot image to be detected; the tiger key point detection model is obtained by training through the model training device.

Referring to fig. 18, an embodiment of the present application further provides an electronic device 200, where the electronic device 200 includes at least one memory 210, at least one processor 220, and a bus 230 connecting different platform systems.

The memory 210 may include readable media in the form of volatile memory, such as random access memory (pram) 211 and/or cache memory 212, and may further include Read Only Memory (ROM) 213.

The memory 210 further stores a computer program, and the computer program can be executed by the processor 220, so that the processor 220 executes the steps of the model training method or the steps of the tiger key point detection method in the embodiment of the present application, and a specific implementation manner of the method is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method, and details of some of the contents are not repeated.

Memory 210 may also include a utility 214 having at least one program module 215, such program modules 215 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Accordingly, the processor 220 may execute the computer programs described above, and may execute the utility 214.

Bus 230 may be a local bus representing one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or any other type of bus structure.

The electronic device 200 may also communicate with one or more external devices 240, such as a keyboard, pointing device, bluetooth device, etc., and may also communicate with one or more devices capable of interacting with the electronic device 200, and/or with any devices (e.g., routers, modems, etc.) that enable the electronic device 200 to communicate with one or more other computing devices. Such communication may be through input-output interface 250. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

In a specific embodiment, the electronic device 200 may also be provided with a camera.

The embodiment of the present application further provides a computer-readable storage medium, and a specific implementation manner of the computer-readable storage medium is consistent with technical effects achieved by the implementation manners described in the embodiments of the model training method or the tiger key point detection method, and some contents are not described again.

The computer readable storage medium is used for storing a computer program or a tiger key point detection model, wherein the computer program is executed to implement the steps of the model training method or the steps of the tiger key point detection method in the embodiment of the present application, and the tiger key point detection model is obtained by training with the model training method in the embodiment of the present application.

Fig. 19 shows a program product 300 for implementing the model training method provided in this embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be executed on a terminal device, such as a personal computer. However, the program product 300 of the present invention is not so limited, and in this application, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Program product 300 may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that can communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on an associated device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The method for detecting the key points of the tigers is more accurate, more stable and higher in efficiency by adopting a model training method, so that the trend of geographic spatial distribution of the tigers is monitored and the population is tracked, and the method has a good application prospect. FIG. 20 is a schematic flow chart diagram of yet another model training method of the present application.

While the present application is described in terms of various aspects, including exemplary embodiments, the principles of the invention should not be limited to the disclosed embodiments, but are also intended to cover various modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A model training method, characterized in that the model training method comprises:

acquiring a plurality of training data, wherein each training data comprises a training image and marking data corresponding to the training image, and the marking data corresponding to the training image is used for indicating the positions of a plurality of key points of the tiger in the training image;

inputting each training image into a fusion feature extraction network to obtain a fusion feature extraction result corresponding to each training image, wherein the fusion feature extraction result corresponding to each training image comprises detection results of a plurality of key points of the tiger in each training image by the fusion feature extraction network, and the detection result of each key point is used for indicating the position and the confidence coefficient of the key point;

training the fusion feature extraction network based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image to obtain a tiger key point detection model, wherein the tiger key point detection model is used for detecting the positions of a plurality of key points of the tiger in the tiger image;

the method for acquiring the fusion feature extraction network comprises the following steps:

respectively inputting each training image into an ith feature extraction network to obtain an ith feature extraction result corresponding to each training image, wherein the ith feature extraction result corresponding to each training image comprises the detection result of the ith feature extraction network on a plurality of key points of the tiger in each training image, the value of i is each positive integer not greater than N, N is the number of feature extraction networks and N is an integer greater than 1;

for each key point, acquiring the average confidence of the ith feature extraction network for the key point based on the ith feature extraction result corresponding to each training image;

acquiring a feature extraction network with the highest average confidence coefficient aiming at the key points in all the feature extraction networks as a feature extraction network corresponding to the key points;

and acquiring a fused feature extraction network based on the feature extraction network corresponding to each key point.

2. The model training method of claim 1, wherein the obtaining a plurality of training data comprises:

acquiring a plurality of tiger images to be marked, and taking each tiger image to be marked as one training image;

and carrying out data annotation on each training image to obtain annotation data corresponding to each training image.

3. The model training method of claim 1, wherein N-3;

the first feature extraction network takes HRNetW48 as a main network and adds a Spatial Group-wise Enhance Module attention Module;

the second feature extraction network takes HRNetW48 as a main network and adds a Spatial Attention module;

the third feature extraction network uses the Unet Plus + ResNet152 as a backbone network.

4. The model training method according to claim 1, wherein the training of the fused feature extraction network based on the labeling data corresponding to each of the training images and the fused feature extraction result corresponding to each of the training images to obtain the tiger keypoint detection model comprises:

based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image, scoring the fusion feature extraction network by using mAP and OKS to obtain a scoring result corresponding to the fusion feature extraction network;

determining the square error loss corresponding to each training image by using a square error loss function based on the annotation data corresponding to each training image and the fusion feature extraction result corresponding to each training image;

and iterating the fusion feature extraction network based on the scoring result corresponding to the fusion feature extraction network and the square error loss corresponding to each training image until the fusion feature extraction network meets the condition of ending iteration, and taking the trained fusion feature extraction network as the tiger key point detection model.

5. The model training method of claim 1, further comprising:

obtaining tiger key point data;

and setting weights for the plurality of key points of the tiger respectively based on the tiger key point data.

6. A tiger key point detection method is characterized by comprising the following steps:

acquiring a tiger image to be detected;

inputting the image of the tiger to be detected into a tiger key point detection model to obtain the positions of a plurality of key points of the tiger in the image of the tiger to be detected;

wherein, the tiger key point detection model is obtained by training by using the model training method of any one of claims 1 to 5.

7. A model training apparatus, the model training apparatus comprising:

the training data module is used for acquiring a plurality of training data, each training data comprises a training image and marking data corresponding to the training image, and the marking data corresponding to the training image is used for indicating the positions of a plurality of key points of the tiger in the training image;

the feature extraction module is used for inputting each training image into a fusion feature extraction network to obtain fusion feature extraction results corresponding to each training image, the fusion feature extraction results corresponding to each training image comprise detection results of a plurality of key points of the tiger in each training image by the fusion feature extraction network, and the detection result of each key point is used for indicating the position and the confidence coefficient of the key point;

the model training module is used for training the fusion feature extraction network based on the labeling data corresponding to each training image and the fusion feature extraction result corresponding to each training image to obtain a tiger key point detection model, and the tiger key point detection model is used for detecting the positions of a plurality of key points of the tiger in the tiger image;

the converged feature extraction network is obtained by using a network convergence module, and the network convergence module includes:

the extraction unit is used for respectively inputting the training images into an ith feature extraction network to obtain ith feature extraction results corresponding to the training images, the ith feature extraction results corresponding to the training images comprise detection results of the ith feature extraction network on a plurality of key points of the tiger in the training images, the value of i is each positive integer not greater than N, N is the number of the feature extraction networks, and N is an integer greater than 1;

a confidence coefficient obtaining unit, configured to obtain, for each key point, an average confidence coefficient of the ith feature extraction network for the key point based on an ith feature extraction result corresponding to each training image;

a key point corresponding unit, configured to acquire a feature extraction network with the highest average confidence for the key point in all feature extraction networks as a feature extraction network corresponding to the key point;

and the fusion network unit is used for acquiring a fusion feature extraction network based on the feature extraction network corresponding to each key point.

8. A tiger key point detection device, characterized in that said tiger key point detection device comprises:

the to-be-detected image module is used for acquiring a to-be-detected tiger image;

the key point detection module is used for inputting the tiger image to be detected into the tiger key point detection model to obtain the positions of a plurality of key points of the tiger in the tiger image to be detected;

9. An electronic device, characterized in that the electronic device comprises a memory storing a computer program and a processor implementing the steps of the model training method according to any one of claims 1-5 or the steps of the slot keypoint detection method according to claim 6 when the processor executes the computer program.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program or a tiger keypoint detection model;

the computer program when executed by a processor implementing the steps of the model training method of any one of claims 1-5 or the steps of the tiger keypoint detection method of claim 6;

the tiger keypoint detection model is trained by the model training method of any one of claims 1 to 5.