CN110991443A

CN110991443A - Key point detection method, image processing method, key point detection device, image processing device, electronic equipment and storage medium

Info

Publication number: CN110991443A
Application number: CN201911040317.7A
Authority: CN
Inventors: 降小龙
Original assignee: Beijing Haiyi Tongzhan Information Technology Co Ltd
Current assignee: Beijing Haiyi Tongzhan Information Technology Co Ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-04-10

Abstract

The application relates to a key point detection method, a key point image processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a first image to be detected, wherein the first image to be detected comprises at least one target object; inputting the first image to be detected into a position detection model trained in advance to obtain position information corresponding to the target object in the first image to be detected; generating a second image to be detected corresponding to the target object according to the position information; and inputting the second image to be detected into a pre-trained key point detection model to obtain the key points of the target object. According to the technical scheme, the key point detection is carried out on the target object through the key point detection model, the accuracy rate of the key point detection of the target object can be guaranteed, and meanwhile, the calculated amount can be reduced. In addition, the system also lays a foundation for realizing the deployment of the mobile terminal and the subsequent intelligent livestock management.

Description

Key point detection method, image processing method, key point detection device, image processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for detecting a key point, a method and an apparatus for processing an image, an electronic device, and a storage medium.

Background

Along with the development of artificial intelligence, intelligent breeding is more and more popular, but still has not few problems to realizing real intelligent breeding, and at present, current algorithm all assumes to go on under perfect camera angle, the same illumination condition, and when the scene was transplanted, camera angle, illumination condition all can take place very big transform, and the algorithm does not possess the generalization.

For example: when the target object is the livestock, factors such as changeable illumination environment in an actual scene, dirt on the body of the livestock, high-light objects on the ground and the like can cause contour extraction errors, and further the detection precision of key points on the body of the livestock is influenced.

Disclosure of Invention

In order to solve the technical problems or at least partially solve the technical problems, the application provides a key point detection method, an image processing method, a key point detection device, an image processing device, an electronic device and a storage medium.

In a first aspect, the present application provides a method for detecting a key point, including:

acquiring a first image to be detected, wherein the first image to be detected comprises at least one target object;

inputting the first image to be detected into a position detection model trained in advance to obtain position information corresponding to the target object in the image to be detected;

generating a second image to be detected corresponding to the target object according to the position information;

and inputting the second image to be detected into a pre-trained key point detection model to obtain the key points of the target object.

In one possible embodiment, the keypoint detection model comprises: a convolution sub-network and a deconvolution sub-network;

inputting the second image to be detected into a pre-trained key point detection model to obtain the key points of the target object, wherein the method comprises the following steps:

inputting the second image to be detected into a convolution sub-network to obtain a first convolution result;

determining a preset number of local coordinates corresponding to the target object according to the first convolution result;

inputting the first convolution result into the deconvolution sub-network to obtain a second convolution result;

segmenting the second convolution result according to the preset number of local coordinates to obtain the preset number of local images;

and determining key points of the target object according to the preset number of local images.

In a possible embodiment, the determining the key point of the target object according to the preset number of local images includes:

acquiring two target points meeting preset conditions from each local image according to a preset sequence from a preset starting point;

and when the pixel difference between the two target points is smaller than a preset pixel difference, taking the target point with the largest pixel value as the key point.

In one possible embodiment, the method further comprises:

and when the pixel difference between the two target points is larger than the preset pixel difference, acquiring a midpoint between the two target points, and taking the midpoint as the key point.

In one possible embodiment, the method further comprises:

and when the pixel values of the two target points are equal, taking the target point which is the closest to the preset starting point in the two target points as the key point.

In one possible embodiment, the method further comprises:

acquiring a first sample image, wherein the first sample image comprises the target object;

acquiring first labeling information of the first sample image, wherein the first labeling information comprises: key point information of the target object in the first sample image;

and training the first sample image and the key point information of the target object according to a first preset convolutional neural network model to obtain the key point detection model.

In one possible embodiment, the method further comprises:

acquiring a second sample image, wherein the second sample image at least comprises one target object;

acquiring second labeling information in the second sample image, wherein the second labeling information comprises target object labeling information corresponding to the target object in the second sample image;

and training the second sample image and the target object labeling information according to a second preset convolutional neural network model to obtain the position detection model.

In a second aspect, the present application provides an image processing method, comprising:

acquiring a target object sample image;

acquiring marking information of the target object sample image, wherein the marking information comprises: first key point information of a target object in the target object sample image;

training the target sample image and the first key point information by adopting a preset convolutional neural network model to obtain a key point detection model;

the preset convolution neural network model comprises a convolution sub-network and a deconvolution sub-network, wherein the convolution sub-network is used for performing convolution processing on the target object sample image to obtain a first convolution result, the deconvolution sub-network is used for performing deconvolution processing on the first convolution result to obtain a second convolution result, and the second convolution result is used for determining the key point of the target object.

In a possible implementation manner, the training the target sample image and the first keypoint information by using a preset convolutional neural network model to obtain a keypoint detection model includes:

inputting the target object sample image into the preset convolutional neural network model to obtain second key point information;

calculating the confrontation loss of the second key point information and the first key point information;

and performing iterative training on the preset convolutional neural network model according to the countermeasure loss to obtain the key point detection model.

In a third aspect, the present application provides a keypoint detection apparatus, comprising:

the device comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a first image to be detected, and the first image to be detected comprises at least one target object;

the first input module is used for inputting the first image to be detected into a position detection model trained in advance to obtain position information corresponding to the target object in the first image to be detected;

the generating module is used for generating a second image to be detected corresponding to the target object according to the position information;

and the second input module is used for inputting the second image to be detected into a pre-trained key point detection model to obtain the key points of the target object.

In a fourth aspect, the present application provides an image processing apparatus comprising:

the first acquisition module is used for acquiring a target object sample image;

a second obtaining module, configured to obtain annotation information of the target sample image, where the annotation information includes: first key point information of a target object in the target object sample image;

and the training module is used for training the target sample image and the key point information of the target object by adopting a preset convolutional neural network model to obtain a key point detection model.

In a fifth aspect, the present application provides an electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the above method steps when executing the computer program.

In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the above-mentioned method steps.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the technical scheme, the key point detection is carried out on the target object through the key point detection model, the accuracy rate of key point detection can be guaranteed, and meanwhile, the calculated amount can be reduced. In addition, the system also lays a foundation for realizing the deployment of the mobile terminal and the subsequent intelligent livestock management.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a method for detecting a key point according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of an image obtained by inputting a first image to be detected into a position detection model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a second image to be detected generated according to position information according to an embodiment of the present application;

FIG. 4 is a model structure of a keypoint detection model provided by an embodiment of the present application;

fig. 5 is a flowchart of an image processing method according to another embodiment of the present application;

fig. 6 is a block diagram of a key point detecting device according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an image processing apparatus according to another embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The method provided by the embodiment of the invention can be applied to any required electronic equipment, such as servers, terminals and other electronic equipment, is not particularly limited, and is convenient to describe, and is hereinafter simply referred to as electronic equipment. First, a key point detection method provided by an embodiment of the present invention is described below.

Fig. 1 is a flowchart of a method for detecting a keypoint according to an embodiment of the present disclosure, where as shown in fig. 1, the method includes the following steps:

step S11, a first image to be detected is acquired, where the first image to be detected includes at least one target object.

Step S12, inputting the first image to be detected into a position detection model trained in advance, and obtaining position information corresponding to the target object in the first image to be detected.

Step S13, a second image to be detected corresponding to the target object is generated based on the position information.

And step S14, inputting the second image to be detected into a pre-trained key point detection model to obtain the key points of the target object.

The target object related to the embodiment of the application can be livestock, such as: cattle, sheep, horses, pigs, etc., and also poultry, such as: chicken, duck, goose, etc. According to the embodiment, the key point detection is carried out on the target object through the key point detection model, so that the accuracy of key point detection can be ensured, and meanwhile, the calculation amount can be reduced. In addition, the system also lays a foundation for realizing the deployment of the mobile terminal and the subsequent intelligent livestock management.

The pig house image is acquired by shooting the pig house, wherein the acquired pig house image comprises at least one pig, then the average value reduction and normalization operation are performed on the pig house image to obtain a first image to be detected, and the first image to be detected is input into a position detection model trained in advance, as shown in fig. 2, fig. 2 is an image obtained by inputting the first image to be detected into the position detection model in the embodiment of the application, so that the position information corresponding to each pig in the first image to be detected can be obtained, and the position information in the embodiment is the surrounding area coordinate of the pig. And generating a second image to be detected corresponding to the pig according to the obtained surrounding area coordinates, as shown in fig. 3, fig. 3 is a schematic diagram of the second image to be detected generated according to the position information provided by the embodiment of the present application.

And after a second image to be detected is obtained, inputting the second image to be detected into the key point detection model to obtain the key points of the pigs. Fig. 4 is a schematic model structure diagram of a keypoint detection model provided in an embodiment of the present application, and as shown in fig. 4, the keypoint detection model includes: convolution sub-networks and deconvolution sub-networks.

In this embodiment, the key point detection model is specifically processed as follows:

inputting the second image to be detected into a convolution sub-network, carrying out primary convolution calculation on the pig in the second image to be detected by the convolution sub-network to obtain a first convolution result, obtaining local coordinates of the pig in the second image to be detected according to the first convolution result, inputting the first convolution result into a deconvolution sub-network to obtain a second convolution result, segmenting the second convolution result according to the local coordinates to obtain local images of five channels, and determining key points of the pig according to the obtained five local images.

After the key points of the pigs are obtained, determining first coordinates of the key points of the pigs in the second image to be detected, and converting the first coordinates into second coordinates of the key points in the first image to be detected, so as to obtain the key points of each pig in the first image to be detected.

Optionally, the specific way of determining the key points of the pigs according to the obtained five local images is as follows: starting from the upper left corner (namely a preset starting point) of each local image, acquiring two pixel points with the largest pixel values from each local image according to the sequence from the upper left corner to the lower right corner, wherein the acquired local image comprises the pixel values of each pixel point in the local image, so that the preset conditions are met: and two pixel points with the maximum pixel value in the local image. And the two pixel points are used as target points.

In this embodiment, the preset pixel difference is 10 pixels, and when the pixel difference between two target points is smaller than 10 pixels, the target point with the largest pixel value is taken as a key point, or when the pixel values of the two target points are equal, the target point closest to the upper left corner (i.e., the preset starting point) of the two target points is taken as a key point.

In an actual scene, there are cases where pigs are only occluded and overlapped, and a target point with a maximum pixel value may be a key point of another pig, that is, the target point with the maximum pixel value in this case is not necessarily the most accurate point. Therefore, when the pixel difference between the two target points is greater than 10 pixels, the coordinates of the two target points are obtained, the average coordinate is calculated according to the coordinates of the two target points, and the pixel point corresponding to the average coordinate is taken as a key point, namely, the midpoint between the two target points is obtained. In this case, the error can be reduced by using the midpoint of the two target points as a key point.

MobileNetV1 used in the convolution sub-network in the keypoint detection model in this embodiment. MobileNetV1 is a lightweight weight deep neural network model proposed for mobile-end and embedded vision applications, the core of which is a deep separable convolution. The depth separable convolution includes a depth convolution and a point-by-point convolution. The depth convolution filters the input channels without increasing the number of channels, and the point-by-point convolution is used to connect different channels of the depth convolution, which may increase the number of channels. Thereby reducing the amount of computation.

In addition, by adding a deconvolution sub-network after the convolution sub-network, it is possible to fuse high-level feature information, which is a feature output from the deconvolution sub-network and has a high feature probability and classification capability, with shallow-level feature information. The shallow feature information refers to features output by a convolution sub-network, more image semantic information is reserved, and meanwhile small target information is well guaranteed. By fusing the characteristics of the shallow layer and the deep layer, the model has better characteristic expression capability and better effect. And compared with the method of directly segmenting the convolution result output by the convolution sub-network, the method can effectively improve the precision of image processing.

Optionally, the key point detection model is obtained by the following method: acquiring a first sample image, wherein the first sample image is a single pig image, and acquiring first annotation information of the first sample image, and the first annotation information comprises: the first sample image is the key point information of the pig, and the key point information can be the coordinates of the key point. And training the first sample image and the key point information of the pig according to the first preset convolutional neural network model to obtain a key point detection model.

Optionally, the position detection model is obtained by: acquiring a second sample image, where the second sample image may be a pigsty image, the second sample image at least includes one pig, and then acquiring second annotation information in the second sample image, where the second annotation information includes only pig annotation information corresponding to the pig in the second sample image, and in this embodiment, the only pig annotation information may be: the characteristic information of the pig and the real position information of the pig, and the position information can be the coordinates of a bounding box.

After the second sample image and the pig labeling information are obtained, iterative training is carried out on the second sample image and the pig labeling information according to a second preset convolutional neural network model to obtain a position detection model, and optionally, the second preset convolutional neural network is a MobileNet V2-YOLOv3 model.

The specific training process of the position detection model is as follows: and inputting the second sample image into a second preset convolution neural network to obtain position information output by the model, calculating the countermeasure loss of the position information output by the model and the real position information, and performing iterative training on the second preset convolution model according to the countermeasure loss to obtain a position detection model.

Fig. 5 is a flowchart of an image processing method according to another embodiment of the present application. As shown in fig. 5, the method further comprises the steps of:

in step S51, a target sample image is acquired.

Step S52, obtaining label information of the target sample image, the label information including: first keypoint information of the target object in the target object sample image.

And step S53, training the target sample image and the first key point information by adopting a preset convolutional neural network model to obtain a key point detection model.

In this embodiment, the target object sample image is a sample image of a single pig, and the image is labeled manually to obtain labeling information of the target sample image, where the labeling information includes first key point information of the target object, and the first key point information is five key points of the pig.

The method comprises the steps of training a target sample image and five key points of a pig by adopting a preset convolution neural network model, wherein the preset convolution neural network model comprises a convolution sub-network and a deconvolution sub-network, the convolution sub-network is used for carrying out convolution processing on the target sample image to obtain a first convolution result, the deconvolution sub-network is used for carrying out deconvolution processing on the first convolution result to obtain a second convolution result, and the second convolution result is used for determining the key points of a target object.

The training steps are as follows: and inputting the target object sample image into a preset convolution neural network model to obtain second key point information, and calculating the countermeasure loss of the second key point information and the first key point information.

Optionally, in this embodiment, the target object sample image is subjected to gaussian blurring through four different gaussian kernels, namely 7 × 7, 9 × 9, 11 × 11, and 15 × 15, to obtain 4 different pieces of second key point information, then an average value of the four pieces of second key point information is calculated, the average value is used as an antagonistic loss, and then the preset convolutional neural network model is subjected to iterative training according to the antagonistic loss, so as to obtain the key point detection model.

Fig. 6 is a block diagram of a key point detecting apparatus provided in an embodiment of the present application, which may be implemented as part of or all of an electronic device through software, hardware, or a combination of the two. As shown in fig. 6, the key point detecting apparatus includes:

an obtaining module 601, configured to obtain a first image to be detected, where the first image to be detected includes at least one target object;

a first input module 602, configured to input a first image to be detected into a pre-trained position detection model, so as to obtain position information corresponding to a target object in the first image to be detected;

a generating module 603, configured to generate a second image to be detected corresponding to the target object according to the position information;

the second input module 604 is configured to input the second image to be detected into the pre-trained keypoint detection model to obtain keypoints of the target object.

Fig. 7 is a block diagram of an image processing apparatus provided in an embodiment of the present application, which may be implemented as part or all of an electronic device through software, hardware, or a combination of the two. As shown in fig. 7, the image processing apparatus includes:

a first obtaining module 701, configured to obtain a target sample image;

a second obtaining module 702, configured to obtain label information of the target sample image, where the label information includes: first key point information of a target object in the target object sample image;

the training module 703 is configured to train the target sample image and the first keypoint information of the target object by using a preset convolutional neural network model, so as to obtain a keypoint detection model.

An embodiment of the present application further provides an electronic device, as shown in fig. 8, the electronic device may include: the system comprises a processor 1501, a communication interface 1502, a memory 1503 and a communication bus 1504, wherein the processor 1501, the communication interface 1502 and the memory 1503 complete communication with each other through the communication bus 1504.

A memory 1503 for storing a computer program;

the processor 1501 is configured to implement the steps of the above embodiments when executing the computer program stored in the memory 1503.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

inputting a first image to be detected into a position detection model trained in advance to obtain position information corresponding to a target object in the image to be detected;

Optionally, the computer program, when executed by the processor, further implements the steps of:

the key point detection model comprises: a convolution sub-network and a deconvolution sub-network;

inputting the second image to be detected into a pre-trained key point detection model to obtain the key points of the target object, and the method comprises the following steps:

determining local coordinates of a preset number corresponding to the target object according to the first convolution result;

inputting the first convolution result into a deconvolution sub-network to obtain a second convolution result;

segmenting the second convolution result according to the local coordinates of the preset number to obtain the local images of the preset number;

determining key points of the target object according to the preset number of local images, wherein the key points comprise:

and when the pixel difference between the two target points is smaller than the preset pixel difference, taking the target point with the largest pixel value as the key point.

and when the pixel values of the two target points are equal, taking the target point which is closest to the preset starting point in the two target points as the key point.

acquiring a first sample image, wherein the first sample image comprises a target object;

acquiring first annotation information of a first sample image, wherein the first annotation information comprises: key point information of a target object in the first sample image;

and training the first sample image and the key point information of the target object according to the first preset convolutional neural network model to obtain a key point detection model.

acquiring second labeling information in the second sample image, wherein the second labeling information comprises target object labeling information corresponding to a target object in the second sample image;

and training the second sample image and the target object labeling information according to a second preset convolutional neural network model to obtain a position detection model.

The computer program when executed by the processor further implements the steps of:

acquiring a target object sample image;

acquiring marking information of a target object sample image, wherein the marking information comprises: first key point information of a target object in the target object sample image;

the preset convolution neural network model comprises a convolution sub-network and a deconvolution sub-network, the convolution sub-network is used for performing convolution processing on a target object sample image to obtain a first convolution result, the deconvolution sub-network is used for performing deconvolution processing on the first convolution result to obtain a second convolution result, and the second convolution result is used for determining key points of the target object.

The computer program when executed by a processor implements the steps of:

training the target sample image and the first key point information by adopting a preset convolutional neural network model to obtain a key point detection model, comprising the following steps of:

inputting the target object sample image into a preset convolutional neural network model to obtain second key point information;

and performing iterative training on the preset convolutional neural network model according to the countermeasure loss to obtain a key point detection model.

It should be noted that, for the above-mentioned apparatus, electronic device and computer-readable storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

It is further noted that, herein, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for detecting a keypoint, comprising:

inputting the first image to be detected into a position detection model trained in advance to obtain position information corresponding to the target object in the first image to be detected;

2. The method of claim 1, wherein the keypoint detection model comprises: a convolution sub-network and a deconvolution sub-network;

3. The method of claim 2, wherein determining the keypoints of the target object from the preset number of local images comprises:

4. The method of claim 3, further comprising:

5. The method of claim 3, further comprising:

6. The method of claim 1, further comprising:

7. The method of claim 1, further comprising:

8. An image processing method, comprising:

acquiring a target object sample image;

acquiring annotation information of the target object sample image, wherein the annotation information comprises: first key point information of a target object in the target object sample image;

9. The method of claim 8, wherein the training the target sample image and the first keypoint information by using a preset convolutional neural network model to obtain a keypoint detection model comprises:

10. A keypoint detection device, comprising:

11. An image processing apparatus characterized by comprising:

a second obtaining module, configured to obtain label information of the target sample image, where the label information includes: key point information of the target object in the target object sample image;

12. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor, when executing the computer program, implementing the method steps of any of claims 1-9.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 9.