WO2021068589A1

WO2021068589A1 - Method and apparatus for determining object and key points thereof in image

Info

Publication number: WO2021068589A1
Application number: PCT/CN2020/102518
Authority: WO
Inventors: 周婷; 吕晋; 周伟杰
Original assignee: 东软睿驰汽车技术（沈阳）有限公司
Priority date: 2019-10-09
Filing date: 2020-07-17
Publication date: 2021-04-15
Also published as: CN110765898B; CN110765898A

Abstract

A method and apparatus for determining an object and key points thereof in an image. After image features corresponding to an image to be detected are extracted from said image, position information of a target object on said image and position information of key points of the target object are directly determined according to the image features corresponding to said image. The position information of the target object on said image can accurately represent the position of the target object in said image, and the position information of the key points of the target object can accurately represent the positions of the key points of the target object in said image. Therefore, according to the method for determining an object and key points thereof in an image provided by the present invention, the position information of the target object and the position information of the key points thereof can be effectively determined from the image to be detected, and the object and the key points thereof in the image can thus be accurately detected.

Description

Method and device for determining objects and their key points in images

The present invention claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 9, 2019, the application number is CN201910954556.7, and the application title is "a method and device for determining an object and its key points in an image", which The entire content is incorporated into the present invention by reference.

Technical field

This application relates to the field of image processing technology, and in particular to a method and device for determining objects and their key points in an image.

Background technique

With the popularity of vehicles, vehicle driving safety is becoming more and more important. Among them, the driver's fatigue driving is an important factor that can affect driving safety.

At present, the fatigue driving detection system of the vehicle is usually used to determine whether the driver is in a fatigue driving state, so as to take corresponding reminder measures to the driver when it is determined that the driver is in a fatigue driving state. Among them, in order to be able to effectively determine whether the driver is in a fatigued driving state, it is necessary to perform face detection and face key point detection based on the image including the driver's face, so that the face detection results and face key points can be used later The detection result determines the mental state of the driver.

The above analysis shows that in order to improve driving safety, it is necessary to accurately detect the face and key points in the image, so that the driver's mental state can be accurately judged based on the detected face and the key points. However, how to accurately detect objects (for example, human faces) and their key points in an image is a technical problem to be solved urgently.

Summary of the invention

In order to solve the above technical problems in the prior art, the present application provides a method and device for determining the object and its key points in an image, which can effectively determine the position of the object and its key point in the image, so as to accurately detect Figure out the objects and their key points in the image.

In order to achieve the foregoing objectives, the technical solutions provided by the embodiments of the present application are as follows:

The embodiment of the present application provides a method for determining an object and its key points in an image, including:

Performing feature extraction on the image to be detected to obtain image features corresponding to the image to be detected;

According to the image features corresponding to the image to be detected, the position information of the target object on the image to be detected and the position information of the key points of the target object are determined; wherein, the key points of the target object are used to characterize the Structure.

Optionally, the determining the position information of the target object on the image to be detected and the key point position information of the target object according to the image feature corresponding to the image to be detected is specifically:

Determine the location information of the target object on the image to be detected according to the image feature corresponding to the image to be detected, and determine the target object based on the image feature corresponding to the image to be detected and the location information of the target object on the image to be detected The key point position information of the object.

Optionally, the position information of the target object on the image to be detected is determined according to the image feature corresponding to the image to be detected, and the position information of the target object on the image to be detected is determined according to the image feature corresponding to the image to be detected and the target object is on the image to be detected The location information to determine the key point location information of the target object, including:

According to the image features corresponding to the image to be detected, use a pre-built first detection model to detect the target object and its key points, and determine the position information of the target object on the image to be detected and the key point position information of the target object;

Wherein, the first detection model includes a first object position detection network layer and a first key point position detection network layer, and the output result of the first object position detection network layer is the first key point position detection network layer The first object position detection network layer is used to determine the position information of the target object on the image to be detected according to the image features corresponding to the image to be detected, and the first key point position detection network layer is used to According to the image feature corresponding to the image to be detected and the position information of the target object on the image to be detected, the key point position information of the target object is determined.

Optionally, the training process of the first detection model specifically includes:

Obtain the image features corresponding to the training image, the actual position information of the target object on the training image, and the actual key point position information of the target object;

According to the image features corresponding to the training image, use the first detection model to detect the target object and its key points, and determine the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object;

According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, Judging whether the first detection model meets the first preset condition;

If it is determined that the first detection model does not meet the first preset condition, update the first detection model, and continue to execute "Using the first detection model to detect the target object and its key points according to the image features corresponding to the training image , Determine the predicted location information of the target object on the image to be detected and the predicted key point location information of the target object".

Optionally, the first preset condition is: the rate of change of the detection loss of the first detection model is lower than a first preset loss threshold, and/or the number of training rounds of the first detection model reaches the first preset round Number threshold.

Optionally, when the first preset condition includes that the rate of change of the detection loss of the first detection model is lower than the first preset loss threshold, according to the actual position information of the target object on the training image, The actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, determine whether the first detection model meets the first preset condition, Specifically:

According to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected, the detection loss of the target object is determined; wherein the detection loss of the target object includes the recognition loss of the target object and the target Loss of object position detection;

Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object;

Determine the detection loss of the first detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object;

According to the detection loss of the first detection model and the historical detection loss of the first detection model, the detection loss change rate of the first detection model is determined; the historical detection loss of the first detection model is the first determined in the historical training process. 1. Detection loss of the detection model;

If the detection loss change rate of the first detection model is lower than the first preset loss threshold, determining that the first detection model meets the first preset condition;

If the detection loss change rate of the first detection model is not lower than the first preset loss threshold, it is determined that the first detection model does not meet the first preset condition.

Optionally, according to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the position information of the target object Predicting the location information of key points and judging whether the first detection model meets the first preset condition specifically includes:

If the detection loss change rate of the first detection model is lower than the first preset loss threshold, or the number of training rounds of the first detection model reaches the first preset number of rounds threshold, it is determined that the first detection model reaches the first Precondition

If the detection loss change rate of the first detection model is not lower than the first preset loss threshold, and the number of training rounds of the first detection model does not reach the first preset number of rounds threshold, it is determined that the first detection model is not The first preset condition is reached.

According to the image features corresponding to the image to be detected, the position information of the target object on the image to be detected and the position information of key points in each image subregion on the image to be detected are determined, and according to the position of the target object on the image to be detected The position information and the key point position information in each image sub-region on the image to be detected determine the key point position information of the target object; the image sub-region is the region of interest corresponding to the anchor point.

Optionally, the position information of the target object on the image to be detected and the position information of key points in each image subregion on the image to be detected are determined according to the image features corresponding to the image to be detected, and based on the target object The location information on the image to be detected and the location information of the key points in each image sub-region on the image to be detected, to determine the location information of the key points of the target object, specifically:

According to the image features corresponding to the image to be detected, use a pre-built second detection model to detect the target object and its key points, and determine the position information of the target object on the image to be detected and the key point position information of the target object;

Wherein, the second detection model includes a second object position detection network layer, a second key point position detection network layer, and an object key point position determination network layer, and the output result of the second object position detection network layer and the The output result of the second key point position detection network layer is the input data of the object key point position determination network layer, and the second object position detection network layer is used to determine the target object according to the image feature corresponding to the image to be detected Position information on the image to be detected; and the second key point position detection network layer is used to determine the position information of key points in each image subregion on the image to be detected according to the image characteristics corresponding to the image to be detected; and The object key point position determination network layer is used to determine the key point position of the target object according to the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected information.

Optionally, the training process of the second detection model specifically includes:

According to the image features corresponding to the training image, use the second detection model to detect the target object and its key points, and determine the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object;

According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, Judging whether the second detection model meets a second preset condition;

If it is determined that the second detection model does not meet the second preset condition, update the second detection model, and continue to execute "Using the second detection model to detect the target object and its key points according to the image features corresponding to the training image , Determine the predicted location information of the target object on the image to be detected and the predicted key point location information of the target object".

Optionally, the second preset condition is: the rate of change of the detection loss of the second detection model is lower than a second preset loss threshold, and/or the number of training rounds of the second detection model reaches the second preset round Number threshold.

Optionally, when the second preset condition includes that the rate of change of the detection loss of the second detection model is lower than a second preset loss threshold, according to the actual position information of the target object on the training image, The actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, determine whether the second detection model meets a second preset condition, Specifically:

Determine the detection loss of the second detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object;

According to the detection loss of the second detection model and the historical detection loss of the second detection model, the detection loss change rate of the second detection model is determined; the historical detection loss of the second detection model is the first determined in the historical training process. 2. Detection loss of the detection model;

If the detection loss change rate of the second detection model is lower than a second preset loss threshold, determining that the second detection model meets the second preset condition;

If the detection loss change rate of the second detection model is not lower than the second preset loss threshold, it is determined that the second detection model does not meet the second preset condition.

Optionally, according to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the position information of the target object Predicting the location information of key points and judging whether the second detection model meets the second preset condition specifically includes:

If the detection loss change rate of the second detection model is lower than the second preset loss threshold, or the number of training rounds of the second detection model reaches the second preset number of rounds threshold, it is determined that the second detection model reaches the second Precondition

If the detection loss change rate of the second detection model is not lower than the second preset loss threshold, and the number of training rounds of the second detection model does not reach the second preset number of rounds threshold, it is determined that the second detection model is not The second preset condition is reached.

The embodiment of the present application also provides an apparatus for determining an object and its key points in an image, including:

An extraction unit, configured to perform feature extraction on the image to be detected to obtain image features corresponding to the image to be detected;

The determining unit is configured to determine the position information of the target object on the image to be detected and the key point position information of the target object based on the image feature corresponding to the image to be detected; wherein, the key point of the target object is used for Characterize the structural characteristics of the target object.

An embodiment of the present application also provides a device, which includes a processor and a memory:

The memory is used to store a computer program;

The processor is configured to execute any embodiment of the method for determining the object and its key points in the image provided above according to the computer program.

The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any of the methods for determining objects and their key points in the image provided above. One embodiment.

Compared with the prior art, the embodiments of the present application have at least the following advantages:

In the method for determining an object and its key points in an image provided by the embodiment of the present application, after the image feature corresponding to the image to be detected is extracted from the image to be detected, the target object is determined directly according to the image feature corresponding to the image to be detected The position information on the image to be detected and the key point position information of the target object. Among them, because the position information of the target object on the image to be detected can accurately represent the position of the target object in the image to be detected, and the key point position information of the target object can accurately represent the key point position of the target object in the image to be detected, The method for determining the object and its key points in the image provided by the embodiments of the application can effectively determine the position information of the target object and the position information of its key points from the image to be detected, so as to accurately detect the object and the key point in the image. Its key point.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in this application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a schematic diagram of using two cascaded models to determine objects and their key points in an image according to an embodiment of the application;

2 is a flowchart of a method for determining an object and its key points in an image provided by Embodiment 1 of the method of this application;

FIG. 3 is a flowchart of a method for determining objects and their key points in an image provided by the second embodiment of the method of this application;

FIG. 4 is a schematic diagram when the first detection model provided by an embodiment of the application is applied to the detection of the position of a human face and its key points;

FIG. 5 is a flowchart of the training process of the first detection model provided by an embodiment of the application;

FIG. 6 is a flowchart of an implementation manner of step S53 according to an embodiment of this application;

FIG. 7 is a flowchart of another implementation manner of step S53 according to an embodiment of the application;

FIG. 8 is a flowchart of a method for determining an object and its key points in an image provided by the fourth embodiment of the method of this application;

FIG. 9 is a schematic diagram when the second detection model provided by an embodiment of the application is applied to the detection of the position of a human face and its key points;

FIG. 10 is a flowchart of the training process of the second detection model provided by an embodiment of the application;

FIG. 11 is a flowchart of an implementation manner of step S103 according to an embodiment of this application;

FIG. 12 is a flowchart of another implementation manner of step S103 according to an embodiment of this application;

FIG. 13 is a schematic structural diagram of an apparatus for determining an object and its key points in an image provided by an embodiment of the application;

FIG. 14 is a schematic diagram of the device structure provided by an embodiment of the application.

Detailed ways

In order to solve the technical problems in the background technology, the inventor found through research that cascaded object detection models (for example, face detection models) and key point detection models of objects (for example, face key point detection models) can be used to The position information of the target object (for example, human face) on the image to be detected and the key point position information of the target object are sequentially determined. Among them, because the object detection model includes the image feature extraction network and the object position detection network, and the key point detection model of the object includes the image feature extraction network and the object key point detection network, the cascaded object detection model and the key point of the object are used. When the point detection model determines the target object and its key points, it needs to go through the following steps in sequence: image feature extraction of the image to be detected → detection of the location information of the target object → image feature extraction of the image of the target object → detection of the key points of the target object location information.

In order to facilitate the understanding and explanation of the above process, the following describes the detection process of the human face and its key points.

As an example, as shown in Figure 1, when the target object is a face, the cascaded face detection model and the face key point detection model are used to determine the face and its key points. Specifically, the process is as follows: First, the detected image Perform feature extraction to obtain the first image feature; secondly, use the face detection network to detect the first image feature to obtain the face image (that is, the face detection result); then, perform feature extraction on the face image to obtain The second image feature; finally, the second image feature is detected using the face key point detection network to obtain an image that carries the face key point information (that is, the face key point detection result).

Based on the technical problems in the background technology and the above technical solutions, the inventor has conducted further research and found that when using the cascaded object detection model and the key point detection model of the object to determine the object and its key points, two image feature extractions are required. This increases the complexity of the process of determining objects and their key points in the image, reduces the efficiency of determining objects and their key points in the image, and also increases the memory consumption when determining objects and their key points in the image. In addition, since the object detection model and the key point detection model of the object need to be implemented by deep neural networks, and each deep neural network consumes a large amount of memory and time during use and training, therefore, cascaded object detection The use process and training process of the model and the key point detection model of the object require a large amount of memory and time, which increases the acquisition cost of the object and its key point in the image, and limits the cascade-based object detection model and the object The scope of application of the detection method of objects and their key points in the image of the key point detection model.

In order to solve the technical problems of the background technology and overcome the defects of the technical solutions of the two cascaded models described above, an embodiment of the present application provides a method for determining objects and their key points in an image. The method specifically includes: After the image feature corresponding to the image to be detected is extracted from the image to be detected, the location information of the target object on the image to be detected and the key point location information of the target object are determined directly according to the image feature corresponding to the image to be detected.

In the method for determining the object and its key points in the image provided by the embodiments of the present application, since the position information of the target object on the image to be detected can accurately represent the position of the target object in the image to be detected, and the key point position of the target object The information can accurately characterize the position of the key points of the target object in the image to be detected. Therefore, the method for determining the object and its key points in the image provided by the embodiments of the present application can effectively determine the position information and the position of the target object from the image to be detected. The location information of its key points can accurately detect the objects and their key points in the image. In addition, since the image feature corresponding to the image to be detected is obtained, the location information of the target object on the image to be detected and the key point location information of the target object can be determined directly based on the image feature corresponding to the image to be detected. The process of restoring the position information of the target object on the image to be detected into an object detection image and performing feature extraction on the object detection image, so that the position information of the target object on the image to be detected and the key point position information of the target object are determined Only one image feature extraction process is required in the process, which simplifies the process of determining objects and their key points in the image, improves the efficiency of determining objects and their key points in the image, and reduces the determination of objects and their key points in the image Memory loss at the time.

In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only These are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Method embodiment one

Refer to FIG. 2, which is a flowchart of a method for determining an object and its key points in an image provided in the first method of the application.

The method for determining objects and their key points in an image provided by the embodiments of the present application specifically includes steps S21-S22:

S21: Perform feature extraction on the image to be detected to obtain image features corresponding to the image to be detected.

The image to be detected refers to the image that needs to be detected.

The image feature corresponding to the image to be detected is used to characterize the feature information of the image to be detected.

The embodiment of the present application does not limit the image feature extraction method, and any existing or future image feature extraction method capable of extracting features from the image to be detected can be used. For example, the image feature extraction method may be a convolution algorithm.

S22: Determine the position information of the target object on the image to be detected and the key point position information of the target object according to the image features corresponding to the image to be detected; wherein the key points of the target object are used to characterize the target The structural characteristics of the object.

The target object refers to the object that needs to be recognized on the image; moreover, the embodiment of the present application does not limit the target object, and the target object may be a human face, a vehicle, a tea cup, and other objects. In addition, the target object can be set in advance, especially according to the application scenario.

The position information of the target object on the image to be detected is used to record the position-related information of the area where the target object is located in the image to be detected; moreover, the embodiment of this application does not limit the representation form of the position information of the target object on the image to be detected , Can be represented by at least one of words, numbers, symbols, etc. For example, the position information of the target object on the image to be detected can use the region of interest (roi) corresponding to the anchor point and the deviation of the area of the target object in the image to be detected relative to the roi corresponding to the anchor. Shift parameters to describe. Among them, the roi corresponding to each anchor can use the regional parameters (x, y, w, h), and x and y are used to represent the center point pixel coordinates of the roi corresponding to the anchor; w is used to represent the roi corresponding to the anchor Width; h is used to indicate the height of the roi corresponding to the anchor. The offset parameter can use the area offset parameter (△x, △y, △w, △h), and △x and △y are used to indicate that the pixel coordinates of the center point of the target object in the image to be detected are relative to the anchor The offset of the pixel coordinates of the center point of the corresponding roi; △w is used to indicate the width offset of the target object in the image to be detected relative to the width of the roi corresponding to the anchor; △h is used to indicate the image to be detected The height offset of the area where the target object is located relative to the height of the roi corresponding to the anchor.

In order to facilitate the understanding and explanation of the position information of the target object on the image to be detected, the following description is combined with examples.

As an example, suppose that the target object is a human face; and the image to be detected includes the first anchor to the Nth anchor; and the area parameter of the roi corresponding to the first anchor is (x ₁ , y ₁ , w ₁ , h ₁ ) , The area parameter of the roi corresponding to the second anchor is (x ₂ , y ₂ , w ₂ , h ₂ ),..., the area parameter of the roi corresponding to the Nth anchor is (x _N , y _N , w _N , h _N ); and the roi corresponding to the T-th anchor carries a human face; and the offset parameter of the area of the face in the image to be detected relative to the roi corresponding to the T-th anchor is (△x, △y, △w ,△h), and 1≤T≤N. Based on the above assumptions, it can be known that the position information of the face on the image to be detected can use the regional parameters of the area where the face is located in the image to be detected (x _N +△x,y _N +△y,w _N +△w,h _N +△h) means.

The key points of the target object are used to characterize the structural characteristics of the target object. For example, when the target object is a human face, the key points of the target object may include related points that can characterize the position of the facial features of the human face and the position of the contour of the face.

In the embodiment of the present application, after the image feature corresponding to the image to be detected is acquired, the position information of the target object on the image to be detected and the key point position of the target object can be directly determined according to the image feature corresponding to the image to be detected information.

Further, the present application further provides two specific embodiments of step S22, and step S22 DETAILED DESCRIPTION two embodiments respectively four and two embodiments are described in the methods described below the methods described below In embodiments, i.e., the step of both S22 DETAILED DESCRIPTION hereinafter embodiments are the method of Example II step S32 and step four methods described below in Example S82 embodiment, technical details, see below.

The above is the specific implementation of the method for determining the object and its key points in the image provided by method embodiment 1. In this implementation, after the image feature corresponding to the image to be detected is extracted from the image to be detected, it is directly based on the The image feature corresponding to the image to be detected determines the position information of the target object on the image to be detected and the key point position information of the target object. Among them, because the position information of the target object on the image to be detected can accurately represent the position of the target object in the image to be detected, and the key point position information of the target object can accurately represent the key point position of the target object in the image to be detected, The method for determining the object and its key points in the image provided by the embodiments of the application can effectively determine the position information of the target object and the position information of its key points from the image to be detected, so as to accurately detect the object and the key point in the image. Its key point. In addition, since the image feature corresponding to the image to be detected is obtained, the location information of the target object on the image to be detected and the key point location information of the target object can be determined directly based on the image feature corresponding to the image to be detected. The process of restoring the position information of the target object on the image to be detected into an object detection image and performing feature extraction on the object detection image, so that the position information of the target object on the image to be detected and the key point position information of the target object are determined Only one image feature extraction process is required in the process, which simplifies the process of determining objects and their key points in the image, improves the efficiency of determining objects and their key points in the image, and reduces the determination of objects and their key points in the image Memory loss at the time.

In order to improve the accuracy of the position detection of objects and their key points in the image, based on the method for determining objects and their key points in the image provided in the first embodiment of the above method, this embodiment of the application also provides the determination of objects and their key points in the image. Another embodiment of the method is explained and illustrated below in conjunction with the drawings.

Method embodiment two

The second method embodiment is an improvement made on the basis of the first method embodiment. For the sake of brevity, the parts in the second method embodiment with the same content as in the first method embodiment will not be repeated here.

Refer to FIG. 3, which is a flowchart of a method for determining an object and its key points in an image provided by the second method of the application.

The method for determining the object and its key points in the image provided by the embodiment of the present application specifically includes steps S31-S32:

S31: Perform feature extraction on the image to be detected to obtain image features corresponding to the image to be detected.

It should be noted that the specific content of step S31 is the same as the content of step S21 in the first embodiment of the above method, and for the sake of brevity, it will not be repeated here.

S32: Determine the position information of the target object on the image to be detected according to the image feature corresponding to the image to be detected, and determine the position information of the target object on the image to be detected according to the image feature corresponding to the image to be detected and the position information of the target object on the image to be detected. The key point position information of the target object is determined; wherein, the key point of the target object is used to characterize the structural feature of the target object.

In the embodiment of the present application, after the image features corresponding to the image to be detected are acquired, the image features corresponding to the image to be detected need to be used to determine the position information of the target object on the image to be detected, and then the target object is used in the image to be detected. The location information on the image and the image features corresponding to the image to be detected determine the location information of the key points of the target object.

In addition, in order to improve the efficiency of determining the object and its key points in the image, the first detection model formed by the fusion of the object position detection network and the object key point detection network can be used to determine the object and its key points. Based on this, the embodiment of the present application also provides an implementation manner of step S32. In this implementation manner, step S32 may specifically be: according to the image features corresponding to the image to be detected, using a pre-built first detection model to pair The target object and its key points are detected, and the position information of the target object on the image to be detected and the key point position information of the target object are determined.

The first detection model is used to detect the position information of the target object on the image to be detected and the key point position information of the target object according to the image features corresponding to the image to be detected.

The first detection model includes a first object position detection network layer and a first key point position detection network layer, and the output result of the first object position detection network layer is the input data of the first key point position detection network layer, And the first object position detection network layer is used to determine the position information of the target object on the image to be detected according to the image features corresponding to the image to be detected, and the first key point position detection network layer is used to determine the position information of the target object on the image to be detected according to the image feature to be detected. The image feature corresponding to the detected image and the position information of the target object on the image to be detected are determined, and the key point position information of the target object is determined. That is, in the first detection model, the input data "image feature corresponding to the image to be detected" of the first detection model is the input data of the first object position detection network layer, and the output data of the first object position detection network layer is The first key point position detects the input data of the network layer. For example, when the target object is a human face, as shown in Figure 4, the input data of the first detection model "image features corresponding to the image to be detected" is the input data of the face detection network layer, and the output data of the face detection network layer It is the input data of the face key point detection network layer. It should be noted that: in Figure 4, "face detection network layer" is used to indicate the "first object position detection network layer" when applied to face position and key point detection; "face key point detection network layer" "Is used to indicate the "first key point position detection network layer" when it is applied to the detection of the face position and its key points.

It should be noted that the embodiment of the present application does not limit the way in which the first key point position detection network layer in the first detection model uses the output data of the first object position detection network layer. For example, the first key point position detection network can be used The layer directly uses the output data of the first object position detection network layer; it can also first determine the image area related information including the target object according to the output data of the first object position detection network layer, and then use the first key point position detection network layer to include The image area related information of the target object is not specifically limited in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the first detection model is a deep neural network, and the deep neural network is composed of a first object position detection network layer and a first key point position detection network layer. In this way, when using the first detection model to detect objects and their key points, only one deep neural network needs to be run, which saves running memory and running time.

It is further noted that the embodiment is also provided a process of training a first detection model in the present application embodiment, and the training process according to a third embodiment will be described in detail in the process, technical details, refer to the method of Example III.

The above is the specific implementation of the method for determining the object and its key points in the image provided by method embodiment 2. In this implementation, after the image feature corresponding to the image to be detected is extracted from the image to be detected, the Image features corresponding to the image to be detected, determine the position information of the target object on the image to be detected, and then directly determine the key point position of the target object based on the position information of the target object on the image to be detected and the corresponding image features of the image to be detected information. Among them, because the position information of the target object on the image to be detected can accurately represent the position of the target object in the image to be detected, and the key point position information of the target object can accurately represent the key point position of the target object in the image to be detected, The method for determining the object and its key points in the image provided by the embodiments of the application can effectively determine the position information of the target object and the position information of its key points from the image to be detected, so as to accurately detect the object and the key point in the image. Its key point. In addition, since the position information of the target object on the image to be detected is obtained, the position information of the target object on the image to be detected can be directly used as the basis for determining the key point position information of the target object. The process of restoring the position information on the image to be detected to the object detection image and performing feature extraction on the object detection image, so that the process of determining the position information of the target object on the image to be detected and the key point position information of the target object only needs to be performed An image feature extraction process simplifies the process of determining objects and their key points in the image, improves the efficiency of determining objects and their key points in the image, and reduces the memory loss when determining objects and their key points in the image.

In addition, in order to further reduce the memory and time consumption when determining the object and its key points in the image, the first detection model can be used to detect the target object and its key points to determine the position information of the target object on the image to be detected and the target The key point position information of the object. Among them, since the first detection model is a deep neural network that includes the first object position detection network layer and the first key point position detection network layer, only one training model is required when training the first detection model. A deep neural network is sufficient, which reduces the memory and time consumption when training the first detection model; moreover, when using the first detection model to detect objects and their key points, you only need to run a deep neural network, which reduces Memory and time loss when determining objects and their key points.

Based on the method for determining objects and their key points in the image provided by the above method embodiment 1 and method embodiment 2, in order to further improve the accuracy of the objects and their detection points in the determined image, the first detection model can be trained to Improve the detection effect of the first detection model. Based on this, the embodiment of the present application also provides a training process of the first detection model, which will be explained and described below with reference to the accompanying drawings.

Method Example Three

Refer to FIG. 5, which is a flowchart of the training process of the first detection model provided in an embodiment of the application.

The training process of the first detection model provided in the embodiment of the application includes steps S51-S55:

S51: Obtain image features corresponding to the training image, actual position information of the target object on the training image, and actual key point position information of the target object.

The training image is used to represent the image used when training the first detection model; moreover, this application does not limit the type of training image. For example, the training image may be an image collected by an image acquisition device (for example, a camera), or It can be an image that carries the position information of the target object, or it can be an image that carries the key point position information of the target object. In addition, the embodiment of the present application does not limit the number of training images, and the number of training images can be preset, especially according to application scenarios.

The actual position information of the target object on the training image is used to record the relevant information about the actual position of the target object in the training image.

The actual key point position information of the target object is used to record the actual position related information of the key point of the target object in the training image.

It should be noted that the embodiment of the application does not limit the method for acquiring image features corresponding to the training image, and any method for acquiring the image features corresponding to the training image can be used; the embodiment of the application does not limit the target object on the training image. The method for obtaining actual position information of the target object can be any method that can obtain the actual position information of the target object on the training image; the embodiment of this application does not limit the method for obtaining the actual key point position information of the target object, and any method can be used. An acquisition method that can acquire the actual key point position information of the target object.

S52: Use the first detection model to detect the target object and its key points according to the image features corresponding to the training image, and determine the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object.

The predicted position information of the target object on the image to be detected is used to record the position related information of the target object on the image to be detected obtained by the first detection model.

The predicted key point position information of the target object is used to record the information about the predicted key point position of the target object in the image to be detected obtained by using the first detection model.

In the embodiment of the present application, after the image feature corresponding to the training image is obtained, the image feature corresponding to the training image is input into the first detection model, so that the first detection model can target the target according to the image feature corresponding to the training image. The object and its key points are detected, the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object are determined, and the predicted position information of the target object on the image to be detected and the target are determined by the first detection model. The predicted key point position information of the object is output.

S53: According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position of the target object Information, it is determined whether the first detection model meets the first preset condition; if it is, step S55 is executed; if not, step S54 is executed.

The first preset condition is used to record the training cut-off condition information of the training process of the first detection model; moreover, the first preset condition may be preset or set according to the application scenario. For example, the first preset condition may specifically include: the rate of change of the detection loss of the first detection model is lower than the first preset loss threshold, and/or the number of training rounds of the first detection model reaches the first preset number of rounds threshold .

In the embodiment of the present application, the predicted position information of the object and its key points detected by the first detection model can be compared with the actual position information of the object and its key points, so as to determine that the prediction of the first detection model is accurate according to the comparison result. It is specifically: according to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the position information of the target object. Predict the location information of key points, and determine whether the first detection model meets the first preset condition. If the first detection model meets the first preset condition, it indicates the prediction of the object and its key points detected by the first detection model The position information is very close to the actual position information of the object and its key points, which indicates that the prediction accuracy of the first detection model is high, and there is no need to train the first detection model. At this time, the first detection model can be ended. The training process; if the first detection model does not meet the first preset condition, it means the difference between the predicted position information of the object and its key points detected by the first detection model and the actual position information of the object and its key points Larger, which indicates that the prediction accuracy of the first detection model is low, and further training of the first detection model is needed. At this time, the first detection model needs to be updated to improve the prediction accuracy of the updated first detection model Sex.

In addition, in order to improve the accuracy of the evaluation of the detection effect of the first detection model, the embodiment of the present application also provides a specific implementation manner of step S53, as shown in FIG. 6, in this implementation manner, when the first preset When the condition includes that the change rate of the detection loss of the first detection model is lower than the first preset loss threshold, step S53 may specifically include steps S53A1-S53A7:

S53A1: Determine the detection loss of the target object according to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected.

The detection loss of the target object is used to record the loss caused by the first detection model in the process of acquiring the position information of the target object on the image to be detected; moreover, the detection loss of the target object includes the recognition loss of the target object (L- softmax) and the position detection loss of the target object (L-regression).

Among them, the recognition loss of the target object is used to record the loss caused by the first detection model when recognizing the target object in the image. That is, the recognition loss of the target object is used to record whether the first detection model can recognize the target object from the image. In addition, the recognition loss of the target object may include the loss caused by not recognizing the target object in the image, the loss caused by recognizing other objects in the image as the target object, and the loss caused by recognizing the target object in the image to be detected as other objects. Damage caused by objects, etc.

The position detection loss of the target object is used to record the loss caused by the first detection model in the process of determining the position information of the target object on the image to be detected; moreover, the position detection loss of the target object may include Detect the loss caused by inaccurate predicted position information on the image. It should be noted that, when determining the position detection loss of the target object, a two-dimensional loss function (l2-loss) can usually be used for calculation, that is, the distance difference between the predicted position coordinate and the actual position coordinate can be calculated.

Based on the above content, after acquiring the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected, the actual position information can be compared with the predicted position information to determine the first detection model At the same time, the distance difference between the actual position information and the predicted position information can be calculated to determine the position detection loss of the target object of the first detection model.

It should be noted that the embodiment of the application does not limit the calculation method of the recognition loss of the target object, and any existing or future calculation method that can measure the recognition effect of the target object in the image can be used to determine the recognition of the target object. loss. In addition, the embodiment of the application does not limit the calculation method of the position detection loss of the target object, and any existing or future calculation method that can measure the position detection effect of the target object can be used to determine the position detection loss of the target object. .

S53A2: Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object.

The loss-dpoint detection loss of the key point position of the target object is used to record the loss caused by the first detection model in the process of obtaining the key point position information of the target object; moreover, the key point position detection loss of the target object includes the loss-dpoint caused by the target object. Loss caused by inaccurate position information of the predicted key points of the object. It should be noted that the two-dimensional loss function (l2-loss) can usually be used to calculate the detection loss of the key point position of the target object, that is, to calculate the distance difference between the predicted position coordinate and the actual position coordinate.

It should be noted that the embodiment of the application does not limit the calculation method of the key point position detection loss of the target object, and any existing or future calculation method that can measure the position detection effect of the key point of the target object can be used. To determine the detection loss of the key point position of the target object.

S53A3: Determine the detection loss of the first detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object.

The loss-detect of the first detection model is used to record the loss caused by the first detection model in the process of acquiring the position information of the target object on the image to be detected and the key point position information of the target object.

In this embodiment of the application, in order to improve the accuracy of the evaluation of the model detection effect, a preset loss function can be used to determine the detection loss of the first detection model, and the loss function is:

loss-detect=α×L-softmax+β×L-regression+F×σ×loss-dpoint,

In the formula, loss-detect represents the detection loss of the first detection model; α represents the influence weight of the recognition loss of the target object; L-softmax represents the recognition loss of the target object; β represents the influence weight of the position detection loss of the target object; L- Regression represents the location detection loss of the target object; F represents the existence identification of the key point location detection loss of the target object. If the training image includes the target object (that is, the positive sample training image), the F value corresponding to the training image is 1 , If the training image does not include the target object (that is, the negative sample training image), the F value corresponding to the training image is 0; σ represents the influence weight of the key point position detection loss of the target object; loss-dpoint represents the target object The key point position detection loss.

It should be noted that the value of F is determined according to whether the target object is included in the training image, specifically: if a target object is included in a training image, the key point related information of the target object needs to be considered. At this time, the first detection is measured When the model detects the loss of the training image, it needs to consider the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object; if the target object is not included in a training image, there is no need to consider the key of the target object Point-related information. At this time, when measuring the detection loss of the training image by the first detection model, only the recognition loss of the target object and the position detection loss of the target object need to be considered.

In the embodiment of this application, the above-mentioned loss function is to jointly supervise the detection effect of the first detection model by using three loss indicators: the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object. Therefore, when using the above loss function to evaluate the detection effect of the first detection model on the object and its key points, it can effectively determine whether the current first detection model can accurately detect the object and its key points in the image. Thus, the detection accuracy of the trained first detection model is improved.

S53A4: Determine the detection loss change rate of the first detection model according to the detection loss of the first detection model and the historical detection loss of the first detection model.

The historical detection loss of the first detection model is the detection loss of the first detection model determined during the historical training process. For example, suppose that the first detection model is sequentially trained from the first round to the Y-round training, and the first-round training is earlier than the second-round training, and the second-round training is earlier than the third-round training..., Y-th The first round of training is earlier than the Y-round training, and the Y-round training is the current training process being executed. At this time, for the Y-round training, the first-round training to the Y-1th round of training are all the Y-round The historical training process of training, then, the detection loss of the first detection model determined in the first round of training to the detection loss of the first detection model determined in the Y-1 round of training are all historical detections of the Y-round training loss.

The detection loss change rate of the first detection model is used to characterize the change information of the detection effect of the first detection model during multiple rounds of training; moreover, the smaller the detection loss change rate of the first detection model, the greater the detection effect of the first detection model. Good, and the larger the detection loss change rate of the first detection model is, the worse the detection effect of the first detection model is.

S53A5: Determine whether the detection loss change rate of the first detection model is lower than the first preset loss threshold, if yes, execute step S53A6; if not, execute step S53A7.

The first preset loss threshold can be set in advance, especially can be set according to application scenarios.

In the embodiment of the present application, after the detection loss of the first detection model is determined based on the three supervision factors: the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object, the detection loss is based on the first The detection loss of the detection model determines the detection loss change rate of the first detection model. At this time, if the detection loss change rate of the first detection model is lower than the first preset loss threshold, it means that the detection loss change of the first detection model is small and close to the stable optimal model, thereby indicating that the first detection model is used. The accuracy of the objects and their key points in the image obtained by the model detection is relatively high, so that it is determined that the first detection model has reached the first preset condition, and then it can be determined that the first detection model does not need to be trained; The detection loss change rate is not lower than the first preset loss threshold, which means that the first detection model has a large change in detection loss and is farther from the optimal model that is stable, thus indicating that the object in the image detected by the first detection model The accuracy of its key points is low, so it is determined that the first detection model does not meet the first preset condition, and then it can be determined that the first detection model still needs to be trained again.

S53A6: Determine that the first detection model meets the first preset condition.

S53A7: Determine that the first detection model does not meet the first preset condition.

It should be noted that there is no fixed execution order between step S53A1 and step S53A2. Step S53A1 and step S53A2 can be executed in sequence, step S53A2 and step S53A1 can be executed in sequence, or step S53A1 and step S53A2 can be executed simultaneously.

The foregoing is an implementation manner of step S53. In this implementation manner, the detection loss change rate of the first detection model is lower than the first preset loss threshold as the training cut-off condition of the first detection model.

In addition, in order to improve the evaluation accuracy of the detection effect of the first detection model, the embodiment of the present application also provides another specific implementation manner of step S53, as shown in FIG. Assuming that the condition includes that the rate of change of the detection loss of the first detection model is lower than the first preset loss threshold, step S53 may specifically include steps S53B1-S53B7:

S53B1: Determine the detection loss of the target object according to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected.

Wherein, the detection loss of the target object includes the recognition loss of the target object and the position detection loss of the target object.

It should be noted that the content of step S53B1 is the same as the content of step S53A1 described above. For the sake of brevity, it will not be repeated here. For technical details of step S53B1, please refer to the content of step S53A1 described above.

S53B2: Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object.

It should be noted that the content of step S53B2 is the same as the content of step S53A2 described above. For the sake of brevity, it will not be repeated here. For technical details of step S53B2, please refer to the content of step S53A2 described above.

S53B3: Determine the detection loss of the first detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object.

It should be noted that the content of step S53B3 is the same as the content of step S53A3 described above. For the sake of brevity, it will not be repeated here. For technical details of step S53B3, please refer to the content of step S53A3 described above.

S53B4: Determine the detection loss change rate of the first detection model according to the detection loss of the first detection model and the historical detection loss of the first detection model.

The historical detection loss of the first detection model is the detection loss of the first detection model determined during the historical training process.

It should be noted that the content of step S53B4 is the same as the content of step S53A4. For the sake of brevity, it will not be repeated here. For technical details of step S53B4, please refer to the content of step S53A4.

S53B5: Determine whether at least one of the following two conditions is met, and the two conditions are: the detection loss change rate of the first detection model is lower than the first preset loss threshold and the number of training rounds of the first detection model reaches the first A preset threshold for the number of rounds; if yes, execute step S53B6; if not, execute step S53B7.

In the embodiment of the present application, after the detection loss change rate of the first detection model and the number of training rounds of the first detection model are obtained, it is necessary to perform the action "to judge whether the detection loss change rate of the first detection model is lower than the first detection model. A preset loss threshold" and an action "whether the number of training rounds of the first detection model reaches the first preset number of rounds threshold". Wherein, as long as it is determined that the detection loss change rate of the first detection model is lower than the first preset loss threshold, regardless of whether the action "whether the number of training rounds of the first detection model reaches the first preset threshold" is performed or not Directly execute step S53B6; moreover, as long as it is determined that the number of training rounds of the first detection model reaches the first preset round number threshold, no matter whether the action is performed or not, it is judged whether the detection loss change rate of the first detection model is lower than the first prediction. Set the loss threshold”, then directly execute step S53B6; moreover, it is only determined that the detection loss change rate of the first detection model is not lower than the first preset loss threshold, and the number of training rounds of the first detection model does not reach the first preset Step S53B7 is executed only when the threshold of the number of rounds is set.

It should be noted that in this embodiment of the application, the action "judging whether the detection loss change rate of the first detection model is lower than the first preset loss threshold" and the action "whether the number of training rounds of the first detection model reaches the first predetermined Set the execution order of the number of rounds threshold. You can execute the actions “judge whether the detection loss change rate of the first detection model is lower than the first preset loss threshold” and the action “whether the number of training rounds of the first detection model reach the first preset” in sequence. Set the number of rounds threshold", you can also perform the actions "whether the number of training rounds of the first detection model reaches the first preset round number threshold" and the action "judge whether the detection loss change rate of the first detection model is lower than the first preset" Loss threshold", you can also perform the two actions at the same time.

S53B6: Determine that the first detection model reaches a first preset condition.

S53B7: Determine that the first detection model does not meet the first preset condition.

It should be noted that there is no fixed execution order between step S53B1 and step S53B2. Step S53B1 and step S53B2 can be executed in sequence, step S53B2 and step S53B1 can be executed in sequence, or step S53B1 and step S53B2 can be executed simultaneously.

The above is another embodiment of step S53. In this embodiment, the detection loss change rate of the first detection model is lower than the first preset loss threshold and the number of training rounds of the first detection model reaches the first preset number of rounds threshold. As the training cut-off condition of the first detection model.

The above is the relevant content of step S53.

S54: Update the first detection model, and continue to execute step S52.

In the embodiment of the present application, when it is determined that the first detection model does not meet the first preset condition, the predicted position information of the target object on the image to be detected and the target object on the training image detected by the first detection model may be used. The gap between the actual position information of the target object and the gap between the predicted key point position information of the target object and the actual key point position information of the target object to update the first detection model so that the updated first detection model detects The obtained predicted position information of the target object on the image to be detected is closer to the actual position information of the target object on the training image, and the predicted key point position information of the target object is closer to the actual key point position information of the target object, thereby This enables the updated first detection model to more accurately detect objects and their key points in the image.

S55: End the training process for the first detection model.

The foregoing is a specific implementation manner of the training process of the first detection model provided in Method Example 3. In this implementation manner, the image features corresponding to the training image, the actual position information of the target object on the training image, and the target object’s actual position are acquired. After the actual key point position information, the first detection model is used to detect the target object and its key points according to the image characteristics corresponding to the training image, and the predicted position information of the target object on the image to be detected and the prediction of the target object are determined Key point position information; and then according to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the target object’s position information Predict the position information of the key points, and determine whether the first detection model meets the first preset condition, so as to determine whether to continue training the first detection model according to the judgment result. Among them, because the embodiment of this application is based on the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the target object’s position information. Predicting the four pieces of information of the key point location information to evaluate the detection effect of the first detection model makes the evaluation result more reasonable and effective, so that the detection effect of the first detection model that satisfies the first preset condition is better.

In addition, in order to further improve the accuracy of the evaluation of the detection effect of the first detection model, the embodiment of the present application also uses three loss factors: the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object. To jointly supervise the detection effect of the first detection model, the evaluation result is more reasonable and effective, so that the detection effect of the first detection model that meets the first preset condition is better.

In order to improve the accuracy of position detection of objects and their key points in an image, based on the method for determining objects and their key points in an image provided by the foregoing method embodiments, an embodiment of the present application also provides a method for determining objects and their key points in an image Another embodiment of, will be explained and described below with reference to the accompanying drawings.

Method embodiment four

Method embodiment 4 is an improvement based on method embodiment 1 to method embodiment 3. For the sake of brevity, the parts in method embodiment 2 that are the same as those in method embodiment 1 to method embodiment 3 will not be repeated here. .

Refer to FIG. 8, which is a flowchart of a method for determining an object and its key points in an image provided in the fourth embodiment of the method of this application.

The method for determining an object and its key points in an image provided by the embodiment of the present application specifically includes steps S81-S82:

S81: Perform feature extraction on the image to be detected to obtain image features corresponding to the image to be detected.

It should be noted that the specific content of step S81 is the same as the content of step S21 in the first embodiment of the method, and for the sake of brevity, the details are not repeated here.

S82: Determine the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected according to the image features corresponding to the image to be detected, and determine the position information of the target object in the image to be detected The location information on the image and the location information of the key points in each image subregion on the image to be detected determine the location information of the key points of the target object.

Among them, the image sub-region is the region of interest corresponding to the anchor point. It should be noted that, for the technical details of the "region of interest corresponding to the anchor point", please refer to the related content in step S22 of the first method embodiment. In addition, the image to be detected usually includes at least one image sub-region.

In the embodiment of the present application, after the image feature corresponding to the image to be detected is acquired, the position information of the target object on the image to be detected and the image sub-regions on the image to be detected are determined according to the image features corresponding to the image to be detected. According to the position information of the key points in the image to be detected, the position information of the key points of the target object is determined according to the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected. As an example, assume that the image to be detected includes the first image sub-region to the P-th image sub-region, and the target object is located in the W-th image sub-region. Based on this assumption, step S82 specifically includes: determining the position information of the target object on the image to be detected and the key point position information in the first image subregion of the image to be detected according to the image features corresponding to the image to be detected to the Pth The position information of the key points in the image sub-region. At this time, since the target object is located in the W-th image sub-region, the position information of the target object on the image to be detected and the position information of each image sub-region on the image to be detected are The key point position information is to determine the key point position information of the target object, that is, to determine the key point position information of the target object in the W-th image subregion.

It should be noted that, in the embodiment of the present application, after acquiring the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected, the image subregion where the target object is located needs to be The key point position within is used as the key point position information of the target object.

In addition, in order to improve the efficiency of determining the object and its key points in the image, the second detection model formed by the fusion of the object position detection network and the object key point detection network can be used to determine the object and its key points. Based on this, the embodiment of the present application also provides an implementation manner of step S82. In this implementation manner, step S82 may specifically be: according to the image feature corresponding to the image to be detected, using a pre-built second detection model to pair The target object and its key points are detected, and the position information of the target object on the image to be detected and the key point position information of the target object are determined.

The second detection model is used to detect the position information of the target object on the image to be detected and the key point position information of the target object according to the image features corresponding to the image to be detected.

The second detection model includes a second object position detection network layer, a second key point position detection network layer, and an object key point position determination network layer, and the output result of the second object position detection network layer and the second key point The output result of the position detection network layer is the input data of the object key point position determination network layer, and the second object position detection network layer is used to determine that the target object is in the image to be detected according to the image features corresponding to the image to be detected And the second key point position detection network layer is used to determine the key point position information in each image sub-area of the image to be detected according to the image characteristics corresponding to the image to be detected; and the object key point The position determination network layer is used to determine the key point position information of the target object according to the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected. That is, in the second detection model, the input data "image features corresponding to the image to be detected" of the second detection model are the input data of the second object position detection network layer and the input data of the second key point position detection network layer. , And the output data of the second object position detection network layer and the output data of the second key point position detection network layer are the input data of the object key point position determination network layer. For example, when the target object is a human face, as shown in Figure 9, the input data of the second detection model "image features corresponding to the image to be detected" is the input data of the face detection network layer and the input of the face key point detection network layer Data, and the output data of the face detection network layer and the output data of the face key point detection network layer are the input data of the face key point determination network layer. It should be noted that: in Figure 9, "face detection network layer" is used to indicate the "second object position detection network layer" applied to the detection of face position and its key points; "face key point detection network layer" "Is used to indicate the "second key point location detection network layer" when applied to face position and key point detection; "face key point determination network layer" is used to indicate when it is applied to face position and key point detection "Determining the network layer of the key point position of the object".

It should be noted that, in the embodiment of the present application, the second detection model is a deep neural network, and the deep neural network is composed of a second object position detection network layer and a second key point position detection network layer. In this way, when using the second detection model to detect objects and their key points, only one deep neural network needs to be run, which saves running memory and running time.

It is further noted that the embodiment is also provided a process of training a second detection model in the present application embodiment, and the training process embodiment of the method will be described in detail in five technical details, refer to the method of Example V.

The above is the specific implementation of the method for determining the object and its key points in the image provided by method embodiment 4. In this implementation, after the image features corresponding to the image to be detected are extracted from the image to be detected, the According to the image features corresponding to the image to be detected, the position information of the target object on the image to be detected and the key point position information in each image sub-area of the image to be detected are determined, and then according to the position information of the target object on the image to be detected And the key point position information in each image sub-region on the image to be detected to determine the key point position information of the target object. Among them, because the position information of the target object on the image to be detected can accurately represent the position of the target object in the image to be detected, and the key point position information of the target object can accurately represent the key point position of the target object in the image to be detected, The method for determining the object and its key points in the image provided by the embodiments of the application can effectively determine the position information of the target object and the position information of its key points from the image to be detected, so as to accurately detect the object and the key point in the image. Its key point. In addition, after the image feature corresponding to the image to be detected is obtained, the position information of the target object on the image to be detected and the key point position information in each image subregion on the image to be detected are directly determined according to the image feature corresponding to the image to be detected , And based on the position information of the target object on the image to be detected and the position information of the key points in each image sub-area of the image to be detected, the key point position information of the target object can be determined, without the need to place the target object on the image to be detected The location information is restored to the object detection image and the process of feature extraction on the object detection image, so that only one image feature extraction is required in the process of determining the location information of the target object on the image to be detected and the key point location information of the target object This simplifies the process of determining the object and its key points in the image, improves the efficiency of determining the object and its key points in the image, and reduces the memory loss when determining the object and its key points in the image.

In addition, in order to further reduce the memory and time consumption when determining the object and its key points in the image, the second detection model can be used to detect the target object and its key points to determine the position information of the target object on the image to be detected and the target The key point position information of the object. Among them, since the second detection model is a deep neural network that includes the second object position detection network layer and the second key point position detection network layer, only one training model is required when training the second detection model. A deep neural network is sufficient, reducing the memory and time consumption when training the second detection model; moreover, when using the second detection model to detect objects and their key points, you only need to run a deep neural network, which reduces Memory and time loss when determining objects and their key points.

Based on the method for determining objects and their key points in the image provided by the foregoing method embodiment 1 to method embodiment 4, in order to further improve the accuracy of the determined objects and their detection points in the image, the second detection model can be trained to improve The detection effect of the second detection model. Based on this, the embodiment of the present application also provides a training process of a second detection model, which will be explained and described below in conjunction with the accompanying drawings.

Method embodiment five

Refer to FIG. 10, which is a flowchart of the training process of the second detection model provided in an embodiment of the application.

The training process of the second detection model provided in the embodiment of the present application includes steps S101-S105:

S101: Obtain image features corresponding to the training image, actual position information of the target object on the training image, and actual key point position information of the target object.

It should be noted that the content of step S101 is the same as the content of step S51 in the third method embodiment. For technical details, please refer to step S51 in the third method embodiment.

S102: Use the second detection model to detect the target object and its key points according to the image features corresponding to the training image, and determine the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object.

The predicted position information of the target object on the image to be detected is used to record information about the position of the target object on the image to be detected obtained by using the second detection model.

The predicted key point position information of the target object is used to record the information about the predicted key point position of the target object in the image to be detected obtained by using the second detection model.

In the embodiment of the present application, after the image feature corresponding to the training image is obtained, the image feature corresponding to the training image is input into the second detection model, so that the second detection model can target the target according to the image feature corresponding to the training image. The object and its key points are detected, the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object are determined, and the predicted position information of the target object on the image to be detected and the target are determined by the second detection model. The predicted key point position information of the object is output.

S103: According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position of the target object Information, it is determined whether the second detection model meets the second preset condition; if so, step S105 is executed; if not, step S104 is executed.

The second preset condition is used to record the training cut-off condition information of the training process of the second detection model; moreover, the second preset condition may be preset or set according to the application scenario. For example, the second preset condition may specifically include: the rate of change of the detection loss of the second detection model is lower than the second preset loss threshold, and/or the number of training rounds of the second detection model reaches the second preset number of rounds threshold .

In the embodiment of the present application, the predicted position information of the object and its key points detected by the second detection model can be compared with the actual position information of the object and its key points, so as to determine that the prediction of the second detection model is accurate according to the comparison result. It is specifically: according to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the position information of the target object. Predict the location information of key points, and determine whether the second detection model meets the second preset condition. If the second detection model meets the second preset condition, it indicates the prediction of the object and its key points detected by the second detection model The position information is very close to the actual position information of the object and its key points, which indicates that the prediction accuracy of the second detection model is high, and there is no need to train the second detection model. At this time, the second detection model can be ended. The training process; if the second detection model does not meet the second preset condition, it means the difference between the predicted position information of the object and its key points detected by the second detection model and the actual position information of the object and its key points Larger, which indicates that the prediction accuracy of the second detection model is low, and further training of the second detection model is needed. At this time, the second detection model needs to be updated to improve the prediction accuracy of the updated second detection model Sex.

In addition, in order to improve the accuracy of the evaluation of the detection effect of the second detection model, the embodiment of the present application also provides a specific implementation of step S103, as shown in FIG. 11, in this implementation, when the second preset When the condition includes that the rate of change of the detection loss of the second detection model is lower than the second preset loss threshold, step S103 may specifically include steps S103A1-S103A7:

S103A1: Determine the detection loss of the target object according to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected.

The detection loss of the target object is used to record the loss caused by the second detection model in the process of acquiring the position information of the target object on the image to be detected; moreover, the detection loss of the target object includes the recognition loss of the target object (L- softmax) and the position detection loss of the target object (L-regression).

Among them, the recognition loss of the target object is used to record the loss caused by the second detection model when recognizing the target object in the image. That is, the recognition loss of the target object is used to record whether the second detection model can recognize the target object from the image. In addition, the recognition loss of the target object may include the loss caused by not recognizing the target object in the image, the loss caused by recognizing other objects in the image as the target object, and the loss caused by recognizing the target object in the image to be detected as other objects. Damage caused by objects, etc.

The position detection loss of the target object is used to record the loss caused by the second detection model in the process of determining the position information of the target object on the image to be detected; moreover, the position detection loss of the target object may include Detect the loss caused by inaccurate predicted position information on the image. It should be noted that, when determining the position detection loss of the target object, a two-dimensional loss function (l2-loss) can usually be used for calculation, that is, the distance difference between the predicted position coordinate and the actual position coordinate can be calculated.

Based on the above content, after acquiring the actual location information of the target object on the training image and the predicted location information of the target object on the image to be detected, the actual location information can be compared with the predicted location information to determine the second detection model At the same time, the distance difference between the actual position information and the predicted position information can be calculated to determine the position detection loss of the target object of the second detection model.

S103A2: Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object.

The key point position detection loss (loss-dpoint) of the target object is used to record the loss caused by the second detection model in the process of obtaining the key point position information of the target object; moreover, the key point position detection loss of the target object includes the loss-dpoint caused by the target object. Loss caused by inaccurate position information of the predicted key points of the object. It should be noted that the two-dimensional loss function (l2-loss) can usually be used to calculate the detection loss of the key point position of the target object, that is, to calculate the distance difference between the predicted position coordinate and the actual position coordinate.

S103A3: Determine the detection loss of the second detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object.

The loss-detect of the second detection model is used to record the loss caused by the second detection model in the process of acquiring the position information of the target object on the image to be detected and the key point position information of the target object.

In this embodiment of the application, in order to improve the accuracy of the evaluation of the model detection effect, a preset loss function can be used to determine the detection loss of the second detection model, and the loss function is:

loss-detect=α×L-softmax+β×L-regression+F×σ×loss-dpoint,

In the formula, loss-detect represents the detection loss of the second detection model; α represents the influence weight of the recognition loss of the target object; L-softmax represents the recognition loss of the target object; β represents the influence weight of the position detection loss of the target object; L- Regression represents the location detection loss of the target object; F represents the existence identification of the key point location detection loss of the target object. If the training image includes the target object (that is, the positive sample training image), the F value corresponding to the training image is 1 , If the training image does not include the target object (that is, the negative sample training image), the F value corresponding to the training image is 0; σ represents the influence weight of the key point position detection loss of the target object; loss-dpoint represents the target object The key point position detection loss.

It should be noted that the value of F is determined according to whether the target object is included in the training image, specifically: if a target object is included in a training image, the key point related information of the target object needs to be considered. At this time, the second detection is measured When the model detects the loss of the training image, it needs to consider the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object; if the target object is not included in a training image, there is no need to consider the key of the target object Point-related information. At this time, when measuring the detection loss of the training image by the second detection model, only the recognition loss of the target object and the position detection loss of the target object need to be considered.

In the embodiment of this application, the loss function mentioned above is to jointly supervise the detection effect of the second detection model by using three loss indicators: the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object. Therefore, when using the above loss function to evaluate the detection effect of the second detection model on the object and its key points, it can effectively determine whether the current second detection model can accurately detect the object and its key points in the image. Thus, the detection accuracy of the trained second detection model is improved.

S103A4: Determine the detection loss change rate of the second detection model according to the detection loss of the second detection model and the historical detection loss of the second detection model.

The historical detection loss of the second detection model is the detection loss of the second detection model determined during the historical training process. For example, suppose that the second detection model is sequentially trained from the first round to the Y-round training, and the first-round training is earlier than the second-round training, and the second-round training is earlier than the third-round training..., Y-th The first round of training is earlier than the Y-round training, and the Y-round training is the current training process being executed. At this time, for the Y-round training, the first-round training to the Y-1th round of training are all the Y-round The historical training process of training, then, the detection loss of the second detection model determined in the first round of training to the detection loss of the second detection model determined in the Y-1 round of training are all historical detections of the Y-round training loss.

The detection loss change rate of the second detection model is used to characterize the change information of the detection effect of the second detection model during multiple rounds of training; moreover, the smaller the detection loss change rate of the second detection model, the better the detection effect of the second detection model. Good, and the larger the detection loss change rate of the second detection model is, the worse the detection effect of the second detection model is.

S103A5: Determine whether the detection loss change rate of the second detection model is lower than the second preset loss threshold, if yes, execute step S103A6; if not, execute step S103A7.

The second preset loss threshold can be set in advance, especially can be set according to application scenarios.

In the embodiment of the present application, after the detection loss of the second detection model is determined according to the three supervision factors of the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object, the detection loss of the second detection model is determined based on the second detection loss. The detection loss of the detection model determines the detection loss change rate of the second detection model. At this time, if the detection loss change rate of the second detection model is lower than the second preset loss threshold, it means that the detection loss of the second detection model has a small change and is close to the stable optimal model, thereby indicating that the second detection model is used. The accuracy of the objects and their key points in the image obtained by the model detection is relatively high, so that it is determined that the second detection model has reached the second preset condition, and then it can be determined that the second detection model does not need to be trained; The detection loss change rate is not lower than the second preset loss threshold, which means that the second detection model has a large change in detection loss and is farther away from the stable optimal model, thus indicating that the object in the image detected by the second detection model The accuracy of its key points is low, so it is determined that the second detection model does not meet the second preset condition, and it can be determined that the second detection model still needs to be trained again.

S103A6: Determine that the second detection model meets a second preset condition.

S103A7: Determine that the second detection model does not meet the second preset condition.

It should be noted that there is no fixed execution order between step S103A1 and step S103A2. Step S103A1 and step S103A2 can be executed in sequence, step S103A2 and step S103A1 can be executed in sequence, or step S103A1 and step S103A2 can be executed simultaneously.

The foregoing is an implementation manner of step S103. In this implementation manner, the detection loss change rate of the second detection model is lower than the second preset loss threshold as the training cut-off condition of the second detection model.

In addition, in order to improve the accuracy of the evaluation of the detection effect of the second detection model, the embodiment of the present application also provides another specific implementation manner of step S103, as shown in FIG. Assuming that the condition includes that the rate of change of the detection loss of the second detection model is lower than the second preset loss threshold, step S103 may specifically include steps S103B1-S103B7:

S103B1: Determine the detection loss of the target object according to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected.

It should be noted that the content of step S103B1 is the same as the content of step S103A1 described above. For the sake of brevity, it will not be repeated here. For technical details of step S103B1, please refer to the content of step S103A1 described above.

S103B2: Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object.

It should be noted that the content of step S103B2 is the same as the content of step S103A2 described above. For the sake of brevity, details are not repeated here. For technical details of step S103B2, please refer to the content of step S103A2 described above.

S103B3: Determine the detection loss of the second detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object.

It should be noted that the content of step S103B3 is the same as the content of step S103A3 described above. For the sake of brevity, it will not be repeated here. For technical details of step S103B3, please refer to the content of step S103A3 described above.

S103B4: Determine the detection loss change rate of the second detection model according to the detection loss of the second detection model and the historical detection loss of the second detection model.

The historical detection loss of the second detection model is the detection loss of the second detection model determined during the historical training process.

It should be noted that the content of step S103B4 is the same as the content of step S103A4 described above. For the sake of brevity, it will not be repeated here. For technical details of step S103B4, please refer to the content of step S103A4 described above.

S103B5: Determine whether at least one of the following two conditions is met, and the two conditions are: the detection loss change rate of the second detection model is lower than the second preset loss threshold and the number of training rounds of the second detection model reaches the first Second, the preset round number threshold; if yes, execute step S103B6; if not, execute step S103B7.

In the embodiment of the present application, after the detection loss change rate of the second detection model and the number of training rounds of the second detection model are obtained, the execution action "judges whether the detection loss change rate of the second detection model is lower than the first 2. Preset loss threshold" and action "Does the number of training rounds of the second detection model reach the second preset number of rounds threshold". Wherein, as long as it is determined that the detection loss change rate of the second detection model is lower than the second preset loss threshold, regardless of whether the action "whether the number of training rounds of the second detection model reaches the second preset threshold" is performed or not Directly execute step S103B6; moreover, as long as it is determined that the number of training rounds of the second detection model reaches the second preset round number threshold, no matter whether the action is performed or not, it is judged whether the detection loss change rate of the second detection model is lower than the second preset Set the loss threshold", proceed directly to step S103B6; moreover, it is only determined that the detection loss change rate of the second detection model is not lower than the second preset loss threshold, and the number of training rounds of the second detection model does not reach the second preset Step S103B7 is executed only when the threshold of the number of rounds is set.

It should be noted that the embodiment of this application does not limit the action "judging whether the detection loss change rate of the second detection model is lower than the second preset loss threshold" and the action "whether the number of training rounds of the second detection model reaches the second predetermined loss threshold". Set the execution sequence of the number of rounds threshold. You can execute the actions “judge whether the detection loss change rate of the second detection model is lower than the second preset loss threshold” and the action “whether the number of training rounds of the second detection model reach the second preset” in sequence. Set the number of rounds threshold", you can also perform the actions "whether the number of training rounds of the second detection model reaches the second preset round number threshold" and the action "judge whether the detection loss change rate of the second detection model is lower than the second preset" Loss threshold", you can also perform the two actions at the same time.

S103B6: Determine that the second detection model meets a second preset condition.

S103B7: Determine that the second detection model does not meet the second preset condition.

It should be noted that there is no fixed execution order between step S103B1 and step S103B2. Step S103B1 and step S103B2 can be executed in sequence, step S103B2 and step S103B1 can be executed in sequence, or step S103B1 and step S103B2 can be executed simultaneously.

The above is another implementation manner of step S103. In this implementation manner, the detection loss change rate of the second detection model is lower than the second preset loss threshold and the number of training rounds of the second detection model reaches the second preset round number threshold. As the training cut-off condition of the second detection model.

The above is the relevant content of step S103.

S104: Update the second detection model, and continue to perform step S102.

In the embodiment of the present application, when it is determined that the second detection model does not meet the second preset condition, the predicted position information of the target object on the image to be detected and the target object on the training image detected by the second detection model may be used. The difference between the actual position information of the target object, and the difference between the predicted key point position information of the target object and the actual key point position information of the target object, to update the second detection model, so that the updated second detection model detects The obtained predicted position information of the target object on the image to be detected is closer to the actual position information of the target object on the training image, and the predicted key point position information of the target object is closer to the actual key point position information of the target object, thereby This enables the updated second detection model to more accurately detect objects and their key points in the image.

S105: End the training process for the second detection model.

The foregoing is the specific implementation manner of the training process of the second detection model provided by method embodiment 5. In this implementation manner, the image features corresponding to the training image, the actual position information of the target object on the training image, and the target object’s information are acquired. After the actual key point location information, first use the second detection model to detect the target object and its key points according to the image features corresponding to the training image, and determine the predicted location information of the target object on the image to be detected and the prediction of the target object Key point position information; and then according to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the target object’s position information Predict the position information of the key point, and determine whether the second detection model meets the second preset condition, so as to determine whether to continue training the second detection model according to the judgment result. Among them, because the embodiment of this application is based on the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the target object’s position information. Predicting the four pieces of information of the key point location information to evaluate the detection effect of the second detection model makes the evaluation result more reasonable and effective, so that the detection effect of the second detection model that satisfies the second preset condition is better.

In addition, in order to further improve the accuracy of the evaluation of the detection effect of the second detection model, the embodiments of the present application also use three loss factors: the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object. To jointly supervise the detection effect of the second detection model, the evaluation result is more reasonable and effective, so that the detection effect of the second detection model that meets the second preset condition is better.

It should be noted that in the embodiment of the present application, the execution subject of the "method for determining objects and their key points in the image" and the execution subject of the "training process of the detection model" may be the same execution subject, or they may be different executions. The main body, the embodiment of this application does not specifically limit this. Among them, the "method for determining objects in images and their key points" can be executed by terminals, servers, vehicles, and other devices that can execute "methods for determining objects in images and their key points". The execution subject of the "training process of the detection model" can be terminals, servers, vehicles, and other devices capable of executing the "training process of the detection model".

Based on the specific implementation of the method for determining objects and their key points in an image provided by the foregoing method embodiments, an embodiment of the present application also provides a device for determining objects and their key points in an image. The following is an explanation and description with reference to the accompanying drawings. .

Device embodiment

For technical details of the device for determining the object and its key points in the image provided by the device embodiment, please refer to the above method embodiment.

Refer to FIG. 13, which is a schematic structural diagram of an apparatus for determining an object and its key points in an image provided by an embodiment of the application.

The device 130 for determining an object and its key points in an image provided by the embodiment of the present application includes:

The extraction unit 131 is configured to perform feature extraction on the image to be detected to obtain image features corresponding to the image to be detected;

The determining unit 132 is configured to determine the position information of the target object on the image to be detected and the key point position information of the target object according to the image feature corresponding to the image to be detected; wherein, the key point of the target object is used for Characterize the structural characteristics of the target object.

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, the determining unit 132 is specifically configured to: determine the position of the target object on the image to be detected according to the image characteristics corresponding to the image to be detected. Location information, and determine the key point location information of the target object according to the image feature corresponding to the image to be detected and the location information of the target object on the image to be detected.

As an implementation manner, in order to improve the accuracy of determining objects and their key points in the image, the determining unit 132 is specifically configured to:

As an implementation manner, in order to improve the accuracy of determining the objects and their key points in the image, the training process of the first detection model specifically includes:

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, the first preset condition is: the change rate of the detection loss of the first detection model is lower than the first preset loss threshold, and/ Or, the number of training rounds of the first detection model reaches the first preset round number threshold.

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, when the first preset condition includes that the rate of change of the detection loss of the first detection model is lower than the first preset loss threshold, According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object , Judging whether the first detection model meets the first preset condition, specifically including:

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, according to the actual position information of the target object on the training image, the actual key point position information of the target object, and the target object The predicted position information of the object on the image to be detected and the predicted key point position information of the target object to determine whether the first detection model meets the first preset condition specifically includes:

As an implementation manner, in order to improve the accuracy of determining the objects and their key points in the image, the training process of the second detection model specifically includes:

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, the second preset condition is: the change rate of the detection loss of the second detection model is lower than the second preset loss threshold, and/ Or, the number of training rounds of the second detection model reaches the second preset round number threshold.

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, when the second preset condition includes that the rate of change of the detection loss of the second detection model is lower than the second preset loss threshold, According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object , Judging whether the second detection model meets the second preset condition, specifically including:

As an implementation manner, in order to improve the accuracy of determining the object and its key points in the image, according to the actual position information of the target object on the training image, the actual key point position information of the target object, and the target object The predicted position information of the object on the image to be detected and the predicted key point position information of the target object to determine whether the second detection model meets the second preset condition specifically includes:

The above is the specific implementation of the device 130 for determining the object and its key points in the image provided by the device embodiment. In this implementation, after the image feature corresponding to the image to be detected is extracted from the image to be detected, it is directly based on the The image feature corresponding to the image to be detected determines the position information of the target object on the image to be detected and the key point position information of the target object. Among them, because the position information of the target object on the image to be detected can accurately represent the position of the target object in the image to be detected, and the key point position information of the target object can accurately represent the key point position of the target object in the image to be detected, The method for determining the object and its key points in the image provided by the embodiments of the application can effectively determine the position information of the target object and the position information of its key points from the image to be detected, so as to accurately detect the object and the key point in the image. Its key point. In addition, since the image feature corresponding to the image to be detected is obtained, the location information of the target object on the image to be detected and the key point location information of the target object can be determined directly based on the image feature corresponding to the image to be detected. The process of restoring the position information of the target object on the image to be detected into an object detection image and performing feature extraction on the object detection image, so that the position information of the target object on the image to be detected and the key point position information of the target object are determined Only one image feature extraction process is required in the process, which simplifies the process of determining objects and their key points in the image, improves the efficiency of determining objects and their key points in the image, and reduces the determination of objects and their key points in the image Memory loss at the time.

Based on the specific implementation of the method for determining the object and its key points in the image provided by the foregoing method embodiment, an embodiment of the present application also provides a device, which will be explained and described below with reference to the accompanying drawings.

Device embodiment

For details of the device technology provided by the device embodiment, please refer to the above method embodiment.

Refer to FIG. 14, which is a schematic diagram of the device structure provided by an embodiment of the application.

The device 140 provided in the embodiment of the present application includes a processor 141 and a memory 142:

The memory 142 is used to store computer programs;

The processor 141 is configured to execute any implementation manner of the method for determining an object and its key points in an image provided by the foregoing method embodiment according to the computer program. In other words, the processor 141 is configured to perform the following steps:

The above is the relevant content of the device 140 provided in the embodiment of the application.

Based on the method for determining objects and their key points in an image provided by the foregoing method embodiments, an embodiment of the present application also provides a computer-readable storage medium.

Media Examples

For technical details of the computer-readable storage medium provided by the media embodiment, please refer to the foregoing method embodiment.

The embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the method for determining an object and its key points in an image provided by the foregoing method embodiment Any embodiment of. That is, the computer program is used to perform the following steps:

The above is the relevant content of the computer-readable storage medium provided by the embodiments of the application.

It should be understood that in this application, "at least one (item)" refers to one or more, and "multiple" refers to two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A, only B, and both A and B , Where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a). For example, at least one of a, b, or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ", where a, b, and c can be single or multiple.

The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed as above in preferred embodiments, it is not intended to limit the present invention. Anyone familiar with the art, without departing from the scope of the technical solution of the present invention, can use the methods and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it into equivalent changes. Examples. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments without departing from the technical solution of the present invention based on the technical essence of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

A method for determining objects and their key points in an image is characterized in that they include:

Performing feature extraction on the image to be detected to obtain image features corresponding to the image to be detected;

According to the image features corresponding to the image to be detected, the position information of the target object on the image to be detected and the position information of the key points of the target object are determined; wherein, the key points of the target object are used to characterize the Structure.
The method according to claim 1, wherein the determining the position information of the target object on the image to be detected and the key point position information of the target object according to the image characteristics corresponding to the image to be detected is specifically :

Determine the location information of the target object on the image to be detected according to the image feature corresponding to the image to be detected, and determine the target object based on the image feature corresponding to the image to be detected and the location information of the target object on the image to be detected The key point position information of the object.
The method according to claim 2, wherein the position information of the target object on the image to be detected is determined according to the image feature corresponding to the image to be detected, and the image feature and the image feature corresponding to the image to be detected are determined. The location information of the target object on the image to be detected and the key point location information of the target object specifically include:

According to the image features corresponding to the image to be detected, use a pre-built first detection model to detect the target object and its key points, and determine the position information of the target object on the image to be detected and the key point position information of the target object;

Wherein, the first detection model includes a first object position detection network layer and a first key point position detection network layer, and the output result of the first object position detection network layer is the first key point position detection network layer The first object position detection network layer is used to determine the position information of the target object on the image to be detected according to the image features corresponding to the image to be detected, and the first key point position detection network layer is used to According to the image feature corresponding to the image to be detected and the position information of the target object on the image to be detected, the key point position information of the target object is determined.
The method according to claim 3, wherein the training process of the first detection model specifically includes:

Obtain the image features corresponding to the training image, the actual position information of the target object on the training image, and the actual key point position information of the target object;

According to the image features corresponding to the training image, use the first detection model to detect the target object and its key points, and determine the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object;

According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, Judging whether the first detection model meets the first preset condition;

If it is determined that the first detection model does not meet the first preset condition, update the first detection model, and continue to execute "Using the first detection model to detect the target object and its key points according to the image features corresponding to the training image , Determine the predicted location information of the target object on the image to be detected and the predicted key point location information of the target object".
The method according to claim 4, wherein the first preset condition is: the rate of change of the detection loss of the first detection model is lower than the first preset loss threshold, and/or the detection loss of the first detection model The number of training rounds reaches the first preset round number threshold.
The method according to claim 5, wherein when the first preset condition includes that the rate of change of the detection loss of the first detection model is lower than a first preset loss threshold, the The actual position information on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, determine the first detection Whether the model meets the first preset condition, including:

According to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected, the detection loss of the target object is determined; wherein the detection loss of the target object includes the recognition loss of the target object and the target Loss of object position detection;

Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object;

Determine the detection loss of the first detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object;

According to the detection loss of the first detection model and the historical detection loss of the first detection model, the detection loss change rate of the first detection model is determined; the historical detection loss of the first detection model is the first determined in the historical training process. 1. Detection loss of the detection model;

If the detection loss change rate of the first detection model is lower than the first preset loss threshold, determining that the first detection model meets the first preset condition;

If the detection loss change rate of the first detection model is not lower than the first preset loss threshold, it is determined that the first detection model does not meet the first preset condition.
The method according to claim 4, characterized in that, according to the actual position information of the target object on the training image, the actual key point position information of the target object, and the position information of the target object on the image to be detected The predicted position information and the predicted key point position information of the target object, and judging whether the first detection model meets the first preset condition, specifically include:

According to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected, the detection loss of the target object is determined; wherein the detection loss of the target object includes the recognition loss of the target object and the target Loss of object position detection;

Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object;

Determine the detection loss of the first detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object;

According to the detection loss of the first detection model and the historical detection loss of the first detection model, the detection loss change rate of the first detection model is determined; the historical detection loss of the first detection model is the first determined in the historical training process. 1. Detection loss of the detection model;

If the detection loss change rate of the first detection model is lower than the first preset loss threshold, or the number of training rounds of the first detection model reaches the first preset number of rounds threshold, it is determined that the first detection model reaches the first Precondition

If the detection loss change rate of the first detection model is not lower than the first preset loss threshold, and the number of training rounds of the first detection model does not reach the first preset number of rounds threshold, it is determined that the first detection model is not The first preset condition is reached.
The method according to claim 1, wherein the determining the position information of the target object on the image to be detected and the key point position information of the target object according to the image characteristics corresponding to the image to be detected is specifically :

According to the image features corresponding to the image to be detected, the position information of the target object on the image to be detected and the position information of key points in each image subregion on the image to be detected are determined, and according to the position of the target object on the image to be detected The position information and the key point position information in each image sub-region on the image to be detected determine the key point position information of the target object; the image sub-region is the region of interest corresponding to the anchor point.
The method according to claim 8, wherein the position information of the target object on the image to be detected and the key points in each image subregion on the image to be detected are determined according to the image characteristics corresponding to the image to be detected Position information, and determine the key point position information of the target object according to the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected, specifically:

According to the image features corresponding to the image to be detected, use a pre-built second detection model to detect the target object and its key points, and determine the position information of the target object on the image to be detected and the key point position information of the target object;

Wherein, the second detection model includes a second object position detection network layer, a second key point position detection network layer, and an object key point position determination network layer, and the output result of the second object position detection network layer and the The output result of the second key point position detection network layer is the input data of the object key point position determination network layer, and the second object position detection network layer is used to determine the target object according to the image feature corresponding to the image to be detected Position information on the image to be detected; and the second key point position detection network layer is used to determine the position information of key points in each image subregion on the image to be detected according to the image characteristics corresponding to the image to be detected; and The object key point position determination network layer is used to determine the key point position of the target object according to the position information of the target object on the image to be detected and the key point position information in each image subregion of the image to be detected information.
The method according to claim 9, wherein the training process of the second detection model specifically includes:

Obtain the image features corresponding to the training image, the actual position information of the target object on the training image, and the actual key point position information of the target object;

According to the image features corresponding to the training image, use the second detection model to detect the target object and its key points, and determine the predicted position information of the target object on the image to be detected and the predicted key point position information of the target object;

According to the actual position information of the target object on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, Judging whether the second detection model meets a second preset condition;

If it is determined that the second detection model does not meet the second preset condition, update the second detection model, and continue to execute "Using the second detection model to detect the target object and its key points according to the image features corresponding to the training image , Determine the predicted location information of the target object on the image to be detected and the predicted key point location information of the target object".
The method according to claim 10, wherein the second preset condition is: the rate of change of the detection loss of the second detection model is lower than the second preset loss threshold, and/or, the detection loss of the second detection model The number of training rounds reaches the second preset round number threshold.
The method according to claim 11, wherein when the second preset condition includes that the rate of change of the detection loss of the second detection model is lower than a second preset loss threshold, the The actual position information on the training image, the actual key point position information of the target object, the predicted position information of the target object on the image to be detected, and the predicted key point position information of the target object, determine the second detection Whether the model meets the second preset condition, including:

According to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected, the detection loss of the target object is determined; wherein the detection loss of the target object includes the recognition loss of the target object and the target Loss of object position detection;

Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object;

Determine the detection loss of the second detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object;

According to the detection loss of the second detection model and the historical detection loss of the second detection model, the detection loss change rate of the second detection model is determined; the historical detection loss of the second detection model is the first determined in the historical training process. 2. Detection loss of the detection model;

If the detection loss change rate of the second detection model is lower than a second preset loss threshold, determining that the second detection model meets the second preset condition;

If the detection loss change rate of the second detection model is not lower than the second preset loss threshold, it is determined that the second detection model does not meet the second preset condition.
The method according to claim 10, characterized in that, according to the actual position information of the target object on the training image, the actual key point position information of the target object, and the position of the target object on the image to be detected The predicted position information and the predicted key point position information of the target object, and determining whether the second detection model meets the second preset condition specifically includes:

According to the actual position information of the target object on the training image and the predicted position information of the target object on the image to be detected, the detection loss of the target object is determined; wherein the detection loss of the target object includes the recognition loss of the target object and the target Loss of object position detection;

Determine the key point position detection loss of the target object according to the actual key point position information of the target object and the predicted key point position information of the target object;

Determine the detection loss of the second detection model according to the recognition loss of the target object, the position detection loss of the target object, and the key point position detection loss of the target object;

According to the detection loss of the second detection model and the historical detection loss of the second detection model, the detection loss change rate of the second detection model is determined; the historical detection loss of the second detection model is the first determined in the historical training process. 2. Detection loss of the detection model;

If the detection loss change rate of the second detection model is lower than the second preset loss threshold, or the number of training rounds of the second detection model reaches the second preset number of rounds threshold, it is determined that the second detection model reaches the second Precondition

If the detection loss change rate of the second detection model is not lower than the second preset loss threshold, and the number of training rounds of the second detection model does not reach the second preset number of rounds threshold, it is determined that the second detection model is not The second preset condition is reached.
A device for determining objects and their key points in an image is characterized in that they include:

An extraction unit, configured to perform feature extraction on the image to be detected to obtain image features corresponding to the image to be detected;

The determining unit is configured to determine the position information of the target object on the image to be detected and the key point position information of the target object according to the image feature corresponding to the image to be detected; wherein, the key point of the target object is used to characterize The structural characteristics of the target object.
A device, characterized in that the device includes a processor and a memory:

The memory is used to store a computer program;

The processor is configured to execute the method according to any one of claims 1-13 according to the computer program.
A computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the method according to any one of claims 1-13.