CN111243011A

CN111243011A - Key point detection method and device, electronic equipment and storage medium

Info

Publication number: CN111243011A
Application number: CN201811446154.8A
Authority: CN
Inventors: 李磊; 刘庭皓; 王权; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2020-06-05

Abstract

The present disclosure relates to a method and an apparatus for detecting a key point, an electronic device, and a storage medium, wherein the method includes: acquiring a first image with a preset size and a position relation between each pixel point of the image to be detected and each pixel point of the first image according to the position of a target area in the image to be detected; inputting the first image into a detection network for processing to obtain an object detection result; and determining a second position of the key point of the target object in the image to be detected according to the first position and the position relation of the key point of the target object in the first image. According to the key point detection method of the embodiment of the disclosure, the first image of the target area of the image to be processed can be obtained, the first position of the target object in the first image is determined, the second position of the target object in the image to be processed is determined, the first position obtained in the first image does not depend on the positioning of the target area, and the obtaining precision of the first position can be improved.

Description

Key point detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting a keypoint, an electronic device, and a storage medium.

Background

In the related art, a target object may be searched for in an image and a key point of the target object may be located, but in the case where the target object is some object in a specific area, the location of the target object depends on the location of the specific area, and if the location of the target area deviates, the location deviation of the target object is large.

Disclosure of Invention

The disclosure provides a key point detection method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a keypoint detection method, including:

acquiring a first image with a preset size and a position relation between each pixel point of the image to be detected and each pixel point of the first image according to the position of a target area in the image to be detected, wherein the first image comprises the target area;

inputting the first image into a detection network for processing to obtain an object detection result, wherein the object detection result indicates a first position of a key point of a target object in the first image under the condition that the target area comprises the target object;

and determining a second position of the key point of the target object in the image to be detected according to the first position of the key point of the target object in the first image and the position relation.

According to the key point detection method of the embodiment of the disclosure, the first image of the target area of the image to be processed can be acquired, the first position of the target object in the first image is determined, the second position of the target object in the image to be processed is determined, the first position acquired in the first image does not depend on the positioning of the target area, the acquisition precision of the first position can be improved, and therefore the precision of the second position determined according to the first position is higher.

In a possible implementation manner, obtaining a first image with a preset size and a position relationship between each pixel point of the image to be detected and each pixel point of the first image according to a position of a target region in the image to be detected includes:

intercepting the target area in the image to be detected according to the position of the target area in the image to be detected to obtain a second image;

and carrying out scaling processing on the second image to obtain the first image with the preset size.

In this way, the first image including the target area can be obtained, and the detection accuracy can be improved in the detection process of the first image.

and determining the position relation between each pixel point of the image to be detected and each pixel point of the first image according to the position of the target area in the image to be detected and the first image.

In this way, the first position in the first phase can be converted into the second position in the second image by the positional relationship, and the accuracy of detection can be improved.

In a possible implementation manner, the position relationship includes a position transformation matrix between each pixel point of the image to be detected and each pixel point of the first image,

determining a second position of the key point of the target object in the image to be detected according to the first position of the key point of the target object in the first image and the position relationship, wherein the determining comprises:

and determining the second position according to the first position and the position transformation matrix.

In one possible implementation, in a case that the target object does not exist in the target area, the object detection result indicates that the target object does not exist.

In one possible implementation, the method further includes:

and carrying out target area detection processing on the image to be detected to obtain the position of the target area.

In one possible implementation, the method further includes:

training the detection network through a data set comprising a plurality of sample images, wherein one or more sample target areas are included in the sample images, and sample target objects in the sample target areas have keypoint labeling information.

In one possible implementation, training the detection network through a dataset including a plurality of sample images includes:

obtaining a third image with a preset size according to the position of a sample target area in the sample image;

preprocessing the third image to obtain a fourth image;

and training the detection network through at least one of the fourth image and the third image to obtain the trained detection network.

In one possible implementation, the target area is a human face area, and the target object is glasses.

In one possible implementation, the key points of the target object include a spectacle frame midpoint, a spectacle leg inflection point, an intersection of a spectacle frame and a lens, and a key point of a lens edge.

According to another aspect of the present disclosure, there is provided a keypoint device comprising:

the image detection device comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a first image with a preset size and the position relation between each pixel point of the image to be detected and each pixel point of the first image according to the position of a target area in the image to be detected, and the first image comprises the target area;

the processing module is used for inputting the first image into a detection network for processing to obtain an object detection result, wherein the object detection result indicates a first position of a key point of a target object in the first image under the condition that the target area comprises the target object;

and the determining module is used for determining the second position of the key point of the target object in the image to be detected according to the first position of the key point of the target object in the first image and the position relation.

In one possible implementation, the obtaining module is further configured to:

wherein the determination module is further configured to:

In one possible implementation, the apparatus further includes:

the detection module is used for carrying out target area detection processing on the image to be detected to obtain the position of the target area.

In one possible implementation, the apparatus further includes:

a training module, configured to train the detection network through a data set including a plurality of sample images, where the sample images include one or more sample target regions, and sample target objects in the sample target regions have keypoint annotation information.

In one possible implementation, the training module is further configured to:

preprocessing the third image to obtain a fourth image;

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the above-described keypoint detection method is performed.

According to another aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described keypoint detection method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a keypoint detection method according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a keypoint detection method according to an embodiment of the disclosure;

FIG. 3 shows a schematic diagram of an intercept process according to an embodiment of the disclosure;

FIG. 4 shows a schematic diagram of key points of eyewear in accordance with an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of a keypoint detection method according to an embodiment of the disclosure;

FIG. 6 shows a schematic diagram of an application of a keypoint detection method according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of a keypoint detection apparatus according to an embodiment of the disclosure;

FIG. 8 shows a block diagram of a keypoint detection apparatus according to an embodiment of the disclosure;

FIG. 9 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;

fig. 10 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flow chart of a keypoint detection method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:

in step S11, according to a position of a target region in an image to be detected, obtaining a first image of a preset size and a position relationship between each pixel point of the image to be detected and each pixel point of the first image, where the first image includes the target region;

in step S12, inputting the first image into a detection network for processing, and obtaining an object detection result, wherein the object detection result indicates a first position of a key point of a target object in a first image when the target object is included in the target area;

in step S13, determining a second position of the keypoint of the target object in the image to be detected according to the first position of the keypoint of the target object in the first image and the position relationship.

In one possible implementation, the key point detecting method may be performed by a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server by acquiring an image to be processed by a terminal device or an image capture device (e.g., a camera, etc.) and transmitting the image to be processed to the server.

In a possible implementation manner, the image to be detected may be an image including one or more target regions, the target regions are regions to be detected, and a target object may be detected in the target regions, for example, the target regions are human face regions, the target object is glasses, and for example, the target object may also be a five-sense organ, an ornament, or the like.

In a possible implementation manner, the position of the target region may be marked in the image to be detected, or the position of the target region in the image to be detected may be determined through the detection process.

Fig. 2 shows a flow chart of a keypoint detection method according to an embodiment of the present disclosure. As shown in fig. 2, the method further comprises:

in step S14, a target area detection process is performed on the image to be detected, and the position of the target area is obtained.

In one possible implementation, the position of the target region in the image to be detected may be obtained by a target region detection process. In an example, the target region is a face region, the image to be detected may include one or more face regions, the face in the image to be detected may be determined based on features such as geometric features or depth features of the face, and the position of the face region may be determined, and the face detection processing may be performed on the image to be detected by using a neural network such as a convolutional neural network, so as to obtain the position of the face region. The position of the target region may be the coordinates of the target region in the image to be detected, for example, in the image to be detected with an image resolution of 1024 × 768, it may be determined that the target region is a rectangular region with vertex coordinates of (100 ), (200, 100), (100, 200), (200 ). The present disclosure does not limit the method of detecting the target area.

In a possible implementation manner, the position of the target region in the image to be detected may also be determined by using a manual labeling method or the like, and the specific manner of determining the position of the target region is not limited in the present disclosure.

In a possible implementation manner, in step S11, a first image including the target region and a position relationship between each pixel point of the image to be detected and each pixel point of the first image may be acquired.

In a possible implementation manner, obtaining a first image with a preset size and a position relationship between each pixel point of the image to be detected and each pixel point of the first image according to a position of a target region in the image to be detected may include: intercepting the target area in the image to be detected according to the position of the target area in the image to be detected to obtain a second image; and carrying out scaling processing on the second image to obtain the first image with the preset size.

In a possible implementation manner, the position of the target area may be intercepted to obtain the second image. The second image is a portion of the image to be detected, and in the example, the image to be detected includes a plurality of target regions, and sizes of the target regions may be different from each other, and therefore, sizes of the plurality of second images obtained by the capturing may be different from each other, for example, the image to be detected includes two target regions, a first target region is a rectangular region having vertex coordinates (100 ), (200, 100), (100, 200), (200 ), and has a size of 100 × 100, and a second target region is a rectangular region having vertex coordinates (300 ), (300, 350), (350, 300), (350 ), and has a size of 50 × 50.

In an example, the target area is a face area, and when determining the target area, the proportion of the face in the target area may be preset, for example, the proportion of the face in the target area is preset to be 80%, so that in the second image, the proportion of the face in the target area is 80%. The area ratio may be determined by the ratio of the number of pixels in the face contour to the total number of pixels in the target region.

FIG. 3 shows a schematic diagram of intercept processing according to an embodiment of the disclosure. As shown in fig. 3, the target area in the image to be detected is a face area, the number of pixels in the outline of the face is 8000 in total, the occupied area ratio of the preset face is 80%, 10000 pixels of the face and the surrounding area can be selected as the target area (namely, the dotted area in the image to be detected), the target area is intercepted, a second image is obtained, the second image contains 10000 pixels, and the number of pixels in the face outline is 8000. If the image to be detected comprises a plurality of faces, the target area comprising each face can be intercepted through the method to obtain a plurality of second images, and the area ratio of each face in each second image is consistent, for example, 80%.

In one possible implementation, the second image may be scaled to obtain the first image with a preset size, for example, the detection network may receive the image with a size of 100 × 100, and the second images obtained by cutting may be scaled to obtain a plurality of first images with a size of 100 × 100, and each of the first images with a preset size may be used as an input image of the detection network.

In a possible implementation manner, a positional relationship between each pixel point of the image to be detected and each pixel point of the first image may be determined. Wherein, according to the position of the target region in the image to be detected, obtain the first image of the preset size and the position relation between each pixel point of the image to be detected and each pixel point of the first image, can include: and determining the position relation between each pixel point of the image to be detected and each pixel point of the first image according to the position of the target area in the image to be detected and the first image.

In an example, the first image is obtained by performing an intercepting process and a scaling process on a target area in the image to be detected, for example, a pixel point with a coordinate of (1, 1) in the first image is (100 ) in the image to be detected, a pixel point with a coordinate of (2, 1) in the first image is (102, 100) in the image to be detected, and the like. In an example, the position relationship includes a position transformation matrix between each pixel point of the image to be detected and each pixel point of the first image, and the position transformation matrix may be determined according to parameters such as a position of each pixel point of the target region in the image to be detected, a position and a scaling ratio of each pixel point in the first image, and the like.

In one possible implementation manner, in step S12, the first image may be input to the detection network for processing, and the object detection result is obtained. If the target area has the target object therein, the object detection result may indicate a first position of a key point of the target object in the first image; if the target object does not exist in the target area, the object detection result indicates that the target object does not exist.

In an example, the target area is a human face area and the target object is glasses. The detection network may process the first image to obtain an object detection result, and if glasses are worn on the face in the target area, the detection network may output a first position in the first image of a keypoint of a target object (i.e., glasses), including a frame midpoint, a leg inflection point, an intersection of the frame and the lens, and a keypoint of a lens edge (e.g., a frame of the glasses). If the glasses are not worn on the face, the object detection result indicates that the glasses are not present.

In one possible implementation, the detection network may be a neural network such as a BP neural network, a recurrent neural network, or a convolutional neural network, and the disclosure does not limit the type of the detection network. In an example, the detection network is a deep learning neural network, for example, a deep learning neural network having a multi-hierarchy structure (i.e., having a plurality of hidden layers), and the neurons of the input layer, the hidden layers, and the output layer of the detection network may be tree-connected, such as fully-connected or not fully-connected. The input layer of the detection network can input a first image, and the output layer of the detection network can output an object detection result after the processing of the hidden layer.

Fig. 4 shows a schematic diagram of key points of eyewear in accordance with an embodiment of the present disclosure. As shown in fig. 4, the key points of the eye may include: the spectacle frame comprises a spectacle frame middle point, an intersection point of the spectacle frame and a left lens, an intersection point of the spectacle frame and a right lens, a left spectacle leg inflection point, a right spectacle leg inflection point, 8 points of the left spectacle lens edge, the left eyeglass leg inflection point, the right eyeglass leg inflection point, the upper left edge, the lower left edge, the upper right edge, the lower right edge, the upper left edge and the lower left edge, and key points of the spectacle can be symmetrical according to the spectacle frame middle point.

In this way, a plurality of key points of the glasses can be determined, the position and the shape of the glasses can be accurately determined through less consumption of computing resources, and the processing efficiency is improved.

In a possible implementation manner, in step S13, the second position of the key point of the target object in the image to be detected may be determined according to the first position of the key point of the target object in the first image and the position relationship between each pixel point of the image to be detected and each pixel point of the first image. In an example, the second position of the key point of the target object in the image to be detected can be determined through a corresponding relationship between a pixel point in the target region of the first image and a pixel point in the first image, for example, a pixel point with a coordinate of (1, 1) in the first image is (100 ) in the image to be detected, a pixel point with a coordinate of (2, 1) in the first image is (102, 100) in the image to be detected, and the like, and the position coordinate (i.e., the second position) of each key point in the image to be detected can be determined according to the corresponding relationship.

In a possible implementation manner, the position relationship includes a position transformation matrix between each pixel point of the image to be detected and each pixel point of the first image, and step S13 includes: and determining the second position according to the first position and the position transformation matrix. In an example, the second position of the key point in the image to be detected can be determined according to the first position and the position transformation matrix, for example, the position transformation matrix can represent the position relationship between each pixel point in the image to be detected and each pixel point in the first image, the coordinate of the pixel point in the image to be detected and the position transformation matrix are subjected to matrix multiplication operation, the coordinates of the corresponding pixel points in the first image may be obtained, whereas, the inverse of the position transformation matrix may be determined, the coordinates of the pixel points in the first image may be matrix multiplied with the inverse of the position transformation matrix, the coordinates of the corresponding pixel points in the image to be detected can be obtained, and therefore, the coordinates of the key points and the inverse matrix of the position transformation matrix can be subjected to matrix multiplication, and the second position (i.e., the coordinates) of the key points in the image to be detected can be obtained.

In one possible implementation, the detection network may be trained prior to performing detection processing on the first image using the detection network.

Fig. 5 shows a flow chart of a keypoint detection method according to an embodiment of the present disclosure. As shown in fig. 5, the method further comprises:

in step S15, the detection network is trained by a dataset comprising a plurality of sample images, wherein the sample images comprise one or more sample target areas and sample target objects in the sample target areas have keypoint labeling information.

In one possible implementation, the detection network may be trained using a data set composed of a plurality of sample images, one or more sample target regions may be included in the sample images, sample target objects may be located in the sample target regions, and key points of the sample target objects may be manually labeled.

In an example, the sample image may be an image including one or more face regions, which are sample target regions, and in the one or more face regions, a face with glasses (i.e., a target object) may be included, and a face without glasses may be included, and if a certain face has glasses, key points of the glasses are labeled, for example, a spectacle frame midpoint of the glasses, a spectacle leg inflection point, an intersection of the spectacle frame and a lens, and key points of a lens edge are labeled. If some glasses worn by a certain face in the sample image are not shot in the sample image, only the key points in the sample image may be labeled, or the coordinates of the key points of the part not shot in the sample image may be determined according to the symmetry of the glasses, for example, the sample image is an image with a resolution of 1024 × 768, some glasses worn by the face near the boundary of the sample image are not shot in the sample image, for example, the intersection point of the right lens and the temple is not shot in the sample image, and the intersection point of the right lens and the temple may be determined to be (1050, 500) according to the symmetry of the eyes, and the intersection point is a point outside the sample image. In an example, if the portion of the eyewear that is not captured within the sample image is greater than half a lens, the eyewear may be marked as unworn. If the certain face does not wear the glasses, the face is marked as not wearing the glasses.

In one possible implementation, step S15 includes: obtaining a third image with a preset size according to the position of a sample target area in the sample image; preprocessing the third image to obtain a fourth image; and training the detection network through at least one of the fourth image and the third image to obtain the trained detection network.

In a possible implementation manner, the sample target area in the sample image may be subjected to the intercepting process and the scaling process, and a third image with a preset size may be obtained, for example, the face area may be intercepted, the proportion of the face in the sample target area may be preset, and after the sample target area is intercepted, the intercepted image may be scaled to obtain the third image with the preset size.

In a possible implementation manner, the third image may be preprocessed, for example, at least one of translation, rotation, RGB value or gray value changing preprocessing is performed, one or more fourth images may be obtained, and the fourth image obtained through translation, rotation, RGB value or gray value changing preprocessing is used to train the detection network, so that the error recognition capability of the detection network may be improved, and the robustness of the detection network may be improved.

In one possible implementation, the detection network may be trained using at least one of the third image and the fourth image. Either one of the third image and the fourth image may be input to the detection network, and a sample object detection result may be obtained, that is, if the detection network determines that the target object exists in the input image, the sample object detection result is the first position of the key point of the target object, and if the detection network determines that the target object does not exist in the input image, the sample detection result indicates that the target object does not exist. The sample object detection result is an output result of the detection network, and the output result may have an error, for example, a position error of a key point for identifying the target object, or an error for misrecognizing the target object.

In a possible implementation manner, the network loss of the detection network may be determined according to the labeling information and the output result. In an example, the annotation information and the output result can be compared, and a difference obtained by the comparison is determined as a network loss of the detection network. For another example, the cross entropy loss function of the detection network may be determined according to the label information and the output result, and the present disclosure does not limit the manner of determining the loss function. In an example, a regularized loss function can be used as the network loss of the detection network, and the situation that the network parameters of the detection network are over-fitted in the iterative training process can be avoided.

In one possible implementation, the network parameters of the detected network may be adjusted according to the network loss, and in an example, the network parameters may be adjusted in a direction that minimizes the network loss, so that the adjusted detected network has a higher goodness of fit while avoiding overfitting. In an example, a gradient descent method may be used to perform back propagation of network loss to adjust network parameters of the detection network, for example, for a detection network performing tree connection between neurons, a random gradient descent method may be used to adjust network parameters to reduce complexity of a process of adjusting network parameters, improve efficiency of adjusting network parameters, and avoid an over-fitting situation of the adjusted network parameters.

In a possible implementation manner, when the detection network meets the training condition, the training may be stopped, and the trained detection network is obtained. The training conditions may include the number of adjustments and the size of model losses or convergence properties. The preset number of images can be input into the detection network, the network parameters of the detection network are adjusted for the preset times, and when the adjustment times reach the preset times, the training condition is met. Alternatively, the number of times of adjustment is not limited, and when the network loss decreases to a certain extent or converges within a certain threshold, the adjustment is stopped, and the adjusted detection network is obtained. And the detection network may be used in determining the object detection result for the first image.

Fig. 6 shows an application diagram of the keypoint detection method according to an embodiment of the present disclosure. As shown in fig. 6, the target area in the image to be detected is two face areas, wherein the face on the upper left side does not wear glasses, and the face on the lower right side wears glasses.

In a possible implementation manner, the two target areas can be intercepted, the intercepted second image is zoomed, the two first images can be obtained, and a position transformation matrix between each pixel point of the image to be detected and each pixel point of the first image can be obtained.

In a possible implementation manner, two first images may be respectively input into the detection network, and the object detection result may be obtained, that is, the first image on the left side does not wear glasses, and in the first image on the right side, the first position of the key point of the glasses is (a, B) (C, D).

In one possible implementation, a first position (i.e., coordinate) of the eyewear keypoint in the first image may be matrix multiplied by an inverse of the position transformation matrix, and a second position (Q, W) (E, R.) of the eyewear keypoint in the image to be detected may be obtained.

According to the key point detection method disclosed by the embodiment of the disclosure, the first image of the target area of the image to be processed can be obtained, and the detection accuracy can be improved in the detection process of the first image. The detection network trained through the preprocessed images has strong recognition capability and robustness. Furthermore, the second position of the target object in the image to be processed can be determined through the position relation between each pixel point of the image to be detected and each pixel point of the first image, the first position obtained in the first image does not depend on the positioning of the target area, the obtaining precision of the first position can be improved, and therefore the precision of the second position determined according to the first position is higher.

Fig. 7 shows a block diagram of a keypoint detection apparatus according to an embodiment of the present disclosure, said apparatus comprising, as shown in fig. 7:

the acquiring module 11 is configured to acquire a first image with a preset size and a position relationship between each pixel point of the image to be detected and each pixel point of the first image according to a position of a target region in the image to be detected, where the first image includes the target region;

a processing module 12, configured to input the first image into a detection network for processing, and obtain an object detection result, where the object detection result indicates a first position of a key point of a target object in a first image when the target area includes the target object;

and the determining module 13 is configured to determine, according to the first position of the key point of the target object in the first image and the position relationship, a second position of the key point of the target object in the image to be detected.

In one possible implementation, the obtaining module 11 is further configured to:

wherein the determining module 13 is further configured to:

Fig. 8 shows a block diagram of a keypoint detection apparatus according to an embodiment of the present disclosure, which, as shown in fig. 8, further comprises:

the detection module 14 is configured to perform target area detection processing on an image to be detected, so as to obtain a position of the target area.

In one possible implementation, the apparatus further includes:

a training module 15, configured to train the detection network through a data set including a plurality of sample images, where the sample images include one or more sample target areas, and sample target objects in the sample target areas have keypoint annotation information.

In one possible implementation, the training module 15 is further configured to:

preprocessing the third image to obtain a fourth image;

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

In addition, the present disclosure also provides a key point detecting device, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the key point detecting methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 9 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 9, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 10 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 10, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of keypoint detection, the method comprising:

2. The method of claim 1, wherein obtaining a first image with a preset size and a position relationship between each pixel point of the image to be detected and each pixel point of the first image according to a position of a target region in the image to be detected comprises:

3. The method according to claim 1 or 2, wherein obtaining a first image with a preset size and a position relationship between each pixel point of the image to be detected and each pixel point of the first image according to a position of a target region in the image to be detected comprises:

4. The method according to any of claims 1-3, wherein the positional relationship comprises a positional transformation matrix between each pixel point of the image to be detected and each pixel point of the first image,

5. The method according to any of claims 1-4, wherein in case a target object is not present in the target area, the object detection result indicates that no target object is present.

6. The method according to any one of claims 1-5, further comprising:

7. The method according to any one of claims 1-6, further comprising:

8. A keypoint detection device, the device comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 7.

10. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.