CN114332977A

CN114332977A - Key point detection method and device, electronic equipment and storage medium

Info

Publication number: CN114332977A
Application number: CN202111199996.XA
Authority: CN
Inventors: 龚琛婷; 谭啸; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2022-04-12

Abstract

The disclosure provides a key point detection method and device, electronic equipment and a storage medium, relates to the field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be particularly used in smart cities and intelligent traffic scenes. The specific implementation scheme is as follows: acquiring an image to be detected, wherein the image to be detected comprises at least one target to be detected; acquiring a key point thermodynamic diagram corresponding to an image to be detected; acquiring candidate key points of a target to be detected based on the key point thermodynamic diagram; and performing coordinate regression on the candidate key points to obtain target key points of the target to be detected. According to the method and the device, the accuracy of detecting the key points of the vehicle is further improved through coordinate regression on the coordinate information output by the thermodynamic diagram, so that the output key point information of the vehicle has good space generalization capability and high positioning accuracy.

Description

Key point detection method and device, electronic equipment and storage medium

Technical Field

The utility model relates to an artificial intelligence field especially relates to computer vision and deep learning technical field, specifically can be used to under wisdom city and the intelligent traffic scene.

Background

In an image recognition scene, the state information of the target object can be further recognized by acquiring the key points of the target object in the image, and information support is provided for various applications. However, in the related art, the detection of the key points is basically performed through a neural network, and the detection precision is not high enough, so that accurate key point information cannot be provided.

Disclosure of Invention

The disclosure provides a key point detection method, a key point detection device, an electronic device, a storage medium and a computer program product.

According to an aspect of the present disclosure, there is provided a keypoint detection method, including:

acquiring an image to be detected, wherein the image to be detected comprises at least one target to be detected;

acquiring a key point thermodynamic diagram corresponding to an image to be detected;

acquiring candidate key points of a target to be detected based on the key point thermodynamic diagram;

and performing coordinate regression on the candidate key points to obtain target key points of the target to be detected.

According to the embodiment of the invention, the accuracy of the detection of the key points of the vehicle is further improved through coordinate regression on the coordinate information output by the thermodynamic diagram, so that the output key point information of the vehicle has good space generalization capability and high positioning accuracy.

According to another aspect of the present disclosure, there is provided a keypoint detection apparatus comprising:

the image acquisition module is used for acquiring an image to be detected, wherein the image to be detected comprises at least one target to be detected;

the thermodynamic diagram module is used for acquiring a key point thermodynamic diagram corresponding to an image to be detected;

the candidate key point module is used for acquiring candidate key points of the target to be detected based on the key point thermodynamic diagram;

and the coordinate regression module is used for carrying out coordinate regression on the candidate key points to obtain target key points of the target to be detected.

According to another aspect of the present disclosure, there is provided an electronic device comprising at least one processor, and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of keypoint detection of an embodiment of the first aspect of the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the keypoint detection method of the first aspect of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the keypoint detection method of an embodiment of the first aspect of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of a keypoint detection method according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of a keypoint detection method according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a keypoint detection model;

FIG. 4 is a flow diagram of a keypoint detection method according to one embodiment of the present disclosure;

FIG. 5 is a block diagram of a keypoint detection apparatus according to one embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device for implementing the keypoint detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to facilitate understanding of the present disclosure, the technical field to which the present disclosure relates will first be briefly explained below.

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge map technology and the like.

Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire 'information' from images or multidimensional data. The information referred to herein refers to information defined by Shannon that can be used to help make a "decision". Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make an artificial system "perceive" from images or multidimensional data.

Deep learning is a new research direction in the field of machine learning, and is introduced into machine learning to make it closer to the original target, artificial intelligence. Deep learning is the intrinsic law and representation hierarchy of learning sample data, and information obtained in the learning process is very helpful for interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds. Deep learning is a complex machine learning algorithm, and achieves the effect in speech and image recognition far exceeding the prior related art.

The smart city is a new theory and a new mode for promoting the intellectualization of city planning, construction, management and service by applying new generation information integration technologies such as internet of things, cloud computing, big data, space geographic information integration and the like.

The intelligent transportation is a comprehensive transportation management technology which is established by effectively integrating and applying advanced information technology, data communication transmission technology, electronic sensing technology, control technology, computer technology and the like to the whole ground transportation management system, plays a role in a large range in all directions, and is real-time, accurate and efficient.

Fig. 1 is a flowchart of a keypoint detection method according to one embodiment of the present disclosure, as shown in fig. 1, the method comprising the steps of:

s101, obtaining an image to be detected, wherein the image to be detected comprises at least one target to be detected.

In the embodiment of the disclosure, the image to be detected may be a pre-acquired image or a real-time acquired image, and the image includes at least one object to be detected. Alternatively, the image to be detected may be a color image or a grayscale image.

Optionally, the object to be detected is an object to be detected, which may be a vehicle, a gesture, a human face, or other objects to be detected.

In some implementations, vehicle images are collected based on a camera on a road, and key point detection is performed on vehicles on the road for determining whether the vehicles have violation behaviors and collecting information of the violation vehicles.

In other implementations, a camera on the mobile phone is used to collect a face image, and key point detection is performed on the face to determine whether the face in the image is the person.

In other implementations, gesture images are collected based on a camera on the mobile phone, and key point detection is performed on the gestures to identify operation information represented by the gestures, so that the mobile phone is operated.

And S102, acquiring a key point thermodynamic diagram corresponding to the image to be detected.

After the image to be detected is obtained, feature extraction can be carried out on the image to be detected, a feature map of the image to be detected is obtained, after the feature map is obtained, key point detection can be carried out on the basis of the feature map, and a key point thermodynamic diagram corresponding to the image to be detected is obtained on the basis of the generated key points.

Optionally, the feature map mainly includes color features, texture features, shape features and spatial relationship features of the image. Optionally, feature extraction is performed on the image to be detected based on the feature extraction network to generate the feature map of the image to be detected, for example, efficientnet-B1 may be used as the feature extraction network, or SENet may also be used as the feature extraction network, and the like, which is not limited in this disclosure.

The key point thermodynamic diagram reflects the position of the feature point which can represent the target most in the targets to be detected, for example, the feature point of the vehicle is the grounding point of four tires and the central point of the vehicle, and the feature point of the human face is the position point of facial features. In the thermodynamic diagram, the positions closer to the feature points are darker and brighter in color.

S103, acquiring candidate key points of the target to be detected based on the key point thermodynamic diagram.

Each pixel point on the key point thermodynamic diagram has a heat value, the heat value of some pixel points is larger, and the heat value of some pixel points is smaller. In order to screen a plurality of candidate key points from a plurality of key points and obtain more accurate key point coordinates, non-maximum suppression needs to be performed on the key points, a local maximum heat force value is identified from a key point thermodynamic diagram, and the key point corresponding to the local maximum heat force value is determined as the candidate key point.

The local size range can be determined and adjusted according to the coordinate accuracy of the finally obtained key point.

And S104, performing coordinate regression on the candidate key points to obtain target key points of the target to be detected.

In order to improve the identification accuracy of the key points, after the candidate key points are obtained, coordinate regression needs to be performed on the candidate key points. The offset of the candidate key point can be obtained through coordinate regression, the offset can reflect the error between the candidate key point and the corresponding real key point, and the target key point of the target to be detected can be obtained based on the offset.

In the embodiment of the disclosure, an image to be detected is obtained, wherein the image to be detected includes at least one target to be detected, a key point thermodynamic diagram corresponding to the image to be detected is obtained, candidate key points of the target to be detected are obtained based on the key point thermodynamic diagram, coordinate regression is performed on the candidate key points, and target key points of the target to be detected are obtained. According to the embodiment of the invention, the accuracy of the detection of the key points of the vehicle is further improved through coordinate regression on the coordinate information output by the thermodynamic diagram, so that the output key point information of the vehicle has good space generalization capability and high positioning accuracy.

Fig. 2 is a flowchart of a keypoint detection method according to an embodiment of the present disclosure, and on the basis of the above embodiment, with further reference to fig. 2, a process of performing coordinate regression on candidate keypoints to obtain target keypoints of an object to be detected is explained, including the following steps:

and S201, performing coordinate regression on the candidate key points, and determining the offset of the candidate key points.

And inputting the candidate key points into a coordinate regression network, wherein the coordinate regression network comprises a full connection layer, and each node of the full connection layer is connected with all nodes of the previous layer and is used for integrating the extracted features. And carrying out full connection operation by the coordinate regression network, and outputting the offset corresponding to the candidate key points.

And S202, adding the candidate key points and the offset to obtain target key points.

The offset is an error between the candidate key point coordinates and the real coordinates, and the coordinates of the candidate key points and the offset are added, so that the coordinates of the candidate key points can be corrected, and more accurate target key points can be obtained.

In the embodiment of the disclosure, coordinate regression is performed on the candidate key points, the offset of the candidate key points is determined, and the candidate key points and the offset are added to obtain the target key points. In the embodiment of the disclosure, the offset between the candidate key point and the true value is obtained through coordinate regression, and the precision of key point detection is further improved.

On the basis of the above embodiment, the process of obtaining the keypoint thermodynamic diagram corresponding to the image to be detected includes: inputting the image to be detected into a thermodynamic diagram generation network, extracting a feature diagram of the image to be detected by the thermodynamic diagram generation network, and predicting a key point thermodynamic diagram based on the feature diagram. Optionally, the thermodynamic diagram generating network may be a convolutional neural network, and the convolutional neural network includes a feature extraction layer, and the feature extraction layer extracts the feature diagram of the image to be detected, where the feature extraction layer may include a plurality of convolutional layers. And continuously performing convolution processing after the characteristic diagram is obtained to generate a key point thermodynamic diagram of the image to be detected.

On the basis of the above embodiment, the key point detection model may include: the thermodynamic diagram generates a network and a coordinate regression network, as shown in fig. 3. The thermodynamic diagram generation network is a convolutional neural network and can comprise a feature extraction layer and an activation layer, and the coordinate regression network is a fully connected layer.

The thermodynamic diagram generation network identifies the positions of the targets to be detected and the targets to be detected in the map according to the shape characteristics and the spatial relationship characteristics in the characteristic map, and further identifies the detection frames of the targets to be detected.

The detection frame can be used for framing the targets to be detected completely, one target to be detected corresponds to one detection frame, and the size of the detection frame can be changed according to the size of the target to be detected, so that the detection frame cannot contain too much content of the targets which are not to be detected. Typically, the detection frame is rectangular or square.

Because the whole method is based on a single target, before generating the thermodynamic diagram, a detection frame is needed to be obtained firstly, the detection frame only comprises one target to be detected, and then the thermodynamic diagram is predicted in the detection frame. If the image contains a plurality of targets to be detected, optionally, a target to be detected of which key points are desired to be obtained can be selected from the image, and if a plurality of targets are selected, after the target key point of one target is obtained, a detection frame of another target is obtained to generate a thermodynamic diagram, so that the target key points are generated.

The pixel points with large response values have more bright colors, the smaller the response values are, and the colors are sequentially decreased from warm to cold according to the sequence in the chromatogram. In the area where the response value is particularly weak, the color block may not be superimposed.

Optionally, the magnitude of the response value may be determined according to the distance between the pixel point in the response region and the center point of the detection frame, where the pixel point close to the center point of the detection frame has a large response value and the pixel point far from the center point of the detection frame has a small response value.

Optionally, the magnitude of the response value may be determined according to the feature point of the target to be detected in an actual situation, where the response value of the pixel point close to the feature point is large, and the response value of the pixel point far from the feature point is small. The target to be detected has at least one characteristic point, and the characteristic point can be determined according to the type of the target to be detected. For example, the characteristic point of the vehicle may be a ground contact point of four tires of the vehicle and a center point of the vehicle, and the characteristic point of the hand may be a joint point of the hand.

When the vehicle key point detection is carried out, the image to be detected comprises the vehicle, and the output key points comprise the grounding points of four tires of the vehicle and the central point of the vehicle.

When the face key point detection is carried out, the image to be detected comprises the face, and the output key points comprise position points of five sense organs and contour points of the face.

When the gesture key point detection is carried out, the image to be detected comprises a gesture, and the output key points comprise joint points of a hand and a central point of a palm.

In the embodiment of the disclosure, a key point detection model is constructed, so that the key point detection process is more convenient and streamlined.

FIG. 4 is a flow chart of a method of training a keypoint detection model according to the present disclosure. On the basis of the above embodiment, further referring to fig. 4, the training process of the keypoint detection model is explained, which includes the following steps:

s401, obtaining a sample image and an annotation key point of the sample image.

And acquiring the real coordinates of the key points of the to-be-detected target in the to-be-detected image and the image of the sample as the labeling key points.

S402, inputting the sample image into a thermodynamic diagram generation network, and extracting a sample characteristic diagram of the sample image.

And S403, predicting a sample prediction thermodynamic diagram of the sample image based on the sample feature diagram.

And S404, acquiring sample candidate key points of the sample image through the sample prediction thermodynamic diagram.

S405, inputting the candidate key points of the sample into a coordinate regression network for coordinate regression, and obtaining the predicted key points of the sample image.

Specific implementation of steps S402 to S405 may refer to related descriptions in the embodiments of the present disclosure, and details are not described herein.

S406, obtaining the prediction key points and the labeling key points of the sample image, and determining a loss function.

After the predicted key point is obtained, the root mean square error between the predicted key point and the corresponding labeled key point coordinate can be obtained and used as a loss function of the key point detection model.

The root mean square error is calculated by first squaring, then averaging, and then squaring, the magnitude of which may account for the degree of dispersion between the data. And obtaining the root mean square error between the predicted key point and the corresponding marked key point coordinate, wherein the larger the error is, the larger the non-coincidence degree between the predicted key point and the marked key point is proved to be.

S407, adjusting the key point detection model based on the loss function, and returning to continue training until the training ending condition is met to finish training so as to generate a final key point detection model.

And adjusting and optimizing parameters of the constructed key point detection model based on the loss function, and returning to use the next sample image and the labeled key point of the sample image to continue training the adjusted key point detection model until the training end condition is met.

Alternatively, when the error is stable or converged, the keypoint detection model may be considered to reach the training set number of times, and the training end condition is satisfied. Optionally, when the training duration reaches a set value, it may also be considered that the key point detection model reaches the set training times, and the training end condition is satisfied.

The lower the value of the obtained loss function is, the better the performance of the constructed key point detection model is proved to be. And finishing training until an output result consistent with the labeled key point value is obtained, and generating a final key point detection model.

In the embodiment of the disclosure, a sample image and labeled key points of the sample image are obtained, the sample image is input into a feature extraction network, a sample feature map of the sample image is extracted, the sample feature map is input into a thermodynamic map generation network, a sample prediction thermodynamic map of the sample image is predicted, sample candidate key points of the sample image are obtained through the sample prediction thermodynamic map, the sample candidate key points are input into a coordinate regression network for coordinate regression, the predicted key points of the sample image are obtained, the predicted key points and the labeled key points of the sample image are obtained, a loss function is determined, a key point detection model is adjusted based on the loss function, training is returned and continued until a training end condition is met, and training is ended, so that a final key point detection model is generated. In the embodiment of the disclosure, the key point detection model is trained, so that the error of the key point detection model is reduced, the best model is obtained, and the accuracy of the key point coordinates is improved.

Fig. 5 is a block diagram of a key point detecting apparatus according to an embodiment of the present disclosure, and as shown in fig. 5, the key point detecting apparatus 500 includes:

an image obtaining module 510, configured to obtain an image to be detected, where the image to be detected includes at least one target to be detected;

the thermodynamic diagram module 520 is used for acquiring a key point thermodynamic diagram corresponding to an image to be detected;

a candidate key point module 530, configured to obtain candidate key points of the target to be detected based on the key point thermodynamic diagram;

and the coordinate regression module 540 is configured to perform coordinate regression on the candidate key points to obtain target key points of the target to be detected.

It should be noted that the foregoing explanation of the embodiment of the keypoint detection method is also applicable to the keypoint detection apparatus of the embodiment, and is not repeated herein.

Further, in a possible implementation manner of the embodiment of the present disclosure, the candidate keypoint module 530 is further configured to: and identifying a local maximum heat force value from the key point thermodynamic diagram, and determining the key point corresponding to the local maximum heat force value as a candidate key point.

Further, in a possible implementation manner of the embodiment of the present disclosure, the coordinate regression module 540 is further configured to: performing coordinate regression on the candidate key points, and determining the offset of the candidate key points; and adding the candidate key points and the offset to obtain target key points.

Further, in a possible implementation manner of the embodiment of the present disclosure, the thermodynamic diagram module 520 is further configured to: inputting the image to be detected into a thermodynamic diagram generation network, extracting a feature diagram of the image to be detected by the thermodynamic diagram generation network, and predicting a key point thermodynamic diagram based on the feature diagram.

Further, in a possible implementation manner of the embodiment of the present disclosure, the coordinate regression module 540 is further configured to: and inputting the candidate key points into a coordinate regression network, and carrying out full connection operation by the coordinate regression network so as to output the offset corresponding to the candidate key points.

Further, in a possible implementation manner of the embodiment of the present disclosure, the target to be detected included in the image to be detected is a vehicle to be detected.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the keypoint detection method. For example, in some embodiments, the keypoint detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the keypoint detection method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the keypoint detection method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A keypoint detection method, comprising:

acquiring a key point thermodynamic diagram corresponding to the image to be detected;

acquiring candidate key points of the target to be detected based on the key point thermodynamic diagram;

2. The method of claim 1, wherein the obtaining candidate keypoints for the vehicle based on the keypoint thermodynamic diagram comprises:

and identifying a local maximum heat force value from the key point thermodynamic diagram, and determining the key point corresponding to the local maximum heat force value as the candidate key point.

3. The method according to claim 1 or 2, wherein the performing coordinate regression on the candidate key points to obtain the target key points of the target to be detected comprises:

performing coordinate regression on the candidate key points, and determining the offset of the candidate key points;

and adding the candidate key points and the offset to obtain the target key points.

4. The method according to claim 1 or 2, wherein the acquiring of the keypoint thermodynamic diagram corresponding to the image to be detected comprises:

inputting the image to be detected into a thermodynamic diagram generation network, extracting a feature diagram of the image to be detected by the thermodynamic diagram generation network, and predicting the key point thermodynamic diagram based on the feature diagram.

5. The method according to claim 4, wherein the performing coordinate regression on the candidate key points to obtain the target key points of the target to be detected comprises:

and inputting the candidate key points into a coordinate regression network, and carrying out full connection operation by the coordinate regression network so as to output the offset corresponding to the candidate key points.

6. The method according to claim 1, wherein the object to be detected included in the image to be detected is a vehicle to be detected.

7. A keypoint detection device, comprising:

the thermodynamic diagram module is used for acquiring a key point thermodynamic diagram corresponding to the image to be detected;

and the coordinate regression module is used for carrying out coordinate regression on the candidate key points to obtain the target key points of the target to be detected.

8. The apparatus of claim 7, wherein the candidate keypoint module is further configured to:

9. The apparatus of claim 7 or 8, wherein the coordinate regression module is further to:

10. The apparatus of claim 7 or 8, wherein the thermodynamic diagram module is further configured to:

11. The apparatus of claim 10, wherein the coordinate regression module is further to:

12. The apparatus according to claim 7, wherein the object to be detected included in the image to be detected is a vehicle to be detected.

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.