CN114581730A

CN114581730A - Training method of detection model, target detection method, device, equipment and medium

Info

Publication number: CN114581730A
Application number: CN202210205615.2A
Authority: CN
Inventors: 杨黔生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-06-03

Abstract

The utility model provides a training method of a detection model, which relates to the technical field of artificial intelligence, in particular to the technical field of computer vision, image recognition, deep learning and augmented reality, and can be applied to smart cities and intelligent traffic scenes. The specific implementation scheme is as follows: inputting the sample image into a detection model to obtain an output result, wherein the output result comprises a relation graph of the sample image, and edges in the relation graph are used for representing relation information among a plurality of key points in the sample image; obtaining a difference value according to the label and the output result of the sample image; and training the detection model according to the difference value. The disclosure also provides a target detection method, a target detection device, an electronic device and a storage medium.

Description

Training method of detection model, target detection method, device, equipment and medium

Technical Field

The utility model relates to an artificial intelligence technical field especially relates to technical fields such as computer vision, image recognition, deep learning and augmented reality, can be applied to wisdom city and intelligent traffic scene. More specifically, the present disclosure provides a training method of a detection model, a target detection method, an apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence technology, deep learning models are widely applied to image recognition or object detection in scenes such as smart cities or intelligent transportation. When the target is detected by the deep learning model, a key point in the image can be detected, and for example, each joint of the target can be detected as a key point. As another example, a heat map of the image may be acquired to detect keypoints in the image. Based on the detected key points, the attitude estimation can be carried out in scenes such as smart cities or intelligent transportation.

Disclosure of Invention

The disclosure provides a training method of a detection model, a target detection method, a device, equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a training method of a detection model, the method including: inputting a sample image into a detection model to obtain an output result, wherein the output result comprises a relation graph of the sample image, and edges in the relation graph are used for representing relation information among a plurality of key points in the sample image; obtaining a difference value according to the label of the sample image and the output result; and training the detection model according to the difference value.

According to another aspect of the present disclosure, there is provided an object detection method, including: performing target detection on a target image to obtain a target detection result, wherein the target detection result is a relation graph of the target image, and edges in the relation graph are used for representing relation information among a plurality of key points in the target image; and determining target information of a plurality of key points in the target image according to the target detection result.

According to another aspect of the present disclosure, there is provided a training apparatus for detecting a model, the apparatus including: the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for inputting a sample image into a detection model to obtain an output result, the output result comprises a relation graph of the sample image, and edges in the relation graph are used for representing relation information among a plurality of key points in the sample image; the second obtaining module is used for obtaining a difference value according to the label of the sample image and the output result; and the training module is used for training the detection model according to the difference value.

According to another aspect of the present disclosure, there is provided an object detecting apparatus including: the target detection module is used for carrying out target detection on a target image to obtain a target detection result, wherein the target detection result comprises a relation graph of the target image, and edges in the relation graph are used for representing relation information among a plurality of key points in the target image; and the determining module is used for determining the target information of a plurality of key points in the target image according to the target detection result.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of a method of training a detection model according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of a method of training a detection model according to another embodiment of the present disclosure;

FIG. 3A is a schematic diagram of a plurality of keypoints in a sample image, according to one embodiment of the present disclosure;

3B-3D are schematic diagrams of relationship information between a plurality of keypoints, according to one embodiment of the present disclosure;

fig. 3E is a schematic diagram of a local relationship diagram for a sample image according to one embodiment of the present disclosure.

Fig. 4A is a schematic diagram of a heat map sub-tag in accordance with another embodiment of the present disclosure;

fig. 4B is a schematic diagram of a depth map sub-label according to another embodiment of the present disclosure;

FIG. 4C is a schematic diagram of a relationship graph sub-label, according to another embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a training method of a test model according to an embodiment of the present disclosure;

FIG. 6 is a flow diagram of a target detection method according to one embodiment of the present disclosure;

FIG. 7 is a block diagram of a training apparatus for detection models, according to one embodiment of the present disclosure;

FIG. 8 is a block diagram of an object detection device according to one embodiment of the present disclosure;

FIG. 9 is a block diagram of an electronic device to which a training method of a detection model and/or an object detection method may be applied, according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A target detection method based on heat map can intercept the area where the object is located from the image as an input image; extracting features from an input image to obtain input features of a CNN (Convolutional Neural Network) model; processing the input features using a CNN model to obtain at least one heat map; determining a key point in the heat map as a root node, and establishing a Star-Structure relationship graph or a Tree-Structure relationship graph; and determining target information for attitude estimation according to the relationship diagram and the heat map.

However, when a part of key points of an object in an image is occluded, it is difficult to estimate key points at a long distance based on the Star-Structure relationship diagram, and it is difficult to estimate the occluded key points based on the Tree-Structure relationship diagram.

FIG. 1 is a flow diagram of a method of training a detection model according to one embodiment of the present disclosure.

As shown in fig. 1, the method 100 may include operations S110 to S130.

In operation S110, the sample image is input to the detection model, and an output result is obtained.

For example, the output result includes a relationship graph of the sample image, and edges in the relationship graph are used for representing relationship information among a plurality of key points in the sample image.

For example, for each key point, the relationship graph includes a directed edge connecting each key point and a plurality of key points, and the directed edge may represent relationship information between the key points. As another example, a directed edge may be a directed edge where each keypoint points to multiple keypoints.

For example, the detection model may be a Hourglass model.

In operation S120, a difference value is obtained according to the label of the sample image and the output result.

For example, the label of the sample image may include a relationship graph sub-label. In one example, the label of the sample image includes true information for a plurality of keypoints in the sample image. Based on the real information of the key points, for each key point, a directed edge connecting each key point and the key points can be established to obtain the real relation information of each key point. Similarly, the real relationship information of other key points can be obtained to obtain the relationship graph sub-label of the sample image. From the relationship graph sub-label and the relationship graph described above, a loss value may be determined, based on which a difference value between the label and the output result may be determined.

In operation S130, a detection model is trained according to the difference value.

For example, parameters of the detection model may be adjusted according to the difference value to train the detection model.

Through the embodiment of the disclosure, the trained detection model can efficiently generate the relationship graph of the image so as to acquire more information related to the key points.

In some embodiments, the relationship information between the plurality of keypoints may comprise relationship information between at least two keypoints of the plurality of keypoints. For example, at least two keypoints may be selected from the plurality of keypoints in the relationship graph. For each key point in the at least two key points, the relationship graph includes a directed edge connecting each key point and the at least two key points, and the directed edge can represent relationship information between the key points. As another example, a directed edge may be a directed edge where each keypoint points to at least two keypoints.

In some embodiments, the output result further includes a heat map of the sample image and a depth map of the sample image, the tags including heat map sub-tags, depth map sub-tags, and relationship map sub-tags.

FIG. 2 is a flow diagram of a method of training a detection model according to another embodiment of the present disclosure.

As shown in fig. 2, the method 220 may obtain a difference value according to the label and the output result of the sample image. The following description will be made in conjunction with operations S221 to S224.

In operation S221, a first loss value is obtained according to the heat map and the heat map sub-tag.

For example, location information for 17 key points may be included in the heatmap.

For example, an LSE (Least Square Error) loss value between the heat map and the heat map sub-label may be calculated as the first loss value L1. In one example, LSE loss is calculated based on the L2 norm loss function.

In operation S222, a second loss value is obtained according to the depth map and the depth map sub-label.

In an embodiment of the present disclosure, for at least one key point in the depth map, at least one second sub-loss value is obtained according to the local depth map and the local depth map sub-label corresponding to each key point in the at least one key point.

For example, a local depth map sub-label is determined from each keypoint and the depth map sub-label.

For example, depth information for 17 key points may be included in the depth map.

For example, 17 partial depth maps may be divided out of the depth map, each partial depth map corresponding to one keypoint. Each local depth map may include a keypoint. Correspondingly, 17 partial depth map sub-labels are marked out from the depth map sub-labels, and each partial depth map sub-label corresponds to one key point.

For another example, an LAD (Least Absolute Deviation) loss value between each partial depth map and the partial depth map sub-label may be calculated as each second sub-loss value. In one example, the LAD loss value is calculated based on an L1 norm loss function.

In an embodiment of the present disclosure, the second loss value is obtained according to at least one second sub-loss value.

For example, the second loss value L _2 may be obtained according to the second sub-loss value corresponding to each key point in the 17 key points. In one example, a sum of 17 second sub-loss values may be calculated as the second loss value.

In operation S223, a third loss value is obtained according to the relationship diagram and the relationship diagram sub-label.

In the embodiment of the present disclosure, at least one third sub-loss value is obtained for at least one key point in the relationship graph according to the local relationship graph and the local relationship graph sub-label corresponding to each key point in the at least one key point.

For example, a local relationship graph sub-label is determined from each keypoint and the relationship graph sub-label.

For example, the relationship graph may include relationship information of 17 key points. In one example, one key point corresponds to 17 pieces of relationship information, where one piece of relationship information is the relationship information of the key point and itself. As described above, in a relationship graph, the relationship information of one key point and another key point can be represented by a directed edge.

For example, 17 local relationship graphs can be divided from the relationship graph, and each local relationship graph corresponds to one key point. The plurality of local relationship graphs may be uniform in size, and each local relationship graph may include a keypoint and at least one keypoint adjacent thereto. Accordingly, 17 local relationship graph sub-labels can be divided from the relationship graph sub-labels, and each local relationship graph sub-label corresponds to one key point.

For another example, the LAD loss between each local relationship graph and the local relationship graph sub-label may be calculated as each third sub-loss value.

In an embodiment of the disclosure, a third loss value is obtained according to at least one third sub-loss value.

For example, the third loss value L _3 may be obtained according to the third sub-loss value corresponding to each key point in the 17 key points. In one example, a sum of 17 third sub-loss values may be calculated as the third loss value.

In operation S224, a difference value is obtained according to the first loss value, the second loss value, and the third loss value.

In this embodiment of the present disclosure, a weighted summation may be performed according to the first loss value, the second loss value, and the third loss value, so as to obtain a difference value.

For example, the weighted sum may be performed by the following formula:

Diff＝w₁*L_1+w₂*L_2+w₃l _3 (formula one)

w₁Is the weight of the first penalty value, w₂Is the weight of the second penalty value, w₃Is the weight of the third penalty value. In one example, w₁＝0.8，w₂＝0.1，w₃＝0.1。

In some embodiments, unlike method 220, at least one keypoint in the depth map may be a portion of all keypoints in the depth map. For example, as described above, the depth information of 17 key points is included in the depth map. Any 15 keypoints can be selected from 17 keypoints. And obtaining 15 second sub-loss values according to the local depth map and the local depth map sub-label corresponding to each key point in the 15 key points. And obtaining a second loss value according to the 15 second sub-loss values.

In some embodiments, unlike method 220, at least one keypoint in the relationship graph may be a portion of all keypoints in the relationship graph. For example, as described above, the relationship graph includes the relationship information of 17 key points. 15 keypoints can be arbitrarily selected from 17 keypoints. And obtaining 15 third sub-loss values according to the local relationship graph and the local relationship graph sub-label corresponding to each key point in the 15 key points. And obtaining a third loss value according to the 15 third sub-loss values.

FIG. 3A is a schematic diagram of a plurality of keypoints in a sample image, according to one embodiment of the present disclosure.

As shown in fig. 3A, in the present embodiment, the sample image 301 includes 15 key points. Next, for simplicity, the relationship diagram of the sample image in the present disclosure is described in detail with reference to fig. 3B to 3E by taking 5 key points as examples. For example, the 5 keypoints are keypoint 310, keypoint 320, keypoint 330, keypoint 340, and keypoint 350. In one example, keypoint 340 may correspond to, for example, one ankle joint of the subject.

Fig. 3B through 3D are schematic diagrams of relationship information between a plurality of key points according to one embodiment of the present disclosure.

As shown in fig. 3B, for the keypoint 310, the relationship diagram includes: taking the key point 310 as a root node and taking the key point 320 as a directed edge of a target node; the key point 310 is taken as a root node, and the key point 330 is taken as a directed edge of a target node; taking the key point 310 as a root node and taking the key point 340 as a directed edge of a target node; the key point 310 is used as a root node, and the key point 350 is used as a directed edge of a target node. In one example, the relationship graph further includes a directed edge with the key point 310 as a root node and the key point 310 as a target node.

As shown in fig. 3C, for the keypoint 320, the relationship diagram includes: taking the key point 320 as a root node and taking the key point 310 as a directed edge of a target node; taking the key point 320 as a root node and taking the key point 330 as a directed edge of a target node; taking the key point 320 as a root node and taking the key point 340 as a directed edge of a target node; the key point 320 is used as a root node, and the key point 350 is used as a directed edge of a target node. In one example, the relationship graph further includes a directed edge with the key point 320 as a root node and the key point 320 as a target node.

As shown in fig. 3D, for the keypoint 330, the relationship diagram includes: the key point 330 is taken as a root node, and the key point 320 is taken as a directed edge of a target node; the key point 330 is taken as a root node, and the key point 310 is taken as a directed edge of a target node; taking the key point 330 as a root node and taking the key point 340 as a directed edge of a target node; the key point 330 is used as a root node, and the key point 350 is used as a directed edge of the target node. In one embodiment, the relationship graph further includes a directed edge with the key point 330 as a root node and the key point 330 as a target node.

For the keypoint 340 and the keypoint 350, the relationship graph includes a similar directed edge as that between the keypoint 310 described above and the 5 keypoints described above, and the description of the disclosure is omitted here.

As shown in fig. 3E, for the above-mentioned 5 key points, the relationship graph of the sample image includes directional edges between the 5 key points. The directed edge corresponding to keypoint 310 is shown in FIG. 3B, the directed edge corresponding to keypoint 320 is shown in FIG. 3C, and the directed edge corresponding to keypoint 330 is shown in FIG. 3D. In one example, the local relationship graph shown in FIG. 3E corresponds to the keypoint 330.

Fig. 4A is a schematic diagram of a heatmap sub-tag, according to another embodiment of the present disclosure.

As shown in fig. 4A, the heatmap sub-label includes 17 key points.

Fig. 4B is a schematic diagram of a depth map sub-label according to another embodiment of the present disclosure.

As shown in fig. 4B, the depth information of 17 key points is included in the depth map sub-label.

Fig. 4C is a schematic diagram of a relationship graph sub-label according to another embodiment of the present disclosure.

As shown in FIG. 4C, the edges in the graph sub-label can characterize the relationship information between multiple key points.

FIG. 5 is a schematic diagram of a training method of a test model according to an embodiment of the present disclosure.

As shown in fig. 5, detection model 500 may include a cascade of N processing stages, such as processing stages 5001, … … processing stage 500N. In one example, N ═ 2.

Each processing stage may output an output result. For example, after the sample image 501 is input into the detection model 500, the processing stage 5001 may output an output result 502 and the processing stage 500N may output an output result 503.

A heat map 504, a depth map 505, and a relationship map 506 of the sample image may be determined from the output results 503. From the heat map 504 and the heat map sub-tag 507, a first loss value L _1510 may be derived. From the depth map 505 and the depth map sub-label 508, a second loss value L _2511 can be obtained. From the relationship graph 506 and the relationship graph sub-label 509, a third loss value L _3512 may be obtained.

From the first loss value L1510, the second loss value L2511, and the third loss value L3513, a difference value Diff 513 can be obtained. Based on the difference value Diff 513, the detection model 500 may be trained.

FIG. 6 is a flow diagram of a target detection method according to one embodiment of the present disclosure.

As shown in fig. 6, the method 600 may include operations S610 through S620.

In operation S610, target detection is performed on the target image to obtain a target detection result.

In the embodiment of the present disclosure, the target detection result includes a relationship graph of the target image, and an edge in the relationship graph is used to represent relationship information between a plurality of key points in the target image.

In operation S620, target information of a plurality of key points in the target image is determined according to the target detection result.

For example, the target detection result includes a relationship diagram of the target image. From the relationship graph, the relationship information between each key point and the plurality of key points described above can be acquired. Based on these relationship information, target information for a plurality of keypoints may be determined.

According to the embodiment of the disclosure, richer information related to the key points can be acquired.

In some embodiments, performing target detection on the target image, and obtaining a target detection result includes: and carrying out target detection on the target image by using the detection model to obtain a target detection result. For example, the detection model is trained according to the training method of the detection model provided by the present disclosure.

In some embodiments, the target detection results include a heat map of the target image and a depth map of the sample image.

In some embodiments, determining target information for a plurality of keypoints in the target image according to the target detection result comprises: fusing position information of K key points in the plurality of key points and depth information of the K key points to obtain first fusion information; and fusing the first fusion information and the offset information of the K key points to obtain target information.

For example, the plurality of keypoints is M keypoints, and K is an integer less than or equal to M. M is an integer greater than or equal to 1. In one example, K keypoints may be determined from the M keypoints according to their confidence levels.

For example, the location information for the K key points is determined from the heatmap.

For example, the depth information of K key points is determined from the depth map, K being an integer greater than or equal to 1.

For example, the offset information for the K keypoints is determined from the relationship graph.

For example, the offset information for each keypoint may characterize the offset between each keypoint and multiple keypoints. In one example, a relationship graph includes a plurality of directed edges. The offset between two keypoints may be determined from a directed edge.

For example, the location information Local _ xy of K keypoints is a vector of K × 2 dimensions.

For example, the first fusion information Local _ xyz may be obtained by fusing the depth information of K key points and the position information Local _ xy of K key points. The first fusion information Local _ xyz may be a vector of K × 3 dimensions.

For another example, the Offset information Offset of K key points may be a K × M × 3-dimensional vector.

For another example, the Offset information Offset of the K key points is fused with the first fusion information Local _ xyz to obtain the target information, which will be described in detail below.

In some embodiments, fusing the first fusion information and the offset information of the K key points to obtain the target information includes: performing dimension increasing processing on the first fusion information to obtain the first fusion information after dimension increasing; fusing the first fusion information subjected to dimensionality increase and the offset information of the K key points to obtain second fusion information; and performing dimension reduction processing on the second fusion information based on the confidence coefficient of each key point in the K key points to obtain target information.

For example, as described above, the Offset information Offset of K keypoints is a K × M × 3-dimensional vector. And the first fusion information Local _ xyz is a vector of K × 3 dimensions.

The first fusion information Local _ xyz may be copied M times and spliced into a K × M × 3-dimensional vector, so as to obtain the first fusion information K _ Local _ xyz after the dimension is raised.

Next, the first fused information K _ Local _ xyz after the dimension lifting may be added to the Offset information Offset of the K key points to obtain second fused information K _ Local 3D. The second fusion information K _ Local3D may be a K × M × 3 dimensional vector.

Next, according to the confidence of each keypoint in the K keypoints, the dimension of the second fusion information K _ Local3D is reduced by using an averaging method, so as to obtain target information Local 3D. The target information Local3D may be an N × 3 dimensional vector.

FIG. 7 is a block diagram of a training apparatus to detect a model according to one embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 may include a first obtaining module 710, a second obtaining module 720, and a training module 730.

The first obtaining module 710 is configured to input the sample image into the detection model to obtain an output result. For example, the output result includes a relationship graph of the sample image, and edges in the relationship graph are used for representing relationship information among a plurality of key points in the sample image;

a second obtaining module 720, configured to obtain a difference value according to the label of the sample image and the output result.

A training module 730, configured to train the detection model according to the difference value.

In some embodiments, the output further includes a heat map of the sample image and a depth map of the sample image, the tags including a heat map sub-tag, a depth map sub-tag, and a relationship map sub-tag, the second obtaining module including: a first obtaining sub-module, configured to obtain a first loss value according to the heat map and the heat map sub-tags; the second obtaining submodule is used for obtaining a second loss value according to the depth map and the depth map sub-labels; the third obtaining submodule is used for obtaining a third loss value according to the relation graph and the relation graph sub-labels; and a fourth obtaining submodule, configured to obtain the difference value according to the first loss value, the second loss value, and the third loss value.

In some embodiments, the second obtaining sub-module comprises: a first obtaining unit, configured to obtain, for at least one keypoint in the depth map, at least one second sub-loss value according to a local depth map and a local depth map sub-label corresponding to each keypoint in the at least one keypoint, where the local depth map sub-label is determined according to each keypoint and the depth map sub-label; and a second obtaining unit, configured to obtain the second loss value according to the at least one second sub-loss value.

In some embodiments, the third obtaining sub-module comprises: a third obtaining unit, configured to obtain, for at least one key point in the relationship graph, at least one third sub-loss value according to a local relationship graph and a local relationship graph sub-label that correspond to each key point in the at least one key point, where the local relationship graph sub-label is determined according to each key point and the relationship graph sub-label; and a fourth obtaining unit, configured to obtain the third loss value according to the at least one third sub-loss value.

In some embodiments, the fourth obtaining sub-module comprises: and a fifth obtaining unit, configured to perform weighted summation according to the first loss value, the second loss value, and the third loss value, so as to obtain the difference value.

Fig. 8 is a block diagram of an object detection apparatus according to another embodiment of the present disclosure.

As shown in fig. 8, the apparatus 800 may include an object detection module 810 through a determination module 820.

And the target detection module 810 is configured to perform target detection on the target image to obtain a target detection result. For example, the target detection result includes a relationship graph of the target image, and an edge in the relationship graph is used to represent relationship information between a plurality of key points in the target image.

A determining module 820, configured to determine target information of a plurality of key points in the target image according to the target detection result.

In some embodiments, the target detection results comprise a heat map of the target image and a depth map of the sample image, the determination module comprising: a first fusion sub-module, configured to fuse location information of K key points in the plurality of key points and depth information of the K key points to obtain first fusion information, where the location information of the K key points is determined according to the heat map, the depth information of the K key points is determined according to the depth map, and K is an integer greater than or equal to 1; and a second fusion submodule, configured to fuse the first fusion information and the offset information of the K key points to obtain the target information, where the offset information of the K key points is determined according to the relationship diagram, and the offset information of each key point is used to represent offsets between each key point and the key points.

In some embodiments, the second fusion submodule comprises: the dimension increasing unit is used for performing dimension increasing processing on the first fusion information to obtain the first fusion information after dimension increasing; the fusion unit is used for fusing the first fusion information subjected to dimensionality increase and the offset information of the K key points to obtain second fusion information; and the dimension reduction unit is used for performing dimension reduction processing on the second fusion information based on the confidence coefficient of each key point in the K key points to obtain the target information.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure. This will be described in detail below with reference to fig. 9.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs various methods and processes described above, such as a training method of a detection model and/or an object detection method. For example, in some embodiments, the training method of the detection model and/or the target detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the detection model and/or the object detection method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the training method of the detection model and/or the object detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a detection model, comprising:

inputting a sample image into a detection model to obtain an output result, wherein the output result comprises a relation graph of the sample image, and edges in the relation graph are used for representing relation information among a plurality of key points in the sample image;

obtaining a difference value according to the label of the sample image and the output result; and

and training the detection model according to the difference value.

2. The method of claim 1, wherein the output result further comprises a heat map of the sample image and a depth map of the sample image, the labels comprising a heat map sub-label, a depth map sub-label, and a relationship map sub-label,

the obtaining a difference value according to the label of the sample image and the output result includes:

obtaining a first loss value according to the heat map and the heat map sub-tags;

obtaining a second loss value according to the depth map and the depth map sub-label;

obtaining a third loss value according to the relation graph and the relation graph sub-label; and

and obtaining the difference value according to the first loss value, the second loss value and the third loss value.

3. The method of claim 2, wherein the deriving a second loss value from the depth map and the depth map sub-label comprises:

for at least one key point in the depth map, obtaining at least one second sub-loss value according to a local depth map and a local depth map sub-label corresponding to each key point in the at least one key point, wherein the local depth map sub-label is determined according to each key point and the depth map sub-label; and

and obtaining the second loss value according to the at least one second sub-loss value.

4. The method of claim 2, wherein the deriving a third loss value from the relationship graph and the relationship graph sub-labels comprises:

aiming at least one key point in the relationship graph, obtaining at least one third sub-loss value according to a local relationship graph and a local relationship graph sub-label corresponding to each key point in the at least one key point, wherein the local relationship graph sub-label is determined according to each key point and the relationship graph sub-label; and

and obtaining the third loss value according to the at least one third sub-loss value.

5. The method of claim 2, wherein the deriving the difference value from the first loss value, the second loss value, and the third loss value comprises:

and carrying out weighted summation according to the first loss value, the second loss value and the third loss value to obtain the difference value.

6. A method of target detection, comprising:

performing target detection on a target image to obtain a target detection result, wherein the target detection result comprises a relation graph of the target image, and edges in the relation graph are used for representing relation information among a plurality of key points in the target image; and

and determining target information of a plurality of key points in the target image according to the target detection result.

7. The method of claim 6, wherein the target detection results comprise a heat map of the target image and a depth map of the sample image,

the determining the target information of the plurality of key points in the target image according to the target detection result comprises:

fusing position information of K key points in the plurality of key points and depth information of the K key points to obtain first fusion information, wherein the position information of the K key points is determined according to the heat map, the depth information of the K key points is determined according to the depth map, and K is an integer greater than or equal to 1; and

and fusing the first fusion information and the offset information of the K key points to obtain the target information, wherein the offset information of the K key points is determined according to the relationship diagram, and the offset information of each key point is used for representing the offset between each key point and the plurality of key points.

8. The method of claim 7, wherein the fusing the first fused information and the offset information of the K key points to obtain the target information comprises:

performing dimension increasing processing on the first fusion information to obtain dimension-increased first fusion information;

fusing the first fused information subjected to dimensionality increase and the offset information of the K key points to obtain second fused information; and

and performing dimension reduction processing on the second fusion information based on the confidence coefficient of each key point in the K key points to obtain the target information.

9. A training apparatus for testing a model, comprising:

the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for inputting a sample image into a detection model to obtain an output result, the output result comprises a relation graph of the sample image, and edges in the relation graph are used for representing relation information among a plurality of key points in the sample image;

the second obtaining module is used for obtaining a difference value according to the label of the sample image and the output result; and

and the training module is used for training the detection model according to the difference value.

10. The apparatus of claim 9, wherein the output further comprises a heat map of the sample image and a depth map of the sample image, the labels comprising a heat map sub-label, a depth map sub-label, and a relationship map sub-label,

the second obtaining module includes:

a first obtaining sub-module, configured to obtain a first loss value according to the heat map and the heat map sub-tags;

the second obtaining submodule is used for obtaining a second loss value according to the depth map and the depth map sub-labels;

the third obtaining submodule is used for obtaining a third loss value according to the relation graph and the relation graph sub-labels; and

and the fourth obtaining submodule is used for obtaining the difference value according to the first loss value, the second loss value and the third loss value.

11. The apparatus of claim 10, wherein the second obtaining submodule comprises:

a first obtaining unit, configured to obtain, for at least one keypoint in the depth map, at least one second sub-loss value according to a local depth map and a local depth map sub-label corresponding to each keypoint in the at least one keypoint, where the local depth map sub-label is determined according to each keypoint and the depth map sub-label; and

a second obtaining unit, configured to obtain the second loss value according to the at least one second sub-loss value.

12. The apparatus of claim 10, wherein the third obtaining submodule comprises:

a third obtaining unit, configured to obtain, for at least one key point in the relationship graph, at least one third sub-loss value according to a local relationship graph and a local relationship graph sub-label corresponding to each key point in the at least one key point, where the local relationship graph sub-label is determined according to each key point and the relationship graph sub-label; and

a fourth obtaining unit, configured to obtain the third loss value according to the at least one third sub-loss value.

13. The apparatus of claim 10, wherein the fourth obtaining submodule comprises:

and a fifth obtaining unit, configured to perform weighted summation according to the first loss value, the second loss value, and the third loss value, so as to obtain the difference value.

14. An object detection device comprising:

the target detection module is used for carrying out target detection on a target image to obtain a target detection result, wherein the target detection result comprises a relation graph of the target image, and edges in the relation graph are used for representing relation information among a plurality of key points in the target image; and

and the determining module is used for determining the target information of a plurality of key points in the target image according to the target detection result.

15. The apparatus of claim 14, wherein the target detection results comprise a heat map of the target image and a depth map of the sample image,

the determining module comprises:

a first fusion sub-module, configured to fuse location information of K key points in the plurality of key points and depth information of the K key points to obtain first fusion information, where the location information of the K key points is determined according to the heat map, the depth information of the K key points is determined according to the depth map, and K is an integer greater than or equal to 1; and

and a second fusion submodule, configured to fuse the first fusion information and the offset information of the K key points to obtain the target information, where the offset information of the K key points is determined according to the relationship diagram, and the offset information of each key point is used to represent offsets between each key point and the plurality of key points.

16. The apparatus of claim 15, wherein the second fusion submodule comprises:

the dimension-increasing unit is used for performing dimension-increasing processing on the first fusion information to obtain the first fusion information after dimension-increasing;

the fusion unit is used for fusing the first fusion information subjected to dimensionality increase and the offset information of the K key points to obtain second fusion information; and

and the dimension reduction unit is used for performing dimension reduction processing on the second fusion information based on the confidence coefficient of each key point in the K key points to obtain the target information.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.