WO2020215974A1 - 用于人体检测的方法和装置 - Google Patents
用于人体检测的方法和装置 Download PDFInfo
- Publication number
- WO2020215974A1 WO2020215974A1 PCT/CN2020/081314 CN2020081314W WO2020215974A1 WO 2020215974 A1 WO2020215974 A1 WO 2020215974A1 CN 2020081314 W CN2020081314 W CN 2020081314W WO 2020215974 A1 WO2020215974 A1 WO 2020215974A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- human body
- candidate
- body image
- candidate human
- key points
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/759—Region-based matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the embodiments of the present application relate to the field of computer technology, and in particular to methods and devices for human detection.
- human detection is widely used in various fields such as national defense and military, public transportation, social security and commercial applications.
- the so-called human body detection refers to detecting and locating the human body in the picture, and returning the coordinates of the human body rectangular frame.
- Human body detection is the basis of human body posture analysis, human body behavior analysis, etc.
- the existing human body detection methods are mainly performed through human body detection models.
- the embodiments of the present application propose methods and devices for human detection.
- some embodiments of the present application provide a method for human body detection.
- the method includes: obtaining a set of candidate human body image regions in a target image based on a human body detection model;
- the candidate human body image area based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image area;
- the obtained position information determines the candidate human body key points within the contour of the human body;
- the confidence score of the candidate human body image region is determined according to the sum of the confidence of the candidate human body key points within the human body contour; according to the candidate human body image region
- the confidence scores of the candidate body image regions in the set are determined from the set of candidate body image regions.
- the method further includes: converting the determined human body image
- the candidate human body key points in the area are determined as human body key points.
- obtaining a set of candidate human body image regions in the target image based on the human body detection model includes: obtaining a set of candidate human body image regions in the target image based on the human body detection model, and obtaining a set of candidate human image regions in the target image
- the candidate body image area is the confidence level of the body image area
- the confidence score of the candidate body image area is determined according to the sum of the confidence levels of the candidate body key points in the body contour, including: The sum of the confidence levels of the candidate human body key points and the confidence that the candidate human body image region is the human body image region, and the confidence score of the candidate human body image region is determined.
- the confidence score of the candidate human body image region is determined according to the sum of the confidence of the candidate human body key points within the human body contour and the confidence that the candidate human body image region is the human body image region, including : According to the preset weights, the sum of the confidence of the candidate human body key points within the contour of the human body, the confidence that the candidate human body image area is the human body image area, and the confidence of the candidate human body key points outside the human body contour The weighted summation is performed on the sum of degrees to obtain the confidence score of the candidate human body image area. Among them, the weight set for the confidence sum of the candidate human body key points within the contour of the human body is greater than that for the preparation outside the contour of the human body. Choose the weight of the sum of the confidence of the key points of the human body.
- acquiring the position information of the candidate human body key points in the candidate human body image area based on the single-person human body key point detection model includes: acquiring the candidate human body image area based on the convolutional neural network model Location information of key points of the human body.
- a cascaded network structure is used to determine the position information of key points of the candidate human body in the candidate human body image region in combination with global information and local information in the candidate human body image region.
- the cascaded network structure is a cascaded network structure obtained by cascading multiple identical network models in sequence, and a full convolutional layer is connected to the end of the last network model in the cascaded network structure to output A heat map corresponding to each candidate human body key point, where the heat map represents the probability that the candidate human body key point exists in each pixel point in the candidate human body image area, where one heat map corresponds to a candidate human body key point.
- obtaining the position information of the candidate human body key points in the candidate human body image area includes: for each heat map: determining the pixel point with the greatest probability in the candidate human body image area based on the heat map The location is determined as the location of the candidate human body key points corresponding to the heat map.
- the confidence of the candidate human body key points corresponding to the heat map is the probability corresponding to the pixel point with the greatest probability determined based on the heat map.
- determining the human body image region from the candidate body image region set according to the confidence score of the candidate body image region in the candidate body image region set includes: determining the confidence in the candidate body image region set The degree score exceeds the preset score threshold, or sorted according to the confidence score from high to low, and ranks first the preset number of candidate body image regions; find the candidate body image region in the set of candidate body image regions and the determined candidate body image region Redundant candidate body image area; remove the found candidate body image area.
- searching for redundant candidate body image regions in the candidate body image region set and the determined candidate body image region includes: for the candidate body image regions in the candidate body image region set: Determine the contour center distance according to the body contour information in the candidate body image area and the determined body contour information in the candidate body image area; and determine the distance between the candidate body key points included in the candidate body image area and The distance between the candidate human body key points included in the candidate human body image area determines the similarity; in response to the determined contour center distance is less than the preset distance threshold, and the determined similarity is greater than the preset similarity threshold, determine the similarity
- the candidate body image area is a redundant candidate body image area.
- some embodiments of the present application provide a device for human body detection, the device comprising: an acquiring unit configured to acquire a set of candidate human body image regions in a target image based on a human body detection model; first The determining unit is configured to, for the candidate human body image region in the candidate human body image region set: obtain the position information and confidence of the candidate human body key points in the candidate human body image region based on a single human body key point detection model; Determine the candidate human body key points in the human body contour according to the human body contour information in the candidate human body image area and the acquired position information; determine the candidate body according to the sum of the confidence of the candidate human body key points in the human body contour The confidence score of the human body image region; the second determining unit is configured to determine the human body image region from the candidate body image region set according to the confidence score of the candidate human body image region in the candidate body image region set.
- the device further includes: a third determining unit configured to determine the candidate human body key points in the determined human body image area as the human body key points.
- the acquiring unit is further configured to: acquire a set of candidate human body image regions in the target image based on the human body detection model, and the candidate human body image regions in the candidate body image region set are those of the human body image region. Confidence; and the first determining unit is further configured to: determine the candidate human body according to the sum of the confidence degrees of the candidate human body key points within the contour of the human body and the confidence that the candidate human body image region is a human body image region The confidence score of the image area.
- the first determining unit is further configured to: according to preset weights, the sum of the confidence levels of the candidate human body key points within the contour of the human body, and the candidate human body image area is the proportion of the human body image area. Confidence and the confidence sum of the candidate human body key points outside the contour of the human body are weighted and summed to obtain the confidence score of the candidate human body image area, where the confidence of the candidate human body key points inside the human body contour The weight set by the sum of degrees is greater than the weight set for the sum of the confidence degrees of the candidate human body key points outside the contour of the human body.
- the first determining unit is further configured to obtain the position information of the candidate human body key points in the candidate human body image region based on the convolutional neural network model.
- the first determining unit is further configured to: adopt a cascaded network structure to determine the positions of key points of the candidate body in the candidate body image region in combination with global information and local information in the candidate body image region information.
- the cascaded network structure is a cascaded network structure obtained by cascading multiple identical network models in sequence, and a full convolutional layer is connected to the end of the last network model in the cascaded network structure,
- the heat map represents the probability that the candidate human body key point exists in each pixel in the candidate human body image area, where one heat map corresponds to one candidate The key points of the human body.
- obtaining the position information of the candidate human body key points in the candidate human body image area includes: for each heat map: determining the pixel point with the greatest probability in the candidate human body image area based on the heat map The location is determined as the location of the candidate human body key points corresponding to the heat map.
- the confidence of the candidate human body key points corresponding to the heat map is the probability corresponding to the pixel point with the greatest probability determined based on the heat map.
- the second determining unit includes: a determining sub-unit configured to determine that the confidence score in the set of candidate human body image regions exceeds a preset score threshold, or is sorted according to the confidence score from high to low, and ranked The previously preset number of candidate human body image regions; the searching subunit is configured to search for redundant candidate human body image regions in the set of candidate human body image regions and the determined candidate human body image region; removing the subunit is It is configured to remove the found candidate body image area.
- the search subunit is further configured to: for a candidate human body image region in the candidate body image region set: according to the body contour information in the candidate human body image region and the determined candidate human body image
- the human body contour information in the area determines the contour center distance; and the similarity is determined based on the distance between the candidate human body key points included in the candidate human body image area and the candidate human body key points included in the determined candidate human body image area ;
- the candidate human body image area is determined to be a redundant candidate human body image area.
- some embodiments of the present application provide a device, including: one or more processors; a storage device, on which one or more programs are stored, when the above one or more programs are Execution by two processors, so that the foregoing one or more processors implement the foregoing method of the first aspect.
- some embodiments of the present application provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method as described in the first aspect is implemented.
- the method and device for human body detection obtained by the embodiments of the application obtain a set of candidate body image regions in a target image, and then for the candidate body image regions in the set of candidate body image regions: based on single human body key points Detect the model to obtain the position information and confidence of the candidate human body key points in the candidate human body image area; determine the position information in the human body according to the body contour information in the candidate human body image area and the acquired position information of the candidate human body key points The candidate human body key points in the contour; determine the confidence score of the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour, and finally according to the candidate human body in the set of candidate human body image areas
- the confidence score of the image region determines the human image region from the set of candidate human image regions, providing a human body detection mechanism based on human body contour information and human body key points, and improving the accuracy of human body detection.
- Figure 1 is a diagram of some exemplary system architectures applicable to this application.
- Fig. 2 is a flowchart of an embodiment of the method for human detection according to the present application
- Figure 3A is a schematic diagram of the external structure of a single hourglass (hourglass) model
- Figure 3B is a schematic diagram of the internal structure of a single hourglass model
- Figure 3C is a schematic diagram of the cascaded network structure obtained after multiple hourglass models are sequentially cascaded
- Fig. 4 is a schematic diagram of an application scenario of the method for human detection according to the present application.
- Fig. 5 is a flowchart of another embodiment of the method for human detection according to the present application.
- Fig. 6 is a schematic structural diagram of an embodiment of a device for human body detection according to the present application.
- Fig. 7 is a schematic structural diagram of a computer system suitable for implementing a server or a terminal in some embodiments of the present application.
- FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for human body detection or the apparatus for human body detection of the present application can be applied.
- the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
- the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
- the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
- the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
- Various client applications may be installed on the terminal devices 101, 102, 103, such as image collection applications, image processing applications, e-commerce applications, search applications, and so on.
- the terminal devices 101, 102, and 103 may be hardware or software.
- the terminal devices 101, 102, and 103 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on.
- the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules, or as a single software or software module. There is no specific limitation here.
- the server 105 may be a server that provides various services, such as a back-end server that provides support for applications installed on the terminal devices 101, 102, and 103.
- the server 105 may obtain a set of candidate body image regions in the target image based on a human body detection model; For the candidate human body image region in the set of candidate human body image regions: based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image region; according to the candidate human body image The human body contour information in the area and the acquired position information determine the candidate human body key points within the human body contour; the confidence level of the candidate human body image area is determined according to the sum of the confidence of the candidate human body key points within the human body contour Score: According to the confidence score of the candidate body image region in the candidate body image region set, the body image region is determined from the candidate body image region set.
- the method for human body detection can be executed by the server 105, and can also be executed by the terminal devices 101, 102, 103. Accordingly, the device for human body detection can be set on the server 105. It can also be installed in the terminal devices 101, 102, 103.
- the server can be hardware or software.
- the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
- the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module. There is no specific limitation here.
- terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
- the method for human detection includes the following steps:
- Step 201 based on the human body detection model, obtain a set of candidate human body image regions in the target image.
- the execution subject of the method for human detection may first obtain a set of candidate human image regions in the target image based on the human detection model.
- the target image can include any image for which human body detection is to be performed.
- the target image can be directly input to the human detection model, or the target image can be preprocessed first, and the preprocessed target image can be input to the human detection model.
- the human body detection model can be constructed by using target detection algorithms such as SSD, Faster R-CNN, YOLO, R-FCN, etc.
- the aforementioned Faster R-CNN, R-FCN, SSD and YOLO are currently widely researched and applied well-known technologies. Repeat it again.
- the human body detection model must guarantee the recall rate, it is possible to choose algorithms with high human body detection accuracy such as Faster R-CNN.
- the target image assumes that the target image contains N people.
- Step 202 For the candidate human body image region in the set of candidate human body image regions: based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image region; Select the human body contour information in the human body image area and the acquired position information to determine the candidate human body key points in the human body contour; determine the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour The confidence score.
- the above-mentioned execution subject may, for the candidate human body image region in the candidate human body image region set acquired in step 201: obtain the candidate human body in the candidate human body image region based on the single-person body key point detection model The position information and confidence of the key points; determine the candidate human body key points in the human body contour according to the human body contour information in the candidate human body image area and the obtained position information; according to the candidate human body key points in the human body contour The sum of confidences determines the confidence score of the candidate body image region. Since there is only one person in the candidate human body image area obtained by the human body detection model, the single-person human body key point detection model is used to obtain the candidate human body key points.
- the key point estimation is a regression problem.
- a model such as convolutional neural network (CNN) can be used to perform regression analysis to determine the position information of the candidate human body key points in the candidate human body image area.
- the position information may be, for example, Coordinate information. Due to the different scales of the main key points of the human body, for joints such as the head, neck, and shoulders that are more obvious and difficult to make complex movements, a more accurate estimate of these key points can be obtained by directly using a CNN model. For the key points that are easily blocked or invisible, such as hips, wrists, ankles, etc., it is necessary to use local information and increase the receptive field to further obtain the accurate positions of these key points.
- the front and back cascaded network structure may be a cascaded network structure obtained by sequentially cascading multiple identical network models, for example, may be multiple hourglass models cascaded back and forth.
- the following is an example of a cascaded network structure formed by cascading multiple hourglass models.
- Figure 3A is a schematic diagram of the external structure of a single hourglass model
- Figure 3B is a schematic diagram of the internal structure of a single hourglass model.
- the hourglass model includes several residual network modules. The entire structure is symmetric. Low-resolution features are obtained through downsampling, high-resolution features are obtained through upsampling, and feature maps are added element by element.
- two 1*1 full convolutional layers can be connected to output a heat map of each joint (that is, each candidate human body key point), one heat map corresponds to a candidate human body key Point, calculate the difference between the heat map output by the hourglass module and the real heat map of the corresponding key point to obtain the loss function value of the candidate human body key point.
- multiple hourglass models can be cascaded to obtain a stacked hourglass model, as shown in Figure 3C. Since the heat map represents the probability that the key points of the candidate human body exist in each pixel in the candidate human image area, look for the position of the pixel with the largest probability on each heat map output by the last hourglass model, which is Corresponding to the position coordinates of the candidate human body key point, the maximum probability value is the confidence of the candidate human body key point.
- the human body contour information may be information used to distinguish the human body from the background, for example, a binary image that distinguishes the human body from the background.
- the contour information may be performed on the candidate body image regions in the candidate body image region set, and is independent of the determination of the key points.
- Contour detection can be performed using existing semantic segmentation technology, or it can use an Encoder-Decoder structure similar to the hourglass structure.
- the end of the network output is a 1*1 full convolutional layer, followed by a normalized index Function loss (softmax loss) layer.
- Contour detection can provide weak supervision information for key point estimation, so even rough contour detection can meet the demand.
- the use of an Encoder-Decoder structure similar to the hourglass structure can reduce Requirements for labeling data quality and network complexity.
- the correspondence relationship between the sum of the confidence levels of the candidate human body key points within the outline of the human body and the confidence score of the candidate human body image region may be preset.
- the set correspondence relationship may indicate the The greater the sum of the confidence levels of the candidate human body key points in, the higher the confidence score of the candidate human body image region.
- the confidence that the candidate human image area output by the human detection model is the human image area may also be considered. Since some key points of the human body may fall outside the contour of the human body, the sum of the confidences of the candidate key points of the human body outside the contour of the human body can also be comprehensively considered to further improve the accuracy of the determined confidence score.
- obtaining a set of candidate human image regions in the target image based on the human detection model includes: obtaining a set of candidate human image regions in the target image based on the human detection model, and preparing Select the candidate body image region in the body image region set as the confidence level of the body image region; and determine the confidence score of the candidate body image region according to the sum of the confidence levels of the candidate body key points within the body contour, including : Determine the confidence score of the candidate human body image region according to the sum of the confidence degrees of the candidate human body key points within the human body contour and the confidence that the candidate human body image region is the human body image region.
- the confidence score of the candidate human body image region may be determined as According to the preset weights, the weighted sum of the confidence of the candidate human body key points in the contour of the human body and the confidence that the candidate human body image area is the human body image area is weighted and the specific weight can be set according to actual needs , You can also use machine learning methods to get through training.
- the candidate human body image area is determined based on the sum of the confidence levels of the candidate human body key points within the human body contour and the confidence that the candidate human body image area is a human body image area
- the confidence score of includes: according to the preset weights, the sum of the confidence of the candidate human body key points within the contour of the human body, the confidence that the candidate human body image region is the human body image region, and the preparation outside the human body contour
- the confidence sum of the key points of the human body is selected for weighted summation, and the confidence score of the candidate human body image area is obtained.
- the weight set for the confidence sum of the candidate human body key points within the contour of the human body is greater than the weighted summation.
- the specific weight can be set according to actual needs, or it can be obtained through training using a machine learning method.
- Step 203 Determine the human body image area from the candidate human body image area set according to the confidence score of the candidate human body image area in the candidate human body image area set.
- the above-mentioned execution subject may determine the human body image region from the candidate body image region set according to the confidence score of the candidate body image region in the candidate body image region set determined in step 202.
- the confidence score of the candidate body image regions can be used to determine from the set of candidate body image regions Out of the human body image area.
- the method further includes: Determine the candidate human body key points in the determined human body image area as the human body key points.
- the parameters of the human body detection model in step 201, the parameters of the single human body key point detection model in step 202, the parameters involved in determining the body contour information, and the parameters involved in determining the confidence score can be manually set To be sure, you can also use machine learning methods and get them through training.
- the sample data used may include sample pictures and annotation information, and the annotation information may include annotated human body image regions or human body key points.
- a sample picture can be used as input, and the coordinates of key points of the human body can be labeled as output, and one or more of the above parameters can be obtained through training.
- Preprocessing can include data cleaning and data enhancement.
- Data cleaning refers to the removal of erroneous and incomplete annotation data in the training data.
- Multi-person human body key point annotation data usually has human body image area annotation errors and key point annotation errors, including missing coordinates and incorrect coordinates.
- Data enhancement can be to obtain expanded training data through rotation, size change, cropping, flipping, and changing the brightness of the original training data, and make the model more generalized.
- the picture can be cropped without changing the aspect ratio of the picture, and the picture is adjusted to a size of 256*256 after the edge of the picture is zero-filled. While data enhancement is performed on the image, corresponding operations such as rotation, scale change, and flipping must be performed on the labeled data.
- the method provided in the foregoing embodiments of the present application provides a human body detection mechanism based on human body contour information and human body key points, which improves the accuracy of human body detection.
- FIG. 4 is a schematic diagram of an application scenario of the method for human detection according to this embodiment.
- the server 301 inputs the target image 302 into the human body detection model 303 to obtain the candidate body image region set in the target image 302, and then for the candidate body image region in the candidate body image region set: It inputs the single-person human body key point detection model 304 to obtain the position information and confidence of the candidate human body key points in the candidate human body image area, and according to the body contour information and the acquired position information in the candidate human body image area Determine the candidate human body key points in the contour of the human body, and then determine the confidence score of the candidate human body image region according to the sum of the confidence of the candidate human body key points in the human body contour, and finally according to the candidate body image region set The confidence score of the candidate body image region is determined from the set of candidate body image regions.
- FIG. 5 shows a flow 400 of still another embodiment of a method for human detection.
- the process 400 of the method for human detection includes the following steps:
- Step 401 Obtain a set of candidate human body image regions in the target image based on the human body detection model.
- the execution subject of the method for human detection may first obtain a set of candidate human image regions in the target image based on the human detection model.
- Step 402 For the candidate human body image region in the set of candidate human body image regions: based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image region; Select the human body contour information in the human body image area and the acquired position information to determine the candidate human body key points in the human body contour; determine the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour The confidence score.
- the above-mentioned execution subject may, for the candidate human body image region in the candidate human body image region set acquired in step 401: obtain the candidate human body in the candidate human body image region based on the single-person body key point detection model The position information and confidence of the key points; determine the candidate human body key points in the human body contour according to the human body contour information in the candidate human body image area and the obtained position information; according to the candidate human body key points in the human body contour The sum of confidences determines the confidence score of the candidate body image region.
- Step 403 Determine that the confidence score in the set of candidate body image regions exceeds a preset score threshold, or sort the candidate body image regions from high to low according to the confidence score, and rank the candidate body image regions in the front by a preset number.
- the above-mentioned execution subject may determine that the confidence score in the set of candidate human image regions exceeds a preset score threshold, or sort the confidence scores from high to low, and rank first the preset number of candidate human image regions .
- the score threshold and the preset number can be set according to actual needs. For example, the preset number can be 1, and the candidate human body image region with the highest confidence score is determined.
- Step 404 Search for redundant candidate human body image regions in the set of candidate human body image regions and the determined candidate human body image region.
- the above-mentioned execution subject may search for redundant candidate human body image regions in the candidate human body image region set and the candidate human body image region determined in step 403.
- redundant detection frames inevitably appear.
- the redundancy can be determined by the keypoint similarity (OKS) and/or the contour center distance.
- searching for redundant candidate body image regions in the candidate body image region set and the determined candidate body image region includes: Alternative human body image area: Determine the contour center distance according to the human body contour information in the candidate human body image area and the determined human body contour information in the candidate human body image area; and according to the candidate human body included in the candidate human body image area The key point, and the determined distance between the candidate human body key points included in the determined candidate human body image area, determine the similarity; in response to the determined contour center distance is less than the preset distance threshold, and the determined similarity is greater than the preset The similarity threshold determines that the candidate human body image area is a redundant candidate human body image area.
- Step 405 Remove the found candidate body image area.
- the above-mentioned execution subject may remove the candidate human body image region found in step 404.
- the estimation of two persons who are close together will not be eliminated, which improves the accuracy of human body detection.
- the whole image can be further improved. The accuracy of the estimation of key points of the human body.
- step 401 and step 402 are basically the same as the operations of step 201 and step 202, and will not be repeated here.
- the candidate body image region set with a higher confidence score is determined.
- this application provides an embodiment of a device for human detection.
- the device embodiment corresponds to the method embodiment shown in FIG.
- the device can be specifically applied to various electronic devices.
- the apparatus 500 for human body detection in this embodiment includes: an acquisition unit 501, a first determination unit 502, and a second determination unit 503.
- the acquiring unit is configured to acquire a set of candidate human body image regions in the target image based on the human body detection model
- the first determining unit is configured to determine the candidate body image region in the set of candidate human body image regions: based on the single The human body key point detection model, to obtain the position information and confidence of the candidate human body key points in the candidate human body image area; determine within the human body contour according to the body contour information in the candidate human body image area and the acquired position information
- the candidate human body key points determine the confidence score of the candidate human body image area according to the sum of the confidence levels of the candidate human body key points within the contour of the human body
- the second determining unit is configured to set the candidate human body image areas according to The confidence score of the candidate body image area in, determines the body image area from the set of candidate body image areas.
- the specific processing of the acquiring unit 501, the first determining unit 502, and the second determining unit 503 of the apparatus 500 for human body detection may refer to step 201, step 202, and step 203 in the embodiment corresponding to FIG. 2.
- the apparatus further includes: a third determining unit configured to determine the candidate human body key points in the determined human body image area as the human body key points.
- the acquiring unit is further configured to acquire a set of candidate human body image regions in the target image and a candidate human body image in the set of candidate human body image regions based on the human body detection model
- the region is the confidence level of the human body image region
- the first determining unit is further configured to: according to the sum of the confidence levels of the candidate human body key points within the human body contour and the confidence level that the candidate human body image region is the human body image region , Determine the confidence score of the candidate body image region.
- the first determining unit is further configured to: according to preset weights, the sum of the confidence levels of the candidate human body key points within the human body contour and the candidate human body image
- the area is the confidence of the human body image area and the confidence of the candidate human body key points outside the contour of the human body.
- the weighted summation is performed to obtain the confidence score of the candidate human body image area.
- the weight set for the sum of the confidences of the key points of the human body is greater than the weight set for the sum of the confidences of the candidate key points of the human body outside the contour of the human body.
- the first determining unit is further configured to obtain the position information of the candidate human body key points in the candidate human body image region based on the convolutional neural network model.
- the first determining unit is further configured to: adopt a cascaded network structure, and determine the candidate body image region in combination with global information and local information in the candidate body image region. Select the location information of the key points of the human body.
- the cascaded network structure is a cascaded network structure obtained by cascading multiple identical network models in sequence, and the end of the last network model in the cascaded network structure is connected
- the heat map represents the probability that the candidate human body key point exists in each pixel point in the candidate human body image area. Among them, one The heat map corresponds to a key point of the candidate human body.
- obtaining the position information of the key points of the candidate human body in the candidate human body image area includes: for each heat map: determining the candidate human body image area based on the heat map The position of the pixel with the greatest probability is determined as the position of the candidate human body key point corresponding to the heat map.
- the confidence of the candidate human body key points corresponding to the heat map is the probability corresponding to the pixel point with the greatest probability determined based on the heat map.
- the second determining unit includes: a determining sub-unit configured to determine that the confidence score in the set of candidate human body image regions exceeds a preset score threshold, or according to the higher confidence score To the low order, the preset number of candidate human body image regions are ranked first; the search sub-unit is configured to search for redundant candidate human body image regions in the set of candidate human body image regions and the determined candidate human body image region ; The removal subunit is configured to remove the found candidate body image area.
- the search subunit is further configured to: for a candidate human body image region in the set of candidate human body image regions: determine according to the body contour information in the candidate human body image region Determine the contour center distance based on the human body contour information in the candidate human body image area; and determine the distance between the candidate human body key points included in the candidate human body image area and the candidate human body key points included in the determined candidate human body image area. The distance between the two determines the similarity; in response to the determined contour center distance is less than the preset distance threshold, and the determined similarity is greater than the preset similarity threshold, determine that the candidate body image area is a redundant candidate body image area .
- the device provided by the foregoing embodiment of the present application obtains a set of candidate human body image regions in the target image based on a human body detection model; for candidate body image regions in the candidate body image region set: detection based on single human body key points Model, obtain the position information and confidence of the candidate human body key points in the candidate human body image area; determine the candidate human body key within the human body contour according to the body contour information in the candidate human body image area and the obtained position information Point; determine the confidence score of the candidate human body image region according to the sum of the confidence of the candidate human body key points within the human body contour; according to the confidence score of the candidate human image region in the set of candidate human body image regions, from The human body image region is determined from the set of candidate human body image regions, which provides a human body detection mechanism based on human body contour information and human body key points, and improves the accuracy of human body detection.
- FIG. 7 shows a schematic structural diagram of a computer system 600 suitable for implementing a server or a terminal in the embodiments of the present application.
- the server or terminal shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
- the computer system 600 includes a central processing unit (CPU) 601, which can be based on a program stored in a read-only memory (ROM) 602 or a program loaded from a storage part 608 into a random access memory (RAM) 603 And perform various appropriate actions and processing.
- the RAM 603 also stores various programs and data required for the operation of the system 600.
- the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
- An input/output (I/O) interface 605 is also connected to the bus 604.
- the following components can be connected to the I/O interface 605: including an input part 606 such as a keyboard, a mouse, etc.; including an output part 607 such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; including storage such as a hard disk Part 608; and a communication part 609 including a network interface card such as a LAN card, a modem, etc.
- the communication section 609 performs communication processing via a network such as the Internet.
- the driver 610 is also connected to the I/O interface 605 as needed.
- a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that the computer program read from it is installed into the storage part 608 as needed.
- the process described above with reference to the flowchart can be implemented as a computer software program.
- the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from the network through the communication part 609, and/or installed from the removable medium 611.
- the central processing unit (CPU) 601 the above-mentioned functions defined in the method of the present application are executed.
- the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable medium or any combination of the two.
- the computer-readable medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- the computer-readable medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
- This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than the computer-readable medium, and the computer-readable medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device.
- the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
- the computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof.
- the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional The procedural programming language-such as C language or similar programming language.
- the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
- the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider for example, using an Internet service provider to pass Internet connection.
- each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
- the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present application can be implemented in software or hardware.
- the described unit may also be provided in the processor.
- a processor includes an acquiring unit, a first determining unit, and a second determining unit.
- the names of these units do not constitute a limitation on the unit itself under certain circumstances.
- the acquisition unit can also be described as "configured to acquire a set of candidate human body image regions in the target image based on a human body detection model. Unit".
- the present application also provides a computer-readable medium, which may be included in the device described in the above-mentioned embodiments; or it may exist alone without being assembled into the device.
- the above-mentioned computer-readable medium carries one or more programs.
- the device obtains a set of candidate body image regions in the target image based on the body detection model;
- the candidate human body image area in the human body image area set Based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image area; according to the candidate human body image area
- the human body contour information and the acquired position information determine the candidate human body key points in the human body contour; determine the confidence score of the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour;
- the confidence score of the candidate body image region in the candidate body image region set is determined from the candidate body image region set.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (24)
- 一种用于人体检测的方法,所述方法包括:基于人体检测模型,获取目标图像中的备选人体图像区域集合;对于所述备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的备选人体关键点的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域。
- 根据权利要求1所述的方法,其中,所述根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域之后,所述方法还包括:将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
- 根据权利要求1所述的方法,其中,所述基于人体检测模型,获取目标图像中的备选人体图像区域集合,包括:基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及所述根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数,包括:根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
- 根据权利要求3所述的方法,其中,所述根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数,包括:根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
- 根据权利要求1所述的方法,其中,所述基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息,包括:基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
- 根据权利要求5所述的方法,其中,采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位置信息。
- 根据权利要求6所述的方法,其中,所述级联网络结构为多个相同的网络模型依次级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
- 根据权利要求7所述的方法,其中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
- 根据权利要求8所述的方法,其中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
- 根据权利要求1-9中任一项所述的方法,其中,所述根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域,包括:确定所述备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;查找所述备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;去除查找到的备选人体图像区域。
- 根据权利要求10所述的方法,其中,所述查找所述备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域,包括:对于所述备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区 域。
- 一种用于人体检测的装置,所述装置包括:获取单元,被配置成基于人体检测模型,获取目标图像中的备选人体图像区域集合;第一确定单元,被配置成对于所述备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的备选人体关键点的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;第二确定单元,被配置成根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域。
- 根据权利要求12所述的装置,其中,所述装置还包括:第三确定单元,被配置成将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
- 根据权利要求12所述的装置,其中,所述获取单元,进一步被配置成:基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及所述第一确定单元,进一步被配置成:根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
- 根据权利要求14所述的装置,其中,所述第一确定单元,进一步被配置成:根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
- 根据权利要求12所述的装置,其中,所述第一确定单元进一步被配置成:基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
- 根据权利要求16所述的装置,其中,所述第一确定单元进一步被配置成:采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位置信息。
- 根据权利要求17所述的装置,其中,所述级联网络结构为多个相同的网络模型 依次级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
- 根据权利要求18所述的装置,其中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
- 根据权利要求19所述的装置,其中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
- 根据权利要求12-20中任一项所述的装置,其中,所述第二确定单元,包括:确定子单元,被配置成确定所述备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;查找子单元,被配置成查找所述备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;去除子单元,被配置成去除查找到的备选人体图像区域。
- 根据权利要求21所述的装置,其中,所述查找子单元,进一步被配置成:对于所述备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区域。
- 一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-11中任一所述的方法。
- 一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1-11中任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/602,969 US20220198816A1 (en) | 2019-04-24 | 2020-03-26 | Method and apparatus for detecting body |
JP2021559665A JP7265034B2 (ja) | 2019-04-24 | 2020-03-26 | 人体検出用の方法及び装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910331939.9A CN110046600B (zh) | 2019-04-24 | 2019-04-24 | 用于人体检测的方法和装置 |
CN201910331939.9 | 2019-04-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020215974A1 true WO2020215974A1 (zh) | 2020-10-29 |
Family
ID=67278895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/081314 WO2020215974A1 (zh) | 2019-04-24 | 2020-03-26 | 用于人体检测的方法和装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220198816A1 (zh) |
JP (1) | JP7265034B2 (zh) |
CN (1) | CN110046600B (zh) |
WO (1) | WO2020215974A1 (zh) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046600B (zh) * | 2019-04-24 | 2021-02-26 | 北京京东尚科信息技术有限公司 | 用于人体检测的方法和装置 |
CN110490140A (zh) * | 2019-08-21 | 2019-11-22 | 上海眼控科技股份有限公司 | 屏幕显示状态判别方法、装置、计算机设备和存储介质 |
CN110705365A (zh) * | 2019-09-06 | 2020-01-17 | 北京达佳互联信息技术有限公司 | 一种人体关键点检测方法、装置、电子设备及存储介质 |
CN112699706A (zh) * | 2019-10-22 | 2021-04-23 | 广州弘度信息科技有限公司 | 跌倒检测方法、系统和存储介质 |
CN110889376A (zh) * | 2019-11-28 | 2020-03-17 | 创新奇智(南京)科技有限公司 | 一种基于深度学习的安全帽佩戴检测系统及方法 |
CN111027495A (zh) * | 2019-12-12 | 2020-04-17 | 京东数字科技控股有限公司 | 用于检测人体关键点的方法和装置 |
CN111079695B (zh) * | 2019-12-30 | 2021-06-01 | 北京华宇信息技术有限公司 | 一种人体关键点检测与自学习方法及装置 |
CN111507806B (zh) * | 2020-04-23 | 2023-08-29 | 北京百度网讯科技有限公司 | 虚拟试鞋方法、装置、设备以及存储介质 |
CN111861998A (zh) * | 2020-06-24 | 2020-10-30 | 浙江大华技术股份有限公司 | 一种人体图像质量评估方法、装置、系统和计算机设备 |
WO2021204037A1 (zh) * | 2020-11-12 | 2021-10-14 | 平安科技(深圳)有限公司 | 人脸关键点的检测方法、装置、存储介质及电子设备 |
CN112528850B (zh) * | 2020-12-11 | 2024-06-04 | 北京百度网讯科技有限公司 | 人体识别方法、装置、设备和存储介质 |
CN112613382B (zh) * | 2020-12-17 | 2024-04-30 | 浙江大华技术股份有限公司 | 对象完整性的确定方法及装置、存储介质、电子装置 |
CN113011242A (zh) * | 2020-12-31 | 2021-06-22 | 杭州拓深科技有限公司 | 一种仰卧起坐计数方法、装置、电子装置和存储介质 |
CN113947635A (zh) * | 2021-10-15 | 2022-01-18 | 北京百度网讯科技有限公司 | 图像定位方法、装置、电子设备以及存储介质 |
CN117290537B (zh) * | 2023-09-28 | 2024-06-07 | 腾讯科技(深圳)有限公司 | 图像搜索方法、装置、设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106355188A (zh) * | 2015-07-13 | 2017-01-25 | 阿里巴巴集团控股有限公司 | 图像检测方法及装置 |
CN108009466A (zh) * | 2016-10-28 | 2018-05-08 | 北京旷视科技有限公司 | 行人检测方法和装置 |
CN108122247A (zh) * | 2017-12-25 | 2018-06-05 | 北京航空航天大学 | 一种基于图像显著性和特征先验模型的视频目标检测方法 |
US10083352B1 (en) * | 2017-05-22 | 2018-09-25 | Amazon Technologies, Inc. | Presence detection and detection localization |
CN108710868A (zh) * | 2018-06-05 | 2018-10-26 | 中国石油大学(华东) | 一种基于复杂场景下的人体关键点检测系统及方法 |
CN110046600A (zh) * | 2019-04-24 | 2019-07-23 | 北京京东尚科信息技术有限公司 | 用于人体检测的方法和装置 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8229164B2 (en) * | 2006-07-10 | 2012-07-24 | Synthesis Corporation | Pedestrian tracking method and pedestrian tracking device |
KR101591779B1 (ko) * | 2009-03-17 | 2016-02-05 | 삼성전자주식회사 | 모션 데이터 및 영상 데이터를 이용한 골격 모델 생성 장치및 방법 |
JP5152231B2 (ja) * | 2010-03-12 | 2013-02-27 | オムロン株式会社 | 画像処理方法および画像処理装置 |
WO2017044550A1 (en) * | 2015-09-11 | 2017-03-16 | Intel Corporation | A real-time multiple vehicle detection and tracking |
US10096132B2 (en) * | 2016-01-27 | 2018-10-09 | Samsung Electronics Co., Ltd. | Method and apparatus for positioning feature point |
CN106295567B (zh) * | 2016-08-10 | 2019-04-12 | 腾讯科技(深圳)有限公司 | 一种关键点的定位方法及终端 |
US10204284B2 (en) * | 2016-12-06 | 2019-02-12 | Datalogic Ip Tech S.R.L. | Object recognition utilizing feature alignment |
CN109101859A (zh) * | 2017-06-21 | 2018-12-28 | 北京大学深圳研究生院 | 使用高斯惩罚检测图像中行人的方法 |
CN107609536A (zh) * | 2017-09-29 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | 信息生成方法和装置 |
CN108205655B (zh) * | 2017-11-07 | 2020-08-11 | 北京市商汤科技开发有限公司 | 一种关键点预测方法、装置、电子设备及存储介质 |
CN108121952B (zh) * | 2017-12-12 | 2022-03-08 | 北京小米移动软件有限公司 | 人脸关键点定位方法、装置、设备及存储介质 |
CN108038469B (zh) * | 2017-12-27 | 2019-10-25 | 百度在线网络技术(北京)有限公司 | 用于检测人体的方法和装置 |
CN108898087B (zh) * | 2018-06-22 | 2020-10-16 | 腾讯科技(深圳)有限公司 | 人脸关键点定位模型的训练方法、装置、设备及存储介质 |
US11301718B2 (en) * | 2018-12-28 | 2022-04-12 | Vizit Labs, Inc. | Systems, methods, and storage media for training a machine learning model |
-
2019
- 2019-04-24 CN CN201910331939.9A patent/CN110046600B/zh active Active
-
2020
- 2020-03-26 US US17/602,969 patent/US20220198816A1/en active Pending
- 2020-03-26 WO PCT/CN2020/081314 patent/WO2020215974A1/zh active Application Filing
- 2020-03-26 JP JP2021559665A patent/JP7265034B2/ja active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106355188A (zh) * | 2015-07-13 | 2017-01-25 | 阿里巴巴集团控股有限公司 | 图像检测方法及装置 |
CN108009466A (zh) * | 2016-10-28 | 2018-05-08 | 北京旷视科技有限公司 | 行人检测方法和装置 |
US10083352B1 (en) * | 2017-05-22 | 2018-09-25 | Amazon Technologies, Inc. | Presence detection and detection localization |
CN108122247A (zh) * | 2017-12-25 | 2018-06-05 | 北京航空航天大学 | 一种基于图像显著性和特征先验模型的视频目标检测方法 |
CN108710868A (zh) * | 2018-06-05 | 2018-10-26 | 中国石油大学(华东) | 一种基于复杂场景下的人体关键点检测系统及方法 |
CN110046600A (zh) * | 2019-04-24 | 2019-07-23 | 北京京东尚科信息技术有限公司 | 用于人体检测的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
JP7265034B2 (ja) | 2023-04-25 |
JP2022528176A (ja) | 2022-06-08 |
CN110046600B (zh) | 2021-02-26 |
CN110046600A (zh) | 2019-07-23 |
US20220198816A1 (en) | 2022-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020215974A1 (zh) | 用于人体检测的方法和装置 | |
US11093560B2 (en) | Stacked cross-modal matching | |
TWI773189B (zh) | 基於人工智慧的物體檢測方法、裝置、設備及儲存媒體 | |
US20220129731A1 (en) | Method and apparatus for training image recognition model, and method and apparatus for recognizing image | |
Nayef et al. | Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt | |
WO2020199931A1 (zh) | 人脸关键点检测方法及装置、存储介质和电子设备 | |
WO2021179205A1 (zh) | 医学图像分割方法、医学图像分割装置及终端设备 | |
CN108205655B (zh) | 一种关键点预测方法、装置、电子设备及存储介质 | |
WO2021227726A1 (zh) | 面部检测、图像检测神经网络训练方法、装置和设备 | |
US11775574B2 (en) | Method and apparatus for visual question answering, computer device and medium | |
CN111488826B (zh) | 一种文本识别方法、装置、电子设备和存储介质 | |
WO2019119505A1 (zh) | 人脸识别的方法和装置、计算机装置及存储介质 | |
WO2020224405A1 (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
CN111898696A (zh) | 伪标签及标签预测模型的生成方法、装置、介质及设备 | |
Wang et al. | FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection | |
US11768876B2 (en) | Method and device for visual question answering, computer apparatus and medium | |
WO2022012179A1 (zh) | 生成特征提取网络的方法、装置、设备和计算机可读介质 | |
CN114612759B (zh) | 视频处理方法、查询视频的方法和模型训练方法、装置 | |
JP2023527615A (ja) | 目標対象検出モデルのトレーニング方法、目標対象検出方法、機器、電子機器、記憶媒体及びコンピュータプログラム | |
US11195024B1 (en) | Context-aware action recognition by dual attention networks | |
CN110781413A (zh) | 兴趣点确定方法及装置、存储介质、电子设备 | |
EP3690673A1 (en) | Method, apparatus, electronic device, and storage medium for image-based data processing | |
US20220391425A1 (en) | Method and apparatus for processing information | |
Wang et al. | Thermal images-aware guided early fusion network for cross-illumination RGB-T salient object detection | |
CN110796108A (zh) | 一种人脸质量检测的方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20796284 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021559665 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20796284 Country of ref document: EP Kind code of ref document: A1 |