WO2020215974A1 - 用于人体检测的方法和装置 - Google Patents

用于人体检测的方法和装置 Download PDF

Info

Publication number
WO2020215974A1
WO2020215974A1 PCT/CN2020/081314 CN2020081314W WO2020215974A1 WO 2020215974 A1 WO2020215974 A1 WO 2020215974A1 CN 2020081314 W CN2020081314 W CN 2020081314W WO 2020215974 A1 WO2020215974 A1 WO 2020215974A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
candidate
body image
candidate human
key points
Prior art date
Application number
PCT/CN2020/081314
Other languages
English (en)
French (fr)
Inventor
鲍慊
刘武
梅涛
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Priority to US17/602,969 priority Critical patent/US20220198816A1/en
Priority to JP2021559665A priority patent/JP7265034B2/ja
Publication of WO2020215974A1 publication Critical patent/WO2020215974A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/759Region-based matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular to methods and devices for human detection.
  • human detection is widely used in various fields such as national defense and military, public transportation, social security and commercial applications.
  • the so-called human body detection refers to detecting and locating the human body in the picture, and returning the coordinates of the human body rectangular frame.
  • Human body detection is the basis of human body posture analysis, human body behavior analysis, etc.
  • the existing human body detection methods are mainly performed through human body detection models.
  • the embodiments of the present application propose methods and devices for human detection.
  • some embodiments of the present application provide a method for human body detection.
  • the method includes: obtaining a set of candidate human body image regions in a target image based on a human body detection model;
  • the candidate human body image area based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image area;
  • the obtained position information determines the candidate human body key points within the contour of the human body;
  • the confidence score of the candidate human body image region is determined according to the sum of the confidence of the candidate human body key points within the human body contour; according to the candidate human body image region
  • the confidence scores of the candidate body image regions in the set are determined from the set of candidate body image regions.
  • the method further includes: converting the determined human body image
  • the candidate human body key points in the area are determined as human body key points.
  • obtaining a set of candidate human body image regions in the target image based on the human body detection model includes: obtaining a set of candidate human body image regions in the target image based on the human body detection model, and obtaining a set of candidate human image regions in the target image
  • the candidate body image area is the confidence level of the body image area
  • the confidence score of the candidate body image area is determined according to the sum of the confidence levels of the candidate body key points in the body contour, including: The sum of the confidence levels of the candidate human body key points and the confidence that the candidate human body image region is the human body image region, and the confidence score of the candidate human body image region is determined.
  • the confidence score of the candidate human body image region is determined according to the sum of the confidence of the candidate human body key points within the human body contour and the confidence that the candidate human body image region is the human body image region, including : According to the preset weights, the sum of the confidence of the candidate human body key points within the contour of the human body, the confidence that the candidate human body image area is the human body image area, and the confidence of the candidate human body key points outside the human body contour The weighted summation is performed on the sum of degrees to obtain the confidence score of the candidate human body image area. Among them, the weight set for the confidence sum of the candidate human body key points within the contour of the human body is greater than that for the preparation outside the contour of the human body. Choose the weight of the sum of the confidence of the key points of the human body.
  • acquiring the position information of the candidate human body key points in the candidate human body image area based on the single-person human body key point detection model includes: acquiring the candidate human body image area based on the convolutional neural network model Location information of key points of the human body.
  • a cascaded network structure is used to determine the position information of key points of the candidate human body in the candidate human body image region in combination with global information and local information in the candidate human body image region.
  • the cascaded network structure is a cascaded network structure obtained by cascading multiple identical network models in sequence, and a full convolutional layer is connected to the end of the last network model in the cascaded network structure to output A heat map corresponding to each candidate human body key point, where the heat map represents the probability that the candidate human body key point exists in each pixel point in the candidate human body image area, where one heat map corresponds to a candidate human body key point.
  • obtaining the position information of the candidate human body key points in the candidate human body image area includes: for each heat map: determining the pixel point with the greatest probability in the candidate human body image area based on the heat map The location is determined as the location of the candidate human body key points corresponding to the heat map.
  • the confidence of the candidate human body key points corresponding to the heat map is the probability corresponding to the pixel point with the greatest probability determined based on the heat map.
  • determining the human body image region from the candidate body image region set according to the confidence score of the candidate body image region in the candidate body image region set includes: determining the confidence in the candidate body image region set The degree score exceeds the preset score threshold, or sorted according to the confidence score from high to low, and ranks first the preset number of candidate body image regions; find the candidate body image region in the set of candidate body image regions and the determined candidate body image region Redundant candidate body image area; remove the found candidate body image area.
  • searching for redundant candidate body image regions in the candidate body image region set and the determined candidate body image region includes: for the candidate body image regions in the candidate body image region set: Determine the contour center distance according to the body contour information in the candidate body image area and the determined body contour information in the candidate body image area; and determine the distance between the candidate body key points included in the candidate body image area and The distance between the candidate human body key points included in the candidate human body image area determines the similarity; in response to the determined contour center distance is less than the preset distance threshold, and the determined similarity is greater than the preset similarity threshold, determine the similarity
  • the candidate body image area is a redundant candidate body image area.
  • some embodiments of the present application provide a device for human body detection, the device comprising: an acquiring unit configured to acquire a set of candidate human body image regions in a target image based on a human body detection model; first The determining unit is configured to, for the candidate human body image region in the candidate human body image region set: obtain the position information and confidence of the candidate human body key points in the candidate human body image region based on a single human body key point detection model; Determine the candidate human body key points in the human body contour according to the human body contour information in the candidate human body image area and the acquired position information; determine the candidate body according to the sum of the confidence of the candidate human body key points in the human body contour The confidence score of the human body image region; the second determining unit is configured to determine the human body image region from the candidate body image region set according to the confidence score of the candidate human body image region in the candidate body image region set.
  • the device further includes: a third determining unit configured to determine the candidate human body key points in the determined human body image area as the human body key points.
  • the acquiring unit is further configured to: acquire a set of candidate human body image regions in the target image based on the human body detection model, and the candidate human body image regions in the candidate body image region set are those of the human body image region. Confidence; and the first determining unit is further configured to: determine the candidate human body according to the sum of the confidence degrees of the candidate human body key points within the contour of the human body and the confidence that the candidate human body image region is a human body image region The confidence score of the image area.
  • the first determining unit is further configured to: according to preset weights, the sum of the confidence levels of the candidate human body key points within the contour of the human body, and the candidate human body image area is the proportion of the human body image area. Confidence and the confidence sum of the candidate human body key points outside the contour of the human body are weighted and summed to obtain the confidence score of the candidate human body image area, where the confidence of the candidate human body key points inside the human body contour The weight set by the sum of degrees is greater than the weight set for the sum of the confidence degrees of the candidate human body key points outside the contour of the human body.
  • the first determining unit is further configured to obtain the position information of the candidate human body key points in the candidate human body image region based on the convolutional neural network model.
  • the first determining unit is further configured to: adopt a cascaded network structure to determine the positions of key points of the candidate body in the candidate body image region in combination with global information and local information in the candidate body image region information.
  • the cascaded network structure is a cascaded network structure obtained by cascading multiple identical network models in sequence, and a full convolutional layer is connected to the end of the last network model in the cascaded network structure,
  • the heat map represents the probability that the candidate human body key point exists in each pixel in the candidate human body image area, where one heat map corresponds to one candidate The key points of the human body.
  • obtaining the position information of the candidate human body key points in the candidate human body image area includes: for each heat map: determining the pixel point with the greatest probability in the candidate human body image area based on the heat map The location is determined as the location of the candidate human body key points corresponding to the heat map.
  • the confidence of the candidate human body key points corresponding to the heat map is the probability corresponding to the pixel point with the greatest probability determined based on the heat map.
  • the second determining unit includes: a determining sub-unit configured to determine that the confidence score in the set of candidate human body image regions exceeds a preset score threshold, or is sorted according to the confidence score from high to low, and ranked The previously preset number of candidate human body image regions; the searching subunit is configured to search for redundant candidate human body image regions in the set of candidate human body image regions and the determined candidate human body image region; removing the subunit is It is configured to remove the found candidate body image area.
  • the search subunit is further configured to: for a candidate human body image region in the candidate body image region set: according to the body contour information in the candidate human body image region and the determined candidate human body image
  • the human body contour information in the area determines the contour center distance; and the similarity is determined based on the distance between the candidate human body key points included in the candidate human body image area and the candidate human body key points included in the determined candidate human body image area ;
  • the candidate human body image area is determined to be a redundant candidate human body image area.
  • some embodiments of the present application provide a device, including: one or more processors; a storage device, on which one or more programs are stored, when the above one or more programs are Execution by two processors, so that the foregoing one or more processors implement the foregoing method of the first aspect.
  • some embodiments of the present application provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method as described in the first aspect is implemented.
  • the method and device for human body detection obtained by the embodiments of the application obtain a set of candidate body image regions in a target image, and then for the candidate body image regions in the set of candidate body image regions: based on single human body key points Detect the model to obtain the position information and confidence of the candidate human body key points in the candidate human body image area; determine the position information in the human body according to the body contour information in the candidate human body image area and the acquired position information of the candidate human body key points The candidate human body key points in the contour; determine the confidence score of the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour, and finally according to the candidate human body in the set of candidate human body image areas
  • the confidence score of the image region determines the human image region from the set of candidate human image regions, providing a human body detection mechanism based on human body contour information and human body key points, and improving the accuracy of human body detection.
  • Figure 1 is a diagram of some exemplary system architectures applicable to this application.
  • Fig. 2 is a flowchart of an embodiment of the method for human detection according to the present application
  • Figure 3A is a schematic diagram of the external structure of a single hourglass (hourglass) model
  • Figure 3B is a schematic diagram of the internal structure of a single hourglass model
  • Figure 3C is a schematic diagram of the cascaded network structure obtained after multiple hourglass models are sequentially cascaded
  • Fig. 4 is a schematic diagram of an application scenario of the method for human detection according to the present application.
  • Fig. 5 is a flowchart of another embodiment of the method for human detection according to the present application.
  • Fig. 6 is a schematic structural diagram of an embodiment of a device for human body detection according to the present application.
  • Fig. 7 is a schematic structural diagram of a computer system suitable for implementing a server or a terminal in some embodiments of the present application.
  • FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for human body detection or the apparatus for human body detection of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various client applications may be installed on the terminal devices 101, 102, 103, such as image collection applications, image processing applications, e-commerce applications, search applications, and so on.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules, or as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides various services, such as a back-end server that provides support for applications installed on the terminal devices 101, 102, and 103.
  • the server 105 may obtain a set of candidate body image regions in the target image based on a human body detection model; For the candidate human body image region in the set of candidate human body image regions: based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image region; according to the candidate human body image The human body contour information in the area and the acquired position information determine the candidate human body key points within the human body contour; the confidence level of the candidate human body image area is determined according to the sum of the confidence of the candidate human body key points within the human body contour Score: According to the confidence score of the candidate body image region in the candidate body image region set, the body image region is determined from the candidate body image region set.
  • the method for human body detection can be executed by the server 105, and can also be executed by the terminal devices 101, 102, 103. Accordingly, the device for human body detection can be set on the server 105. It can also be installed in the terminal devices 101, 102, 103.
  • the server can be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module. There is no specific limitation here.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
  • the method for human detection includes the following steps:
  • Step 201 based on the human body detection model, obtain a set of candidate human body image regions in the target image.
  • the execution subject of the method for human detection may first obtain a set of candidate human image regions in the target image based on the human detection model.
  • the target image can include any image for which human body detection is to be performed.
  • the target image can be directly input to the human detection model, or the target image can be preprocessed first, and the preprocessed target image can be input to the human detection model.
  • the human body detection model can be constructed by using target detection algorithms such as SSD, Faster R-CNN, YOLO, R-FCN, etc.
  • the aforementioned Faster R-CNN, R-FCN, SSD and YOLO are currently widely researched and applied well-known technologies. Repeat it again.
  • the human body detection model must guarantee the recall rate, it is possible to choose algorithms with high human body detection accuracy such as Faster R-CNN.
  • the target image assumes that the target image contains N people.
  • Step 202 For the candidate human body image region in the set of candidate human body image regions: based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image region; Select the human body contour information in the human body image area and the acquired position information to determine the candidate human body key points in the human body contour; determine the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour The confidence score.
  • the above-mentioned execution subject may, for the candidate human body image region in the candidate human body image region set acquired in step 201: obtain the candidate human body in the candidate human body image region based on the single-person body key point detection model The position information and confidence of the key points; determine the candidate human body key points in the human body contour according to the human body contour information in the candidate human body image area and the obtained position information; according to the candidate human body key points in the human body contour The sum of confidences determines the confidence score of the candidate body image region. Since there is only one person in the candidate human body image area obtained by the human body detection model, the single-person human body key point detection model is used to obtain the candidate human body key points.
  • the key point estimation is a regression problem.
  • a model such as convolutional neural network (CNN) can be used to perform regression analysis to determine the position information of the candidate human body key points in the candidate human body image area.
  • the position information may be, for example, Coordinate information. Due to the different scales of the main key points of the human body, for joints such as the head, neck, and shoulders that are more obvious and difficult to make complex movements, a more accurate estimate of these key points can be obtained by directly using a CNN model. For the key points that are easily blocked or invisible, such as hips, wrists, ankles, etc., it is necessary to use local information and increase the receptive field to further obtain the accurate positions of these key points.
  • the front and back cascaded network structure may be a cascaded network structure obtained by sequentially cascading multiple identical network models, for example, may be multiple hourglass models cascaded back and forth.
  • the following is an example of a cascaded network structure formed by cascading multiple hourglass models.
  • Figure 3A is a schematic diagram of the external structure of a single hourglass model
  • Figure 3B is a schematic diagram of the internal structure of a single hourglass model.
  • the hourglass model includes several residual network modules. The entire structure is symmetric. Low-resolution features are obtained through downsampling, high-resolution features are obtained through upsampling, and feature maps are added element by element.
  • two 1*1 full convolutional layers can be connected to output a heat map of each joint (that is, each candidate human body key point), one heat map corresponds to a candidate human body key Point, calculate the difference between the heat map output by the hourglass module and the real heat map of the corresponding key point to obtain the loss function value of the candidate human body key point.
  • multiple hourglass models can be cascaded to obtain a stacked hourglass model, as shown in Figure 3C. Since the heat map represents the probability that the key points of the candidate human body exist in each pixel in the candidate human image area, look for the position of the pixel with the largest probability on each heat map output by the last hourglass model, which is Corresponding to the position coordinates of the candidate human body key point, the maximum probability value is the confidence of the candidate human body key point.
  • the human body contour information may be information used to distinguish the human body from the background, for example, a binary image that distinguishes the human body from the background.
  • the contour information may be performed on the candidate body image regions in the candidate body image region set, and is independent of the determination of the key points.
  • Contour detection can be performed using existing semantic segmentation technology, or it can use an Encoder-Decoder structure similar to the hourglass structure.
  • the end of the network output is a 1*1 full convolutional layer, followed by a normalized index Function loss (softmax loss) layer.
  • Contour detection can provide weak supervision information for key point estimation, so even rough contour detection can meet the demand.
  • the use of an Encoder-Decoder structure similar to the hourglass structure can reduce Requirements for labeling data quality and network complexity.
  • the correspondence relationship between the sum of the confidence levels of the candidate human body key points within the outline of the human body and the confidence score of the candidate human body image region may be preset.
  • the set correspondence relationship may indicate the The greater the sum of the confidence levels of the candidate human body key points in, the higher the confidence score of the candidate human body image region.
  • the confidence that the candidate human image area output by the human detection model is the human image area may also be considered. Since some key points of the human body may fall outside the contour of the human body, the sum of the confidences of the candidate key points of the human body outside the contour of the human body can also be comprehensively considered to further improve the accuracy of the determined confidence score.
  • obtaining a set of candidate human image regions in the target image based on the human detection model includes: obtaining a set of candidate human image regions in the target image based on the human detection model, and preparing Select the candidate body image region in the body image region set as the confidence level of the body image region; and determine the confidence score of the candidate body image region according to the sum of the confidence levels of the candidate body key points within the body contour, including : Determine the confidence score of the candidate human body image region according to the sum of the confidence degrees of the candidate human body key points within the human body contour and the confidence that the candidate human body image region is the human body image region.
  • the confidence score of the candidate human body image region may be determined as According to the preset weights, the weighted sum of the confidence of the candidate human body key points in the contour of the human body and the confidence that the candidate human body image area is the human body image area is weighted and the specific weight can be set according to actual needs , You can also use machine learning methods to get through training.
  • the candidate human body image area is determined based on the sum of the confidence levels of the candidate human body key points within the human body contour and the confidence that the candidate human body image area is a human body image area
  • the confidence score of includes: according to the preset weights, the sum of the confidence of the candidate human body key points within the contour of the human body, the confidence that the candidate human body image region is the human body image region, and the preparation outside the human body contour
  • the confidence sum of the key points of the human body is selected for weighted summation, and the confidence score of the candidate human body image area is obtained.
  • the weight set for the confidence sum of the candidate human body key points within the contour of the human body is greater than the weighted summation.
  • the specific weight can be set according to actual needs, or it can be obtained through training using a machine learning method.
  • Step 203 Determine the human body image area from the candidate human body image area set according to the confidence score of the candidate human body image area in the candidate human body image area set.
  • the above-mentioned execution subject may determine the human body image region from the candidate body image region set according to the confidence score of the candidate body image region in the candidate body image region set determined in step 202.
  • the confidence score of the candidate body image regions can be used to determine from the set of candidate body image regions Out of the human body image area.
  • the method further includes: Determine the candidate human body key points in the determined human body image area as the human body key points.
  • the parameters of the human body detection model in step 201, the parameters of the single human body key point detection model in step 202, the parameters involved in determining the body contour information, and the parameters involved in determining the confidence score can be manually set To be sure, you can also use machine learning methods and get them through training.
  • the sample data used may include sample pictures and annotation information, and the annotation information may include annotated human body image regions or human body key points.
  • a sample picture can be used as input, and the coordinates of key points of the human body can be labeled as output, and one or more of the above parameters can be obtained through training.
  • Preprocessing can include data cleaning and data enhancement.
  • Data cleaning refers to the removal of erroneous and incomplete annotation data in the training data.
  • Multi-person human body key point annotation data usually has human body image area annotation errors and key point annotation errors, including missing coordinates and incorrect coordinates.
  • Data enhancement can be to obtain expanded training data through rotation, size change, cropping, flipping, and changing the brightness of the original training data, and make the model more generalized.
  • the picture can be cropped without changing the aspect ratio of the picture, and the picture is adjusted to a size of 256*256 after the edge of the picture is zero-filled. While data enhancement is performed on the image, corresponding operations such as rotation, scale change, and flipping must be performed on the labeled data.
  • the method provided in the foregoing embodiments of the present application provides a human body detection mechanism based on human body contour information and human body key points, which improves the accuracy of human body detection.
  • FIG. 4 is a schematic diagram of an application scenario of the method for human detection according to this embodiment.
  • the server 301 inputs the target image 302 into the human body detection model 303 to obtain the candidate body image region set in the target image 302, and then for the candidate body image region in the candidate body image region set: It inputs the single-person human body key point detection model 304 to obtain the position information and confidence of the candidate human body key points in the candidate human body image area, and according to the body contour information and the acquired position information in the candidate human body image area Determine the candidate human body key points in the contour of the human body, and then determine the confidence score of the candidate human body image region according to the sum of the confidence of the candidate human body key points in the human body contour, and finally according to the candidate body image region set The confidence score of the candidate body image region is determined from the set of candidate body image regions.
  • FIG. 5 shows a flow 400 of still another embodiment of a method for human detection.
  • the process 400 of the method for human detection includes the following steps:
  • Step 401 Obtain a set of candidate human body image regions in the target image based on the human body detection model.
  • the execution subject of the method for human detection may first obtain a set of candidate human image regions in the target image based on the human detection model.
  • Step 402 For the candidate human body image region in the set of candidate human body image regions: based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image region; Select the human body contour information in the human body image area and the acquired position information to determine the candidate human body key points in the human body contour; determine the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour The confidence score.
  • the above-mentioned execution subject may, for the candidate human body image region in the candidate human body image region set acquired in step 401: obtain the candidate human body in the candidate human body image region based on the single-person body key point detection model The position information and confidence of the key points; determine the candidate human body key points in the human body contour according to the human body contour information in the candidate human body image area and the obtained position information; according to the candidate human body key points in the human body contour The sum of confidences determines the confidence score of the candidate body image region.
  • Step 403 Determine that the confidence score in the set of candidate body image regions exceeds a preset score threshold, or sort the candidate body image regions from high to low according to the confidence score, and rank the candidate body image regions in the front by a preset number.
  • the above-mentioned execution subject may determine that the confidence score in the set of candidate human image regions exceeds a preset score threshold, or sort the confidence scores from high to low, and rank first the preset number of candidate human image regions .
  • the score threshold and the preset number can be set according to actual needs. For example, the preset number can be 1, and the candidate human body image region with the highest confidence score is determined.
  • Step 404 Search for redundant candidate human body image regions in the set of candidate human body image regions and the determined candidate human body image region.
  • the above-mentioned execution subject may search for redundant candidate human body image regions in the candidate human body image region set and the candidate human body image region determined in step 403.
  • redundant detection frames inevitably appear.
  • the redundancy can be determined by the keypoint similarity (OKS) and/or the contour center distance.
  • searching for redundant candidate body image regions in the candidate body image region set and the determined candidate body image region includes: Alternative human body image area: Determine the contour center distance according to the human body contour information in the candidate human body image area and the determined human body contour information in the candidate human body image area; and according to the candidate human body included in the candidate human body image area The key point, and the determined distance between the candidate human body key points included in the determined candidate human body image area, determine the similarity; in response to the determined contour center distance is less than the preset distance threshold, and the determined similarity is greater than the preset The similarity threshold determines that the candidate human body image area is a redundant candidate human body image area.
  • Step 405 Remove the found candidate body image area.
  • the above-mentioned execution subject may remove the candidate human body image region found in step 404.
  • the estimation of two persons who are close together will not be eliminated, which improves the accuracy of human body detection.
  • the whole image can be further improved. The accuracy of the estimation of key points of the human body.
  • step 401 and step 402 are basically the same as the operations of step 201 and step 202, and will not be repeated here.
  • the candidate body image region set with a higher confidence score is determined.
  • this application provides an embodiment of a device for human detection.
  • the device embodiment corresponds to the method embodiment shown in FIG.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 500 for human body detection in this embodiment includes: an acquisition unit 501, a first determination unit 502, and a second determination unit 503.
  • the acquiring unit is configured to acquire a set of candidate human body image regions in the target image based on the human body detection model
  • the first determining unit is configured to determine the candidate body image region in the set of candidate human body image regions: based on the single The human body key point detection model, to obtain the position information and confidence of the candidate human body key points in the candidate human body image area; determine within the human body contour according to the body contour information in the candidate human body image area and the acquired position information
  • the candidate human body key points determine the confidence score of the candidate human body image area according to the sum of the confidence levels of the candidate human body key points within the contour of the human body
  • the second determining unit is configured to set the candidate human body image areas according to The confidence score of the candidate body image area in, determines the body image area from the set of candidate body image areas.
  • the specific processing of the acquiring unit 501, the first determining unit 502, and the second determining unit 503 of the apparatus 500 for human body detection may refer to step 201, step 202, and step 203 in the embodiment corresponding to FIG. 2.
  • the apparatus further includes: a third determining unit configured to determine the candidate human body key points in the determined human body image area as the human body key points.
  • the acquiring unit is further configured to acquire a set of candidate human body image regions in the target image and a candidate human body image in the set of candidate human body image regions based on the human body detection model
  • the region is the confidence level of the human body image region
  • the first determining unit is further configured to: according to the sum of the confidence levels of the candidate human body key points within the human body contour and the confidence level that the candidate human body image region is the human body image region , Determine the confidence score of the candidate body image region.
  • the first determining unit is further configured to: according to preset weights, the sum of the confidence levels of the candidate human body key points within the human body contour and the candidate human body image
  • the area is the confidence of the human body image area and the confidence of the candidate human body key points outside the contour of the human body.
  • the weighted summation is performed to obtain the confidence score of the candidate human body image area.
  • the weight set for the sum of the confidences of the key points of the human body is greater than the weight set for the sum of the confidences of the candidate key points of the human body outside the contour of the human body.
  • the first determining unit is further configured to obtain the position information of the candidate human body key points in the candidate human body image region based on the convolutional neural network model.
  • the first determining unit is further configured to: adopt a cascaded network structure, and determine the candidate body image region in combination with global information and local information in the candidate body image region. Select the location information of the key points of the human body.
  • the cascaded network structure is a cascaded network structure obtained by cascading multiple identical network models in sequence, and the end of the last network model in the cascaded network structure is connected
  • the heat map represents the probability that the candidate human body key point exists in each pixel point in the candidate human body image area. Among them, one The heat map corresponds to a key point of the candidate human body.
  • obtaining the position information of the key points of the candidate human body in the candidate human body image area includes: for each heat map: determining the candidate human body image area based on the heat map The position of the pixel with the greatest probability is determined as the position of the candidate human body key point corresponding to the heat map.
  • the confidence of the candidate human body key points corresponding to the heat map is the probability corresponding to the pixel point with the greatest probability determined based on the heat map.
  • the second determining unit includes: a determining sub-unit configured to determine that the confidence score in the set of candidate human body image regions exceeds a preset score threshold, or according to the higher confidence score To the low order, the preset number of candidate human body image regions are ranked first; the search sub-unit is configured to search for redundant candidate human body image regions in the set of candidate human body image regions and the determined candidate human body image region ; The removal subunit is configured to remove the found candidate body image area.
  • the search subunit is further configured to: for a candidate human body image region in the set of candidate human body image regions: determine according to the body contour information in the candidate human body image region Determine the contour center distance based on the human body contour information in the candidate human body image area; and determine the distance between the candidate human body key points included in the candidate human body image area and the candidate human body key points included in the determined candidate human body image area. The distance between the two determines the similarity; in response to the determined contour center distance is less than the preset distance threshold, and the determined similarity is greater than the preset similarity threshold, determine that the candidate body image area is a redundant candidate body image area .
  • the device provided by the foregoing embodiment of the present application obtains a set of candidate human body image regions in the target image based on a human body detection model; for candidate body image regions in the candidate body image region set: detection based on single human body key points Model, obtain the position information and confidence of the candidate human body key points in the candidate human body image area; determine the candidate human body key within the human body contour according to the body contour information in the candidate human body image area and the obtained position information Point; determine the confidence score of the candidate human body image region according to the sum of the confidence of the candidate human body key points within the human body contour; according to the confidence score of the candidate human image region in the set of candidate human body image regions, from The human body image region is determined from the set of candidate human body image regions, which provides a human body detection mechanism based on human body contour information and human body key points, and improves the accuracy of human body detection.
  • FIG. 7 shows a schematic structural diagram of a computer system 600 suitable for implementing a server or a terminal in the embodiments of the present application.
  • the server or terminal shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU) 601, which can be based on a program stored in a read-only memory (ROM) 602 or a program loaded from a storage part 608 into a random access memory (RAM) 603 And perform various appropriate actions and processing.
  • the RAM 603 also stores various programs and data required for the operation of the system 600.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following components can be connected to the I/O interface 605: including an input part 606 such as a keyboard, a mouse, etc.; including an output part 607 such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; including storage such as a hard disk Part 608; and a communication part 609 including a network interface card such as a LAN card, a modem, etc.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that the computer program read from it is installed into the storage part 608 as needed.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 609, and/or installed from the removable medium 611.
  • the central processing unit (CPU) 601 the above-mentioned functions defined in the method of the present application are executed.
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable medium or any combination of the two.
  • the computer-readable medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable medium, and the computer-readable medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • the computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional The procedural programming language-such as C language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to pass Internet connection.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application can be implemented in software or hardware.
  • the described unit may also be provided in the processor.
  • a processor includes an acquiring unit, a first determining unit, and a second determining unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the acquisition unit can also be described as "configured to acquire a set of candidate human body image regions in the target image based on a human body detection model. Unit".
  • the present application also provides a computer-readable medium, which may be included in the device described in the above-mentioned embodiments; or it may exist alone without being assembled into the device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the device obtains a set of candidate body image regions in the target image based on the body detection model;
  • the candidate human body image area in the human body image area set Based on the single-person human body key point detection model, obtain the position information and confidence of the candidate human body key points in the candidate human body image area; according to the candidate human body image area
  • the human body contour information and the acquired position information determine the candidate human body key points in the human body contour; determine the confidence score of the candidate human body image area according to the sum of the confidence of the candidate human body key points in the human body contour;
  • the confidence score of the candidate body image region in the candidate body image region set is determined from the candidate body image region set.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了用于人体检测的方法和装置。该方法的一具体实施方式包括:获取目标图像中的备选人体图像区域集合;对于备选人体图像区域集合中的备选人体图像区域:获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。该实施方式提供了一种基于人体轮廓信息和人体关键点的人体检测机制,提高了人体检测的准确度。

Description

用于人体检测的方法和装置
本专利申请要求于2019年04月24日提交的、申请号为201910331939.9、发明名称为“用于人体检测的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及用于人体检测的方法和装置。
背景技术
随着计算机技术的飞快发展,数字图像处理技术发展越来越迅猛,已经深入到生活的方方面面。人体检测作为数字图像处理技术领域重要研究课题之一,被广泛应用于国防军事、公共交通、社会安全和商业应用等各个领域。所谓人体检测,是指检测并定位图片中的人体,返回人体矩形框坐标。人体检测是人体姿态分析、人体行为分析等的基础,现有的人体检测方法主要是通过人体检测模型进行。
发明内容
本申请实施例提出了用于人体检测的方法和装置。
第一方面,本申请的一些实施例提供了一种用于人体检测的方法,该方法包括:基于人体检测模型,获取目标图像中的备选人体图像区域集合;对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
在一些实施例中,根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域之后,方法还包括:将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
在一些实施例中,基于人体检测模型,获取目标图像中的备选人体图像区域集合,包括:基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体 图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数,包括:根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
在一些实施例中,根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数,包括:根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
在一些实施例中,基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息,包括:基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
在一些实施例中,采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位置信息。
在一些实施例中,级联网络结构为多个相同的网络模型依次级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
在一些实施例中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
在一些实施例中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
在一些实施例中,根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域,包括:确定备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;查找备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;去除查找到的备选人体图像区域。
在一些实施例中,查找备选人体图像区域集合中与确定出的备选人体图像区域的 冗余的备选人体图像区域,包括:对于备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区域。
第二方面,本申请的一些实施例提供了一种用于人体检测的装置,该装置包括:获取单元,被配置成基于人体检测模型,获取目标图像中的备选人体图像区域集合;第一确定单元,被配置成对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;第二确定单元,被配置成根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
在一些实施例中,装置还包括:第三确定单元,被配置成将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
在一些实施例中,获取单元,进一步被配置成:基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及第一确定单元,进一步被配置成:根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
在一些实施例中,第一确定单元,进一步被配置成:根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
在一些实施例中,第一确定单元进一步被配置成:基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
在一些实施例中,第一确定单元进一步被配置成:采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位 置信息。
在一些实施例中,所述级联网络结构为多个相同的网络模型依次级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
在一些实施例中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
在一些实施例中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
在一些实施例中,第二确定单元,包括:确定子单元,被配置成确定备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;查找子单元,被配置成查找备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;去除子单元,被配置成去除查找到的备选人体图像区域。
在一些实施例中,查找子单元,进一步被配置成:对于备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区域。
第三方面,本申请的一些实施例提供了一种设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当上述一个或多个程序被上述一个或多个处理器执行,使得上述一个或多个处理器实现如第一方面上述的方法。
第四方面,本申请的一些实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面上述的方法。
本申请实施例提供的用于人体检测的方法和装置,通过获取目标图像中的备选人体图像区域集合,而后对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的备选人体关键点的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信 度之和确定该备选人体图像区域的置信度分数,最后根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域,提供了一种基于人体轮廓信息和人体关键点的人体检测机制,提高了人体检测的准确度。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请的一些可以应用于其中的示例性系统架构图;
图2是根据本申请的用于人体检测的方法的一个实施例的流程图;
图3A为单个hourglass(沙漏)模型的外部结构示意图;
图3B为单个hourglass模型的内部结构示意图;
图3C为多个hourglass模型依次级联后获得的级联网络结构示意图;
图4是根据本申请的用于人体检测的方法的应用场景的一个示意图;
图5是根据本申请的用于人体检测的方法的又一个实施例的流程图;
图6是根据本申请的用于人体检测的装置的一个实施例的结构示意图;
图7是适于用来实现本申请的一些实施例的服务器或终端的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的用于人体检测的方法或用于人体检测的装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收 或发送消息等。终端设备101、102、103上可以安装有各种客户端应用,例如图像采集类应用、图像处理类应用、电子商务类应用、搜索类应用等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上安装的应用提供支持的后台服务器,服务器105可以基于人体检测模型,获取目标图像中的备选人体图像区域集合;对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
需要说明的是,本申请实施例所提供的用于人体检测的方法可以由服务器105执行,也可以由终端设备101、102、103执行,相应地,用于人体检测的装置可以设置于服务器105中,也可以设置于终端设备101、102、103中。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的用于人体检测的方法的一个实施例的流程200。该用于人体检测的方法,包括以下步骤:
步骤201,基于人体检测模型,获取目标图像中的备选人体图像区域集合。
在本实施例中,用于人体检测的方法执行主体(例如图1所示的服务器或终端)可以首先基于人体检测模型,获取目标图像中的备选人体图像区域集合。目标图像可以包括任何待对其进行人体检测的图像。可以直接将目标图像输入至人体检测模型,也可以先对目标图像进行预处理,将预处理后的目标图像输入至人体检测模型。人体 检测模型可以采用SSD、Faster R-CNN、YOLO、R-FCN等目标检测算法进行构建,上述Faster R-CNN、R-FCN、SSD和YOLO是目前广泛研究和应用的公知技术,在此不再赘述。考虑到人体检测模型要保证召回率,因此,可以选择Faster R-CNN等人体检测准确度高的算法。目标图像假设目标图像中包含N个人,经过人体检测模型得到的备选人体图像区域集合中,可以包括M个备选人体图像区域,在保证召回率的前提下M>=N。
步骤202,对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数。
在本实施例中,上述执行主体可以对于步骤201中获取的备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数。由于经过人体检测模型得到的备选人体图像区域内只有一个人,因此使用单人人体关键点检测模型获取备选人体关键点。
关键点估计是一个回归问题,可以使用卷积神经网络(CNN)等模型,进行回归分析,以确定出备选人体图像区域中备选人体关键点的位置信息,其中所述位置信息例如可以是坐标信息。由于人体的主要关键点的尺度不同,对于头、颈、肩等比较明显和不易做出复杂动作的关节部位,直接使用一个CNN模型可以得到这些关键点的较准确估计。而对于臀、手腕、脚踝等容易被遮挡或不可见的关键点,则需要利用局部信息、增大感受野来进一步获得这些关键点的准确位置。可以使用hourglass(沙漏)模型,或前后级联的网络结构,结合全局信息和局部信息来提高关键点估计的准确度。其中,所述前后级联的网络结构可以为多个相同的网络模型依次级联获得的级联网络结构,例如可以是多个hourglass模型前后级联。
下面以多个hourglass模型前后级联形成的级联网络结构为例对进行说明。
图3A为单个hourglass模型的外部结构示意图,图3B为单个hourglass模型的内部结构示意图。hourglass模型内部包括若干个残差网络模块,整个结构是对称的,通过降采样获取低分辨率特征、通过上采样获得高分辨率特征,以及逐元素相加特征图 (feature map)。在hourglass模型的末端,可以连接两个1*1的全卷积层,以输出获得各个关节(即各个备选人体关键点)的热力图(heat map),一个热力图对应一个备选人体关键点,计算hourglass模块输出的热力图与对应的关键点真实热力图的差值即可获得该备选人体关键点的损失函数值。为了提高关键点估计的准确度,可以将多个hourglass模型依次级联获得堆叠的hourglass模型,如图3C所示。由于热力图表示备选人体关键点在备选人体图像区域中的每个像素点存在的概率,在最后一个hourglass模型输出的每个热力图上寻找具有最大概率的像素点所在的位置,即为对应的备选人体关键点的位置坐标,该最大概率值即为该备选人体关键点的置信度。
在本实施例中,人体轮廓信息可以是用于区分人体与背景的信息,例如区分人体与背景的二值化图像。轮廓信息可以是针对备选人体图像区域集合中的备选人体图像区域进行的,与关键点的确定相互独立。轮廓检测可以使用现有的语义分割技术进行,也可以使用与沙漏型结构相似的编码-解码(Encoder-Decoder)结构,网络输出的末端是1*1的全卷积层,接归一化指数函数损失(softmax loss)层。轮廓检测可以为关键点估计提供弱监督信息,因此即使是较粗糙的轮廓检测也能够满足需求,相比于语义分割,使用与沙漏型结构相似的编码-解码(Encoder-Decoder)结构可以降低了对标注数据质量和网络复杂度的要求。
在本实施例中,可以预先设置在人体轮廓内的备选人体关键点的置信度之和与该备选人体图像区域的置信度分数的对应关系,例如,设置的对应关系可以指示在人体轮廓内的备选人体关键点的置信度之和越大,该备选人体图像区域的置信度分数越高。此外,在确定备选人体图像区域的置信度分数时,还可以考虑人体检测模型输出的该备选人体图像区域为人体图像区域的置信度。由于一些人体关键点可能落在人体轮廓外,还可以综合考虑在人体轮廓外的备选人体关键点的置信度之和,以进一步提高确定出的置信度分数的准确度。
在本实施例的一些可选实现方式中,基于人体检测模型,获取目标图像中的备选人体图像区域集合,包括:基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数,包括:根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
在本实现方式中,根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数可以 是根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度进行加权求和,具体的权重可以根据实际需要进行设置,也可以使用机器学习方法,通过训练得到。
在本实施例的一些可选实现方式中,根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数,包括:根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。在本实现方式中,具体的权重可以根据实际需要进行设置,也可以使用机器学习方法,通过训练得到。
步骤203,根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
在本实施例中,上述执行主体可以根据步骤202中确定出的备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。由于人体检测模型为了保证召回率,备选人体图像区域集合中不可避免地会出现冗余的人体图像区域,所以可以通过备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
在本实施例的一些可选实现方式中,根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域之后,方法还包括:将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
在本实施例中,步骤201中人体检测模型的参数、步骤202中单人人体关键点检测模型的参数、确定人体轮廓信息时涉及的参数以及确定置信度分数时涉及的参数,可以通过人工设置确定,也可以使用机器学习方法,通过训练得到。训练时,使用的样本数据可以包括样本图片与标注信息,标注信息可以包括标注的人体图像区域或人体关键点。作为示例,可以将样本图片作为输入,标注的人体关键点的坐标作为输出,训练得到以上参数中的一项或多项。
训练前,可以先对样本数据进行预处理,预处理可以包括数据清洗和数据增强。数据清洗是指去除训练数据中错误的、不完整的标注数据,多人人体关键点标注数据通常存在人体图像区域标注错误和关键点标注错误,包括坐标缺失、坐标错误。数据增强可以是对原训练数据经过旋转、尺寸变化、剪裁、翻转、改变光线亮度等手段获 得扩充的训练数据、并且使模型有更强的泛化性。在数据增强的剪裁(crop)步骤中,可以在保证图片长宽比不变的情况下对图片剪裁,在图片边缘补零之后将图片调整到256*256的尺寸。在对图片进行数据增强的同时,标注数据也要进行对应的旋转、尺度变化、翻转等操作。
本申请的上述实施例提供的方法提供了一种基于人体轮廓信息和人体关键点的人体检测机制,提高了人体检测的准确度。
继续参见图4,图4是根据本实施例的用于人体检测的方法的应用场景的一个示意图。在图4的应用场景中,服务器301将目标图像302输入人体检测模型303,获取目标图像302中的备选人体图像区域集合,而后对于备选人体图像区域集合中的备选人体图像区域:将其输入单人人体关键点检测模型304,获取该备选人体图像区域中备选人体关键点的位置信息和置信度,并根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点,而后根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数,最后根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
进一步参考图5,其示出了用于人体检测的方法的又一个实施例的流程400。该用于人体检测的方法的流程400,包括以下步骤:
步骤401,基于人体检测模型,获取目标图像中的备选人体图像区域集合。
在本实施例中,用于人体检测的方法执行主体(例如图1所示的服务器或终端)可以首先基于人体检测模型,获取目标图像中的备选人体图像区域集合。
步骤402,对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数。
在本实施例中,上述执行主体可以对于步骤401中获取的备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数。
步骤403,确定备选人体图像区域集合中置信度分数超过预设分数阈值,或按照 置信度分数从高到低排序,排在前预设数目的备选人体图像区域。
在本实施例中,上述执行主体可以确定备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域。分数阈值和预设数目可以根据实际需要进行设置,例如,预设数目可以是1,确定出置信度分数最高的备选人体图像区域。
步骤404,查找备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域。
在本实施例中,上述执行主体可以查找备选人体图像区域集合中与步骤403中确定出的备选人体图像区域的冗余的备选人体图像区域。上述的人体检测模型为了保证召回率,不可避免地会出现冗余检测框。可以通过关键点相似度(object keypoint similarity,OKS)和/或轮廓中心距离确定是否冗余。
在本实施例的一些可选实现方式中,查找备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域,包括:对于备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区域。
步骤405,去除查找到的备选人体图像区域。
在本实施例中,上述执行主体可以去除步骤404中查找到的备选人体图像区域。通过去除同一个人的冗余估计的同时,不会消除比较靠近的两个人的估计,提高了人体检测的准确度,后续利用人体检测结果进行人体关键点估计时,也可以进一步提高整张图像多人人体关键点估计的准确度。
在本实施例中,步骤401、步骤402的操作与步骤201、步骤202的操作基本相同,在此不再赘述。
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于人体检测的方法的流程400中通过确定备选人体图像区域集合中置信度分数较高的备选人体图像区域,而后去除与这些区域冗余的区域,进一步提高了确定出人体图像区域的准确度。
进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种用于人体检测的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图6所示,本实施例的用于人体检测的装置500包括:获取单元501、第一确定单元502、第二确定单元503。其中,获取单元,被配置成基于人体检测模型,获取目标图像中的备选人体图像区域集合;第一确定单元,被配置成对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;第二确定单元,被配置成根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
在本实施例中,用于人体检测的装置500的获取单元501、第一确定单元502、第二确定单元503的具体处理可以参考图2对应实施例中的步骤201、步骤202和步骤203。
在本实施例的一些可选实现方式中,装置还包括:第三确定单元,被配置成将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
在本实施例的一些可选实现方式中,获取单元,进一步被配置成:基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及第一确定单元,进一步被配置成:根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
在本实施例的一些可选实现方式中,第一确定单元,进一步被配置成:根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
在本实施例的一些可选实现方式中,第一确定单元进一步被配置成:基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
在本实施例的一些可选实现方式中,第一确定单元进一步被配置成:采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位置信息。
在本实施例的一些可选实现方式中,所述级联网络结构为多个相同的网络模型依次 级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
在本实施例的一些可选实现方式中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
在本实施例的一些可选实现方式中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
在本实施例的一些可选实现方式中,第二确定单元,包括:确定子单元,被配置成确定备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;查找子单元,被配置成查找备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;去除子单元,被配置成去除查找到的备选人体图像区域。
在本实施例的一些可选实现方式中,查找子单元,进一步被配置成:对于备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区域。
本申请的上述实施例提供的装置,通过基于人体检测模型,获取目标图像中的备选人体图像区域集合;对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域,提供了一种基于人体轮廓信息和人体关键点的人体检测机制,提高了人体检测的准确度。
下面参考图7,其示出了适于用来实现本申请实施例的服务器或终端的计算机系统600的结构示意图。图6示出的服务器或终端仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件可以连接至I/O接口605:包括诸如键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读介质或者是上述两者的任意组合。计算机可读介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的 程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如C语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、第一确定单元和第二确定单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“被配置成基于人体检测模型,获取目标图像中的备选人体图像区域集合的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该装置:基于人体检测模型,获取目标图像中的备选人体图像区域集合;对于备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的位置信息确定在人体轮廓内的备选人体关键点;根 据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;根据备选人体图像区域集合中的备选人体图像区域的置信度分数,从备选人体图像区域集合中确定出人体图像区域。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (24)

  1. 一种用于人体检测的方法,所述方法包括:
    基于人体检测模型,获取目标图像中的备选人体图像区域集合;
    对于所述备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的备选人体关键点的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;
    根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域。
  2. 根据权利要求1所述的方法,其中,所述根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域之后,所述方法还包括:
    将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
  3. 根据权利要求1所述的方法,其中,所述基于人体检测模型,获取目标图像中的备选人体图像区域集合,包括:
    基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及
    所述根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数,包括:
    根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
  4. 根据权利要求3所述的方法,其中,所述根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数,包括:
    根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
  5. 根据权利要求1所述的方法,其中,所述基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息,包括:
    基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
  6. 根据权利要求5所述的方法,其中,采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位置信息。
  7. 根据权利要求6所述的方法,其中,所述级联网络结构为多个相同的网络模型依次级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
  8. 根据权利要求7所述的方法,其中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:
    对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
  9. 根据权利要求8所述的方法,其中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
  10. 根据权利要求1-9中任一项所述的方法,其中,所述根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域,包括:
    确定所述备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;
    查找所述备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;
    去除查找到的备选人体图像区域。
  11. 根据权利要求10所述的方法,其中,所述查找所述备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域,包括:
    对于所述备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区 域。
  12. 一种用于人体检测的装置,所述装置包括:
    获取单元,被配置成基于人体检测模型,获取目标图像中的备选人体图像区域集合;
    第一确定单元,被配置成对于所述备选人体图像区域集合中的备选人体图像区域:基于单人人体关键点检测模型,获取该备选人体图像区域中备选人体关键点的位置信息和置信度;根据该备选人体图像区域中的人体轮廓信息和所获取的备选人体关键点的位置信息确定在人体轮廓内的备选人体关键点;根据在人体轮廓内的备选人体关键点的置信度之和确定该备选人体图像区域的置信度分数;
    第二确定单元,被配置成根据所述备选人体图像区域集合中的备选人体图像区域的置信度分数,从所述备选人体图像区域集合中确定出人体图像区域。
  13. 根据权利要求12所述的装置,其中,所述装置还包括:
    第三确定单元,被配置成将确定出的人体图像区域中的备选人体关键点确定为人体关键点。
  14. 根据权利要求12所述的装置,其中,所述获取单元,进一步被配置成:
    基于人体检测模型,获取目标图像中的备选人体图像区域集合,以及备选人体图像区域集合中的备选人体图像区域为人体图像区域的置信度;以及
    所述第一确定单元,进一步被配置成:
    根据在人体轮廓内的备选人体关键点的置信度之和以及该备选人体图像区域为人体图像区域的置信度,确定该备选人体图像区域的置信度分数。
  15. 根据权利要求14所述的装置,其中,所述第一确定单元,进一步被配置成:
    根据预先设置的权重,对在人体轮廓内的备选人体关键点的置信度之和、该备选人体图像区域为人体图像区域的置信度、在人体轮廓外的备选人体关键点的置信度之和进行加权求和,得到该备选人体图像区域的置信度分数,其中,针对在人体轮廓内的备选人体关键点的置信度之和设置的权重,大于针对在人体轮廓外的备选人体关键点的置信度之和设置的权重。
  16. 根据权利要求12所述的装置,其中,所述第一确定单元进一步被配置成:
    基于卷积神经网络模型获取该备选人体图像区域中备选人体关键点的位置信息。
  17. 根据权利要求16所述的装置,其中,所述第一确定单元进一步被配置成:
    采用级联网络结构,结合该备选人体图像区域中的全局信息和局部信息确定该备选人体图像区域中备选人体关键点的位置信息。
  18. 根据权利要求17所述的装置,其中,所述级联网络结构为多个相同的网络模型 依次级联获得的级联网络结构,在所述级联网络结构中最后一个网络模型的末端连接有全卷积层,以输出各个备选人体关键点对应的热力图,所述热力图表示备选人体关键点在所述备选人体图像区域中的每个像素点存在的概率,其中,一个热力图对应一个备选人体关键点。
  19. 根据权利要求18所述的装置,其中,获取该备选人体图像区域中备选人体关键点的位置信息,包括:
    对于每一热力图:基于该热力图确定所述备选人体图像区域中具有最大概率的像素点所在的位置,将该位置确定为与该热力图对应的备选人体关键点的位置。
  20. 根据权利要求19所述的装置,其中,对于每一热力图,与该热力图对应的备选人体关键点的置信度为基于该热力图确定的具有最大概率的像素点所对应的概率。
  21. 根据权利要求12-20中任一项所述的装置,其中,所述第二确定单元,包括:
    确定子单元,被配置成确定所述备选人体图像区域集合中置信度分数超过预设分数阈值,或按照置信度分数从高到低排序,排在前预设数目的备选人体图像区域;
    查找子单元,被配置成查找所述备选人体图像区域集合中与确定出的备选人体图像区域的冗余的备选人体图像区域;
    去除子单元,被配置成去除查找到的备选人体图像区域。
  22. 根据权利要求21所述的装置,其中,所述查找子单元,进一步被配置成:
    对于所述备选人体图像区域集合中的备选人体图像区域:根据该备选人体图像区域中的人体轮廓信息与确定出的备选人体图像区域中的人体轮廓信息确定轮廓中心距离;并根据该备选人体图像区域包括的备选人体关键点,与确定出的备选人体图像区域包括的备选人体关键点之间的距离确定相似度;响应于确定出的轮廓中心距离小于预设距离阈值,且确定出的相似度大于预设相似度阈值,确定该备选人体图像区域为冗余的备选人体图像区域。
  23. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-11中任一所述的方法。
  24. 一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1-11中任一所述的方法。
PCT/CN2020/081314 2019-04-24 2020-03-26 用于人体检测的方法和装置 WO2020215974A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/602,969 US20220198816A1 (en) 2019-04-24 2020-03-26 Method and apparatus for detecting body
JP2021559665A JP7265034B2 (ja) 2019-04-24 2020-03-26 人体検出用の方法及び装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910331939.9A CN110046600B (zh) 2019-04-24 2019-04-24 用于人体检测的方法和装置
CN201910331939.9 2019-04-24

Publications (1)

Publication Number Publication Date
WO2020215974A1 true WO2020215974A1 (zh) 2020-10-29

Family

ID=67278895

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081314 WO2020215974A1 (zh) 2019-04-24 2020-03-26 用于人体检测的方法和装置

Country Status (4)

Country Link
US (1) US20220198816A1 (zh)
JP (1) JP7265034B2 (zh)
CN (1) CN110046600B (zh)
WO (1) WO2020215974A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046600B (zh) * 2019-04-24 2021-02-26 北京京东尚科信息技术有限公司 用于人体检测的方法和装置
CN110490140A (zh) * 2019-08-21 2019-11-22 上海眼控科技股份有限公司 屏幕显示状态判别方法、装置、计算机设备和存储介质
CN110705365A (zh) * 2019-09-06 2020-01-17 北京达佳互联信息技术有限公司 一种人体关键点检测方法、装置、电子设备及存储介质
CN112699706A (zh) * 2019-10-22 2021-04-23 广州弘度信息科技有限公司 跌倒检测方法、系统和存储介质
CN110889376A (zh) * 2019-11-28 2020-03-17 创新奇智(南京)科技有限公司 一种基于深度学习的安全帽佩戴检测系统及方法
CN111027495A (zh) * 2019-12-12 2020-04-17 京东数字科技控股有限公司 用于检测人体关键点的方法和装置
CN111079695B (zh) * 2019-12-30 2021-06-01 北京华宇信息技术有限公司 一种人体关键点检测与自学习方法及装置
CN111507806B (zh) * 2020-04-23 2023-08-29 北京百度网讯科技有限公司 虚拟试鞋方法、装置、设备以及存储介质
CN111861998A (zh) * 2020-06-24 2020-10-30 浙江大华技术股份有限公司 一种人体图像质量评估方法、装置、系统和计算机设备
WO2021204037A1 (zh) * 2020-11-12 2021-10-14 平安科技(深圳)有限公司 人脸关键点的检测方法、装置、存储介质及电子设备
CN112528850B (zh) * 2020-12-11 2024-06-04 北京百度网讯科技有限公司 人体识别方法、装置、设备和存储介质
CN112613382B (zh) * 2020-12-17 2024-04-30 浙江大华技术股份有限公司 对象完整性的确定方法及装置、存储介质、电子装置
CN113011242A (zh) * 2020-12-31 2021-06-22 杭州拓深科技有限公司 一种仰卧起坐计数方法、装置、电子装置和存储介质
CN113947635A (zh) * 2021-10-15 2022-01-18 北京百度网讯科技有限公司 图像定位方法、装置、电子设备以及存储介质
CN117290537B (zh) * 2023-09-28 2024-06-07 腾讯科技(深圳)有限公司 图像搜索方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355188A (zh) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 图像检测方法及装置
CN108009466A (zh) * 2016-10-28 2018-05-08 北京旷视科技有限公司 行人检测方法和装置
CN108122247A (zh) * 2017-12-25 2018-06-05 北京航空航天大学 一种基于图像显著性和特征先验模型的视频目标检测方法
US10083352B1 (en) * 2017-05-22 2018-09-25 Amazon Technologies, Inc. Presence detection and detection localization
CN108710868A (zh) * 2018-06-05 2018-10-26 中国石油大学(华东) 一种基于复杂场景下的人体关键点检测系统及方法
CN110046600A (zh) * 2019-04-24 2019-07-23 北京京东尚科信息技术有限公司 用于人体检测的方法和装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229164B2 (en) * 2006-07-10 2012-07-24 Synthesis Corporation Pedestrian tracking method and pedestrian tracking device
KR101591779B1 (ko) * 2009-03-17 2016-02-05 삼성전자주식회사 모션 데이터 및 영상 데이터를 이용한 골격 모델 생성 장치및 방법
JP5152231B2 (ja) * 2010-03-12 2013-02-27 オムロン株式会社 画像処理方法および画像処理装置
WO2017044550A1 (en) * 2015-09-11 2017-03-16 Intel Corporation A real-time multiple vehicle detection and tracking
US10096132B2 (en) * 2016-01-27 2018-10-09 Samsung Electronics Co., Ltd. Method and apparatus for positioning feature point
CN106295567B (zh) * 2016-08-10 2019-04-12 腾讯科技(深圳)有限公司 一种关键点的定位方法及终端
US10204284B2 (en) * 2016-12-06 2019-02-12 Datalogic Ip Tech S.R.L. Object recognition utilizing feature alignment
CN109101859A (zh) * 2017-06-21 2018-12-28 北京大学深圳研究生院 使用高斯惩罚检测图像中行人的方法
CN107609536A (zh) * 2017-09-29 2018-01-19 百度在线网络技术(北京)有限公司 信息生成方法和装置
CN108205655B (zh) * 2017-11-07 2020-08-11 北京市商汤科技开发有限公司 一种关键点预测方法、装置、电子设备及存储介质
CN108121952B (zh) * 2017-12-12 2022-03-08 北京小米移动软件有限公司 人脸关键点定位方法、装置、设备及存储介质
CN108038469B (zh) * 2017-12-27 2019-10-25 百度在线网络技术(北京)有限公司 用于检测人体的方法和装置
CN108898087B (zh) * 2018-06-22 2020-10-16 腾讯科技(深圳)有限公司 人脸关键点定位模型的训练方法、装置、设备及存储介质
US11301718B2 (en) * 2018-12-28 2022-04-12 Vizit Labs, Inc. Systems, methods, and storage media for training a machine learning model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355188A (zh) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 图像检测方法及装置
CN108009466A (zh) * 2016-10-28 2018-05-08 北京旷视科技有限公司 行人检测方法和装置
US10083352B1 (en) * 2017-05-22 2018-09-25 Amazon Technologies, Inc. Presence detection and detection localization
CN108122247A (zh) * 2017-12-25 2018-06-05 北京航空航天大学 一种基于图像显著性和特征先验模型的视频目标检测方法
CN108710868A (zh) * 2018-06-05 2018-10-26 中国石油大学(华东) 一种基于复杂场景下的人体关键点检测系统及方法
CN110046600A (zh) * 2019-04-24 2019-07-23 北京京东尚科信息技术有限公司 用于人体检测的方法和装置

Also Published As

Publication number Publication date
JP7265034B2 (ja) 2023-04-25
JP2022528176A (ja) 2022-06-08
CN110046600B (zh) 2021-02-26
CN110046600A (zh) 2019-07-23
US20220198816A1 (en) 2022-06-23

Similar Documents

Publication Publication Date Title
WO2020215974A1 (zh) 用于人体检测的方法和装置
US11093560B2 (en) Stacked cross-modal matching
TWI773189B (zh) 基於人工智慧的物體檢測方法、裝置、設備及儲存媒體
US20220129731A1 (en) Method and apparatus for training image recognition model, and method and apparatus for recognizing image
Nayef et al. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt
WO2020199931A1 (zh) 人脸关键点检测方法及装置、存储介质和电子设备
WO2021179205A1 (zh) 医学图像分割方法、医学图像分割装置及终端设备
CN108205655B (zh) 一种关键点预测方法、装置、电子设备及存储介质
WO2021227726A1 (zh) 面部检测、图像检测神经网络训练方法、装置和设备
US11775574B2 (en) Method and apparatus for visual question answering, computer device and medium
CN111488826B (zh) 一种文本识别方法、装置、电子设备和存储介质
WO2019119505A1 (zh) 人脸识别的方法和装置、计算机装置及存储介质
WO2020224405A1 (zh) 图像处理方法、装置、计算机可读介质及电子设备
CN111898696A (zh) 伪标签及标签预测模型的生成方法、装置、介质及设备
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
US11768876B2 (en) Method and device for visual question answering, computer apparatus and medium
WO2022012179A1 (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
CN114612759B (zh) 视频处理方法、查询视频的方法和模型训练方法、装置
JP2023527615A (ja) 目標対象検出モデルのトレーニング方法、目標対象検出方法、機器、電子機器、記憶媒体及びコンピュータプログラム
US11195024B1 (en) Context-aware action recognition by dual attention networks
CN110781413A (zh) 兴趣点确定方法及装置、存储介质、电子设备
EP3690673A1 (en) Method, apparatus, electronic device, and storage medium for image-based data processing
US20220391425A1 (en) Method and apparatus for processing information
Wang et al. Thermal images-aware guided early fusion network for cross-illumination RGB-T salient object detection
CN110796108A (zh) 一种人脸质量检测的方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20796284

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021559665

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20796284

Country of ref document: EP

Kind code of ref document: A1