WO2022095907A1

WO2022095907A1 - Associating method for detecting blocks of multiple parts of human body, apparatus, electronic device, and storage medium

Info

Publication number: WO2022095907A1
Application number: PCT/CN2021/128487
Authority: WO
Inventors: Jiang Zhang; Jingsong HAO
Original assignee: Zhejiang Dahua Technology Co., Ltd.
Priority date: 2020-11-03
Filing date: 2021-11-03
Publication date: 2022-05-12
Also published as: CN112507786B; CN112507786A

Abstract

Disclosed are an associating method for detecting blocks of multiple parts of a human body, an apparatus, an electronic device, and a storage medium. The method includes: obtaining a video stream, wherein the video stream includes a plurality of video images; performing a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block; determining a first associating relationship between the face detecting block and the head-and-shoulder detecting block; determining a second associating relationship between the head-and-shoulder detecting block and the human body detecting block; and obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.

Description

ASSOCIATING METHOD FOR DETECTING BLOCKS OF MULTIPLE PARTS OF HUMAN BODY, APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

CROSS REFERENCE

The present application claims foreign priority of China Patent Application No. 202011208177.2 filed on November 03, 2020, in the China National Intellectual Property Administration, the entire contents of which are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure relates to the field of computer vision technologies, and in particular to an associating method for detecting blocks of multiple parts of a human body, an apparatus, an electronic device, and a storage medium.

BACKGROUND

Tracking technology has been one of hot spots in the field of computer vision research. Core tasks of tracking technology include a prediction of target trajectories and a matching of multiple target history detection locations. However, current techniques on target block matching are almost always about matching the same detection position belonging to the same person in two consecutive video images, and rarely involve associated matching of multiple detection positions corresponding to the same person.

In related techniques, an intersection over union between a face detecting block and a human body detecting block is calculated to determine a face detecting block and a human body detecting block corresponding to the same person, and the face detecting block and its corresponding human body detecting block are associated. However, detecting blocks corresponding to multiple people in a video image may overlap with each other, which may lead to an intersection of the detecting blocks corresponding to face, head-shoulders, and the human body, resulting in a case of incorrect association of multiple detecting blocks. Moreover, the size of the detecting blocks corresponding to multiple targets varies greatly, and the intersection over union will be small even for exact matches, which is not representative and distinguishable. In addition, there may be cases where detecting blocks are missing.

No effective solution has been proposed for the problem that the detecting blocks corresponding to multiple parts of the human body cannot be accurately associated in the related technology.

SUMMARY OF THE DISCLOSURE

The present disclosure provides an associating method for detecting blocks of multiple parts of a human body, an apparatus, an electronic device, and a storage medium, in order to solve the problem of being unable to accurately associate detecting blocks of multiple parts of a human body in the prior art.

In a first aspect, the present disclosure provides an associating method for detecting blocks of multiple parts of a human body, including: obtaining a video stream, wherein the video stream includes a plurality of video images; performing a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images; determining a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block; determining a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block; and obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.

In some embodiments, the determining the first associating relationship between the face detecting block and the head-and-shoulder detecting block according to the preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images includes: in condition of the head-and-shoulder detecting block including a plurality of head-and-shoulder detecting blocks and the face detecting block including a plurality of face detecting blocks, for each head-and-shoulder detecting block: obtaining a first intersection over union between the head-and-shoulder detecting block and each face detecting block; determining a candidate face detecting block set corresponding to the head-and-shoulder detecting block according to a preset relative position condition, the first intersection over union, and a preset intersection over union threshold; wherein the candidate face detecting block set includes a plurality of candidate face detecting blocks; determining an expanded face block corresponding to each candidate face detecting block in the candidate face detecting block set according to the preset expansion ratio; obtaining a second intersection over union between each expanded face block and the head-and-shoulder detecting block to obtain a plurality of second intersection over unions, and sorting the plurality of second intersection over unions to obtain a first sorting result; and determining a target face detecting block corresponding to the head-and-shoulder detecting block from the candidate face detecting block set according to the first sorting result, and constructing the first associating relationship between the head-and-shoulder detecting block and the target face detecting block.

In some embodiments, before the determining the first associating relationship between the face detecting block and the head-and-shoulder detecting block according to the preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images, the method further includes: obtaining a plurality of first historical video images that meet a first preset restriction condition; wherein the plurality of first historical video images include a face detecting block and a head-and-shoulder detecting block that are associated to each other; the first preset restriction condition includes a condition that there is only one face detecting block inside each corresponding head-and-shoulder detecting block in the plurality of first historical video images; obtaining expansion ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according to the face detecting block and the head-and-shoulder detecting block associated with each other in the plurality of first historical video images; wherein the expansion ratio data includes upward expansion ratio data, downward expansion ratio data, leftward expansion ratio data, and rightward expansion ratio data; and performing statistical analysis on the expansion ratio data to obtain a preset expansion ratio; wherein the preset expansion ratio includes a preset upward expansion ratio, a preset downward expansion ratio, a preset leftward expansion ratio, and a preset rightward expansion ratio.

In some embodiments, the performing statistical analysis on the expansion ratio data to obtain the preset expansion ratio includes: taking a mean value of a distribution corresponding to the downward expansion ratio data as a preset downward expansion ratio, taking a minimum variance value of a distribution corresponding to the upward expansion ratio data as a preset upward expansion ratio, taking a minimum variance value of a distribution corresponding to the leftward expansion ratio data as a preset leftward expansion ratio, and taking a minimum variance value of a distribution corresponding to the rightward expansion ratio data as a preset rightward expansion ratio.

In some embodiments, the determining the second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to the preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images includes: in condition of the human body detecting block including a plurality of human body detecting blocks, for each human body detecting block: obtaining a first intersection area ratio between a human body detecting block and each head-and-shoulder detecting block; wherein the first intersection area ratio indicates that a ratio of an intersection area of the human body detecting block and the head-and-shoulder detecting block to an area of the head-and-shoulder detecting block; determining a candidate head-and-shoulder detecting block set corresponding to the human body detecting block according to the first intersection area ratio and a preset area ratio threshold; wherein the candidate head-and-shoulder detecting block set includes a plurality of candidate head-and-shoulder detecting blocks; removing at least one candidate head-and-shoulder detecting block that does not meet the preset relative ratio from the candidate head-and-shoulder detecting block set, and obtaining a removal-processed candidate head-and-shoulder detecting block set; obtaining a plurality of first intersection area ratios according to the first intersection area ratio corresponding to each candidate head-and-shoulder detecting block in the removal-processed candidate head-and-shoulder detecting block set, and sorting the plurality of first intersection area ratios to obtain a second sorting result; and determining a target head-and-shoulder detecting block corresponding to the human body detecting block from the removal-processed candidate head-and-shoulder detecting block set according to the second sorting result, and constructing the second associating relationship between the human body detecting block and the target head-and-shoulder detecting block.

In some embodiments, before the determining the second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to the preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images, the method further includes: obtaining a plurality of second historical video images that meet a second preset restriction condition; wherein the second historical video image includes a head-and-shoulder detecting block and a human body detecting block that are associated to each other; the second preset restriction condition includes a condition that there is only one head-and-shoulder detecting block inside each corresponding human body detecting block in the plurality of second historical video images; obtaining a second intersection area ratio corresponding to the head-and-shoulder detecting block and the human body detecting block in each of the second historical video images, and obtaining intersection area ratio data corresponding to the plurality of second historical video images; and performing statistical analysis on the intersection area ratio data to obtain the preset area ratio threshold.

In some embodiments, after the obtaining the plurality of second historical video images that meet a second preset restriction condition, the method further includes: obtaining relative ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according a plurality of sets of associated head-and-shoulder detecting blocks and human body detecting blocks in the plurality of second historical video images; and performing statistical analysis on the relative ratio data to obtain the preset relative ratio; wherein the preset relative ratio includes a preset upper edge relative ratio, a preset width relative ratio, and a preset height relative ratio.

In some embodiments, before the obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship, the method further includes: performing a mis-associating investigating process on the first associating relationship according to a preset first mis-associating processing rule to obtain a first mis-associating detecting block, and re-determining a target detecting block corresponding to the first mis-associating detecting block; wherein the first mis-associating detecting block includes a face detecting block that is repeatedly associated and a plurality of head-and-shoulder detecting blocks that are correspondingly associated with the face detecting block; and performing a mis-associating investigating process on the second associating relationship according to a preset second mis-associating processing rule to obtain a second mis-associating detecting block, and re-determining a target detecting block corresponding to the second mis-associating detecting block; wherein the second mis-associating detecting block includes a head-and-shoulder detecting block that is repeatedly associated and a plurality of human body detecting blocks that are correspondingly associated with the head-and-shoulder detecting block.

In some embodiments, the performing the mis-associating investigating process on the first associating relationship according to the preset first mis-associating processing rule to obtain the first mis-associating detecting block, and re-determining the target detecting block corresponding to the first mis-associating detecting block includes: for each video image, traversing the plurality of face detecting blocks and the plurality of head-and-shoulder detecting blocks that are in the first associating relationship in the video image, and obtaining a duplicate association face detecting block that has an associating relationship with a plurality of head-and-shoulder detecting blocks; in condition of the duplicate association face detecting block including a plurality of duplicate association face detecting blocks, for each duplicate association face detecting block, obtaining a third intersection over union between the duplicate association face detecting block and each head-and-shoulder detecting block associated with the duplicate association face detecting block, and obtaining a plurality of third intersection over unions; and sorting the plurality of third intersection over unions to obtain a third sorting result; and according to the third sorting result, taking the head-and-shoulder detecting block corresponding to a largest third intersection over union as the target head-and-shoulder detecting block corresponding to the duplicate association face detecting block; and for each of head-and-shoulder detecting blocks corresponding to other third intersection over unions, re-determining the target face detecting block according to other candidate face detecting blocks in the candidate face detecting block set corresponding to a head-and-shoulder detecting block.

In some embodiments, the performing the mis-associating investigating process on the second associating relationship according to the preset second mis-associating processing rule to obtain the second mis-associating detecting block, and re-determining a target detecting block corresponding to the second mis-associating detecting block include: for each video image, traversing the plurality of human body detecting blocks and the plurality of head-and-shoulder detecting blocks that are in the first associating relationship in each video image, and obtaining a duplicate association head-and-shoulder detecting block that has an associating relationship with a plurality of human body detecting blocks; in condition of the duplicate association head-and-shoulder detecting block including a plurality of duplicate association head-and-shoulder detecting blocks, for each duplicate association head-and-shoulder detecting block, obtaining a fourth intersection over union between a duplicate association head-and-shoulder detecting block and each human body detecting block associated with the duplicate association head-and-shoulder detecting block, and obtaining a plurality of fourth intersection over unions; and sorting the plurality of fourth intersection over unions to obtain a fourth sorting result; and according to the fourth sorting result, taking the human body detecting block corresponding to a largest fourth intersection over union as the target human body detecting block corresponding to the duplicate association head-and-shoulder detecting block; and for each of the human body detecting blocks corresponding to other fourth intersection over unions, re-determining the target head-and-shoulder detecting block according to other candidate head-and-shoulder detecting blocks in the candidate head-and-shoulder detecting block set corresponding to a human body detecting block.

In a second aspect, the present disclosure provides an associating apparatus for detecting blocks of multiple parts of a human body, including: a data obtaining module, configured to obtain a video stream, wherein the video stream includes a plurality of video images; a target detecting module, configured to perform a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images; a first associating module, configured to determine a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block; a second associating module, configured to determine a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block; and an association determining module, configured to obtain the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.

In a third aspect, the present disclosure provides an electronic device, including a memory and a processor; wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the associating method for detecting blocks of multiple parts of a human body as described above.

In a fourth aspect, the present disclosure provides a storage medium, storing a computer program; wherein the computer program is configured to perform the associating method for detecting blocks of multiple parts of a human body as described above when executed.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings described here are intended to provide a further understanding of the present disclosure and constitute a part of the present disclosure. The illustrative embodiments and descriptions of the present disclosure are intended to explain the present disclosure, and do not constitute an improper limitation of the present disclosure.

FIG. 1 is a flowchart of an associating method for detecting blocks of multiple parts of a human body according to an embodiment of the present disclosure.

FIG. 2 is a flowchart of determining a first associating relationship according to an embodiment of the present disclosure.

FIG. 3 is a schematic view of determining a target face detecting block corresponding to a human shoulder detecting block according to an embodiment of the present disclosure.

FIG. 4 is a flowchart of obtaining a preset expansion ratio according to an embodiment of the present disclosure.

FIGS. 5a-5d are schematic views of statistical results of expansion ratios data according to embodiments of the present disclosure.

FIG. 6 is a flowchart of determining a second associating relationship according to an embodiment of the present disclosure.

FIG. 7 is a schematic view of determining a target face detecting block corresponding to a human shoulder detecting block according to another embodiment of the present disclosure.

FIG. 8 is a flowchart of obtaining a preset area ratio threshold according to an embodiment of the present disclosure.

FIG. 9 is a flowchart of performing mis-associating processing on a first associating relationship and a second associating relationship according to an embodiment of the present disclosure.

FIG. 10 is a flowchart of performing mis-associating processing on a first associating relationship according to a first mis-associating processing role according to an embodiment of the present disclosure.

FIG. 11 is a flowchart of performing mis-associating processing on a second associating relationship according to a second mis-associating processing role according to an embodiment of the present disclosure.

FIG. 12 is a block view of a hardware structure of a terminal for an associating method for detecting blocks of multiple parts of a human body according to an embodiment of the present disclosure.

FIG. 13 is a block view of an associating apparatus for detecting blocks of multiple parts of a human body according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

To make the purpose, technical solutions and advantages of the present disclosure clearer, the following describes and illustrates the present disclosure with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only intended to explain the present disclosure, and are not to limit the present disclosure. Based on the embodiments provided in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the scope of the present disclosure. In addition, it can also be understood that although the efforts made in this development process may be complicated and lengthy, some changes in design, manufacture or production based on the technical content disclosed in the present disclosure are only conventional technical means to those skilled in the art related to the content disclosed in the present disclosure and should not be construed as inadequate for the content disclosed in the present disclosure.

Reference to “embodiments” in the present disclosure means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present disclosure. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described in the present disclosure can be combined with other embodiments without conflict.

Unless otherwise defined, the technical terms or scientific terms involved in the present disclosure shall have the usual meanings understood by those skilled in the technical field to which the present disclosure belongs. Similar words such as “a” , “one” , “a kind of” , “the” and the like referred to in the present disclosure do not mean a quantitative limit, and may mean a singular or plural number. The terms “include” , “comprise” , “have” and any of their variations involved in the present disclosure are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that contains a series of steps or modules (units) is not limited to the listed steps or units, but may also include steps or units that are not listed, or may also include other steps or units that are inherent to those processes, methods, products, or devices. Similar words such as “connected” , “linked” , “coupled” and the like referred to in the present disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. As used in the present disclosure, “multiple” refers to greater than or equal to two. “And/or” describes the associating relationship of the associated objects, indicating that there can be three types of relationships. For example, “A and/or B” can mean that: A alone exists, A and B exist at the same time, and B exists alone. The terms “first” , “second” , “third” , etc. referred to in the present disclosure merely distinguish similar objects, and do not represent a specific order of objects.

The various technologies described in the present disclosure can, but are not limited to, be applied to fields such as intelligent video surveillance, human-computer interaction, robotic visual navigation, virtual reality, and medical diagnosis.

FIG. 1 is a flowchart of an associating method for detecting blocks of multiple parts of a human body according to an embodiment of the present disclosure. As shown in FIG. 1, the process may include operations at blocks as followed.

At block S110: obtaining a video stream, wherein the video stream includes a plurality of video images.

At block S120: performing a target detecting process on the plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images.

At block S130: determining a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block.

It should be noted that a target face detecting block corresponding to each head-and-shoulder detecting block may be screened from multiple face detecting blocks in the video image according to the preset expansion ratio. Also, a target head-and-shoulder detecting block corresponding to each face detecting block may be screened from multiple head-and-shoulder detecting blocks in the video image according to the preset expansion ratio. This embodiment does not limit a subject to be screened, as long as the first associating relationship between the face detecting block and the head-and-shoulder detecting block in the video image can be determined.

At block S140: determining a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block.

It should be noted that a target head-and-shoulder detecting block corresponding to each human body detecting block may be screened from multiple head-and-shoulder detecting blocks in the video image according to the preset relative ratio. Also, a target human body detecting block corresponding to each head-and-shoulder detecting block may be screened from multiple human body detecting blocks in the video image according to the preset relative ratio. This embodiment does not limit a subject to be screened, as long as the second associating relationship between the human body detecting block and the head-and-shoulder detecting block can be determined.

At block S150: obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.

Through the above steps S110 to S150, a video stream is obtained, wherein the video stream includes a plurality of video images; the target detecting process is performed on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images; a first associating relationship between the face detecting block and the head-and-shoulder detecting block is determined according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images, wherein the preset expansion ratio represents a ratio between a distance of the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block; a second associating relationship between the head-and-shoulder detecting block and the human body detecting block is determined according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block; the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images are obtained according to the first associating relationship and the second associating relationship. In this embodiment, by incorporating the preset expansion ratio and the preset relative ratio as a priori knowledge into a determining conditions of an associating scheme, the accuracy and reliability of the associating relationship are improved, thereby avoiding the situation that multiple detecting blocks are incorrectly associated due to the interlocking detecting blocks corresponding to the face, head and shoulder, and human body, etc., and solving the problem that in the related technology, the detecting blocks corresponding to multiple parts of the human body cannot be accurately associated.

In some embodiments, FIG. 2 is a flowchart of determining a first associating relationship according to an embodiment of the present disclosure. As shown in FIG. 2, the process includes operations at blocks as followed.

At block S210: for each head-and-shoulder detecting block: obtaining a first intersection over union between a head-and-shoulder detecting block and each face detecting block.

For each head-and-shoulder detecting block, the first intersection over union between the head-and-shoulder detecting block and each face detecting block is calculated, and the calculation formula of intersection over union (IOU) is as follows.

where ROI _T represents an image sub-region at which a face detecting block is located, ROI _G represents an image sub-region at which a head-and-shoulder detecting block is located, S _XY represents an area of the image sub-region at which the face detecting block is located, and S _MN represents an area of the image sub-region at which the head-and-shoulder detecting block is located.

At block S220: determining a candidate face detecting block set corresponding to the head-and-shoulder detecting block according to a preset relative position condition, the first intersection over union and a preset intersection over union threshold; wherein the candidate face detecting block set includes a plurality of candidate face detecting blocks.

The preset relative position condition indicates that the face detecting block is inside the head-and-shoulder detecting block. The preset intersection over union threshold is an intersection over union threshold obtained from statistical analysis of a large amount of historical video data.

Specifically, when a face detecting block is inside the head-and-shoulder detecting block, and the first intersection over union corresponding to the face detecting block is greater than the preset intersection over union threshold, then the face detecting block is regarded as the candidate face detecting block corresponding to the head-and-shoulder detecting block.

At block S230: determining an expanded face block corresponding to each candidate face detecting block in the candidate face detecting block set according to the preset expansion ratio.

At block S240: obtaining a second intersection over union between each expanded face block and the head-and-shoulder detecting block to obtain a plurality of second intersection over unions, and sorting the plurality of second intersection over unions to obtain a first sorting result.

At block S250: determining a target face detecting block corresponding to the head-and-shoulder detecting block from the candidate face detecting block set according to the first sorting result, and constructing the first associating relationship between the head-and-shoulder detecting block and the target face detecting block.

Further, according to the first sorting result, a face detecting block corresponding to a largest second intersection over union may be taken as the target face detecting block corresponding to the head-and-shoulder detecting block. Also, multiple face detecting blocks corresponding to top-ranked second intersection over unions may be taken as target face detecting blocks corresponding to the head-and-shoulder detecting block, and a priority is set according to the second intersection over unions corresponding to the multiple target face detecting blocks. The greater the second intersection over union, the higher the priority. A target face detecting block corresponding to a high priority is prioritized. When the target face detecting block corresponding to the high priority does not meet a condition, another target face detecting block corresponding to a lower priority may be selected.

FIG. 3 is a schematic view of determining a target face detecting block corresponding to a human shoulder detecting block according to an embodiment of the present disclosure. As shown in FIG. 3, an interfering face detecting block in the candidate face detecting block set may be excluded according to the second intersection over union between the head-and-shoulder detecting block and the expanded face block corresponding to the candidate face detecting block, so as to determine the target face detecting block corresponding to the human shoulder detecting block.

In the above steps S210 to S250, a candidate face detecting block set corresponding to the head-and-shoulder detecting block is determined according to a preset relative position condition, the first intersection over union and a preset intersection over union threshold; an expanded face block corresponding to each candidate face detecting block in the candidate face detecting block set is determined according to the preset expansion ratio; and a target face detecting block corresponding to the head-and-shoulder detecting block is determined from the candidate face detecting block set according to a second intersection over union between each expanded face block and the head-and-shoulder detecting block. In this embodiment, according to the second intersection over union between the expanded face block corresponding to the candidate face detecting block and the head-and-shoulder detecting block, interference of the interfering face detecting block located at an inner edge of the human shoulder detecting block can be excluded, thereby accurately determining the target face detecting block corresponding to the human shoulder detecting block, and further improving the association accuracy of the detecting block corresponding to multiple parts of the human body.

Furthermore, for a person of which a face detecting block is detected but a corresponding head-and-shoulder detecting block is not detected, the face detecting block may be expanded with the preset expansion ratio, thereby constructing an artificial head-and-shoulder detecting block. Therefore, the head-and-shoulder detecting block constructed manually can be used for the associated detection of the head-and-shoulder detecting block and the human body detecting block.

In some embodiments, FIG. 4 is a flowchart of obtaining a preset expansion ratio according to an embodiment of the present disclosure. As shown in FIG. 4, the process includes operations at blocks as followed.

At block S410: obtaining a plurality of first historical video images that meet a first preset restriction condition; wherein the plurality of first historical video images include a face detecting block and a head-and-shoulder detecting block that are associated to each other; the first preset restriction condition includes a condition that there is only one face detecting block inside each corresponding head-and-shoulder detecting block in the plurality of first historical video images.

It should be noted that the associated face detecting block and head-and-shoulder detecting block in the plurality of first historical video images belong to the same person, and the intersection over union between the associated face detecting block and head-and-shoulder detecting block is greater than the preset intersection over union threshold.

At block S420: obtaining expansion ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according to the face detecting block and the head-and-shoulder detecting block associated with each other in the plurality of first historical video images; wherein the expansion ratio data includes upward expansion ratio data, downward expansion ratio data, leftward expansion ratio data, and rightward expansion ratio data.

At block S430: performing statistical analysis on the expansion ratio data to obtain a preset expansion ratio; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block.

The preset expansion ratio includes a preset upward expansion ratio, a preset downward expansion ratio, a preset leftward expansion ratio, and a preset rightward expansion ratio. The preset upward expansion ratio represents a ratio between an upward expansion distance of the face detecting block and a height of the face detecting block. The preset downward expansion ratio represents a ratio between a downward expansion distance of the face detecting block and the height of the face detecting block. The preset leftward expansion ratio represents a ratio between a leftward expansion distance of the face detecting block and a width of the face detecting block. The preset rightward expansion ratio represents a ratio between a rightward expansion distance of the face detecting block and the width of the face detecting block.

Through the above steps S410 to S430, a plurality of first historical video images that meet a first preset restriction condition are obtained; wherein the plurality of first historical video images include a face detecting block and a head-and-shoulder detecting block that are associated to each other; expansion ratio data corresponding to the face detecting block and the head-and-shoulder detecting block is obtained according to the face detecting block and the head-and-shoulder detecting block associated with each other in the plurality of first historical video images; and statistical analysis is performed on the expansion ratio data to obtain a preset expansion ratio. In this embodiment, by obtaining the expansion ratio data corresponding to the associated face detecting block and head-and-shoulder detecting block in the plurality of first historical video images, and performing statistical analysis on the expansion ratio data, a relatively accurate preset expansion ratio can be obtained. This further improves the accuracy of the association detection between the face detecting block and the head-and-shoulder detecting block in the video image.

In some embodiments, step S430 includes: taking a mean value of a distribution corresponding to the downward expansion ratio data as the preset downward expansion ratio, taking a minimum variance value of a distribution corresponding to the upward expansion ratio data as the preset upward expansion ratio, taking a minimum variance value of a distribution corresponding to the leftward expansion ratio data as the preset leftward expansion ratio, and taking a minimum variance value of a distribution corresponding to the rightward expansion ratio data as the preset rightward expansion ratio.

FIG. 5b is a schematic view of a statistical result of the downward expansion ratio data. As shown in FIG. 5b, the distribution corresponding to the downward expansion ratio data is more in line with the Gaussian distribution. Therefore, the mean value of the corresponding distribution of the downward expansion ratio data may be taken as the preset downward expansion ratio.

FIG. 5a, FIG. 5c, and FIG. 5d are schematic views of statistical results of the upward expansion ratio data, the leftward expansion ratio data, and the rightward expansion ratio data, respectively. As shown in FIG. 5a, FIG. 5c and FIG. 5d, the distributions corresponding to the upward expansion ratio data, the leftward expansion ratio data, and the rightward expansion ratio data do not conform to any standard distribution. Therefore, the minimum variance values of the distributions corresponding to the upward expansion ratio data may be taken as the preset upward expansion ratio, the minimum variance values of the distributions corresponding to the leftward expansion ratio data may be taken as the preset leftward expansion ratio, and the minimum variance values of the distributions corresponding to the rightward expansion ratio data may be taken as the preset rightward expansion ratio.

Through this embodiment, the mean value of the corresponding distribution of the downward expansion ratio data may be taken as the preset downward expansion ratio, the minimum variance values of the distributions corresponding to the upward expansion ratio data may be taken as the preset upward expansion ratio, the minimum variance values of the distributions corresponding to the leftward expansion ratio data may be taken as the preset leftward expansion ratio, and the minimum variance values of the distributions corresponding to the rightward expansion ratio data may be taken as the preset rightward expansion ratio. This embodiment can more accurately determine the preset expansion ratio corresponding to each expansion ratio data by targeting the characteristics of the corresponding distribution of the upward expansion ratio data, the downward expansion ratio data, the leftward expansion ratio data, and the rightward expansion ratio data, such that the detection accuracy of the association between the face detecting block and the head-and-shoulder detecting block in the video image can be further improved.

Further, the preset expansion ratio may be [0.91, 1.13, 1.37, 1.38] , where the preset upward expansion ratio is 0.91, the preset downward expansion ratio is 1.13, and the preset leftward expansion ratio is 1.37, and the preset rightward expansion ratio is 1.38.

In some embodiments, FIG. 6 is a flowchart of determining a second associating relationship according to an embodiment of the present disclosure. As shown in FIG. 6, the process includes operations at blocks as followed.

At block 610: for each human body detecting block: obtaining a first intersection area ratio between a human body detecting block and each head-and-shoulder detecting block; wherein the first intersection area ratio indicates that a ratio of an intersection area of the human body detecting block and the head-and-shoulder detecting block to an area of the head-and-shoulder detecting block.

At block S620: determining a candidate head-and-shoulder detecting block set corresponding to the human body detecting block according to the first intersection area ratio and a preset area ratio threshold; wherein the candidate head-and-shoulder detecting block set includes a plurality of candidate head-and-shoulder detecting blocks.

Specifically, when the first intersection area ratio corresponding to a head-and-shoulder detecting block is greater than the preset area ratio threshold, the head-and-shoulder detecting block is taken as the candidate head-and-shoulder detecting block corresponding to the human body detecting block.

At block S630: removing at least one candidate head-and-shoulder detecting block that does not meet the preset relative ratio from the candidate head-and-shoulder detecting block set, and obtaining a removal-processed candidate head-and-shoulder detecting block set.

At block S640: obtaining a plurality of first intersection area ratios according to the first intersection area ratio corresponding to each candidate head-and-shoulder detecting block in the removal-processed candidate head-and-shoulder detecting block set, and sorting the plurality of first intersection area ratios to obtain a second sorting result.

At block S650: determining a target head-and-shoulder detecting block corresponding to the human body detecting block from the removal-processed candidate head-and-shoulder detecting block set according to the second sorting result, and constructing the second associating relationship between the human body detecting block and the target head-and-shoulder detecting block.

Further, according to the second sorting result, a head-and-shoulder detecting block corresponding to a largest first intersection area ratio may be taken as the target head-and-shoulder detecting block corresponding to the human body detecting block. Also, multiple head-and-shoulder detecting blocks corresponding to top-ranked first intersection area ratios may be taken as target head-and-shoulder detecting blocks corresponding to the human body detecting block, and a priority is set according to the first intersection area ratios corresponding to the multiple target head-and-shoulder detecting blocks. The higher the first intersection area ratio, the higher the priority. A target head-and-shoulder detecting block corresponding to a high priority is prioritized. When the target head-and-shoulder detecting block corresponding to the high priority does not meet a condition, another target head-and-shoulder detecting block corresponding to a lower priority may be selected.

FIG. 7 is a schematic view of determining a target face detecting block corresponding to a human shoulder detecting block according to another embodiment of the present disclosure. As shown in FIG. 7, according to the first intersection area ratio corresponding to the candidate head-and-shoulder detecting block in the removal-processed candidate head-and-shoulder detecting block set, an interfering head-and-shoulder detecting block in the candidate head-and-shoulder detecting block set may be excluded, thereby determining the target head-and-shoulder detecting block corresponds to the human body detecting block.

In this embodiment, at least one candidate head-and-shoulder detecting block that does not meet the preset relative ratio is removed from the candidate head-and-shoulder detecting block set, and the target head-and-shoulder detecting block corresponding to the human body detecting block is determined according to the second sorting result of first intersection area ratios corresponding to multiple candidate head-and-shoulder detecting blocks in the removal-processed candidate head-and-shoulder detecting block set. By combining the prior data and the detection method of gradually narrowing the detection range, the human body detecting block corresponding to the target head-and-shoulder detecting block can be quickly and accurately determined. In addition, according to the first intersection area ratio corresponding to the candidate head-and-shoulder detecting block in the removal-processed candidate head-and-shoulder detecting block set, the interference head-and-shoulder detecting block in the candidate head-and-shoulder detecting block set is excluded, and the target head-and-shoulder detecting block corresponding to the human body detecting block can be determined more accurately, thereby further improving the association accuracy of the detecting block corresponding to multiple parts of the human body.

In some embodiments, FIG. 8 is a flowchart of obtaining a preset area ratio threshold according to an embodiment of the present disclosure. As shown in FIG. 8, the process includes operations at blocks as followed.

At block S810: obtaining a plurality of second historical video images that meet a second preset restriction condition; wherein the plurality of second historical video images includes a head-and-shoulder detecting block and a human body detecting block that are associated to each other; the second preset restriction condition includes a condition that there is only one head-and-shoulder detecting block inside a corresponding human body detecting block in the plurality of second historical video images.

At block S820: obtaining a second intersection area ratio corresponding to the head-and-shoulder detecting block and the human body detecting block in each of the second historical video images, and obtaining intersection area ratio data corresponding to the plurality of second historical video images.

At block S830: performing statistical analysis on the intersection area ratio data to obtain the preset area ratio threshold.

Through the above steps S810 to S830, the second intersection area ratio corresponding to the head-and-shoulder detecting block and the human body detecting block in the plurality of second historical video images is obtained, and intersection area ratio data corresponding to the plurality of second historical video images is obtained; statistical analysis is performed on the intersection area ratio data to obtain the preset area ratio threshold. In this embodiment, by obtaining the intersection area ratio data corresponding to the head-and-shoulder detecting block and the human body detecting block in the plurality of second historical video images, and performing statistical analysis on the intersection area ratio data, the relatively accurate preset area ratio threshold can be obtained. Furthermore, the accuracy of the association detection between the human body detecting block and the head-and-shoulder detecting block in the video image is further improved.

In some embodiments, after step S810, the associating method for detecting blocks of multiple parts of the human body further includes: obtaining relative ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according to a plurality of sets of associated head-and-shoulder detecting blocks and human body detecting blocks in the second historical video images; and performing statistical analysis on the relative ratio data to obtain the preset relative ratio; wherein the preset relative ratio includes a preset upper edge relative ratio, a preset width relative ratio, and a preset height relative ratio.

The preset upper edge relative ratio represents a ratio of a distance from an upper edge of the head-and-shoulder detecting block to an upper edge of the human body detecting block to a height of the head-and-shoulder detecting block. The preset width relative ratio indicates a ratio of a width of the head-and-shoulder detecting block to a width of the human body detecting block. The preset height relative ratio indicates a ratio of a height of the head-and-shoulder detecting block to a height of the human body detecting block.

Further, the preset upper edge relative ratio may be less than or equal to 0.1. The preset width relative ratio may be in a range of [0.3, 1] , and the preset height relative ratio may be in a range of [0.2, 0.6] .

Through the above embodiment, relative ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according a plurality of sets of associated head-and-shoulder detecting blocks and human body detecting blocks in the second historical video images is obtained; and statistical analysis is performed on the relative ratio data to obtain the preset relative ratio. In this embodiment, by obtaining the relative ratio data corresponding to the head-and-shoulder detecting block and the human body detecting block in the second historical video images, and performing statistical analysis on the relative ratio data, the relatively accurate preset relative ratio can be obtained, thereby further improving the accuracy of association detection of the human body detecting block and the head-and-shoulder detecting block in the video image.

In some embodiments, FIG. 9 is a flowchart of performing mis-associating processing on a first associating relationship and a second associating relationship according to an embodiment of the present disclosure. As shown in FIG. 9, the process includes operations at blocks as followed.

At block S910: performing a mis-associating investigating process on the first associating relationship according to a preset first mis-associating processing rule to obtain a first mis-associating detecting block, and re-determining a target detecting block corresponding to the first mis-associating detecting block; wherein the first mis-associating detecting block includes a face detecting block that is repeatedly associated and a plurality of head-and-shoulder detecting blocks that are correspondingly associated with the face detecting block.

The first mis-associating processing rule includes a preset duplicate associating detecting rule and a preset duplicate associating rule.

Specifically, the first associating relationship is checked for mis-associating according to the preset duplicate associating detecting rule to obtain the first mis-associating detecting block, and the target detecting block corresponding to the first mis-associating detecting block is re-determined according to the preset duplicate associating rule.

At block S920: performing a mis-associating investigating process on the second associating relationship according to a preset second mis-associating processing rule to obtain a second mis-associating detecting block, and re-determining a target detecting block corresponding to the second mis-associating detecting block; wherein the second mis-associating detecting block includes a head-and-shoulder detecting block that is repeatedly associated and a plurality of human body detecting blocks that are correspondingly associated with the head-and-shoulder detecting block.

The second mis-associating processing rule includes a preset duplicate associating detecting rule and a preset duplicate associating rule.

Specifically, the second associating relationship is checked for mis-associating according to the preset duplicate associating detecting rule to obtain the second mis-associating detecting block, and the target detecting block corresponding to the second mis-associating detecting block is re-determined according to the preset duplicate associating rule.

Through the above steps S910 to S920, the mis-associating investigating process is performed on the first associating relationship according to the preset first mis-associating processing rule to obtain the first mis-associating detecting block, and the target detecting block corresponding to the first mis-associating detecting block is re-determined; the mis-associating investigating process is performed on the second associating relationship according to the preset second mis-associating processing rule to obtain the second mis-associating detecting block, and the target detecting block corresponding to the second mis-associating detecting block is re-determined. By combining the mis-associating processing rules with a two-stage associating scheme which can be an association of face detecting block and head-and-shoulder detecting block and an association of head-and-shoulder detecting block and human body detecting block, this embodiment can effectively reduce the possibility of various cross errors, as well as solve the mis-associating problem of mismatched detecting block size and multiple block intersection, thereby effectively improving the accuracy and reliability of determining the associating relationship.

In some embodiments, FIG. 10 is a flowchart of performing mis-associating processing on a first associating relationship according to a first mis-associating processing role according to an embodiment of the present disclosure. As shown in FIG. 10, the process includes operations at blocks as followed.

At block S1010: for each video image, traversing the face detecting blocks and the head-and-shoulder detecting blocks that are in the first associating relationship in the video image, and obtaining a duplicate association face detecting block that has an associating relationship with a plurality of head-and-shoulder detecting blocks.

At block S1020: for each duplicate association face detecting block, obtaining a third intersection over union between the duplicate association face detecting block and each head-and-shoulder detecting block associated with the duplicate association face detecting block, and obtaining a plurality of third intersection over unions; and sorting the plurality of third intersection over unions to obtain a third sorting result.

At block S1030: according to the third sorting result, taking the head-and-shoulder detecting block corresponding to a largest third intersection over union as the target head-and-shoulder detecting block corresponding to the duplicate association face detecting block; and for each of the head-and-shoulder detecting blocks corresponding to other third intersection over unions, re-determining the target face detecting block according to other candidate face detecting blocks in the candidate face detecting block set corresponding to a head-and-shoulder detecting block.

Specifically, according to the third sorting result, the head-and-shoulder detecting block corresponding to the largest third intersection over union is taken as the target head-and-shoulder detecting block corresponding to the duplicate association face detecting block. For each of the head-and-shoulder detecting blocks corresponding to other third intersection over unions, the candidate face detecting block corresponding to a second-ranked second intersection over union in the candidate face detecting block set corresponding to the head-and-shoulder detecting block may be taken as the target face detecting block. When there are multiple target face detecting blocks corresponding to the head-and-shoulder detecting block, the corresponding duplicate association face detecting block may be excluded from the target face detecting blocks, and a final matching target face detecting block may be determined from the remaining target face detecting blocks according to the priority.

In some embodiments, FIG. 11 is a flowchart of performing mis-associating processing on a second associating relationship according to a second mis-associating processing role according to an embodiment of the present disclosure. As shown in FIG. 11, the process includes operations at blocks as followed.

At block S1110: for each video image, traversing the human body detecting blocks and the head-and-shoulder detecting blocks that are in the first associating relationship in each video image, and obtaining a duplicate association head-and-shoulder detecting block that has an associating relationship with a plurality of human body detecting blocks.

At block S1120: for each duplicate association head-and-shoulder detecting block, obtaining a fourth intersection over union between a duplicate association head-and-shoulder detecting block and each human body detecting block associated with the duplicate association head-and-shoulder detecting block, and obtaining a plurality of fourth intersection over unions; and sorting the plurality of fourth intersection over unions to obtain a fourth sorting result.

At block S1130: according to the fourth sorting result, taking the human body detecting block corresponding to a largest fourth intersection over union as the target human body detecting block corresponding to the duplicate association head-and-shoulder detecting block; and for each of the human body detecting blocks corresponding to other fourth intersection over unions, re-determining the target head-and-shoulder detecting block according to other candidate head-and-shoulder detecting blocks in the candidate head-and-shoulder detecting block set corresponding to a human body detecting block.

Specifically, according to the fourth sorting result, the human body detecting block corresponding to the largest fourth intersection over union is taken as the target human body detecting block corresponding to the duplicate association head-and-shoulder detecting block. For each of the human body detecting blocks corresponding to other fourth intersection over unions, the candidate head-and-shoulder detecting block corresponding to a second-ranked first intersection area ratio in the candidate head-and-shoulder detecting block set corresponding to the human body detecting block may be taken as the target head-and-shoulder detecting block. When there are multiple target head-and-shoulder detecting blocks corresponding to the human body detecting block, the corresponding duplicate association head-and-shoulder detecting block may be excluded from the target head-and-shoulder detecting blocks, and a final matching target head-and-shoulder detecting block may be determined from the remaining target head-and-shoulder detecting blocks according to the priority.

Further, a file name of the original video image is named as “key” , and a value of the list of a set of detecting blocks corresponding to each person in the video image which can be a set of detecting blocks including a face detecting block, a head-and-shoulder detecting block, and a human body detecting block is “value” . All association information in the original video image is saved in the form of a dictionary. When there is no face detecting block, the corresponding data is replaced with “None” . For a person without a head-and-shoulder detecting block detected, the person is discarded without writing into the dictionary.

For example, { “1. jpg” : [ [face detecting block I, head-and-shoulder detecting block I, human body detecting block I] , [None, head-and-shoulder detecting block II, human body detecting block II] , ... ] } means detecting blocks corresponding to all persons in a first video image. Among them, a set of detecting blocks corresponding to a first person includes the face detecting block I, the head-and-shoulder detecting block I and the human body detecting block I, and a set of detecting blocks corresponding to a second person includes None, the head-and-shoulder detecting block II, and the human body detecting block II, where None means that the face detecting block corresponding to the second person is not detected in the first video image.

To facilitate subsequent analysis and processing, when there is no face detecting block, a corresponding data is replaced with “None” ; for a person without a head-and-shoulder detecting block detected, a corresponding data is discarded without writing into the dictionary.

It should be noted that the steps shown in the above process or the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Although a logical sequence is illustrated in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown herein. For example, in conjunction with FIG. 1, the order of execution of steps S130 and S140 may be interchanged, i.e., step S130 may be executed first and then step S140, or step S140 may be executed first and then step S130. For another example, in conjunction with FIG. 9, the order of steps S910 and S920 may also be interchanged.

The method embodiments provided herein may be executed in a terminal, a computer, or a similar computing device. As an example of running on a terminal, FIG. 12 is a block view of a hardware structure of a terminal for an associating method for detecting blocks of multiple parts of a human body according to an embodiment of the present disclosure. As shown in FIG. 12, the terminal may include one or more (only one is shown in FIG. 12) processors 102 (the processors 102 may include, but are not limited to, processing devices such as microprocessor (MCU) or programmable logic devices (FPGA) ) and a memory 104 for storing data. In some embodiments, the terminal may also include a transmission device 106 for communication functions and an input-output device 108. Those skilled in the art can understand that the structure shown in FIG. 12 is only schematic, and it does not limit the structure of the above terminal. For example, the terminal may also include more or fewer components than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

The memory 104 may be configured to store a computer program, such as software programs for application software, and a module, such as the computer program corresponding to the associating method for detecting blocks of multiple parts of a human body in the embodiments. The processor 102 performs various functional applications as well as data processing, i.e., implements the method described above, by running the computer program stored in the memory 104. The memory 104 may include a high-speed random memory, and may also include a non-volatile memory such as one or more magnetic storage devices, a flash memory, or other non-volatile solid state memories. In some examples, the memory 104 may further include memories that are remotely located relative to processor 102, and these remote memories may be connected to the terminal via a network. Examples of the networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is configured to receive or send data over a network. Specific examples of the network described above may include a wireless network provided by the terminal’s communications provider. In an example, the transmission device 106 includes a network interface controller (NIC) that can be connected to other network devices via a base station such that it can communicate with the Internet. In another example, the transmission device 106 may be a radio frequency (RF) module configured to communicate with the Internet wirelessly.

The present disclosure also provides an associating apparatus for detecting blocks of multiple parts of a human body, which is configured to implement the above-mentioned embodiments and preferred implementations, and those that have been explained will not be repeated. As used below, the terms “module” , “unit” , “sub-unit” , etc. can realize a combination of software and/or hardware that can implement predetermined functions. Although the apparatuses described in the following embodiments are preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.

FIG. 13 is a block view of an associating apparatus for detecting blocks of multiple parts of a human body according to an embodiment of the present disclosure. As shown in FIG. 13, the associating apparatus 130 for detecting blocks of multiple parts of a human body includes:

a data obtaining module 131, configured to obtain a video stream, wherein the video stream includes a plurality of video images;

a target detecting module 132, configured to perform a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images;

a first associating module 133, configured to determine a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block;

a second associating module 134, configured to determine a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block; and

an association determining module 135, configured to obtain the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the video image according to the first associating relationship and the second associating relationship.

In some embodiments, the first associating module 133 includes an intersection over union calculation unit, a candidate set determining unit, an expanded face determining unit, a sorting result obtaining unit, and an associating relationship determining unit.

The intersection over union calculation unit is configured to, for each head-and-shoulder detecting block, obtain a first intersection over union between a head-and-shoulder detecting block and each face detecting block.

The candidate set determining unit is configured to determine a candidate face detecting block set corresponding to the head-and-shoulder detecting block according to a preset relative position condition, the first intersection over union and, a preset intersection over union threshold; wherein the candidate face detecting block set includes a plurality of candidate face detecting blocks.

The expanded face determining unit is configured to determine an expanded face block corresponding to each candidate face detecting block in the candidate face detecting block set according to the preset expansion ratio.

The sorting result obtaining unit is configured to obtain a second intersection over union between each expanded face block and the head-and-shoulder detecting block to obtain a plurality of second intersection over unions, and sort the plurality of second intersection over unions to obtain a first sorting result.

The associating relationship determining unit is configured to determine a target face detecting block corresponding to the head-and-shoulder detecting block from the candidate face detecting block set according to the first sorting result, and construct the first associating relationship between the head-and-shoulder detecting block and the target face detecting block.

In some embodiments, the associating apparatus 130 for detecting blocks of multiple parts of the human body further includes a first preset ratio obtaining module, and the first preset ratio obtaining module includes an image obtaining unit, a data obtaining unit, and a statistical analysis unit.

The image obtaining unit is configured to obtain a plurality of first historical video images that meet a first preset restriction condition; wherein the plurality of first historical video images includes a face detecting block and a head-and-shoulder detecting block that are associated to each other; the first preset restriction condition includes a condition that there is only one face detecting block inside each corresponding head-and-shoulder detecting block in the plurality of first historical video images.

The data obtaining unit is configured to obtain expansion ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according to the face detecting block and the head-and-shoulder detecting block associated with each other in the plurality of first historical video images; wherein the expansion ratio data includes upward expansion ratio data, downward expansion ratio data, leftward expansion ratio data, and rightward expansion ratio data.

The statistical analysis unit is configured to perform statistical analysis on the expansion ratio data to obtain a preset expansion ratio.

In some embodiments, the statistical analysis unit is further configured to take a mean value of a distribution corresponding to the downward expansion ratio data as the preset downward expansion ratio, take a minimum variance value of a distribution corresponding to the upward expansion ratio data as the preset upward expansion ratio, take a minimum variance value of a distribution corresponding to the leftward expansion ratio data as the preset leftward expansion ratio, and take a minimum variance value of a distribution corresponding to the rightward expansion ratio data as the preset rightward expansion ratio.

In some embodiments, the second associating module 134 includes a ratio obtaining unit, a set determining unit, a removing processing unit, a sorting processing unit, and an associating determining unit.

The ratio obtaining unit is configured to, for each human body detecting block, obtain a first intersection area ratio between a human body detecting block and each head-and-shoulder detecting block; wherein the first intersection area ratio indicates that a ratio of an intersection area of the human body detecting block and the head-and-shoulder detecting block to an area of the head-and-shoulder detecting block.

The set determining unit is configured to determine a candidate head-and-shoulder detecting block set corresponding to the human body detecting block according to the first intersection area ratio and a preset area ratio threshold; wherein the candidate head-and-shoulder detecting block set includes a plurality of candidate head-and-shoulder detecting blocks.

The removing processing unit is configured to remove at least one candidate head-and-shoulder detecting block that does not meet the preset relative ratio from the candidate head-and-shoulder detecting block set, and obtaining a removal-processed candidate head-and-shoulder detecting block set.

The sorting processing unit is configured to obtain a plurality of first intersection area ratios according to the first intersection area ratio corresponding to each candidate head-and-shoulder detecting block in the removal-processed candidate head-and-shoulder detecting block set, and sort the plurality of first intersection area ratios to obtain a second sorting result.

The associating determining unit is configured to determine a target head-and-shoulder detecting block corresponding to the human body detecting block from the removal-processed candidate head-and-shoulder detecting block set according to the second sorting result, and construct the second associating relationship between the human body detecting block and the target head-and-shoulder detecting block.

In some embodiments, the associating apparatus 130 for detecting blocks of multiple parts of the human body further includes a second preset ratio obtaining module, and the second preset ratio obtaining module includes a historical image obtaining unit, an area ratio obtaining unit, and a ratio threshold determining unit.

The historical image obtaining unit is configured to obtain a plurality of second historical video images that meet a second preset restriction condition; wherein the plurality of second historical video images include a head-and-shoulder detecting block and a human body detecting block that are associated to each other; the second preset restriction condition includes a condition that there is only one head-and-shoulder detecting block inside each corresponding human body detecting block in the plurality of second historical video images.

The area ratio obtaining unit is configured to obtain a second intersection area ratio corresponding to the head-and-shoulder detecting block and the human body detecting block in each of the second historical video images, and obtaining intersection area ratio data corresponding to the plurality of second historical video images.

The ratio threshold determining unit is configured to perform statistical analysis on the intersection area ratio data to obtain the preset area ratio threshold.

In some embodiments, the second preset ratio obtaining module further includes a relative ratio data obtaining unit and a preset relative ratio determining unit.

The relative ratio data obtaining unit is configured to obtain relative ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according a plurality of sets of associated head-and-shoulder detecting blocks and human body detecting blocks in the second historical video images.

The preset relative ratio determining unit is configured to perform statistical analysis on the relative ratio data to obtain the preset relative ratio.

In some embodiments, the associating apparatus 130 for detecting blocks of multiple parts of the human body further includes a mis-associating processing module, and the mis-associating processing module includes a first mis-associating processing unit and a second mis-associating processing unit.

The first mis-associating processing unit is configured to perform a mis-associating investigating process on the first associating relationship according to a preset first mis-associating processing rule to obtain a first mis-associating detecting block, and re-determine a target detecting block corresponding to the first mis-associating detecting block; wherein the first mis-associating detecting block includes a face detecting block that is repeatedly associated and a plurality of head-and-shoulder detecting blocks that are correspondingly associated with the face detecting block.

The second mis-associating processing unit is configured to perform a mis-associating investigating process on the second associating relationship according to a preset second mis-associating processing rule to obtain a second mis-associating detecting block, and re-determining a target detecting block corresponding to the second mis-associating detecting block; wherein the second mis-associating detecting block includes a head-and-shoulder detecting block that is repeatedly associated and a plurality of human body detecting blocks that are correspondingly associated with the head-and-shoulder detecting block.

In some embodiments, the first mis-associating processing unit includes a mis-associating investigation unit, a sorting result obtaining unit, and a duplicate association processing unit.

The mis-associating investigation unit is configured to, for each video image, traverse the face detecting blocks and the head-and-shoulder detecting blocks that are in the first associating relationship in the video image, and obtain a duplicate association face detecting block that has an associating relationship with a plurality of head-and-shoulder detecting blocks.

The sorting result obtaining unit is configured to, for each duplicate association face detecting block, obtain a third intersection over union between the duplicate association face detecting block and each head-and-shoulder detecting block associated with the duplicate association face detecting block, and obtain a plurality of third intersection over unions; and sort the plurality of third intersection over unions to obtain a third sorting result.

The duplicate association processing unit is configured to, according to the third sorting result, take the head-and-shoulder detecting block corresponding to a largest third intersection over union as the target head-and-shoulder detecting block corresponding to the duplicate association face detecting block; and for each of the head-and-shoulder detecting blocks corresponding to other third intersection over unions, re-determine the target face detecting block according to other candidate face detecting blocks in the candidate face detecting block set corresponding to a head-and-shoulder detecting block.

In some embodiments, the second mis-associating processing unit includes a mis-associating investigation unit, a sorting result obtaining unit, and a duplicate association processing unit.

The mis-associating investigation unit is configured to, for each video image, traverse the human body detecting blocks and the head-and-shoulder detecting blocks that are in the first associating relationship in each video image, and obtain a duplicate association head-and-shoulder detecting block that has an associating relationship with a plurality of human body detecting blocks.

The sorting result obtaining unit is configured to, for each duplicate association head-and-shoulder detecting block, obtain a fourth intersection over union between a duplicate association head-and-shoulder detecting block and each human body detecting block associated with the duplicate association head-and-shoulder detecting block, and obtain a plurality of fourth intersection over unions; and sorting the plurality of fourth intersection over unions to obtain a fourth sorting result.

The duplicate association processing unit is configured to, according to the fourth sorting result, take the human body detecting block corresponding to a largest fourth intersection over union as the target human body detecting block corresponding to the duplicate association head-and-shoulder detecting block; and for each of the human body detecting blocks corresponding to other fourth intersection over unions, re-determine the target head-and-shoulder detecting block according to other candidate head-and-shoulder detecting blocks in the candidate head-and-shoulder detecting block set corresponding to a human body detecting block.

It should be noted that the above-mentioned modules can be functional modules or program modules, which can be implemented by software or hardware. For modules implemented by hardware, each of the foregoing modules may be located in the same processor; or each of the foregoing modules may also be located in different processors in any combination.

The present disclosure also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.

In some embodiments, the electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the processor, and the input-output device is connected to the processor.

In some embodiments, the processor may be configured to execute the following steps through a computer program:

S1: obtaining a video stream, wherein the video stream includes a plurality of video images.

S2: performing a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images.

S3: determining a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block.

S4: determining a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block.

S5: obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.

It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the above embodiments and alternative implementations, which is not repeated herein.

In addition, in combination with the associating method for detecting blocks of multiple parts of a human body in the above embodiments, the present disclosure may provide a storage medium for implementation. The storage medium stores a computer program; when the computer program is executed by the processor, any one of the method embodiments may be implemented.

Those skilled in the art should understand that the various technical features of the above-mentioned embodiments can be combined arbitrarily. For the sake of concise description, all possible combinations of each technical feature in the above described embodiments are not described, however, as long as there is no contradiction in the combination of these technical features, they should be considered as the scope of the present disclosure.

The above described embodiments express only several embodiments of the present disclosure, and their descriptions are more specific and detailed, but they should not be construed as a limitation of the scope of the present disclosure. It should be noted that for those skilled in the art, a number of variations and improvements can be made without departing from the conception of the present disclosure, and these belong to the scope of the present disclosure. Therefore, the scope the present disclosure shall be subject to the attached claims.

Claims

An associating method for detecting blocks of multiple parts of a human body, comprising:

obtaining a video stream, wherein the video stream comprises a plurality of video images;

performing a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images;

determining a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block;

determining a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block; and

obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.
The method according to claim 1, wherein the determining the first associating relationship between the face detecting block and the head-and-shoulder detecting block according to the preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images comprises:

in condition of the head-and-shoulder detecting block comprising a plurality of head-and-shoulder detecting blocks and the face detecting block comprising a plurality of face detecting blocks, for each head-and-shoulder detecting block: obtaining a first intersection over union between the head-and-shoulder detecting block and each face detecting block;

determining a candidate face detecting block set corresponding to the head-and-shoulder detecting block according to a preset relative position condition, the first intersection over union, and a preset intersection over union threshold; wherein the candidate face detecting block set comprises a plurality of candidate face detecting blocks;

determining an expanded face block corresponding to each candidate face detecting block in the candidate face detecting block set according to the preset expansion ratio;

obtaining a second intersection over union between each expanded face block and the head-and-shoulder detecting block to obtain a plurality of second intersection over unions, and sorting the plurality of second intersection over unions to obtain a first sorting result; and

determining a target face detecting block corresponding to the head-and-shoulder detecting block from the candidate face detecting block set according to the first sorting result, and constructing the first associating relationship between the head-and-shoulder detecting block and the target face detecting block.
The method according to claim 2, before the determining the first associating relationship between the face detecting block and the head-and-shoulder detecting block according to the preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images, further comprising:

obtaining a plurality of first historical video images that meet a first preset restriction condition; wherein the plurality of first historical video images comprise a face detecting block and a head-and-shoulder detecting block that are associated to each other; the first preset restriction condition comprises a condition that there is only one face detecting block inside each corresponding head-and-shoulder detecting block in the plurality of first historical video images;

obtaining expansion ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according to the face detecting block and the head-and-shoulder detecting block associated with each other in the plurality of first historical video images; wherein the expansion ratio data comprises upward expansion ratio data, downward expansion ratio data, leftward expansion ratio data, and rightward expansion ratio data; and

performing statistical analysis on the expansion ratio data to obtain a preset expansion ratio; wherein the preset expansion ratio comprises a preset upward expansion ratio, a preset downward expansion ratio, a preset leftward expansion ratio, and a preset rightward expansion ratio.
The method according to claim 3, wherein the performing statistical analysis on the expansion ratio data to obtain the preset expansion ratio comprises:

taking a mean value of a distribution corresponding to the downward expansion ratio data as a preset downward expansion ratio, taking a minimum variance value of a distribution corresponding to the upward expansion ratio data as a preset upward expansion ratio, taking a minimum variance value of a distribution corresponding to the leftward expansion ratio data as a preset leftward expansion ratio, and taking a minimum variance value of a distribution corresponding to the rightward expansion ratio data as a preset rightward expansion ratio.
The method according to claim 2, wherein the determining the second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to the preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images comprises:

in condition of the human body detecting block comprising a plurality of human body detecting blocks, for each human body detecting block: obtaining a first intersection area ratio between a human body detecting block and each head-and-shoulder detecting block; wherein the first intersection area ratio indicates that a ratio of an intersection area of the human body detecting block and the head-and-shoulder detecting block to an area of the head-and-shoulder detecting block;

determining a candidate head-and-shoulder detecting block set corresponding to the human body detecting block according to the first intersection area ratio and a preset area ratio threshold; wherein the candidate head-and-shoulder detecting block set comprises a plurality of candidate head-and-shoulder detecting blocks;

removing at least one candidate head-and-shoulder detecting block that does not meet the preset relative ratio from the candidate head-and-shoulder detecting block set, and obtaining a removal-processed candidate head-and-shoulder detecting block set;

obtaining a plurality of first intersection area ratios according to the first intersection area ratio corresponding to each candidate head-and-shoulder detecting block in the removal-processed candidate head-and-shoulder detecting block set, and sorting the plurality of first intersection area ratios to obtain a second sorting result; and

determining a target head-and-shoulder detecting block corresponding to the human body detecting block from the removal-processed candidate head-and-shoulder detecting block set according to the second sorting result, and constructing the second associating relationship between the human body detecting block and the target head-and-shoulder detecting block.
The method according to claim 5, before the determining the second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to the preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images, further comprising:

obtaining a plurality of second historical video images that meet a second preset restriction condition; wherein the second historical video image comprises a head-and-shoulder detecting block and a human body detecting block that are associated to each other; the second preset restriction condition comprises a condition that there is only one head-and-shoulder detecting block inside each corresponding human body detecting block in the plurality of second historical video images;

obtaining a second intersection area ratio corresponding to the head-and-shoulder detecting block and the human body detecting block in each of the second historical video images, and obtaining intersection area ratio data corresponding to the plurality of second historical video images; and

performing statistical analysis on the intersection area ratio data to obtain the preset area ratio threshold.
The method according to claim 6, after the obtaining the plurality of second historical video images that meet a second preset restriction condition, further comprising:

obtaining relative ratio data corresponding to the face detecting block and the head-and-shoulder detecting block according a plurality of sets of associated head-and-shoulder detecting blocks and human body detecting blocks in the plurality of second historical video images; and

performing statistical analysis on the relative ratio data to obtain the preset relative ratio; wherein the preset relative ratio comprises a preset upper edge relative ratio, a preset width relative ratio, and a preset height relative ratio.
The method according to claim 5, before the obtaining the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship, further comprising:

performing a mis-associating investigating process on the first associating relationship according to a preset first mis-associating processing rule to obtain a first mis-associating detecting block, and re-determining a target detecting block corresponding to the first mis-associating detecting block; wherein the first mis-associating detecting block comprises a face detecting block that is repeatedly associated and a plurality of head-and-shoulder detecting blocks that are correspondingly associated with the face detecting block; and

performing a mis-associating investigating process on the second associating relationship according to a preset second mis-associating processing rule to obtain a second mis-associating detecting block, and re-determining a target detecting block corresponding to the second mis-associating detecting block; wherein the second mis-associating detecting block comprises a head-and-shoulder detecting block that is repeatedly associated and a plurality of human body detecting blocks that are correspondingly associated with the head-and-shoulder detecting block.
The method according to claim 8, wherein the performing the mis-associating investigating process on the first associating relationship according to the preset first mis-associating processing rule to obtain the first mis-associating detecting block, and re-determining the target detecting block corresponding to the first mis-associating detecting block comprises:

for each video image, traversing the plurality of face detecting blocks and the plurality of head-and-shoulder detecting blocks that are in the first associating relationship in the video image, and obtaining a duplicate association face detecting block that has an associating relationship with a plurality of head-and-shoulder detecting blocks;

in condition of the duplicate association face detecting block comprising a plurality of duplicate association face detecting blocks, for each duplicate association face detecting block, obtaining a third intersection over union between the duplicate association face detecting block and each head-and-shoulder detecting block associated with the duplicate association face detecting block, and obtaining a plurality of third intersection over unions; and sorting the plurality of third intersection over unions to obtain a third sorting result; and

according to the third sorting result, taking the head-and-shoulder detecting block corresponding to a largest third intersection over union as the target head-and-shoulder detecting block corresponding to the duplicate association face detecting block; and for each of head-and-shoulder detecting blocks corresponding to other third intersection over unions, re-determining the target face detecting block according to other candidate face detecting blocks in the candidate face detecting block set corresponding to a head-and-shoulder detecting block.
The method according to claim 8, wherein the performing the mis-associating investigating process on the second associating relationship according to the preset second mis-associating processing rule to obtain the second mis-associating detecting block, and re-determining a target detecting block corresponding to the second mis-associating detecting block comprise:

for each video image, traversing the plurality of human body detecting blocks and the plurality of head-and-shoulder detecting blocks that are in the first associating relationship in each video image, and obtaining a duplicate association head-and-shoulder detecting block that has an associating relationship with a plurality of human body detecting blocks;

in condition of the duplicate association head-and-shoulder detecting block comprising a plurality of duplicate association head-and-shoulder detecting blocks, for each duplicate association head-and-shoulder detecting block, obtaining a fourth intersection over union between a duplicate association head-and-shoulder detecting block and each human body detecting block associated with the duplicate association head-and-shoulder detecting block, and obtaining a plurality of fourth intersection over unions; and sorting the plurality of fourth intersection over unions to obtain a fourth sorting result; and

according to the fourth sorting result, taking the human body detecting block corresponding to a largest fourth intersection over union as the target human body detecting block corresponding to the duplicate association head-and-shoulder detecting block; and for each of the human body detecting blocks corresponding to other fourth intersection over unions, re-determining the target head-and-shoulder detecting block according to other candidate head-and-shoulder detecting blocks in the candidate head-and-shoulder detecting block set corresponding to a human body detecting block.
An associating apparatus for detecting blocks of multiple parts of a human body, comprising:

a data obtaining module, configured to obtain a video stream, wherein the video stream comprises a plurality of video images;

a target detecting module, configured to perform a target detecting process on a plurality of video images to obtain a face detecting block, a head-and-shoulder detecting block, and a human body detecting block in the plurality of video images;

a first associating module, configured to determine a first associating relationship between the face detecting block and the head-and-shoulder detecting block according to a preset expansion ratio, the face detecting block, and the head-and-shoulder detecting block in the plurality of video images; wherein the preset expansion ratio represents a ratio between a distance from the face detecting block flared to the head-and-shoulder detecting block and a size of the face detecting block;

a second associating module, configured to determine a second associating relationship between the head-and-shoulder detecting block and the human body detecting block according to a preset relative ratio, the head-and-shoulder detecting block and the human body detecting block in the plurality of video images; wherein the preset relative ratio represents a relative ratio between the head-and-shoulder detecting block and the human body detecting block; and

an association determining module, configured to obtain the face detecting block, the head-and-shoulder detecting block, and the human body detecting block corresponding to the same person in the plurality of video images according to the first associating relationship and the second associating relationship.
An electronic device, comprising a memory and a processor; wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform an associating method for detecting blocks of multiple parts of a human body according to any one of claims 1-10.
A storage medium, storing a computer program; wherein the computer program is configured to perform an associating method for detecting blocks of multiple parts of a human body according to any one of claims 1-10 when executed.