CN112101139B

CN112101139B - Human shape detection method, device, equipment and storage medium

Info

Publication number: CN112101139B
Application number: CN202010875179.0A
Authority: CN
Inventors: 肖传利
Original assignee: Pulian International Co ltd
Current assignee: Pulian International Co ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2024-05-03
Anticipated expiration: 2040-08-27
Also published as: CN112101139A

Abstract

The invention discloses a humanoid detection method, a humanoid detection device, humanoid detection equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; extracting the characteristics of the image to be detected to obtain a characteristic diagram of the image to be detected; performing humanoid detection on the feature images of the image to be detected to obtain a target humanoid region of the image to be detected; the target humanoid region comprises a target whole body region, a target upper body region and a target head-shoulder region; and marking the target humanoid region in the image to be detected. By adopting the embodiment of the invention, the effective humanoid detection can be realized when the humanoid target is incomplete, and the humanoid detection accuracy is improved.

Description

Human shape detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a human shape detection method, apparatus, device, and storage medium.

Background

Along with the development of scientific calculation, the use of cameras has been in deep life, and at the same time, humanoid detection is also widely applied to mobile devices such as cameras. Compared with human face detection and the like, human shape detection can more effectively warn the target under the light rays of different postures of a human body.

At present, a complete human shape in an image to be detected is detected, so that a human shape detection result is obtained. However, the inventor finds that when the human shape detection is actually performed, the human shape target is very easy to be shielded by other objects or other pedestrians, and the existing human shape detection method can only detect the complete human shape, so that the existing human shape detection method cannot detect the human shape target when the human shape target is partially shielded, and therefore, the human shape detection accuracy of the existing human shape detection method is lower.

Disclosure of Invention

The embodiment of the invention provides a humanoid detection method, a humanoid detection device, humanoid detection equipment and a storage medium, which can detect a target whole body area, a target upper body area and a target head shoulder area simultaneously in the humanoid detection process, and improve humanoid detection accuracy.

An embodiment of the present invention provides a human shape detection method, including:

Acquiring an image to be detected;

extracting the characteristics of the image to be detected to obtain a characteristic diagram of the image to be detected;

Performing humanoid detection on the feature images of the image to be detected to obtain a target humanoid region of the image to be detected; the target humanoid region comprises a target whole body region, a target upper body region and a target head-shoulder region;

and marking the target humanoid region in the image to be detected.

As an improvement of the above solution, the performing human shape detection on the feature map of the image to be detected to obtain the target human shape region of the image to be detected specifically includes:

Performing human-shaped whole body region detection on the feature map of the image to be detected through a pre-trained whole body detector to obtain a whole body region detection result;

When the whole body region detection result comprises a plurality of candidate whole body regions, determining a first candidate upper body region and a first candidate head shoulder region in the plurality of candidate whole body regions according to the position and the size of the preset upper body detection region and the head shoulder detection region in the standard whole body template based on the aspect ratio relation between the plurality of candidate whole body regions and the preset standard whole body template;

Judging whether a first candidate upper body region in the plurality of candidate whole body regions contains a human upper body or not through a pre-trained first upper body detector, and obtaining a first judgment result corresponding to the plurality of candidate whole body regions;

Judging whether a first candidate head-shoulder region in the plurality of candidate whole-body regions contains a human head-shoulder or not through a pre-trained first head-shoulder detector, and obtaining a second judgment result corresponding to the plurality of candidate whole-body regions;

Determining a candidate whole body region which corresponds to the first judging result and the second judging result as a target whole body region of the image to be detected;

Performing human-shaped upper body region detection on the feature map of the image to be detected through a pre-trained second upper body detector and a pre-trained second head-shoulder detector to obtain a target upper body region of the image to be detected;

and detecting the head-shoulder region of the human shape of the feature map of the image to be detected by a pre-trained third head-shoulder detector to obtain a target head-shoulder region of the image to be detected.

As an improvement of the above solution, the detecting the human-shaped upper body area of the feature map of the image to be detected by using the pre-trained second upper body detector and the pre-trained second head-shoulder detector to obtain the target upper body area of the image to be detected specifically includes:

performing human-shaped upper body region detection on the feature images of the images to be detected through a pre-trained second upper body detector to obtain an upper body region detection result;

when the upper body region detection result includes a plurality of second candidate upper body regions, determining second candidate head-shoulder regions in the plurality of candidate upper body regions according to the position and the size of the head-shoulder detection region in the upper body detection region based on the aspect ratio relation between the plurality of second candidate upper body regions and the upper body detection region in the standard whole body template;

Judging whether the second candidate head-shoulder areas in the plurality of second candidate upper body areas contain human head-shoulder areas or not through a pre-trained second head-shoulder detector, and obtaining third judging results corresponding to the plurality of second candidate upper body areas;

And determining the corresponding second candidate upper body area with the third judging result as the target upper body area of the image to be detected.

As an improvement of the above solution, before performing human shape detection on the feature map of the image to be detected to obtain the target human shape region of the image to be detected, the method further includes:

acquiring a pedestrian data training set; the pedestrian data training set comprises a plurality of human-shaped whole body sample graphs with the same size as a preset standard size, and each human-shaped whole body sample graph is marked with a human-shaped upper body area and a head-shoulder area through an external rectangular frame;

Determining the center positions and the sizes of the head and shoulder detection windows and the upper body detection windows according to the information of the upper body region and the head and shoulder region marked in the plurality of human-shaped whole body sample images;

Scaling the plurality of human-shaped whole body sample graphs according to preset three sizes respectively to obtain a plurality of human-shaped whole body sample graphs corresponding to the three sizes respectively; wherein the three dimensions include a first dimension, a second dimension, and a third dimension, the first dimension being greater than the second dimension, the second dimension being greater than the third dimension;

based on the proportional relation between the standard size and the three sizes, extracting head-shoulder areas of the plurality of human-shaped whole body sample graphs corresponding to the three sizes respectively according to the central position and the size of the head-shoulder detection window to obtain a plurality of human-shaped head-shoulder sample graphs corresponding to the three sizes respectively;

Based on the proportional relation between the standard size and the third size and the second size respectively, respectively extracting upper body areas of a plurality of human-shaped whole body sample images of the third size and a plurality of human-shaped whole body sample images of the second size according to the central position and the size of the upper body detection window to obtain a plurality of human-shaped upper body sample images of the third size and a plurality of human-shaped upper body sample images of the second size;

Training six preset classifiers acquired in advance according to the plurality of human-shaped whole body sample graphs of the third size, the plurality of human-shaped upper body sample graphs of the second size and the plurality of human-shaped head-shoulder sample graphs respectively corresponding to the three sizes to obtain the whole body detector, the first upper body detector, the second upper body detector, the first head-shoulder detector, the second head-shoulder detector and the third head-shoulder detector.

As an improvement of the above-mentioned scheme, the size of the preset standard whole body template is equal to the preset standard size;

Before the feature map of the image to be detected is subjected to humanoid detection to obtain the target humanoid region of the image to be detected, after the pedestrian data training set is obtained, the method further comprises the following steps:

determining the central positions of the upper body detection area and the head shoulder detection area in the standard whole body template according to the information of the upper body area and the head shoulder area marked in the plurality of human-shaped whole body sample images;

And determining the sizes of the upper body detection area and the head and shoulder detection area according to the information of the upper body area and the head and shoulder area marked in the plurality of human-shaped whole body sample images.

As an improvement of the above solution, the determining the sizes of the upper body detection area and the head-shoulder detection area according to the information of the upper body area and the head-shoulder area marked in the several human-shaped whole body sample images specifically includes:

Calculating the minimum value of the width of the upper body detection area which enables the first preset condition to be met, and taking the minimum value of the width of the upper body detection area as the width of the upper body detection area; the first preset condition is that the proportion of the number of the whole body sample images, which are completely contained by the upper body detection area in the transverse range, of the upper body area to the total number of the plurality of human-shaped whole body sample images is larger than a preset proportion threshold;

calculating a minimum value of the height of the upper body detection area which enables the second preset condition to be met, and taking the minimum value of the height of the upper body detection area as the height of the upper body detection area; the second preset condition is that the proportion of the number of the whole body sample images, which are completely contained by the upper body detection area in the transverse range and the longitudinal range, of the upper body area to the total number of the plurality of human-shaped whole body sample images is larger than the preset proportion threshold;

calculating the minimum value of the width of the head-shoulder detection area which enables the third preset condition to be met, and taking the minimum value of the width of the head-shoulder detection area as the width of the head-shoulder detection area; the third preset condition is that the ratio of the number of the whole body sample graphs, which are completely contained by the head-shoulder detection area in the transverse range, of the head-shoulder area to the total number of the plurality of human-shaped whole body sample graphs is greater than the preset ratio threshold;

Calculating the minimum value of the height of the head-shoulder detection area which enables the fourth preset condition to be met, and taking the minimum value of the height of the head-shoulder detection area as the height of the head-shoulder detection area; the fourth preset condition is that the ratio of the number of the whole body sample images, which are completely contained by the head-shoulder detection area in the transverse range and the longitudinal range, of the head-shoulder area to the total number of the plurality of human-shaped whole body sample images is greater than the preset ratio threshold.

As an improvement of the above solution, the marking the target humanoid area in the image to be detected specifically includes:

merging the target humanoid areas in the image to be detected to obtain a merged target humanoid area;

And marking the combined target humanoid region in the image to be detected.

Another embodiment of the present invention provides a humanoid detection apparatus including:

The image acquisition module is used for acquiring an image to be detected;

the feature extraction module is used for extracting features of the image to be detected to obtain a feature map of the image to be detected;

the human shape detection module is used for carrying out human shape detection on the feature images of the image to be detected to obtain a target human shape area of the image to be detected; the target humanoid region comprises a target whole body region, a target upper body region and a target head-shoulder region;

and the humanoid marking module is used for marking the target humanoid region in the image to be detected.

Another embodiment of the present invention also provides a human shape detection apparatus, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the human shape detection method according to any one of the above when executing the computer program.

Another embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to perform the method for detecting a person according to any one of the above.

Compared with the prior art, the humanoid detection method, the device, the equipment and the storage medium provided by the embodiment of the invention have the advantages that the characteristic diagram of the image to be detected is obtained by extracting the characteristics of the obtained image to be detected, the humanoid detection is carried out on the characteristic diagram of the image to be detected to obtain the target humanoid region of the image to be detected, and then the target humanoid region in the image to be detected is marked to realize humanoid detection. Based on the analysis, in the embodiment of the invention, in the process of performing humanoid detection on the feature map of the image to be detected, not only the whole body region of the target but also the upper body region of the target and the head and shoulder region of the target are detected, so that a humanoid detection result is obtained, and therefore, when the part below the head and shoulder of the humanoid target is shielded, effective humanoid detection can still be realized, and the humanoid detection accuracy is improved.

Drawings

Fig. 1 is a flow chart of a human shape detection method according to an embodiment of the invention.

Fig. 2 is a schematic structural diagram of a humanoid detection device according to an embodiment of the invention.

Fig. 3 is a schematic structural diagram of a humanoid detection apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flow chart of a human shape detection method according to an embodiment of the invention is shown.

The humanoid detection method provided by the embodiment of the invention comprises the following steps:

S11, acquiring an image to be detected.

And S12, extracting the characteristics of the image to be detected to obtain a characteristic diagram of the image to be detected.

The feature extraction method may be various, for example, ACF feature extraction, hog feature extraction, or feature extraction using convolutional neural network, and may be selected according to the actual situation in the specific implementation, which is not limited herein.

S13, performing humanoid detection on the feature map of the image to be detected to obtain a target humanoid region of the image to be detected; the target humanoid region comprises a target whole body region, a target upper body region and a target head shoulder region.

The method specifically comprises the steps of performing human body region detection on a feature map of an image to be detected through a pre-trained whole body detector to obtain a target whole body region of the image to be detected, performing human body region detection on the feature map of the image to be detected through a pre-trained upper body detector to obtain a target upper body region of the image to be detected, and performing human head shoulder region detection on the feature map of the image to be detected through a pre-trained head shoulder detector to obtain a target head shoulder region of the image to be detected, so that when the part below the head shoulder of the human target is shielded, a more accurate target human body region can be obtained through detection of the human head shoulder and the upper body, the human target in the missing image is avoided, and the human body detection accuracy is improved. The whole body detector is a classifier which is trained in advance and can detect a human-shaped whole body region contained in an image through a feature map of the input image; the upper body detector is a classifier which is trained in advance and can detect a human-shaped upper body region contained in an input image through a feature map of the image; the head-shoulder detector is a classifier trained in advance and capable of detecting a human-shaped head-shoulder region included in an input image by inputting a feature map of the image. In a specific embodiment, the detection of the whole body detector, the upper body detector and the head-shoulder detector may be performed simultaneously to improve the efficiency of the humanoid detection.

S14, marking the target humanoid region in the image to be detected.

After the target humanoid region of the image to be detected is obtained, the target humanoid region in the image to be detected may be marked by an circumscribed rectangular frame.

Specifically, the step S14 specifically includes:

s141, merging the target humanoid areas in the image to be detected to obtain a merged target humanoid area;

S142, marking the combined target humanoid area in the image to be detected.

It should be noted that in the human shape detection process, a situation that one human shape target is marked as a plurality of areas is likely to be encountered, if one target is marked as a plurality of areas, follow-up errors and follow-up losses are likely to occur in subsequent tracking, so that the efficiency and accuracy of subsequent tracking can be effectively improved by combining and marking the target human shape areas in the image to be detected. The method specifically may determine whether the two target humanoid areas need to be merged by determining whether a center distance between the two target humanoid areas is within a threshold range, or whether the two target humanoid areas need to be merged by determining whether an intersection exists between the two target humanoid areas, and then merging the target humanoid areas determined to be merged to obtain a merged target humanoid area.

According to the humanoid detection method, the device, the equipment and the storage medium, the feature extraction is carried out on the acquired image to be detected to obtain the feature image of the image to be detected, the humanoid detection is carried out on the feature image of the image to be detected through the pre-trained whole body detector, the pre-trained upper body detector and the pre-trained head shoulder detector to obtain the target humanoid region of the image to be detected, and then the target humanoid region in the image to be detected is marked to realize humanoid detection. Based on the analysis, in the embodiment of the invention, in the process of performing humanoid detection on the feature map of the image to be detected, not only the whole body region of the target but also the upper body region of the target and the head and shoulder region of the target are detected, so that a humanoid detection result is obtained, and therefore, when the part below the head and shoulder of the humanoid target is shielded, effective humanoid detection can still be realized, and the humanoid detection accuracy is improved.

In the embodiment of the present invention, the step S13 specifically includes:

S131, performing human-shaped whole body region detection on the feature map of the image to be detected through a pre-trained whole body detector, and obtaining a whole body region detection result.

It can be understood that the following situations exist in the process of performing human-shaped whole-body region detection on the feature map of the image to be detected through the pre-trained whole-body detector: first, if no human whole body region is detected, the output whole body region detection result does not contain the candidate whole body region; secondly, detecting a plurality of candidate whole body regions, wherein the output whole body region detection result comprises information such as the positions, the sizes and the like of a plurality of human-shaped whole body regions.

And S132, when the whole body region detection result comprises a plurality of candidate whole body regions, determining a first candidate upper body region and a first candidate head shoulder region in the plurality of candidate whole body regions according to the position and the size of the preset upper body detection region and the head shoulder detection region in the standard whole body template based on the aspect ratio relation between the plurality of candidate whole body regions and the preset standard whole body template.

Specifically, a standard whole body template marked with an upper body detection area and a head shoulder detection area is preset, when a whole body area detection result contains a plurality of candidate whole body areas, the positions and the sizes of an upper body position to be detected area and a head shoulder position to be detected area in each candidate whole body area are determined according to the positions and the sizes of the upper body detection area and the head shoulder detection area in the standard whole body template based on the aspect ratio relation between each candidate whole body area and the preset standard whole body template, so as to obtain the positions and the sizes of a first candidate upper body area and a first candidate head shoulder area in each candidate whole body area.

S133, judging whether first candidate upper body areas in the plurality of candidate whole body areas contain human-shaped upper bodies or not through a pre-trained first upper body detector, and obtaining first judgment results corresponding to the plurality of candidate whole body areas.

In order to reduce the time consumption in extracting the feature map, specifically, the feature map of the first candidate upper body region in the plurality of candidate whole body regions may be obtained by extracting the upper body region in the feature map of the image to be detected according to the positions and sizes of the first candidate upper body regions in the plurality of candidate whole body regions, so that the feature map of the first candidate upper body region in the plurality of candidate whole body regions does not need to be repeatedly detected, and the feature map of the first candidate upper body region in the plurality of candidate whole body regions is input into the first upper body detector to be analyzed, so as to determine whether the first candidate upper body region in the plurality of candidate whole body regions includes the human upper body, thereby obtaining the first determination result corresponding to the plurality of candidate whole body regions.

S134, judging whether the first candidate head-shoulder regions in the plurality of candidate whole-body regions contain the human head-shoulder by a pre-trained first head-shoulder detector, and obtaining a second judgment result corresponding to the plurality of candidate whole-body regions.

In order to reduce the time consumption in extracting the feature map, specifically, the feature map of the first candidate head-shoulder region in the plurality of candidate whole body regions may be obtained by extracting the head-shoulder region in the feature map of the image to be detected according to the positions and sizes of the first candidate head-shoulder region in the plurality of candidate whole body regions, so that the feature map of the first candidate head-shoulder region in the plurality of candidate whole body regions does not need to be repeatedly detected, and the feature map of the first candidate head-shoulder region in the plurality of candidate whole body regions is input into the first head-shoulder detector to be analyzed, so as to determine whether the first candidate head-shoulder region in the plurality of candidate whole body regions contains the human head-shoulder region, thereby obtaining a second determination result corresponding to the plurality of candidate whole body regions.

S135, determining that the candidate whole body region with the corresponding first judgment result and second judgment result being yes is the target whole body region of the image to be detected.

And judging the candidate whole body region as a target whole body region of the image to be detected when the first judgment result and the second judgment result corresponding to the candidate whole body region are both yes.

S136, detecting the human-shaped upper body region of the feature map of the image to be detected through a pre-trained second upper body detector and a pre-trained second head-shoulder classifier, and obtaining a target upper body region of the image to be detected.

The feature map of the image to be detected is input into a second upper body detector to detect a human-shaped upper body region, a second candidate upper body region is obtained, a second candidate head-shoulder region is obtained according to the position relation between the upper body region and the head-shoulder, whether the second candidate head-shoulder region contains the human-shaped head-shoulder or not is judged by the second head-shoulder detector, the third judgment result is used as a third judgment result, and the candidate upper body region with the third judgment result being the target upper body region of the image to be detected is determined to be the target upper body region of the image to be detected, so that the target upper body region of the image to be detected is output.

S137, performing human-shaped head-shoulder region detection on the feature map of the image to be detected through a pre-trained third head-shoulder detector to obtain a target head-shoulder region of the image to be detected.

And inputting the feature map of the image to be detected into a third head-shoulder detector for human-shaped head-shoulder region detection so as to output a target head-shoulder region of the image to be detected.

Illustratively, the whole body detector may be a detector for detecting a distant human whole body target, the first upper body detector may be a detector for detecting an upper body region in the distant human whole body target, the first upper body detector may be a detector for detecting a lower human body target, the second upper body detector may be a detector for detecting a higher human body target, the third upper body detector may be a detector for detecting a lower human body target, and the first upper body detector may be a detector for detecting a lower human body target.

It can be appreciated that, generally, when detecting a pedestrian target at a far distance, since the pedestrian target is smaller, the feature dimension is low, the detection false alarm rate of a low-resolution image containing less information will increase compared to a high-resolution image containing more information, and the detection accuracy of the human shape of the whole body will not be high. Therefore, in order to reduce the false alarm rate and improve the detection accuracy of the whole body humanoid form, in this embodiment, the candidate whole body region is detected first by a whole body to local mode, and then the head shoulder feature and the upper body feature in the candidate whole body region are detected to determine whether the target whole body region is the target whole body region. In addition, when detecting the pedestrian target at a short distance, since the resolution ratio of the image at the short distance is relatively high and the feature dimension is high, the false alarm rate of the classifier is relatively low, and in the embodiment, multiple classifiers are not used to further reduce false alarm, so that the resource occupation of the target at the short distance can be reduced, and the consumption of detection time is reduced. In addition, in this embodiment, since the whole body detector, the first upper body detector, and the first head shoulder detector are in a cascade structure, the detection time of the non-human target does not significantly rise compared with that of a single classifier, and when the human-shaped whole body target is detected, other irrelevant areas are not detected due to the use of the relative positions of the upper body, the head shoulder, and the whole body, so that a large number of irrelevant windows are filtered, the classifier time is not too long even when the human-shaped target is present, and the detection efficiency is high.

Further, the step S136 specifically includes:

s1361, performing human-shaped upper body region detection on the feature map of the image to be detected through a pre-trained second upper body detector to obtain an upper body region detection result.

It can be understood that the following situations exist in the process of performing human-shaped upper body region detection on the feature map of the image to be detected through the pre-trained second upper body detector: first, if the existence of the human-shaped upper body region is not detected, the output upper body region detection result does not contain a second candidate upper body region; second, a plurality of second candidate upper body regions are detected, and the output upper body region detection result contains information such as positions and sizes of the plurality of second candidate upper body regions.

And S1362, when the upper body region detection result is that the upper body region detection result comprises a plurality of second candidate upper body regions, determining second candidate head-shoulder regions in the plurality of candidate upper body regions according to the position and the size of the head-shoulder detection region in the upper body detection region based on the aspect ratio relation between the plurality of second candidate upper body regions and the upper body detection region in the standard whole body template.

Specifically, a standard whole body template marked with an upper body detection area and a head-shoulder detection area is preset, when the upper body area detection result contains a plurality of second candidate upper body areas, the position and the size of the head-shoulder position to-be-detected area in each second candidate upper body area are determined according to the position and the size of the head-shoulder detection area in the upper body detection area of the standard whole body template based on the aspect ratio relation between each second candidate upper body area and the upper body detection area in the preset standard whole body template, so as to obtain the position and the size of the second candidate head-shoulder area in each second candidate upper body area.

S1363, judging whether the second candidate head and shoulder areas in the plurality of second candidate upper body areas contain the human head and shoulder by a pre-trained second head and shoulder detector, and obtaining a third judgment result corresponding to the plurality of second candidate upper body areas.

In order to reduce the time consumption in extracting the feature map, specifically, the feature map of the second candidate head-shoulder area in the plurality of second candidate upper body areas may be obtained by extracting the head-shoulder area in the feature map of the image to be detected according to the positions and sizes of the second candidate head-shoulder areas in the plurality of second candidate upper body areas, so that the feature map of the second candidate head-shoulder area in the plurality of second candidate upper body areas does not need to be repeatedly detected, and the feature map of the second candidate head-shoulder area in the plurality of second candidate upper body areas is input into the second head-shoulder detector to be analyzed, so as to determine whether the second candidate head-shoulder area in the plurality of second candidate upper body areas includes the human head-shoulder, thereby obtaining a third determination result corresponding to the plurality of second candidate upper body areas.

And S1364, determining the corresponding second candidate upper body region with the third judging result as the target upper body region of the image to be detected.

And when the third judgment result corresponding to the second candidate upper body area is yes, judging that the second candidate upper body area is the target upper body area of the image to be detected.

It will be appreciated that, in general, when detecting a pedestrian target at a far distance, since the pedestrian target is smaller, the detection false alarm rate of a low-resolution image containing less information than a high-resolution image containing more information will increase, and the detection accuracy for the human upper body will not be high. Therefore, in order to reduce the false alarm rate and improve the detection accuracy of the humanoid upper body, the embodiment judges whether the target upper body area is the target upper body area by detecting the head-shoulder characteristics in the second candidate upper body area first and then detecting the second candidate upper body area.

Still further, before the step S13, the method further includes:

S21, acquiring a pedestrian data training set; the pedestrian data training set comprises a plurality of human-shaped whole body sample graphs with the same size as a preset standard size, and each human-shaped whole body sample graph is marked with a human-shaped upper body area and a head-shoulder area through circumscribed rectangular frames.

The pedestrian data training set can be obtained by carrying out complete pedestrian target screenshot on a plurality of pedestrian pictures. The marked circumscribed rectangle can reserve partial edge pixels according to the target size in proportion. The preset standard size is width_body.

S22, determining the central positions and the sizes of the head and shoulder detection windows and the upper body detection windows according to the information of the upper body region and the head and shoulder region marked in the plurality of human-shaped whole body sample images.

The step S22 specifically includes:

Determining the central position of an upper body detection window according to the left lower corner vertex coordinates and the sizes of the upper body areas marked in the plurality of human-shaped whole body sample images; the horizontal coordinate of the central position of the upper body detection window is equal to the average value of the sum of the horizontal coordinate of the left lower corner vertex and half of the width of the upper body area marked in the plurality of human-shaped whole body sample graphs; the ordinate of the central position of the upper body detection window is equal to the average value of the sum of the ordinate of the left lower corner vertex and half of the width of the upper body area marked in the plurality of human-shaped whole body sample graphs;

Determining the center position of a head-shoulder detection window according to the left lower corner vertex coordinates and the sizes of the head-shoulder areas marked in the plurality of human-shaped whole body sample graphs; the horizontal coordinate of the central position of the head-shoulder detection window is equal to the average value of the sum of the horizontal coordinate and half of the width of the left lower corner vertex of the head-shoulder area marked in the plurality of human-shaped whole body sample graphs; the ordinate of the central position of the head-shoulder detection window is equal to the average value of the sum of the ordinate and half of the width of the left lower corner vertex of the head-shoulder region marked in the plurality of human-shaped whole body sample graphs;

Determining the width and the height of the upper body detection window according to the width and the height of the upper body region marked in the plurality of human-shaped whole body sample images respectively; the width of the upper body detection window is equal to the average value of the widths of the upper body areas marked in the plurality of human-shaped whole body sample images, and the height of the upper body detection window is equal to the average value of the heights of the upper body areas marked in the plurality of human-shaped whole body sample images;

determining the width and the height of the head-shoulder detection window according to the width and the height of the head-shoulder areas marked in the plurality of human-shaped whole body sample images respectively; the width of the head-shoulder detection window is equal to the average value of the widths of the head-shoulder regions marked in the plurality of human-shaped whole-body sample graphs, and the height of the head-shoulder detection window is equal to the average value of the heights of the head-shoulder regions marked in the plurality of human-shaped whole-body sample graphs.

Taking the head-shoulder detection window as an example, the head-shoulder position recorded in the i-th human whole-body sample chart (K total) is denoted as (x_head _i,y_head_i,width_head_i,height_head_i), and the horizontal rightward direction is denoted as the positive x-axis direction and the vertical upward direction is denoted as the positive y-axis direction, the coordinates (center_x_head, center_y_head) of the center position of the head-shoulder detection window are calculated as follows:

Wherein center_x_head is the abscissa of the center position of the head-shoulder detection window, K is the number of the plurality of human-shaped whole-body sample images, x_head _i is the abscissa of the left lower corner vertex of the head-shoulder region marked in the i Zhang Renxing-th whole-body sample image, width_head _i is the width of the head-shoulder region marked in the i Zhang Renxing-th whole-body sample image, center_y_head is the ordinate of the center position of the head-shoulder detection window, y_head _i is the ordinate of the left lower corner vertex of the head-shoulder region marked in the i Zhang Renxing-th whole-body sample image, and height_head _i -th Zhang Renxing-th whole-body sample image is the height of the head-shoulder region marked in the i Zhang Renxing-th whole-body sample image;

The standard width and height of the head and shoulder detection window are calculated as follows:

wherein width_head is the width of the head-shoulder detection window, and height_head is the height of the head-shoulder detection window.

Similarly, formulas for calculating the center position (center_x_ upperbody, center_y_ upperbody) and the size width_ upperbody of the upper body detection window can be derived, and are not described herein.

S23, scaling the plurality of human-shaped whole body sample graphs according to three preset sizes respectively to obtain a plurality of human-shaped whole body sample graphs corresponding to the three sizes respectively; wherein the three dimensions include a first dimension, a second dimension, and a third dimension, the first dimension being greater than the second dimension, the second dimension being greater than the third dimension.

Specifically, scaling is performed on the plurality of human whole body sample graphs according to a preset first size, a preset second size and a preset third size respectively, so as to obtain the plurality of human whole body sample graphs of the first size, the plurality of human whole body sample graphs of the second size and the plurality of human whole body sample graphs of the third size.

S24, based on the proportional relation between the standard size and the three sizes, extracting head and shoulder areas of the plurality of human-shaped whole body sample graphs corresponding to the three sizes respectively according to the central position and the size of the head and shoulder detection window to obtain a plurality of human-shaped head and shoulder sample graphs corresponding to the three sizes respectively.

Specifically, based on the proportional relation between the standard size and the first size, determining head-shoulder areas in a plurality of human-shaped whole body sample images of the first size according to the central position and the size of the head-shoulder detection window, and extracting the head-shoulder areas in the plurality of human-shaped whole body sample images of the first size to obtain a plurality of human-shaped head-shoulder sample images of the first size; based on the proportional relation between the standard size and the second size, determining head-shoulder areas in a plurality of human-shaped whole body sample images of the second size according to the central position and the size of the head-shoulder detection window, and extracting the head-shoulder areas in the plurality of human-shaped whole body sample images of the second size to obtain a plurality of human-shaped head-shoulder sample images of the second size; based on the proportional relation between the standard size and the third size, determining head-shoulder areas in the plurality of human-shaped whole-body sample images of the third size according to the central position and the size of the head-shoulder detection window, and extracting the plurality of human-shaped head-shoulder sample images of the third size according to the head-shoulder areas in the plurality of human-shaped whole-body sample images of the third size.

S25, based on the proportional relation between the standard size and the third size and the second size, respectively, according to the central position and the size of the upper body detection window, respectively extracting upper body areas of the plurality of human-shaped whole body sample images of the third size and the plurality of human-shaped whole body sample images of the second size to obtain a plurality of human-shaped upper body sample images of the third size and a plurality of human-shaped upper body sample images of the second size.

Specifically, based on the proportional relation between the standard size and the second size, determining upper body areas in a plurality of human-shaped whole body sample images of the second size according to the central position and the size of an upper body detection window, and extracting the upper body areas in the plurality of human-shaped whole body sample images of the second size to obtain a plurality of human-shaped upper body sample images of the second size; based on the proportional relation between the standard size and the third size, determining upper body areas in the plurality of human-shaped whole body sample images of the third size according to the central position and the size of the upper body detection window, and extracting the plurality of human-shaped upper body sample images of the third size according to the upper body areas in the plurality of human-shaped whole body sample images of the third size.

S26, training six preset classifiers acquired in advance according to the plurality of human-shaped whole body sample graphs of the third size, the plurality of human-shaped upper body sample graphs of the second size and the plurality of human-shaped head-shoulder sample graphs respectively corresponding to the three sizes to obtain the whole body detector, the first upper body detector, the second upper body detector, the first head-shoulder detector, the second head-shoulder detector and the third head-shoulder detector.

Wherein, the method comprises the steps of pre-acquiring a negative sample data set, selecting a non-pedestrian target according to the need, then performing feature extraction on a negative sample in the negative sample data set, a plurality of human-shaped whole-body sample images with a third size, a plurality of human-shaped upper body sample images with a second size and a plurality of human-shaped head-shoulder sample images respectively corresponding to the three sizes, training a preset first classifier according to the negative sample feature images in the negative sample data set and the feature images of the human-shaped whole-body sample images with the third size to obtain a trained whole-body detector, training a preset second classifier according to the negative sample feature images in the negative sample data set and the feature images of the human-shaped upper body sample images with the third size to obtain a trained first upper body detector, training a preset third classifier according to the negative sample feature map in the negative sample data set and the feature maps of the plurality of human-shaped upper body sample maps of the second size to obtain a trained second upper body detector, training a preset fourth classifier according to the negative sample feature map in the negative sample data set and the feature maps of the plurality of human-shaped head-shoulder sample maps of the third size to obtain a trained first head-shoulder detector, training a preset fifth classifier according to the negative sample feature map in the negative sample data set and the feature maps of the plurality of human-shaped head-shoulder sample maps of the second size to obtain a trained second head-shoulder detector, and training a preset sixth classifier according to the negative sample feature map in the negative sample data set and the feature maps of the plurality of human-shaped head-shoulder sample maps of the first size to obtain a trained third head-shoulder detector.

It should be noted that, in step S23, the sizes of the obtained several human-shaped whole-body sample graphs corresponding to the three sizes are (scale_i×width_body, scale_i×height_body), where scale_i (i=1, 2, 3) corresponds to the first size, the second size, and the third size, respectively. The detection target area selected here has a containing relationship, and the human-shaped whole body area contains a human-shaped upper body area which contains a human-shaped head-shoulder area. Illustratively, the scale_i selection rule is as follows:

For a human whole-body region, a complete human whole-body target of (scale_3×width_body, scale_3×height_body) can appear in the video, but a complete human whole-body target of (scale_2×width_body, scale_2×height_body) cannot appear. Wherein,

For a human upper body region, a full human upper body region (scale_2×width_upper body, scale_2×height_upper body) can appear in the video, but a full human upper body region (scale_1×width_upper body, scale_1×height_upper body) cannot appear.

For a human head-shoulder region, a complete human head-shoulder region can appear in the video (scale_1×width_head, scale_1×height_head).

Still further, the size of the preset standard whole body template is equal to the preset standard size.

Then, before the step S13, after the step S21, the method further includes:

s31, determining the central positions of the upper body detection area and the head shoulder detection area in the standard whole body template according to the information of the upper body area and the head shoulder area marked in the plurality of human-shaped whole body sample images.

Specifically, the center positions of the upper body detection region and the head-shoulder detection region in the standard whole body template are the same as the center positions of the head-shoulder detection window and the upper body detection window in the human-shaped whole body sample graph, that is, the center positions of the upper body detection region and the head-shoulder detection region in the standard whole body template are calculated in the same manner as the center positions of the head-shoulder detection window and the upper body detection window.

S32, determining the sizes of the upper body detection area and the head shoulder detection area according to the information of the upper body area and the head shoulder area marked in the plurality of human-shaped whole body sample images.

Specifically, the step S32 specifically includes:

It is to be noted that when the abscissa of the lower left corner vertex of one region is larger than the difference between the abscissa of the center position of the other region and half of the width of the other region, and the abscissa of the lower right corner vertex of the region is smaller than the sum of the abscissa of the center position of the other region and half of the width of the other region, it is determined that the region is completely contained by the other region in the lateral range;

When the ordinate of the left lower corner vertex of one region is greater than the difference between the ordinate of the center position of the other region and half the height of the other region, and the ordinate of the left upper corner vertex of the region is less than the sum of the ordinate of the center position of the other region and half the height of the other region, it is determined that the region is completely contained by the other region within the longitudinal range.

Illustratively, the standard width and height calculation method of the head and shoulder detection area is as follows:

And R is a set of head and shoulder region positions, corresponding width values are selected, the width selection needs to ensure that the transverse range can completely contain the proportion of the head and shoulder regions in R to be alpha or more, and the minimum value which can meet the condition is selected as the roiWidth head. After the width value is selected, the head and shoulder regions which are completely contained in the transverse range are selected to form a set R'. The height is selected in a similar way, and the ratio of the head and shoulder area in the longitudinal range which can be completely contained in R' is ensured to be alpha or more, and the minimum value meeting the condition is also selected as the roijheight head.

The calculation formula of the minimum value of the width of the head-shoulder detection area for enabling the third preset condition to be met is as follows:

Wherein widt denotes the width of the head-shoulder detection region; k is the number of the plurality of human-shaped whole body sample graphs; alpha is a preset proportional threshold; for the ith personal whole body sample graph in the K personal whole body sample graphs, there are

The set R ' obtained after removing the region with M _i =0, assuming that the number of elements in R ' is K ', the calculation formula of the minimum value of the height of the head-shoulder detection region that satisfies the fourth preset condition is:

wherein the ith element of the K 'elements in R' is

Where height represents the height of the head-shoulder detection area.

Correspondingly, the embodiment of the invention also provides a humanoid detection device which can implement all the processes of the humanoid detection method.

Referring to fig. 2, a schematic structural diagram of a humanoid detection device according to an embodiment of the invention is shown.

The humanoid detection device provided by the embodiment of the invention comprises:

an image acquisition module 21 for acquiring an image to be detected;

the feature extraction module 22 is configured to perform feature extraction on the image to be detected, so as to obtain a feature map of the image to be detected;

The human shape detection module 23 is configured to perform human shape detection on the feature map of the image to be detected, so as to obtain a target human shape region of the image to be detected; the target humanoid region comprises a target whole body region, a target upper body region and a target head-shoulder region;

And the humanoid marking module 24 is used for marking the target humanoid region in the image to be detected.

The principle of the humanoid detection device for realizing humanoid detection is the same as that of the above method embodiment, and is not described herein again.

According to the humanoid detection device provided by the embodiment of the invention, the feature extraction is carried out on the acquired image to be detected to obtain the feature map of the image to be detected, the humanoid detection is carried out on the feature map of the image to be detected through the pre-trained whole body detector, the pre-trained upper body detector and the pre-trained head shoulder detector to obtain the target humanoid region of the image to be detected, and then the target humanoid region in the image to be detected is marked to realize humanoid detection. In the embodiment of the invention, in the process of carrying out humanoid detection on the feature map of the image to be detected, not only the whole body region of the target but also the upper body region of the target and the head and shoulder region of the target are detected, so that a humanoid detection result is obtained, and therefore, when the part below the head and shoulder of the humanoid target is shielded, effective humanoid detection can still be realized, and the humanoid detection accuracy is improved.

Further, the humanoid detection module specifically includes a whole body region detection unit, an upper body region detection unit, and a head-shoulder region detection unit:

The whole body region detection unit is used for carrying out human-shaped whole body region detection on the feature map of the image to be detected through a pre-trained whole body detector to obtain a whole body region detection result;

The whole body region detection unit is further configured to determine, when the whole body region detection result includes a plurality of candidate whole body regions, a first candidate upper body region and a first candidate head shoulder region in the plurality of candidate whole body regions according to a position and a size of a preset upper body detection region and a head shoulder detection region in a standard whole body template based on an aspect ratio relationship between the plurality of candidate whole body regions and the preset standard whole body template;

The whole body region detection unit is further used for judging whether a first candidate upper body region in the plurality of candidate whole body regions contains a human-shaped upper body or not through a first pre-trained upper body detector, so as to obtain a first judgment result corresponding to the plurality of candidate whole body regions;

The whole body region detection unit is further used for judging whether first candidate head-shoulder regions in the plurality of candidate whole body regions contain human head-shoulder regions or not through a pre-trained first head-shoulder detector, and obtaining second judgment results corresponding to the plurality of candidate whole body regions;

The whole body region detection unit is further configured to determine that the candidate whole body region for which the corresponding first determination result and the second determination result are both yes is the target whole body region of the image to be detected;

The upper body region detection unit is used for detecting the human-shaped upper body region of the feature map of the image to be detected through a pre-trained second upper body detector and a pre-trained second head shoulder detector to obtain a target upper body region of the image to be detected;

and the head-shoulder region detection unit is used for performing human-shaped head-shoulder region detection on the feature map of the image to be detected through a pre-trained third head-shoulder detector to obtain a target head-shoulder region of the image to be detected.

Still further, the upper body region detection unit is specifically configured to:

Still further, the apparatus further comprises a classifier training module, the classifier training module being specifically configured to:

Still further, the size of the preset standard whole body template is equal to the preset standard size;

The device further comprises a detection area acquisition module, wherein the detection area acquisition module is specifically configured to:

Specifically, the humanoid marking module is specifically configured to:

And marking the combined target humanoid region in the image to be detected.

Referring to fig. 3, a schematic diagram of a humanoid detection apparatus according to an embodiment of the present invention is provided.

The embodiment of the invention provides a humanoid detection device, which comprises a processor 31, a memory 32 and a computer program stored in the memory 32 and configured to be executed by the processor 31, wherein the humanoid detection method according to any embodiment is realized when the processor 31 executes the computer program.

The processor 31 when executing the computer program implements the steps of the embodiment of the human detection method described above, for example all the steps of the human detection method shown in fig. 1. Or the processor 31 when executing the computer program implements the functions of the modules/units of the embodiment of the humanoid detection apparatus described above, such as the functions of the modules of the humanoid detection apparatus shown in fig. 2.

Illustratively, the computer program may be split into one or more modules that are stored in the memory 32 and executed by the processor 31 to perform the present invention. The one or more modules may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program in the human detection device. For example, the computer program may be divided into an image acquisition module, a feature extraction module, a human shape detection module, and a human shape marking module, each of which specifically functions as follows: the image acquisition module is used for acquiring an image to be detected; the feature extraction module is used for extracting features of the image to be detected to obtain a feature map of the image to be detected; the human shape detection module is used for carrying out human shape detection on the feature images of the image to be detected to obtain a target human shape area of the image to be detected; the target humanoid region comprises a target whole body region, a target upper body region and a target head-shoulder region; and the humanoid marking module is used for marking the target humanoid region in the image to be detected.

The humanoid detection device can be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The humanoid detection apparatus may include, but is not limited to, a processor 31, a memory 32. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a human detection device and is not meant to be limiting, and that more or fewer components than shown may be included, or that certain components may be combined, or that different components may be included, for example, the human detection device may also include an input output device, a network access device, a bus, etc.

The Processor 31 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 31 is a control center of the human detection device, and connects the various parts of the entire human detection device using various interfaces and lines.

The memory 32 may be used to store the computer program and/or module, and the processor 31 may implement various functions of the human detection device by running or executing the computer program and/or module stored in the memory 32 and invoking data stored in the memory 32. The memory 32 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the humanoid detection apparatus (such as audio data, phonebook, etc.), and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the integrated modules/units of the humanoid detection apparatus may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A human shape detection method, comprising:

Acquiring an image to be detected;

Marking a target humanoid region in the image to be detected;

Performing humanoid detection on the feature map of the image to be detected to obtain a target humanoid region of the image to be detected, wherein the method specifically comprises the following steps:

2. The human shape detection method according to claim 1, wherein the detecting the human shape upper body region of the feature map of the image to be detected by the pre-trained second upper body detector and the pre-trained second head-shoulder detector to obtain the target upper body region of the image to be detected specifically comprises:

3. The human shape detection method according to claim 2, further comprising, before the human shape detection is performed on the feature map of the image to be detected to obtain the target human shape region of the image to be detected:

4. A humanoid detection method according to claim 3, wherein the size of the preset standard whole body template is equal to the preset standard size;

then, before the feature map of the image to be detected is subjected to humanoid detection to obtain the target humanoid region of the image to be detected, after the training set of pedestrian data is obtained, the method further comprises:

5. The human shape detection method according to claim 4, wherein the determining the sizes of the upper body detection area and the head-shoulder detection area according to the information of the upper body area and the head-shoulder area marked in the plurality of human shape whole body sample images specifically comprises:

6. The human shape detection method according to claim 1, wherein the marking the target human shape region in the image to be detected specifically includes:

And marking the combined target humanoid region in the image to be detected.

7. A humanoid detection apparatus, characterized by comprising:

The image acquisition module is used for acquiring an image to be detected;

The humanoid marking module is used for marking the target humanoid region in the image to be detected;

8. A human shape detection device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the human shape detection method according to any one of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the human shape detection method according to any one of claims 1 to 6.