CN114821677B

CN114821677B - Human body detection method and device, storage medium and passenger flow statistics camera

Info

Publication number: CN114821677B
Application number: CN202210747317.6A
Authority: CN
Inventors: 肖兵
Original assignee: Zhuhai Shixi Technology Co Ltd
Current assignee: Zhuhai Shixi Technology Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-10-04
Anticipated expiration: 2042-06-29
Also published as: CN114821677A

Abstract

The application discloses a human body detection method and device, a storage medium and a passenger flow statistics camera, which are used for improving the human body detection effect aiming at a depth image. The method comprises the following steps: acquiring a human body detection result which is output in advance, wherein the human body detection result comprises a target cluster of a human body area and a bounding box of the human body area; carrying out data association on the target cluster and the bounding box of the current frame image and the target cluster and the bounding box of the previous frame image; determining one-to-many association items in the human body detection result based on the data association result, wherein the association items comprise a target to be split and a plurality of association targets associated with the target to be split; creating a classification object mapped with the current frame image; determining a basic region set and a pending region set in the associated target based on the classified object; performing adhesion splitting operation on the target to be split according to the basic area set and the set of the area to be split; and outputting the human body detection result again based on the result of the adhesion splitting operation.

Description

Human body detection method and device, storage medium and passenger flow statistics camera

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a human body detection method, an apparatus, a storage medium, and a passenger flow statistics camera.

Background

Under the scenes of the fields of consumer electronics, security protection, traffic and the like, passenger flow statistics is often needed to better judge the trend of people. For passenger flow statistics, human body detection is a necessary and crucial link for passenger flow statistics, and the detection accuracy directly influences the final statistical accuracy.

In the prior art, a human body detection technology based on an RGB image is mature, and in the industry, passenger flow statistics is generally performed through the RGB image by using a detection scheme based on HOG + SVM or a detection scheme based on deep learning. In the prior art, a method for acquiring a depth image by using a depth camera based on TOF and structured light and then performing human body detection and passenger flow statistics by using the depth image exists, but the method for performing human body detection based on the depth image is only obtained by directly transferring a human body detection method based on an RGB image, and because the depth image and the RGB image have different acquisition principles and imaging has larger difference, the actual detection effect is poor, and an ideal human body detection result is difficult to obtain through the depth image.

Disclosure of Invention

In order to solve the technical problem, the application provides a human body detection method, a human body detection device, a storage medium and a passenger flow statistics camera.

A first aspect of the present application provides a human body detection method, including:

obtaining a human body detection result which is output in advance, wherein the human body detection result comprises a target cluster of a human body area and a bounding box of the human body area;

performing data association on the target cluster and the bounding box of the current frame image and the target cluster and the bounding box of the previous frame image to obtain a data association result;

determining one-to-many association items in the human body detection result based on the data association result, wherein the association items comprise a target to be split and a plurality of association targets associated with the target to be split;

creating a classification object mapped with the current frame image;

determining a basic region set and a pending region set in the associated target based on the classified object;

performing a sticky splitting operation on the target to be split based on the base region set and the set of regions to be split;

and outputting the human body detection result again based on the result of the adhesion splitting operation.

Optionally, the creating a classification object mapped by the current frame image includes:

a mask image of the same size as the current frame image is created.

Optionally, the determining, based on the classification object, a base region set and a pending region set in the association target includes:

setting the pixel value of the position corresponding to the target to be split as a valid value in the mask image;

traversing the associated targets in the associated items, and creating an initial basic area set and a to-be-determined area set;

traversing pixels of a target cluster corresponding to the associated target;

if the pixel value of the corresponding position in the mask image is the effective value, the pixel of the corresponding position in the target cluster is brought into the basic region set, and the pixel value of the corresponding position in the mask image is set as a number value which is a numerical value not the effective value;

traversing the pixels of the target to be split again;

if the pixel value of the corresponding position in the mask image is the effective value, the pixel of the corresponding position in the target to be split is contained in the to-be-determined region set, and a basic region set and a to-be-determined region set are obtained.

Optionally, before the pixel value of the position corresponding to the target to be split in the mask image is set as a valid value, the method further includes:

and setting the pixel values of all pixels in the mask image as initialization values, wherein the initialization values are numerical values which are not the effective value and the number value.

Optionally, the performing, based on the base region set and the set of regions to be split, a sticky split operation on the target to be split includes:

determining the region attribution of each pixel in the to-be-determined region set, thereby obtaining two split sub-regions;

the outputting the human body detection result again based on the result of the adhesion splitting operation comprises:

and outputting the human body detection result again based on the two split sub-regions.

Optionally, the determining the region attribution of each pixel in the pending region set, so as to obtain two split sub-regions includes:

contracting the set of pending regions from a boundary to an interior;

and if the domain pixels of the target pixel comprise pixels with pixel values as the number values, incorporating the target pixel into the basic region set corresponding to the number values, thereby obtaining two split sub-regions.

Optionally, the determining the region attribution of each pixel in the set of to-be-determined regions so as to obtain two split sub-regions includes:

expanding the set of pending regions from inside to outside;

Optionally, the determining, based on the data association result, one-to-many association items in the human body detection result includes:

determining a data association result for verifying the bounding box of the current frame image and the bounding box of the previous frame image;

if the bounding box of the current frame image and the bounding box of the previous frame image have a one-to-many condition, verifying the data association result of the target cluster of the current frame image and the target cluster of the previous frame image;

if the target cluster of the current frame image and the target cluster of the previous frame image are in a one-to-many condition, determining that the individual targets are adhered in the human body detection result;

and determining one-to-many association items in the human body detection result based on the association result of the bounding box and the association result of the target cluster.

Optionally, the performing data association between the target cluster and the bounding box of the current frame image and the target cluster and the bounding box of the previous frame image includes:

data association is performed by the following equation:

wherein, the

Represents the overlapping area of the target A, B in the human body detection result,

、

each representing the area of target A, B.

A second aspect of the present application provides a traffic statistics camera, which includes a processor and a depth camera, wherein the processor executes the method of any one of the first aspect and the options of the first aspect during operation.

A third aspect of the present application provides a human detection apparatus, the apparatus comprising:

the device comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring a human body detection result which is output in advance, and the human body detection result comprises a target cluster of a human body region and a bounding box of the human body region;

the data association unit is used for performing data association on the target cluster and the bounding box of the current frame image and the target cluster and the bounding box of the previous frame image to obtain a data association result;

an association item determining unit, configured to determine, based on the data association result, one-to-many association items in the human body detection result, where the association items include a target to be split and multiple association targets associated with the target to be split;

a creating unit for creating a classification object mapped with the current frame image;

a region set determining unit, configured to determine a basic region set and a to-be-determined region set in the association target based on the classification object;

the adhesion splitting unit is used for executing adhesion splitting operation on the target to be split based on the basic area set and the set of the area to be split;

and the re-output unit is used for re-outputting the human body detection result based on the result of the adhesion splitting operation.

A fourth aspect of the present application provides a human detection apparatus, the apparatus comprising:

the device comprises a processor, a memory, an input and output unit and a bus;

the processor is connected with the memory, the input and output unit and the bus;

the memory holds a program that the processor calls to perform the method of any of the first aspect and the first aspect.

A fifth aspect of the present application provides a computer readable storage medium having a program stored thereon, which when executed on a computer performs the method of any one of the first aspect and the first aspect.

According to the technical scheme, the method has the following advantages:

according to the human body detection method, a pre-output human body detection result can be corrected, data association is carried out on the image of the current frame and the image of the previous frame, if one-to-many conditions exist, namely adhesion conditions exist in human body detection results, then the target to be split and a plurality of associated targets are determined based on the data association results, and when correction is carried out, adhesion splitting is carried out on the target to be split through the basic area set and the area set to be split, so that the accurate single target number is obtained, and the accuracy of human body detection can be effectively improved.

The human body detection method provided by the application has the advantages of high operation speed, low requirement on computing power, convenience in deployment, wide applicability and large-scale popularization and application prospect, and can achieve real-time detection when running on a middle-low end embedded platform CPU.

Drawings

In order to more clearly illustrate the technical solutions in the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a human detection method provided in the present application;

FIG. 2 is a schematic diagram of a cluster of targets and bounding boxes in the human detection results of the present application;

FIG. 3 is another schematic diagram of the target cluster and bounding box in the human body test results of the present application;

FIG. 4 is a flowchart illustrating an embodiment of step S103 in the present application;

FIG. 5 is a schematic view of an image of a mask according to the present application;

FIG. 6 is a schematic representation of an association in the present application;

FIG. 7 is a flowchart illustrating an embodiment of step S105;

FIG. 8 is a schematic illustration of a base area in the present application;

FIG. 9 is a schematic illustration of two targets after splitting;

fig. 10 is a schematic flowchart of an embodiment of a method for obtaining a human body detection result in the present application;

fig. 11 is a schematic flowchart of an embodiment of step S203 in the present application;

FIG. 12 is a comparative schematic of the present application employing a bounding box of the head instead of a bounding box of the body;

FIG. 13 is a schematic structural diagram of a human body detecting device according to a first embodiment of the present application;

fig. 14 is a schematic structural diagram of a human body detection device according to a first embodiment of the present application.

Detailed Description

Based on this, the application provides a human body detection method for improving the human body detection effect aiming at the depth image.

It should be noted that the human body detection method provided by the present application may be applied to a passenger flow camera, a depth camera, and other terminals, and may also be applied to a server, and the other terminals may be a smart phone or a computer, a tablet computer, a smart television, a smart watch, a portable computer terminal, or an intelligent terminal with computing and data analysis capabilities such as a desktop computer. For convenience of explanation, the terminal is taken as an execution subject for illustration in the present application.

The human body detection method provided by the embodiment is mainly applied to a passenger flow camera, and realizes human body detection and subsequent passenger flow statistics based on the depth image. Compared with the RGB image, the depth image has the advantages that the depth image effect is not easily affected by illumination change, and the image can be normally acquired in dim light and even at night; the depth image does not have information such as color, texture and the like, so that the appearance information of people cannot be recorded, and the worry of people about privacy is eliminated; the depth image contains distance information to facilitate function application based on distance determination. Therefore, more and more passenger flow cameras are selecting such depth cameras.

However, when the depth camera performs human body detection, if the depth camera encounters multiple targets, adhesion between the targets may be caused, for example, in practice, two human body areas adhere to each other when two people stand back and forth, where adhesion refers to a situation where two or more targets in the calculated human body detection result are regarded as one target because two or more targets are close to each other and the cluster areas adhere to each other, which may cause false detection of the passenger flow camera. Aiming at the condition of target adhesion, the application provides a human body detection method which can well deal with the condition of target adhesion in a depth image, and the following describes the specific embodiment of the method in detail:

the embodiment provides a human body detection method which can improve the human body detection effect aiming at the depth image.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a human body detection method provided by the present application, and the method includes:

s101, obtaining a human body detection result which is output in advance, wherein the human body detection result comprises a target cluster of a human body area and a bounding box of the human body area;

in this embodiment, a human body detection result obtained by performing preliminary detection by a certain method is first obtained, where the human body detection result includes a target cluster and a bounding box of a human body region, see fig. 2 and fig. 3, where the target cluster is a pixel set of all human body regions, and the bounding box is a rectangular bounding box of a region detected as a human body. The preliminary tests can be carried out by the following embodiments:

the method includes the steps of preprocessing an input depth image to obtain an image to be detected, determining a picture motion area in the image to be detected through background modeling, clustering the picture motion area to determine a human body area set, and calculating a human body detection result according to the human body area set.

S102, performing data association on the target cluster and the bounding box of the current frame image and the target cluster and the bounding box of the previous frame image to obtain a data association result;

in order to determine the adhesion condition of individual targets in the human body detection result, performing data association on the human body detection result (including a target bounding box and a target cluster) of the current frame image and the human body detection result of the previous frame image, wherein if the target adhesion does not occur in the current scene, the corresponding data association results are all in a one-to-one relationship; when the targets start to stick, the data association is "one-to-many". Therefore, whether the target is separated to be stuck can be judged by confirming whether the data association is in a one-to-many state or not. If there is a one-to-many case, this indicates that there is adhesion of the individual target.

S103, determining one-to-many association items in the human body detection result based on the data association result, wherein the association items comprise a target to be split and a plurality of association targets associated with the target to be split;

referring to fig. 6, a diagram of a "one-to-many" correlation term is shown, in which a dotted line represents a detection result of a previous frame image, a solid line represents a calculation result of a current frame image, and specifically,

、

respectively representing the associated previous frame object A, B,

and representing the adhesion target detected by the current frame, namely the target to be split.

And

、

these 2 targets are related, belonging to the "one-to-many" case.

In another embodiment, in order to further improve the determination accuracy of the adhesion condition, the manner of determining that the human body detection result has adhesion may be:

and S1031, when the data association result has a one-to-many condition, verifying the data association result of the bounding box of the current frame image and the bounding box of the previous frame image, if the bounding box of the current frame image and the bounding box of the previous frame image have the one-to-many condition, executing the step S1032, and if the one-to-many condition does not exist, judging that no blocking exists.

S1032, verifying data association results of the target cluster of the current frame image and the target cluster of the previous frame image, if the target cluster of the current frame image and the target cluster of the previous frame image have a one-to-many condition, executing step S1033, and if the one-to-many condition does not exist, judging that adhesion does not exist.

And S1033, determining that the adhesion condition of the individual target in the human body detection result is dynamic adhesion.

S1034, determining that no adhesion exists.

In this embodiment, a discrimination scheme for blocking is provided, which adopts a "two-step verification" strategy, that is, when a target cluster and a bounding box exist in a one-to-many manner at the same time, the target cluster and the bounding box are identified as a one-to-many association, which can effectively improve discrimination accuracy,

in step S1031, performing data association between the bounding box detected from the current frame image and the bounding box of the previous frame, and if there is a "one-to-many" situation, performing the next verification; otherwise, directly judging that no dynamic adhesion exists.

In step S1032, the data in step S1031 is associated with the target of "one-to-many", and the target cluster is used to replace the bounding box for secondary association or confirmation, if the "one-to-many" condition still exists, it is determined that dynamic adhesion exists, and the data association result is output to the splitting link together; otherwise, judging that no dynamic adhesion exists.

Optionally, when performing data association, an IoM (interaction over Minimum) matching algorithm is used in the data association process to perform association matching, for example:

for target A, B, ioM is calculated as:

wherein, the first and the second end of the pipe are connected with each other,

representing the overlap area of target A, B,

、

each representing the area of the target A, B. Specifically, if data association is performed by using a bounding box, the overlap area and the target area are both calculated according to the bounding box; if the data association is performed by using the target cluster, the corresponding overlap area and the target area are calculated according to the target cluster.

In the step S102, if the target adhesion does not occur in the current scene, the corresponding data association results should be all in a one-to-one relationship; when the targets start to be adhered, the data association may occur in a "one-to-many" condition, so that the adhesion condition of the individual targets can be determined through the data association result, for example, when the one-to-many condition occurs, the adhesion condition of the individual targets in the human body detection result is determined to be dynamic adhesion, and the dynamic adhesion indicates that the targets are initially separated and are adhered together later.

S104, creating a classification object mapped with the current frame image;

in this embodiment, a classification object mapped with the current frame image is first created to record the pixel values of the pixels in the current frame image, and the classification object may be a mask image having the same size as the current frame image or may be recorded in the form of a recording table.

For example: in order to ensure high efficiency, an 8-bit mask image with the same size as the detection image is created in advance, and the mask image can be used as a one-dimensional lookup table, namely the pixel value range in the mask image is [0, 255]. In the present embodiment, an INVALID area pixel VALUE (INVALID _ VALUE) is defined to be 255, an effective area pixel VALUE (VALID _ VALUE) is defined to be 128, and numerical VALUEs (0 to 127) of 128 or less are used as the number numerical VALUEs. In practice, it is reasonable in practical situations that 128 is taken as VALID _ VALUE, that is, it is assumed that the maximum number of current detection results does not exceed 128, and of course, other VALUEs between 128 and 244 may be taken, or 128 is taken as VALID _ VALUE and 255 is taken as INVALID _ VALUE, which is not limited herein.

S105, determining a basic region set and a to-be-determined region set in the associated target based on the classified object;

based on the classification object created in the step S104, determining all basic regions and pending regions in the associated target associated with the target to be split, to obtain a basic region set and a pending region set; for example, the following procedure:

determining a basic region set and a to-be-determined region set in the associated target based on the mask image, wherein the basic region set is a divided region, and the to-be-determined region is a region which is not divided yet, and the subsequent steps need to accurately divide the to-be-determined region into the corresponding basic region to complete the division of the region, so as to perform segmentation.

Referring to fig. 7, a specific manner may be:

s1051, setting the pixel values of all pixels in the mask image as initialization values;

all pixel values of the mask image are initialized, and the initialized value may be set to the invalid value of the above, i.e., 255.

S1052 traverses the pixels in the target to be split, and sets the pixel values of the pixels corresponding to the position of the target to be split in the mask image as effective values;

then, pixels in the target cluster to be split are traversed, and the corresponding position VALUE in the mask image is set as a VALID VALUE VALID _ VALUE (128).

Referring to FIG. 5, at this point

The region is a region to be split,

the mask image corresponding to the region has a pixel VALUE of VALID _ VALUE (128), and the mask image corresponding to the other region has a pixel VALUE of INVALID _ VALUE (255).

S1053, traversing the related objects in the related items, and creating an initial basic area set and a to-be-determined area set;

s1054, traversing the pixels of the target cluster corresponding to the associated target;

s1055, if the pixel value of the corresponding position in the mask image is the effective value, then the pixel of the corresponding position in the target cluster is brought into the basic region set, and the pixel value of the corresponding position in the mask image is a number value, wherein the number value is any value other than the effective value and the initialization value;

and traversing the objects in the association list of the objects to be split, creating a corresponding basic region set for the i association targets, traversing the pixels in the association target cluster, if the pixel VALUE of the corresponding position in the mask image is VALID _ VALUE (128), adding the pixel position i into the corresponding basic region set, and setting the VALUE of the mask image at the corresponding position as i.

With reference to figure 8 of the drawings,

is a

The number of (a) is included,

the number of (c) }isfirst

Create a set of base areas A (i = 0), traverse

If the pixel VALUE of the corresponding location in the mask image is VALID _ VALUE (128) (obviously,

and

is a), then the overlap/intersection of (a) will beThe pixel position is added to A, and the value of the mask image at the corresponding position is set to 0. Similarly, a B region is obtained, and the value of the mask image corresponding to B is 1 (i = 1). The foundational area A, B is

And (4) performing initial splitting.

S1056 traversing the pixel of the target to be split again;

and S1057, if the pixel value of the corresponding position in the mask image is the effective value, incorporating the pixel of the corresponding position in the target to be split into the set of the to-be-determined region to obtain a basic region set and a set of the to-be-determined region.

And traversing the pixels in the target to be split again, and adding the position to the set of the region to be determined if the numerical VALUE of the mask image at the corresponding position is still VALID _ VALUE (128) for any pixel. Referring to fig. 8, the remaining area of the mask image with a value of 128 between the basic areas A, B is the pending area.

By the method, the basic region set and the pending region set are obtained, and then the regions can be divided only by incorporating the pending region set into the corresponding basic region set.

S106, performing adhesion splitting operation on the target to be split based on the basic area set and the set of the regions to be split;

in the embodiment, the region attribution of each undetermined region in the undetermined region set, namely which basic region the undetermined region belongs to is determined, so that the precise division of the regions is realized.

An example of performing the adhesion detachment is provided below: determining the region attribution of each pixel in the to-be-determined region set, thereby obtaining two split sub-regions;

the specific mode of determining the region affiliation of each pixel in the pending region set may be to contract the pending region set from the boundary to the inside; and if the domain pixels of the target pixel comprise pixels with pixel values as the number values, incorporating the target pixel into the basic region set corresponding to the number values, thereby obtaining two split sub-regions.

The set of pending regions may be expanded from inside to outside;

The following is illustrated by way of example:

contracting the undetermined area from the boundary from outside to inside, and classifying the undetermined pixel (with the value of 128) at the boundary into a preliminary partition area adjacent to the undetermined pixel. Referring to fig. 8, if the neighborhood pixel value of the to-be-determined pixel at the boundary includes 0, it is classified into the 0 th basic region (i.e., a), whereas if the neighborhood pixel value includes 1, it is classified into the 1 st basic region (i.e., B).

In addition, the base region may be diffused outward. That is, the root area A, B simultaneously diffuses outward to gradually incorporate the pixels in the adjacent regions to be determined therein. Finally, with reference to FIG. 9,

obtaining regions after splitting

、

I.e. the effect after splitting.

And S107, outputting the human body detection result again based on the result of the adhesion splitting operation.

After the corresponding adhesion splitting operation is performed, for example, the target to be split in step S105 is split into two sub-regions, and the bounding box and the target cluster are obtained again, so that the information of the bounding box and the target cluster is updated, the corrected human body detection result can be obtained, and the human body detection result is output again.

The embodiment provides a human body detection method, which is reliable in detection, and combines a correction mechanism to deal with the condition of target adhesion in the depth image, so that the human body detection effect aiming at the depth image is greatly improved, and the passenger flow statistics precision is ensured.

The human body detection method provided by the application has the advantages of high operation speed, low calculation force requirement, convenience in deployment, wide applicability and large-scale popularization and application prospect, and can achieve real-time detection when running on a middle-low end embedded platform CPU.

The embodiments in the present application are all used to correct a pre-output human body detection result, where the pre-output human body detection result is obtained by performing preliminary detection through a certain method, and the pre-output human body detection result includes a target cluster and a bounding box of a human body region, where the target cluster is a pixel set of all human body regions, and the bounding box is a rectangular bounding box of a region detected as a human body.

Specific embodiment modes are provided below for the steps "obtaining a human body detection result output in advance, where the human body detection result includes a target cluster of a human body region and a bounding box of the human body region", which are mentioned in this application, and detailed descriptions are provided below with reference to fig. 10, where the embodiment includes:

s201, preprocessing an input depth image to obtain an image to be detected;

the human body detection method provided by the embodiment is mainly applied to a passenger flow camera, and realizes human body detection and subsequent passenger flow statistics based on the depth image. Compared with the RGB image, the depth image has the advantages that the depth image effect is not easily affected by illumination change, and the image can be normally acquired in dim light and even at night; the depth image does not have information such as color, texture and the like, so that the appearance information of people cannot be recorded, and the worry of people about privacy is eliminated; the depth image contains distance information, facilitating function application based on distance judgment. Therefore, more and more passenger flow cameras are selecting such depth cameras.

However, in practical applications, the passenger flow camera needs to cover a relatively large detection range, and therefore a depth lens with a large field angle is often mounted, and although the field of view of the large field angle is wider, distortion and inclination generated when a human body approaches the edge of a picture are also more serious; on the other hand, in the depth image, holes or missing parts of human body such as human head, body edge, legs and other low-reverse positions are easy to occur, and in severe cases, the shapes of the head or the legs are completely missing. That is, human bodies in the depth image often do not have robust features, which makes it difficult for conventional schemes of machine learning or deep learning for single-frame images to obtain ideal detection results. The embodiment provides a human body detection method, which can improve the human body detection effect for the depth image.

In the embodiment, the terminal first performs preprocessing on the input depth image, including but not limited to down-sampling and format conversion on the depth image, so as to reduce the amount of calculation and improve the detection speed.

The terminal carries out preprocessing on the input depth image, wherein the preprocessing specifically comprises down-sampling and/or format conversion, and the preprocessing also comprises threshold gating on the processed image and then obtaining the image to be detected.

Specifically, the down-sampling and/or format conversion includes: if the resolution of the depth image is larger, the original depth image is downsampled (namely reduced); if the bit depth of the depth image is larger than 8bit, the depth image is converted into an 8bit image, so that the calculated amount is reduced, and the detection speed is improved. The down-sampling and format conversion sequence can be interchanged, preferably, the terminal firstly performs down-sampling on the original depth image and then converts the down-sampled depth image into an 8-bit image.

Furthermore, the terminal also needs to perform depth gating on the processed depth image, that is, a depth range [ Imin, imax ] is preset, and a pixel value that does not conform to the depth range is set to be 0. It should be noted that Imin and Imax may be specifically set according to application requirements and actual scenarios. Some dark part (short-distance) noise and bright part (long-distance) noise can be preliminarily screened out through depth gating, and the detection effect is favorably improved.

S202, determining a picture motion area in an image to be detected through background modeling;

and the terminal determines the picture motion area in the image to be detected through background modeling. Background modeling can be directly used for detecting a moving target or used as a preprocessing link to reduce the search range and further reduce the calculation amount for a scene with a fixed camera and slowly changed picture background. The main advantages of background modeling are that the amount of computation is relatively small and fast. In the depth image acquired in the passenger flow statistics scene, the shape of the human body is often incomplete and unfixed, for example, the head or the leg of the human body is missing, the image of the human body, which is acquired by the camera with a large field angle and is close to the edge of the picture, is seriously distorted and inclined, and effective human body detection cannot be realized by a single-frame-based detection means of the camera with a conventional view angle. In the embodiment, the picture motion area in the image to be detected is determined through background modeling, multi-frame information can be fully considered, a moving object (a pedestrian) is distinguished from the picture background, and the interference of the background on the detection result is reduced.

In some embodiments, the background modeling may be implemented using a CodeBook algorithm CodeBook or LOBSTER algorithm.

S203, clustering the picture motion areas to determine a human body area set, and calculating a human body detection result according to the human body area set;

the terminal clusters the picture motion region obtained in step S202, specifically, clusters effective pixels in the picture motion region to obtain a cluster set, that is, a human body region set in the present application. Specifically, the clustering process refers to traversing a neighborhood pixel of any pixel A in a cluster, and adding Ni into the cluster in which A is located if the absolute value of the difference between the pixel values of Ni and A is smaller than a preset intra-cluster similarity threshold value S for any effective neighborhood pixel Ni, or building a new cluster, adding Ni into the new cluster and continuing clustering.

The above-mentioned obtained human body region set usually already contains main human body regions, but each target cluster cannot be directly regarded as a human body region. This is due to:

1) The obtained clusters may include noise regions, moving objects, false detected backgrounds and other non-human body regions;

2) When the human bodies contact or block each other, the resulting cluster areas adhere together, rather than individual human body areas, referred to herein as "adhesion";

in order to solve the above problem, after the human body region set is obtained, the corresponding human body detection result is further calculated according to the human body region set, that is, the terminal traverses the cluster set, and the bounding boxes of all the clusters are obtained and used as the detection frames of the clusters to obtain the human body detection result.

Specifically, the terminal traverses all pixels of the cluster to obtain the minimum value and the maximum value of x and y coordinates of the pixels in the cluster: xmin, ymin, xmax, ymax, and then determining the corresponding bounding box rectangle, wherein the rectangle is the target detection frame. It should be noted that the human body detection result at least includes a human body region set and a human body bounding box set.

An embodiment of determining a set of body regions is provided below, and with reference to fig. 11, the embodiment includes:

s2031, marking the pixels in the picture motion area as effective pixels through an image mask, and marking the pixels outside the picture motion area as ineffective pixels;

and after the terminal obtains the picture motion area, an image mask with the same resolution as the image to be detected is manufactured to mark whether the pixel is effective or not. Specifically, the corresponding pixels of the picture motion area obtained in step S2032 are marked as valid, and the pixels of the remaining area are marked as invalid.

Furthermore, the terminal can mark the upper, lower, left and right boundary pixels of the image mask as invalid, so that the boundary check of each pixel is avoided during the subsequent clustering, and the efficiency is improved.

S2032, carrying out clustering processing on the effective pixels according to the image mask and the image to be detected to obtain a human body region set;

the terminal clusters all the marked effective pixels in the step S2031 according to the image mask and the image to be detected to obtain a cluster set, namely the human body region set in the application. Specifically, the clustering process refers to traversing a neighborhood pixel of any pixel A in a cluster, and adding Ni into the cluster in which A is located if the absolute value of the difference between the pixel values of Ni and A is smaller than a preset intra-cluster similarity threshold value S for any effective neighborhood pixel Ni, or building a new cluster, adding Ni into the new cluster and continuing clustering.

Further, since clustering may involve searching images, in some specific embodiments, depth-First-Search (DFS) or Breadth-First-Search (BFS) may be used. Preferably, breadth-first search is adopted, recursion is avoided, and memory consumption is low, so that the calculation speed is improved.

S2033, determining a bounding box of the human body region set;

the terminal traverses the human body region set, i.e., the cluster set obtained in step S2032, and an AABB bounding box of each cluster is obtained and used as a detection frame of the cluster. The process of finding the AABB bounding box is as follows: traversing all pixels of the target cluster, and solving the minimum value and the maximum value of x and y coordinates of the pixels in the cluster: and (5) determining the corresponding bounding box rectangle, namely the target detection frame, by xmin, ymin, xmax and ymax.

In some specific embodiments, the terminal may further obtain a pixel position corresponding to a highest point (height of ymax) of each cluster during the process of obtaining a minimum value and a maximum value of x and y coordinates of pixels in the cluster, so as to obtain a vertex coordinate set of each cluster. Therefore, the highest point can be used as a vertex, and a small human head frame can be generated by taking the highest point as a head top reference to replace a human body bounding box, so that in a multi-person and adhesive scene, dense and overlapped detection frames can be prevented from being output, and the display effect on an application end is improved.

Referring to fig. 12, the upper diagram in fig. 12 is a display before improvement, and the lower diagram in fig. 12 is a display effect after improvement by using a human head frame instead of a human body bounding box, which greatly improves user experience.

S2034, screening the bounding boxes through preset constraints, and determining the screened bounding boxes as first human body detection results;

and after the terminal obtains the bounding boxes of the human body region set through calculation, screening the obtained bounding boxes through preset constraints.

Specifically, the preset constraints include, but are not limited to: the body region area constraint, bounding box aspect ratio constraint, boundary limit constraint, and height constraint, which will be described separately below.

1) Human body region area constraint;

specifically, a target area threshold range [ Amin, amax ] is set, and targets having an area not within the set range are discarded. The target area is the number of pixels in the cluster. Further, the Amin value should consider the minimum limit of the area of a single region, and the Amax value should consider the maximum limit of the area of a plurality of regions in an adhesion scene, which is to consider the adhesion situation of the plurality of regions, firstly reserve the region and reserve a subsequent adhesion splitting link for correction. Optionally, the Amin value is smaller than the minimum limit of the single-person region area, and the intention is to consider the situation of tearing of a single human body region, and firstly keep the results for correction of a subsequent tearing and merging link.

2) Bounding box aspect ratio constraints;

specifically, a target detection frame width-to-height ratio range [ Rmin, rmax ] is set, and targets whose detection frame width-to-height ratios are not within the set range are discarded. The aspect ratio is the ratio of the height to the width of the bounding box. Further, the value of Rmin should consider the minimum limit of the width-height ratio of a single surrounding box, and the value of Rmax should consider the maximum limit of the width-height ratio of a multi-surrounding box in an adhesion scene, which is the same as the above [ Amin, amax ].

3) Limiting and constraining the boundary;

specifically, an upper boundary line, a lower boundary line, a left boundary line and a right boundary line are set according to application requirements and actual scene characteristics, and targets with center points exceeding the boundary lines are discarded. Specifically, the central point is a bounding box central point or a human body region centroid, and preferably, the central point is a human body region centroid. The method aims to directly ignore the human body when the human body is in the image boundary with more missing shapes.

4) Height constraint;

specifically, the height threshold HT is set, and objects having a height lower than the height threshold are discarded. The aim is to screen out some low objects which are misdetected, such as chairs which are regarded as motion areas by the background modeling module due to moving. Two specific height discrimination schemes are provided, and the scheme II is preferably adopted:

according to the first scheme, the actual height of the target in the depth image is estimated through the internal and external parameters of the camera and the pixel coordinates of the target, and then the estimated value is compared with a height threshold value;

and in the second scheme, a height calibration mode is adopted, the depth map acquisition is carried out on the height plane where the height threshold HT is located to obtain a reference depth map, or the height Hc of the camera is recorded, the camera is arranged at the height of Hc-HT, then the ground depth map is acquired to serve as the reference depth map, and in the actual use process, the height relation between the target and the reference depth map is determined by comparing the target depth and the reference depth map, so that whether the target is lower than HT or not is judged.

It should be noted that, in the above screening process, clusters and bounding boxes always correspond to each other, and when a cluster is screened, the corresponding bounding box is also deleted synchronously, and vice versa.

And in practical applications, step S2034 will likely be performed multiple times, wherein the height constraint need not be performed each time, only once for the total detection process. The method specifically comprises the following steps: after the corresponding splitting or merging processing is executed, all the bounding boxes of all the clusters need to be solved again, the results are screened through the human body area constraint, the bounding box aspect ratio and the boundary limit constraint, and the screening detection results through the height constraint are placed after the steps as a single step and are only executed once.

The application also provides a passenger flow camera, which comprises a processor and a depth camera, wherein the processor executes any one of the human body detection methods in the running process.

Referring to fig. 13, the present application also provides a human body detecting device, including:

an obtaining unit S301, configured to obtain a human body detection result output in advance, where the human body detection result includes a target cluster of a human body region and a bounding box of the human body region;

a data association unit S302, configured to perform data association between the target cluster and the bounding box of the current frame image and the target cluster and the bounding box of the previous frame image to obtain a data association result;

an association item determining unit S303, configured to determine, based on the data association result, one-to-many association items in the human body detection result, where the association items include an object to be split and multiple association objects associated with the object to be split;

a creating unit S304 for creating a classification object mapped with the current frame image;

a region set determining unit S305, configured to determine a base region set and a to-be-determined region set in the association target based on the classification object;

an adhesion splitting unit S306, configured to perform an adhesion splitting operation on the target to be split based on the base region set and the set of regions to be determined;

and a re-output unit S307 configured to re-output the human body detection result based on the result of the adhesion splitting operation.

Optionally, the creating unit S304 is specifically configured to:

a mask image of the same size as the current frame image is created.

Optionally, the area set determining unit S305 is specifically configured to:

setting the pixel values of all pixels in the mask image as initialization values;

traversing pixels in the target to be split, and setting pixel values of pixels corresponding to the target to be split at positions in the mask image as effective values;

traversing pixels of a target cluster corresponding to the associated target;

if the pixel value of the corresponding position in the mask image is the effective value, the pixel of the corresponding position in the target cluster is brought into the basic region set, and the pixel value of the corresponding position in the mask image is set as a number value, wherein the number value is any value other than the effective value and the initialization value;

traversing the pixels of the target to be split again;

if the pixel value of the corresponding position in the mask image is the effective value, the pixel of the corresponding position in the target to be split is contained in the set of the to-be-determined region, and a basic region set and a set of the to-be-determined region are obtained.

Optionally, the adhesion splitting unit S306 is specifically configured to:

determining the region attribution of each pixel in the to-be-determined region set so as to obtain two split sub-regions;

optionally, the adhesion splitting unit S306 is specifically configured to:

contracting the set of pending regions from a boundary to an interior;

Optionally, the adhesion splitting unit S306 is specifically configured to:

expanding the set of pending regions from inside to outside;

Optionally, the association item determining unit S303 is specifically configured to:

Optionally, the data association unit S302 is specifically configured to:

data association is performed by the following equation:

wherein, the

、

each representing the area of the target A, B.

Referring to fig. 14, the present application also provides a human body detection apparatus including:

a processor S401, a memory S402, an input/output unit S403 and a bus S404;

the processor S401 is connected with the memory S402, the input and output unit S403 and the bus S404;

the memory S402 holds a program that the processor S401 calls to perform any of the above human detection methods.

The present application also relates to a computer-readable storage medium having a program stored thereon, which when run on a computer causes the computer to perform any of the methods described above.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Claims

1. A method of human detection, the method comprising:

creating a classification object mapped with the current frame image;

determining a basic region set and a region to be determined set in the association target based on the classification object, wherein the basic region set is a set of divided regions, and the region to be determined is a set of regions which are not divided;

2. The human body detection method of claim 1, wherein the creating of the classification object mapped to the current frame image comprises:

a mask image of the same size as the current frame image is created.

3. The human body detection method according to claim 2, wherein the determining a set of base regions and a set of pending regions in the associated target based on the classified object comprises:

traversing pixels of a target cluster corresponding to the associated target;

traversing the pixels of the target to be split again;

4. The human body detection method according to claim 3, wherein before setting the pixel value of the position corresponding to the target to be split in the mask image as a valid value, the method further comprises:

setting the pixel values of all pixels in the mask image as initialization values, wherein the initialization values are values which are not the effective value and the number value.

5. The human body detection method according to claim 1, wherein the performing an adhesion splitting operation on the target to be split based on the base region set and the set of regions to be determined comprises:

the outputting the human body detection result again based on the result of the adhesion splitting operation includes:

6. The human body detection method according to claim 5, wherein the determining the region attribution of each pixel in the set of regions to be determined so as to obtain two split sub-regions comprises:

contracting the set of pending regions from a boundary to an interior;

7. The human body detection method according to claim 5, wherein the determining the region attribution of each pixel in the set of regions to be determined so as to obtain two split sub-regions comprises:

expanding the set of pending regions from inside to outside;

8. The human body detection method according to claim 1, wherein the determining one-to-many associated items in the human body detection result based on the data association result comprises:

9. The human body detection method according to any one of claims 1 to 8, wherein the data associating the target cluster of a current frame image and the bounding box with the target cluster of a previous frame image and the bounding box comprises:

data association is performed by the following equation:

wherein, the

、

each representing the area of target A, B.

10. A traffic statistics camera, characterized in that it comprises a processor and a depth camera, said processor in operation performing the human detection method according to any of claims 1 to 9.

11. A human detection device, characterized in that the device comprises:

an association determining unit, configured to determine, based on the data association result, one-to-many associations in the human body detection result, where the associations include a target to be split and multiple association targets associated with the target to be split;

a region set determining unit, configured to determine, based on the classification object, a basic region set and a to-be-determined region set in the association target, where the basic region set is a set of already-divided regions, and the to-be-determined region is a set of regions that have not been divided;

12. A human detection device, characterized in that the device comprises:

the device comprises a processor, a memory, an input and output unit and a bus;

the memory holds a program that the processor calls to execute the human detection method according to any one of claims 1 to 9.

13. A computer-readable storage medium having a program stored thereon, the program, when executed on a computer, performing the human detection method according to any one of claims 1 to 9.