CN113392726B

CN113392726B - Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene

Info

Publication number: CN113392726B
Application number: CN202110579518.5A
Authority: CN
Inventors: 王安; 陆磊; 曹箫洪
Original assignee: SHANGHAI FEILO ACOUSTICS CO LTD
Current assignee: SHANGHAI FEILO ACOUSTICS CO LTD
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2023-06-02
Anticipated expiration: 2041-05-26
Also published as: CN113392726A

Abstract

The invention provides a method and a system for identifying and detecting a head of a person in an outdoor monitoring scene, comprising the following steps: acquiring outdoor pedestrian videos; preprocessing the acquired pedestrian video to obtain a processed pedestrian video; building a pre-trained human head recognition model, and utilizing the pre-trained human head recognition model: performing high-resolution video reconstruction on the processed pedestrian video to obtain a reconstructed pedestrian video; partitioning the reconstructed pedestrian video to obtain a plurality of video blocks; extracting human head characteristics of the obtained video blocks respectively; processing the extracted head characteristics to obtain a head identification frame of the current video frame; tracking the head of the person by using the acquired head identification frame; counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene. The invention is based on machine vision, can effectively detect the pedestrian head under low resolution, and improves the detection accuracy of the whole system.

Description

Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene

Technical Field

The invention relates to the technical field of outdoor head recognition and monitoring, in particular to a head recognition and detection method, a system, a terminal and a medium based on machine vision in an outdoor monitoring scene.

Background

The people head identification is an important technology for predicting group events in intelligent city management, can count the current people flow density in real time and comprehensively master the change rule of the current people flow density, and can early warn in time, so that the labor cost is greatly reduced, and the monitoring and studying and judging capabilities of a manager are effectively improved. The human head recognition and statistics technology relates to the disciplines such as video analysis, artificial intelligence and electronic information.

The traditional human head recognition statistical algorithm is mainly divided into two parts: human head detection and human head tracking. Under the condition that the monitoring camera is fixed, the human head detection mainly divides the human from the background by detecting the head characteristics of the pedestrian; in order to achieve a better statistical effect, pedestrians detected in the current observation area are tracked, so that repeated statistics of the number of heads is avoided.

However, in the outdoor situation, the position where the monitoring camera is installed is high, the resolution of the camera capturing the picture is generally low, and particularly when the monitored scene is crowded, the resolution of the head to be detected is lower. As shown in fig. 1, the conventional human head detection scheme includes: the camera acquisition module, the foreground extraction module, the candidate head extraction module, the head classification module, the head tracking module, the color gamut adjustment module, the head feature extraction module and the head model module can see that if the classification algorithm based on the supervised mechanism does not improve the resolution ratio, the detection accuracy is lower, and the requirement of the test cannot be met.

In order to improve the accuracy of pedestrian detection, a deep learning-based method is adopted, such as: the fast-RCNN is used for detecting the head of a person, but the hardware cost required by the deep learning method is high, the real-time processing is not easy to be carried out at the head end of a local monitoring camera, and the real-time processing is generally carried out by returning to a cloud platform, so that the manufacturing cost of the whole monitoring system is high, and the large-scale popularization is not easy.

The search finds that:

in a background image preprocessing module, frame sampling of images is mainly carried out through illumination intensity and reasonably set threshold values so as to obtain reasonable images, reasonable evacuation or opening and closing of security inspection ports is achieved through detection of the traffic flow, and airport staff is effectively helped to know the traffic flow condition of the security inspection ports in time; the security inspection efficiency is improved through reasonable evacuation, opening and closing of security inspection ports and other methods, and meanwhile, the security inspection time of passengers can be saved. However, in the case of crowding, the technology cannot accurately identify the head due to lower head resolution.

The Chinese patent application No. 201710104613.3 discloses a classroom people number detection method and system based on a machine vision and binocular cooperation technology, which comprises the steps of convoluting acquired classroom left and right video image data with gray level images by using a Gaussian filter mask template, smoothing the images, suppressing noise, weakening background information, enhancing the figure contour effect, and acquiring images with different resolutions by using a traditional super-resolution reconstruction algorithm to detect people's heads. However, this technique is only indoor and is not suitable for outdoor detection with severe light.

The Chinese patent application 201611235768.2 discloses a method, a device and a medium for counting static video people based on human head detection, wherein the head outline of a pedestrian in each frame of image can be accurately detected by using an optical flow method and a three-frame difference method through the color characteristics and the similar elliptic characteristics of the human head, so that human head information is obtained. However, the technology has a slow detection frame frequency, and is not easy to detect the head of a person in real time.

In summary, the prior art including the above-mentioned patent technology still has the problem that when the resolution of the human head is low and the external environment is complex, the accurate and rapid human head recognition cannot be performed, and no description or report similar to the present invention is found at present, and no similar data at home and abroad have been collected.

Disclosure of Invention

The invention provides a method, a system, a terminal and a medium for identifying and detecting a head of a person in an outdoor monitoring scene based on machine vision aiming at the defects in the prior art.

According to one aspect of the invention, there is provided a head recognition and detection method in an outdoor monitoring scene, including:

acquiring outdoor pedestrian videos;

preprocessing the obtained pedestrian video to obtain a processed pedestrian video;

Constructing a pre-trained human head recognition model, and performing the following steps by using the pre-trained human head recognition model:

-performing a high resolution video reconstruction of the processed pedestrian video, resulting in a reconstructed pedestrian video;

-partitioning the reconstructed pedestrian video to obtain a plurality of video blocks;

-extracting human head features from each of the plurality of video blocks;

-processing the extracted head features to obtain a head identification frame of the current video frame;

performing head tracking by using the acquired head identification frame;

counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene.

Preferably, the acquiring the outdoor pedestrian video includes:

and acquiring an outdoor pedestrian video signal through a camera to obtain a video source Image (S).

Preferably, the pretreatment comprises:

performing white balance and color gamut adjustment processing on an acquired video source Image (S) of the pedestrian video;

and carrying out foreground extraction and pedestrian component extraction on the processed video in sequence.

Preferably, the performing white balance and color gamut adjustment processing on the acquired pedestrian video includes:

For the acquired pedestrian video, respectively calculating pixel mean values R of RGB three channels of the pedestrian video in an offline manner _ave 、G _ave and B_ave Gain value K of RGB three channels _R 、K _G and K_B ；

Pixel mean value R of RGB three channels _ave 、G _ave and B_ave And gain value K of RGB three channels _R 、K _G and K_B Corresponding multiplication is carried out to obtain gain pixel values of three channels of RGB:

R＝K _R ×R _ave ；

G＝K _G ×G _ave ；

B＝K _B ×B _ave ；

at this time, the video stream of the video source Image (S) becomes Image (R, G, B), and the white balance processing is completed;

and performing color gamut adjustment on the video stream image (R, G, B) to obtain an image Re (R, G, B), and completing color gamut adjustment processing.

Preferably, the sequentially performing foreground extraction and pedestrian component extraction on the processed video includes:

obtaining a foreground video frame ImageRe (R, G, B, i) of the processed video by utilizing a method of combining an inter-frame relation and an optical flow algorithm; wherein i is a frame number representing a video stream;

and preliminarily deleting useless parts without human heads in the previous Jing Shipin frames image Re (R, G, B, i) by using a corrosion and expansion method to obtain the pedestrian part ImageSize (R, G, B, i).

Preferably, the building the pre-trained human head recognition model includes:

acquiring human head samples under outdoor monitoring, setting the resolutions of the human head samples to be 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36 respectively, and classifying and sorting the samples into positive human head sample sets respectively

And negative human head sample set->

wherein ,k⁺ Is a sample set of positive human head->

Number of samples of human head, k ^- Is a negative human head sample set->

The number of human head samples;

step 4: alignment human head sample set

The human head samples with the medium resolutions of 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by n respectively obtain a sample set S _r {K ⁺ }：

wherein ,

is a positive sample of the head of a person;

sparse representation based on local penalty coefficients for sample set S _r {K ⁺ Performing ultra-high resolution reconstruction, wherein the reconstruction method comprises the following steps:

wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X _O Representing initial image values, λ representing locally penalized regularization parameters for improving robustness of the reconstructed image;

repeating the reconstruction method until the constructed image meets the required visual definition;

for the reconstructed human head sample library, the sample set S after superdivision reconstruction _r {K ⁺ Dividing the texture feature into a plurality of region blocks, extracting the Centrist LBP operators of the different region blocks by adopting an adaptive sliding window extraction method, and obtaining a texture feature-based result:

wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) ₀ ,g _i ) G is the comparison result of the regional block gray level and the non-pixel center gray level ₀ The gray level g of the pixel at the center of the regional block _i For non-center pixel gray scale of regional block, i represents non-center pixelA heart pixel number;

human head positive sample set by using Centrist LBP operator

And human head negative sample set->

Training the samples in the model by adopting an SVM model, extracting coarse-granularity human head characteristics, and constructing to obtain a primary human head characteristic extraction model;

performing Hough feature transformation on the extracted coarse-granularity human head features by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers; and retraining the primary human head feature extraction model by using the obtained Hough feature numbers to obtain a final two-layer human head recognition model.

Preferably, the reconstructing the high resolution image of the processed pedestrian video includes:

performing sparse reconstruction based on local punishment on the pedestrian component ImageSize (R, G, B, i):

wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X _O Representing the initial image values, λ representing the regularization parameters of the local penalty, for improving the robustness of the reconstructed image,

a minimum modulus value representing errors of both the minimum reconstructed image and the optimal image; />

Repeating the reconstruction method until the constructed image meets the required visual definition.

Preferably, the partitioning the reconstructed pedestrian video includes:

dividing the reconstructed pedestrian video into j sub-video blocks with the same resolution, namely:

Im ageSize _r (R,G,B,i,j),j＝1,2,3,4。

preferably, the extracting the head features of the obtained video blocks respectively includes:

imagesize for each of the video blocks _r (R, G, B, i, j), extracting a Centrist LBP operator of the regional block, and carrying out coarse-granularity human head feature detection on an extraction result by using the human head recognition model:

wherein Fe (LBP) is the numerical value of the image after being subjected to the Centrist LBP coding, and the human head characteristic is obtained; delta (g) ₀ ,g _i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel ₀ G is the gray level of the central pixel in the current area block _i And i represents the number of the non-center pixel, and the human head characteristic is obtained through the operation.

Preferably, the processing the extracted head feature to obtain a head identification frame of the current video frame includes:

performing Hough feature transformation on the obtained head features to obtain Hough feature numbers, and performing secondary feature classification through the head recognition model to obtain a head recognition frame of the current video frame:

Im a geSize _r (R,G,B,x _i ,y _j ,x _m ,y _n )

wherein ,x_i To identify the lower left-hand abscissa of the frame, y _j To identify the lower left-hand ordinate of the frame, x _m To identify the upper right-hand abscissa of the frame, y _n To identify the upper right-hand ordinate of the frame.

Preferably, the head tracking using the obtained head recognition frame includes:

and marking the head recognition frames in different video frames by adopting a K-means algorithm and a characteristic block matching method to realize head tracking.

Preferably, the marking the head recognition frame in different video frames by adopting a K-means algorithm and a feature block matching method includes:

setting an initial position point p of a human head for a current scene area, and defining an area with a radius of R by taking the initial position point p as a center to form an ROI area;

calculating all head sample points P in the ROI area _i And constructing a mean value of the initial position point p to obtain a vector D:

wherein n is the number of human head sample points;

the coverage area of the vector D is continuously expanded until the point P is found _T The point P _T The distance between the video frame and the initial position point p is smaller than a threshold value T, namely, the head with the same mark in different video frames is found, the mark is the same head, otherwise, the previous step is returned;

The same marked head between different video frames is found out by using a K-means algorithm;

and for the region marked with the same head, acquiring the heads among different frames by adopting a feature matching method.

Preferably, the counting the number of the tracked heads to obtain the number of the heads in the current scene includes:

and counting the number of the heads of different marks, wherein the obtained counting result is the number of the heads in the current scene.

According to another aspect of the present invention, there is provided a head recognition and detection system in an outdoor monitoring scene, including:

the camera acquisition module is used for acquiring outdoor pedestrian videos;

the foreground processing module is used for preprocessing the acquired pedestrian video to obtain a processed pedestrian video;

the ultrahigh-resolution reconstruction module is used for reconstructing the processed pedestrian video in a high-resolution mode to obtain a reconstructed pedestrian video;

the block extraction module is used for blocking the reconstructed pedestrian video to obtain a plurality of video blocks;

the candidate head extraction module is used for extracting head characteristics of the obtained video blocks respectively;

The head classification module is used for processing the extracted head characteristics to obtain a head identification frame of the current video frame;

the ultrahigh score reconstruction module, the segmentation extraction module, the candidate head extraction module and the head classification module form a head recognition model module;

the pre-training module is used for pre-training the head recognition model module;

the head tracking module is used for tracking the head by using the acquired head identification frame;

and the people head statistics module is used for counting the number of the tracked people heads to obtain the number of the people heads in the current scene, and the people head identification and detection in the outdoor monitoring scene are completed.

According to a third aspect of the present invention there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform the method of any one of the preceding claims or to run the system of the preceding claims.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method of any of the above, or to run a system as described above.

Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:

the head recognition detection method, system, terminal and medium in the outdoor monitoring scene provided by the invention are mainly used for carrying out super-high-resolution reconstruction based on local punishment sparse representation aiming at the scenes with crowded people and lower head resolution, and effectively improving the head statistics accuracy.

According to the method, the system, the terminal and the medium for identifying and detecting the head of the person in the outdoor monitoring scene, the head of the person can be counted locally by the monitoring camera, and the hardware configuration requirement on the cloud server of the monitoring system is reduced.

According to the head recognition and detection method, system, terminal and medium in the outdoor monitoring scene, high-definition image reconstruction is carried out under the conditions of crowded people flow and fuzzy pixels, and head statistics work is carried out.

According to the method, the system, the terminal and the medium for identifying and detecting the head of the person in the outdoor monitoring scene, which are provided by the invention, the image processing is carried out in a machine learning mode, so that the accuracy of pedestrian detection is improved.

The human head identification and detection method, the system, the terminal and the medium in the outdoor monitoring scene provided by the invention adopt the super-division reconstruction based on the local punishment coefficient to perform image preprocessing, and have the advantage of high resolution.

According to the human head identification and detection method, system, terminal and medium in the outdoor monitoring scene, the characteristic extraction is carried out on the reconstructed image by the Centrist LBP operator, so that the human head characteristics are obtained, and the method and the system have the advantage of high precision.

According to the method, the system, the terminal and the medium for identifying and detecting the pedestrian heads in the outdoor monitoring scene, which are provided by the invention, the pedestrian heads can be effectively detected for pedestrians in low resolution based on machine vision, and the detection accuracy of the whole system is improved.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

fig. 1 is a schematic diagram of a conventional demographic scheme in the prior art.

Fig. 2 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scene according to an embodiment of the present invention.

Fig. 3 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scenario according to a preferred embodiment of the present invention.

FIG. 4 is a flowchart of a method for pre-training a human head recognition model in accordance with a preferred embodiment of the present invention.

Fig. 5 is a schematic diagram of a component module of a head recognition and detection system in an outdoor monitoring scenario according to an embodiment of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.

As shown in fig. 2, the method for identifying and detecting a head of a person in an outdoor monitoring scene provided in this embodiment may include the following steps:

s100, acquiring outdoor pedestrian videos;

s200, preprocessing the acquired pedestrian video to obtain a processed pedestrian video;

s300, constructing a pre-trained human head recognition model, and performing S400-S700 by using the pre-trained human head recognition model:

s400, performing high-resolution video reconstruction on the processed pedestrian video to obtain a reconstructed pedestrian video;

s500, partitioning the reconstructed pedestrian video to obtain a plurality of video blocks;

s600, extracting human head features of the obtained video blocks respectively;

S700, processing the extracted head features to obtain a head identification frame of the current video frame;

s800, performing head tracking by using the acquired head identification frame;

s900, counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene.

In this embodiment S100, as a preferred embodiment, acquiring outdoor pedestrian video may include the steps of:

In this embodiment S200, as a preferred embodiment, the preprocessing may include the steps of:

s201, performing white balance and color gamut adjustment processing on a video source Image (S) of the acquired pedestrian video;

s202, foreground extraction and pedestrian component extraction are sequentially carried out on the processed video.

In this embodiment S201, as a preferred embodiment, the white balance and color gamut adjustment processing for the acquired pedestrian video may include the steps of:

s2011, respectively calculating pixel mean values R of three RGB channels of the pedestrian video offline according to the acquired pedestrian video _ave 、G _ave and B_ave Gain value K of RGB three channels _R 、K _G and K_B ；

S2012, the pixel mean value R of the RGB three channels _ave 、G _ave and B_ave And gain value K of RGB three channels _R 、K _G and K_B Corresponding multiplication is carried out to obtain gain pixel values of three channels of RGB:

R＝K _R ×R _ave ；

G＝K _G ×G _ave ；

B＝K _B ×B _ave ；

s2013, the video stream image (R, G, B) is subjected to color gamut adjustment to be image Re (R, G, B), and the color gamut adjustment process is completed.

In this embodiment S202, as a preferred embodiment, the foreground extraction and the pedestrian component extraction are sequentially performed on the processed video, which may include the following steps:

s2021, obtaining a foreground video frame image Re (R, G, B, i) of the processed video by utilizing a method of combining an inter-frame relation and an optical flow algorithm; wherein i is a frame number in the video stream;

s2022, preliminarily deleting the useless part without the head in the previous Jing Shipin frames image Re (R, G, B, i) by using a corrosion and expansion method to obtain the pedestrian part ImageSize (R, G, B, i).

In this embodiment S300, as a preferred embodiment, the construction of the pre-trained human head recognition model may include the steps of:

s301, acquiring human head samples under outdoor monitoring, setting the resolutions of the human head samples to be 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36 respectively, and classifying and sorting the set human head samples into positive human head sample sets respectively

And negative human head sample set->

wherein ,k⁺ Is a sample set of positive human head->

Number of samples of human head, k ^- Is a negative human head sample set->

The number of human head samples;

s302, aligning a human head sample set

wherein ,

is->

Is a positive sample of the head of a person;

s303, sparse representation based on local penalty coefficients is adopted for the sample set S _r {K ⁺ Performing ultra-high resolution reconstruction, wherein the reconstruction method comprises the following steps:

repeating the steps until the image meets the visual definition;

s304, the sample set S after super-division reconstruction _r {K ⁺ Dividing the texture feature into a plurality of region blocks, extracting the Centrist LBP operators of the different region blocks by adopting an adaptive sliding window extraction method, and obtaining a texture feature-based result:

wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) ₀ ,g _i ) For block gray level and non-pixel center gray levelComparison of results, g ₀ The gray level g of the pixel at the center of the regional block _i For the non-center pixel gray scale of the regional block, i represents the non-center pixel number;

s305, on the basis, utilizing a Centrist LBP operator to collect positive samples of the human head

And human head negative sample set

Training a middle sample (namely encoded data) by adopting an SVM model, extracting coarse-granularity human head characteristics, and constructing to obtain a primary human head characteristic extraction model;

s306, performing Hough feature transformation on the extracted coarse-granularity human head features by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers; and retraining the primary human head feature extraction model by using the obtained Hough feature number to obtain a final two-layer human head recognition model.

In S301 of this embodiment, the acquired human head sample may be acquired by manual labeling.

In S301 of this embodiment, the set head samples are respectively classified and sorted into positive head sample sets according to whether there is a head in the video frame

And negative human head sample set->

In this embodiment S400, as a preferred embodiment, the high resolution image reconstruction of the processed pedestrian video may include the following steps:

wherein,x represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X _O Representing the initial image values, λ representing the regularization parameters of the local penalty, for improving the robustness of the reconstructed image,

a minimum modulus value representing errors of both the minimum reconstructed image and the optimal image;

this step is repeated until the constructed image meets visual clarity.

Preferably, the method for partitioning the reconstructed pedestrian video can comprise the following steps:

in this embodiment S500, as a preferred embodiment, the reconstructed pedestrian video is divided into j sub-video blocks of the same resolution, namely:

Im ageSize _r (R,G,B,i,j),j＝1,2,3,4。

in this embodiment S600, as a preferred embodiment, the extracting of the head features of the obtained plurality of video blocks includes:

imagesize for each video block _r (R, G, B, i, j), extracting a Centrist LBP operator of the regional block, and carrying out coarse-granularity human head feature detection on the extraction result by using a human head recognition model:

/>

wherein FFe (LBP) is the numerical value of the image after being subjected to the Centrist LBP coding, and the human head characteristic is obtained; delta (g) ₀ ,g _i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel ₀ G is the gray level of the central pixel in the current area block _i For non-center pixel gray scale values, i represents non-center pixel numbers; the human head characteristics are obtained through the operation.

In this embodiment S700, as a preferred embodiment, the processing of the extracted head feature to obtain the head identification frame of the current video frame may include the following steps:

performing Hough feature transformation on the obtained head features to obtain Hough feature numbers, and performing secondary feature classification through a head recognition model to obtain a head recognition frame of the current video frame:

Im a geSize _r (R,G,B,x _i ,y _j ,x _m ,y _n )

In this embodiment S800, as a preferred embodiment, the head tracking using the acquired head recognition frame may include the following steps:

and marking the head recognition frames in different video frames by adopting a K-means algorithm and a characteristic block matching method, so as to realize head tracking.

In this embodiment S800, as a preferred embodiment, the marking of the head recognition frame in different video frames by using the K-means algorithm and the feature block matching method may include the following steps:

s801, setting an initial position point p of a human head for a current scene area, and defining an area with a radius of R by taking the initial position point p as a center to form an ROI area;

S802, calculating all head sample points P in the ROI area _i And constructing a mean value of the initial position point p to obtain a vector D:

wherein n is the number of human head sample points;

s803, the coverage area of the vector D is continuously enlarged until the point P is found _T Point P _T Satisfying a distance from the initial position point p less than a threshold T, i.eFinding out the heads with the same mark in different video frames, marking the heads as the same head, otherwise returning to S802;

s804, using a K-means algorithm to find the same marked head among different video frames;

s805, for the same head area (namely a smaller area) marked, a feature matching method is adopted to acquire the heads among different frames.

In this embodiment S900, as a preferred embodiment, counting the number of the tracked heads to obtain the number of heads in the current scene may include the following steps:

Fig. 3 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scene according to a preferred embodiment of the present invention.

As shown in fig. 3, the method for identifying and detecting a head of a person in an outdoor monitoring scene provided in the preferred embodiment may include the following steps:

Step 1, pedestrian video acquisition of an outdoor pavement is carried out;

step 2, performing white balance and color gamut adjustment processing on the acquired video, performing foreground extraction by utilizing an inter-frame relation and an optical flow algorithm, and extracting a pedestrian component by adopting an expansion and etching algorithm to obtain a processed pedestrian video;

step 3, constructing a pre-trained human head recognition model, and performing the steps 4 to 7 by using the pre-trained human head recognition model:

step 4, reconstructing the processed pedestrian video by 3 times of resolution image super-resolution video;

step 5, dividing the reconstructed pedestrian video into 4 video blocks;

step 6, adopting a Centrist LBP and Hough feature extraction algorithm to extract the head features of the 4 video blocks respectively;

step 7, processing the extracted head characteristics to obtain a head identification frame of the current video frame;

step 8, adopting an algorithm based on the combination of feature point matching and K-means to track the head of the head identification frame;

and 9, counting the number of the tracked heads to obtain the number of the heads in the current scene.

In this preferred embodiment, it may further include:

and 10, displaying the number of the acquired heads in the current scene.

In this preferred embodiment, the following two modes of operation may be included:

mode 1: a human head recognition model training mode; mode 2: and a real-time head statistics mode.

As shown in fig. 4, in the mode 1, the model training mode for head recognition may include the following steps:

step 101: extracting human head samples from the ImageNet database, and setting the resolution of the human head samples to 14×14, 28×28 and 36×36, to construct a positive human head sample set

And negative human head sample set->

k is a positive integer value, +, -represents a positive sample of the human head, and the human head sample set comprises 10 ten thousand human head training samples and 5000 human head test samples.

Step 102: the resolution ratios of the training samples of the human heads with the resolution ratios of 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36 from the sample set are respectively amplified by three times, and the three-time resolution ratio is obtained:

in the super-division method, for good robustness, sparse representation based on local penalty coefficients is adopted to carry out super-resolution reconstruction, and the reconstruction method comprises the following steps:

wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X _O The method comprises the steps of representing an initial image value, wherein lambda represents regularization parameters of local punishment and is used for improving robustness of a reconstructed image, and iterative operation is adopted in a reconstruction process to obtain a video frame to be processed with high definition;

Step 103: s for super-division reconstruction _r {K ⁺ Extracting self-adaptive Centrist LBP features to obtain texture features, namely extracting Centrist LBP operators of different area blocks according to different resolutions to obtain a result based on the texture features;

wherein Fe (LBP) represents the numerical value of the sample library after being subjected to the Centrist LBP coding, and the human head characteristic is obtained; delta (g) ₀ ,g _i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel ₀ G is the gray level of the central pixel in the current area block _i For a non-center pixel gray scale value, i represents a non-center pixel number.

Step 104: and carrying out coarse-granularity human head positive and negative sample feature extraction on the positive and negative samples and the corresponding SVM training model, and constructing to obtain a primary human head feature extraction model.

Step 105: performing Hough feature transformation on the extracted positive and negative sample features of the coarse-granularity human head by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers;

step 106: and retraining the primary human head feature extraction model by using the obtained Hough feature number to obtain a final two-layer human head recognition model.

In the mode 2, the real-time head statistics mode may include the following steps:

step 201: firstly, video signal acquisition is carried out through a camera, and the obtained video source is Image (S).

Step 202: the white balance of the Image (S) video source is carried out according to the following method: RGB mean R for Image (S) _ave 、G _ave and B_ave The method comprises the steps of carrying out a first treatment on the surface of the Offline calculating gains of three RGB channels to obtain K _R 、K _G and K_B Is a numerical value of (2); multiplying the pixel values of the RGB tee channels by the average gains of the three channels to obtain gain pixel values of the three channels:

R＝KR×R _ave ；

G＝K _c ×G _ave ；

B＝KB×B _ave ；

the white balance algorithm needs to be implemented on the FPGA, namely, a multiplier architecture based on CORDIC iteration is adopted to complete corresponding operation, and at the moment, the video stream of Image (S) is changed into Image (R, G, B);

after the white balance is completed, performing color gamut adjustment on the image (R, G, B) to become image Re (R, G, B); and obtaining image Re (R, G, B, i) by utilizing a method of combining the inter-frame relation and the optical flow algorithm, and roughly deleting useless parts without human heads in the picture through corrosion and expansion to obtain the image size (R, G, B, i).

Step 203: sparse reconstruction based on local punishment is carried out on ImageSize (R, G, B, i), and a reconstruction formula is adopted as follows:

wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X _O Representing the initial image values, λ representing locally penalized regularization parameters for improving the robustness of the reconstructed image, the reconstruction process employing iterative operations.

Step 204: the reconstructed image is divided into 4 sub-video blocks of the same resolution, namely:

Im ageSize _r (R,G,B,i,j),j＝1,2,3,4。

step 205: imagesize for video blocks _r (R, G, B, i, j) extracting based on the Centrist LBP and Hough features by adopting an adaptive sliding window extraction method to obtain texture features, namely extracting Centrist LBP operators of different area blocks according to different resolutions:

wherein Fe (LBP) is a numerical value of the sample library after being subjected to Centrist LBP coding, and thus the human head characteristics are obtained; delta (g) ₀ ,g _i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel ₀ G is the gray level of the central pixel in the current area block _i For a non-center pixel gray scale value, i represents a non-center pixel number.

Step 206: and obtaining a coarse-granularity human head detection result by the feature extraction result and the corresponding SVM training model.

Step 207: on the basis, hough feature transformation is carried out, and Hough feature numbers are obtained.

Step 208: performing SVM secondary feature classification to obtain a human head identification frame of the current video frame, which is ImageSize _r (R,G,B,x _i ,y _j ,x _m ,y _n )。

Step 209: the human head tracking module is used for carrying out ImageSize on the human head by adopting a method of matching K-means with the feature blocks _r (R,G,B,x _i ,y _j ,x _m ,y _n ) The method for marking the head movement in different video frames comprises the following steps:

Step1, setting an initial position point p of a human head for a larger image area, and defining an area with a radius R by taking the initial position point p as a center to form an ROI area;

step2, calculating the average value from all sample points in the ROI to p, and constructing a vector D, namely:

step3, continuously expanding the coverage area of vector D if point P is found _T So that P _T If the distance between the head and the head is smaller than the threshold T, the same head is found, otherwise, the head jumps to Step2;

the same head between different frames is found by using a K-means method;

step4, marking the same head of the person between different frames by adopting a feature matching method for marking the same head area.

Step 210: and the head statistics module is used for counting the number of heads of different marks and sending the counted result to the display terminal.

As shown in fig. 5, the system for identifying and detecting a head of a person in an outdoor monitoring scene according to an embodiment provided by this embodiment may include: the system comprises a camera acquisition module, a foreground processing module, an ultra-high partition reconstruction module, a block extraction module, a candidate head extraction module, a head classification module, a pre-training module, a head tracking module and a head statistics module.

wherein ：

the camera acquisition module is used for acquiring pedestrian videos of outdoor pavement;

the ultrahigh-resolution reconstruction module is used for reconstructing the high-resolution image of the processed pedestrian video to obtain a reconstructed pedestrian video;

the head classification module is used for processing the extracted head characteristics and acquiring a head identification frame of the current video frame;

the super-high partition reconstruction module, the partition extraction module, the candidate head extraction module and the head classification module form a head recognition model module;

and the people head statistics module is used for counting the number of the tracked people heads to obtain the number of the people heads in the current scene and finishing the identification and detection of the people heads in the outdoor monitoring scene.

An embodiment of the present invention provides a terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the program, is operative to perform the method of any one of the foregoing embodiments, or to perform the system of any one of the foregoing embodiments.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the method of any of the above embodiments, or to run the system of the above embodiments.

According to the head recognition detection method, system, terminal and medium under the outdoor monitoring scene provided by the embodiment of the invention, the ultrahigh-resolution reconstruction based on the local punishment sparse representation is mainly performed aiming at the scene with crowded people and lower head resolution, so that the head statistics accuracy is effectively improved; people head statistics can be carried out locally on the monitoring camera, so that the hardware configuration requirement on the monitoring system cloud server is reduced; under the conditions of crowded people flow and fuzzy pixels, reconstructing a high-definition image, and carrying out statistics on the heads of people; the image processing is carried out in a machine learning mode, so that the accuracy of pedestrian detection is improved; the super-division reconstruction based on the local punishment coefficient is adopted to perform image preprocessing, so that the method has the advantage of high resolution; the reconstructed image is subjected to characteristic extraction by a Centrist LBP operator to obtain the human head characteristics, so that the method has the advantage of high precision; based on machine vision, pedestrian head detection can be effectively carried out on pedestrians under low resolution, and detection accuracy of the whole system is improved.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred example of constructing the system, which is not described herein.

Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In summary, the present description should not be construed as limiting the invention.

Claims

1. The method for identifying and detecting the head of a person in an outdoor monitoring scene is characterized by comprising the following steps of:

acquiring outdoor pedestrian videos;

-extracting human head features from each of the plurality of video blocks;

performing head tracking by using the acquired head identification frame;

Counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene;

the building of the pre-trained human head recognition model comprises the following steps:

the human head samples are obtained, the resolution of the human head samples are respectively set to be 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36, and the samples are respectively classified and arranged into human head positive sample sets

And human head negative sample set->

wherein ,k⁺ For human head positive sample set->

Number of samples of human head, k ^- Is a negative sample set of human head->

The number of human head samples;

for human head positive sample set

Human head samples with medium resolutions of 14×14, 28×28, and 36×36 are respectively magnified by n times of resolutionObtaining a sample set S _r {K ⁺ }：

wherein ,

is a positive sample set of the human head;

repeating the reconstruction method until the requirement of visual definition is met;

sample set S after super-division reconstruction _r {K ⁺ Dividing the texture feature into a plurality of region blocks, extracting the Centrist LBP operators of the different region blocks by adopting an adaptive sliding window extraction method, and obtaining a texture feature-based result:

wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) ₀ ,g _i ) G is the comparison result of the regional block gray level and the non-pixel center gray level ₀ Is a regional blockCenter pixel gray scale, g _i For the non-center pixel gray scale of the regional block, i represents the non-center pixel number;

human head positive sample set by using Centrist LBP operator

And human head negative sample set->

performing Hough feature transformation on the extracted coarse-granularity human head features by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers; retraining the primary human head feature extraction model by using the obtained Hough feature numbers to obtain a final two-layer human head recognition model;

the high-resolution image reconstruction of the processed pedestrian video comprises the following steps:

sparse reconstruction based on local penalty is performed on the pedestrian component ImageSize (R, G, B, i):

a minimum modulus value representing the error between the minimum reconstructed image and the optimal image;

the head tracking is performed by using the acquired head identification frame, and the head identification frame is marked in different video frames by adopting a K-means algorithm and a feature block matching method, comprising the following steps:

wherein n is the number of human head sample points;

the coverage area of the vector D is continuously expanded until the point P is found _T The point P _T The distance between the video frame and the initial position point p is smaller than a threshold value T, namely the same head in different video frames is found, the same head is marked, and otherwise, the previous step is returned;

the same head between different video frames is found by using a K-means algorithm;

and acquiring the heads of the different frames by adopting a feature matching method for the areas marked with the same head.

2. The method for identifying and detecting a pedestrian head in an outdoor monitoring scene according to claim 1, wherein the step of acquiring an outdoor pedestrian video comprises:

3. The method for identifying and detecting a person's head in an outdoor monitoring scene according to claim 1, wherein the preprocessing comprises:

sequentially extracting the foreground and pedestrian parts of the processed video;

wherein ：

the white balance and color gamut adjustment processing for the acquired pedestrian video comprises the following steps:

R＝K _R ×R _ave ；

G＝K _G ×G _ave ；

B＝K _B ×B _ave ；

performing color gamut adjustment on the video stream Image (R, G, B) to change the video stream Image into Image Re (R, G, B) to finish color gamut adjustment processing;

The method for extracting the foreground and pedestrian parts of the processed video sequentially comprises the following steps:

and preliminarily deleting useless parts without human heads in the previous Jing Shipin frames of ImageRe (R, G, B, i) by using a corrosion and expansion method to obtain pedestrian parts ImageSize (R, G, B, i).

4. The method for identifying and detecting the head of a person in an outdoor monitoring scene according to claim 1, wherein the step of using the pre-trained head identification model comprises any one or more of the following:

-said partitioning of said reconstructed pedestrian video, comprising:

ImageSize _r (R,G,B,i,j),j＝1,2,3,4；

-said extracting of human head features from each of said plurality of video blocks, comprising:

Wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) ₀ ,g _i ) G is the comparison result of the regional block gray level and the non-pixel center gray level ₀ The gray level g of the pixel at the center of the regional block _i The non-central pixel gray scale of the regional block is obtained by the operation, i represents the non-central pixel number;

-said processing of the extracted head features, obtaining a head identification box of the current video frame, comprising:

ImageSize _r (R,G,B,x _i ,y _j ,x _m ,y _n )

5. The method for identifying and detecting the heads in the outdoor monitoring scene according to claim 1, wherein the counting the number of the tracked heads to obtain the number of the heads in the current scene comprises:

6. A system for identifying and detecting a head of a person in an outdoor monitoring scene, for implementing the method for identifying and detecting a head of a person in an outdoor monitoring scene as set forth in any one of claims 1 to 5, comprising:

the camera acquisition module is used for acquiring outdoor pedestrian videos;

7. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-5 or to run the system of claim 6 when the program is executed by the processor.

8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-5 or to run the system of claim 6.