CN113392726B - Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene - Google Patents

Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene Download PDF

Info

Publication number
CN113392726B
CN113392726B CN202110579518.5A CN202110579518A CN113392726B CN 113392726 B CN113392726 B CN 113392726B CN 202110579518 A CN202110579518 A CN 202110579518A CN 113392726 B CN113392726 B CN 113392726B
Authority
CN
China
Prior art keywords
head
video
pedestrian
human head
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110579518.5A
Other languages
Chinese (zh)
Other versions
CN113392726A (en
Inventor
王安
陆磊
曹箫洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI FEILO ACOUSTICS CO LTD
Original Assignee
SHANGHAI FEILO ACOUSTICS CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI FEILO ACOUSTICS CO LTD filed Critical SHANGHAI FEILO ACOUSTICS CO LTD
Priority to CN202110579518.5A priority Critical patent/CN113392726B/en
Publication of CN113392726A publication Critical patent/CN113392726A/en
Application granted granted Critical
Publication of CN113392726B publication Critical patent/CN113392726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a method and a system for identifying and detecting a head of a person in an outdoor monitoring scene, comprising the following steps: acquiring outdoor pedestrian videos; preprocessing the acquired pedestrian video to obtain a processed pedestrian video; building a pre-trained human head recognition model, and utilizing the pre-trained human head recognition model: performing high-resolution video reconstruction on the processed pedestrian video to obtain a reconstructed pedestrian video; partitioning the reconstructed pedestrian video to obtain a plurality of video blocks; extracting human head characteristics of the obtained video blocks respectively; processing the extracted head characteristics to obtain a head identification frame of the current video frame; tracking the head of the person by using the acquired head identification frame; counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene. The invention is based on machine vision, can effectively detect the pedestrian head under low resolution, and improves the detection accuracy of the whole system.

Description

Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene
Technical Field
The invention relates to the technical field of outdoor head recognition and monitoring, in particular to a head recognition and detection method, a system, a terminal and a medium based on machine vision in an outdoor monitoring scene.
Background
The people head identification is an important technology for predicting group events in intelligent city management, can count the current people flow density in real time and comprehensively master the change rule of the current people flow density, and can early warn in time, so that the labor cost is greatly reduced, and the monitoring and studying and judging capabilities of a manager are effectively improved. The human head recognition and statistics technology relates to the disciplines such as video analysis, artificial intelligence and electronic information.
The traditional human head recognition statistical algorithm is mainly divided into two parts: human head detection and human head tracking. Under the condition that the monitoring camera is fixed, the human head detection mainly divides the human from the background by detecting the head characteristics of the pedestrian; in order to achieve a better statistical effect, pedestrians detected in the current observation area are tracked, so that repeated statistics of the number of heads is avoided.
However, in the outdoor situation, the position where the monitoring camera is installed is high, the resolution of the camera capturing the picture is generally low, and particularly when the monitored scene is crowded, the resolution of the head to be detected is lower. As shown in fig. 1, the conventional human head detection scheme includes: the camera acquisition module, the foreground extraction module, the candidate head extraction module, the head classification module, the head tracking module, the color gamut adjustment module, the head feature extraction module and the head model module can see that if the classification algorithm based on the supervised mechanism does not improve the resolution ratio, the detection accuracy is lower, and the requirement of the test cannot be met.
In order to improve the accuracy of pedestrian detection, a deep learning-based method is adopted, such as: the fast-RCNN is used for detecting the head of a person, but the hardware cost required by the deep learning method is high, the real-time processing is not easy to be carried out at the head end of a local monitoring camera, and the real-time processing is generally carried out by returning to a cloud platform, so that the manufacturing cost of the whole monitoring system is high, and the large-scale popularization is not easy.
The search finds that:
in a background image preprocessing module, frame sampling of images is mainly carried out through illumination intensity and reasonably set threshold values so as to obtain reasonable images, reasonable evacuation or opening and closing of security inspection ports is achieved through detection of the traffic flow, and airport staff is effectively helped to know the traffic flow condition of the security inspection ports in time; the security inspection efficiency is improved through reasonable evacuation, opening and closing of security inspection ports and other methods, and meanwhile, the security inspection time of passengers can be saved. However, in the case of crowding, the technology cannot accurately identify the head due to lower head resolution.
The Chinese patent application No. 201710104613.3 discloses a classroom people number detection method and system based on a machine vision and binocular cooperation technology, which comprises the steps of convoluting acquired classroom left and right video image data with gray level images by using a Gaussian filter mask template, smoothing the images, suppressing noise, weakening background information, enhancing the figure contour effect, and acquiring images with different resolutions by using a traditional super-resolution reconstruction algorithm to detect people's heads. However, this technique is only indoor and is not suitable for outdoor detection with severe light.
The Chinese patent application 201611235768.2 discloses a method, a device and a medium for counting static video people based on human head detection, wherein the head outline of a pedestrian in each frame of image can be accurately detected by using an optical flow method and a three-frame difference method through the color characteristics and the similar elliptic characteristics of the human head, so that human head information is obtained. However, the technology has a slow detection frame frequency, and is not easy to detect the head of a person in real time.
In summary, the prior art including the above-mentioned patent technology still has the problem that when the resolution of the human head is low and the external environment is complex, the accurate and rapid human head recognition cannot be performed, and no description or report similar to the present invention is found at present, and no similar data at home and abroad have been collected.
Disclosure of Invention
The invention provides a method, a system, a terminal and a medium for identifying and detecting a head of a person in an outdoor monitoring scene based on machine vision aiming at the defects in the prior art.
According to one aspect of the invention, there is provided a head recognition and detection method in an outdoor monitoring scene, including:
acquiring outdoor pedestrian videos;
preprocessing the obtained pedestrian video to obtain a processed pedestrian video;
Constructing a pre-trained human head recognition model, and performing the following steps by using the pre-trained human head recognition model:
-performing a high resolution video reconstruction of the processed pedestrian video, resulting in a reconstructed pedestrian video;
-partitioning the reconstructed pedestrian video to obtain a plurality of video blocks;
-extracting human head features from each of the plurality of video blocks;
-processing the extracted head features to obtain a head identification frame of the current video frame;
performing head tracking by using the acquired head identification frame;
counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene.
Preferably, the acquiring the outdoor pedestrian video includes:
and acquiring an outdoor pedestrian video signal through a camera to obtain a video source Image (S).
Preferably, the pretreatment comprises:
performing white balance and color gamut adjustment processing on an acquired video source Image (S) of the pedestrian video;
and carrying out foreground extraction and pedestrian component extraction on the processed video in sequence.
Preferably, the performing white balance and color gamut adjustment processing on the acquired pedestrian video includes:
For the acquired pedestrian video, respectively calculating pixel mean values R of RGB three channels of the pedestrian video in an offline manner ave 、G ave and Bave Gain value K of RGB three channels R 、K G and KB
Pixel mean value R of RGB three channels ave 、G ave and Bave And gain value K of RGB three channels R 、K G and KB Corresponding multiplication is carried out to obtain gain pixel values of three channels of RGB:
R=K R ×R ave
G=K G ×G ave
B=K B ×B ave
at this time, the video stream of the video source Image (S) becomes Image (R, G, B), and the white balance processing is completed;
and performing color gamut adjustment on the video stream image (R, G, B) to obtain an image Re (R, G, B), and completing color gamut adjustment processing.
Preferably, the sequentially performing foreground extraction and pedestrian component extraction on the processed video includes:
obtaining a foreground video frame ImageRe (R, G, B, i) of the processed video by utilizing a method of combining an inter-frame relation and an optical flow algorithm; wherein i is a frame number representing a video stream;
and preliminarily deleting useless parts without human heads in the previous Jing Shipin frames image Re (R, G, B, i) by using a corrosion and expansion method to obtain the pedestrian part ImageSize (R, G, B, i).
Preferably, the building the pre-trained human head recognition model includes:
acquiring human head samples under outdoor monitoring, setting the resolutions of the human head samples to be 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36 respectively, and classifying and sorting the samples into positive human head sample sets respectively
Figure BDA0003085646750000031
And negative human head sample set->
Figure BDA0003085646750000032
wherein ,k+ Is a sample set of positive human head->
Figure BDA0003085646750000033
Number of samples of human head, k - Is a negative human head sample set->
Figure BDA0003085646750000034
The number of human head samples;
step 4: alignment human head sample set
Figure BDA0003085646750000035
The human head samples with the medium resolutions of 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by n respectively obtain a sample set S r {K + }:
Figure BDA0003085646750000041
wherein ,
Figure BDA0003085646750000042
is a positive sample of the head of a person;
sparse representation based on local penalty coefficients for sample set S r {K + Performing ultra-high resolution reconstruction, wherein the reconstruction method comprises the following steps:
Figure BDA0003085646750000043
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing initial image values, λ representing locally penalized regularization parameters for improving robustness of the reconstructed image;
repeating the reconstruction method until the constructed image meets the required visual definition;
for the reconstructed human head sample library, the sample set S after superdivision reconstruction r {K + Dividing the texture feature into a plurality of region blocks, extracting the Centrist LBP operators of the different region blocks by adopting an adaptive sliding window extraction method, and obtaining a texture feature-based result:
Figure BDA0003085646750000044
Figure BDA0003085646750000045
wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) 0 ,g i ) G is the comparison result of the regional block gray level and the non-pixel center gray level 0 The gray level g of the pixel at the center of the regional block i For non-center pixel gray scale of regional block, i represents non-center pixelA heart pixel number;
human head positive sample set by using Centrist LBP operator
Figure BDA0003085646750000046
And human head negative sample set->
Figure BDA0003085646750000047
Training the samples in the model by adopting an SVM model, extracting coarse-granularity human head characteristics, and constructing to obtain a primary human head characteristic extraction model;
performing Hough feature transformation on the extracted coarse-granularity human head features by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers; and retraining the primary human head feature extraction model by using the obtained Hough feature numbers to obtain a final two-layer human head recognition model.
Preferably, the reconstructing the high resolution image of the processed pedestrian video includes:
performing sparse reconstruction based on local punishment on the pedestrian component ImageSize (R, G, B, i):
Figure BDA0003085646750000048
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing the initial image values, λ representing the regularization parameters of the local penalty, for improving the robustness of the reconstructed image,
Figure BDA0003085646750000049
a minimum modulus value representing errors of both the minimum reconstructed image and the optimal image; />
Repeating the reconstruction method until the constructed image meets the required visual definition.
Preferably, the partitioning the reconstructed pedestrian video includes:
dividing the reconstructed pedestrian video into j sub-video blocks with the same resolution, namely:
Im ageSize r (R,G,B,i,j),j=1,2,3,4。
preferably, the extracting the head features of the obtained video blocks respectively includes:
imagesize for each of the video blocks r (R, G, B, i, j), extracting a Centrist LBP operator of the regional block, and carrying out coarse-granularity human head feature detection on an extraction result by using the human head recognition model:
Figure BDA0003085646750000051
Figure BDA0003085646750000052
wherein Fe (LBP) is the numerical value of the image after being subjected to the Centrist LBP coding, and the human head characteristic is obtained; delta (g) 0 ,g i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel 0 G is the gray level of the central pixel in the current area block i And i represents the number of the non-center pixel, and the human head characteristic is obtained through the operation.
Preferably, the processing the extracted head feature to obtain a head identification frame of the current video frame includes:
performing Hough feature transformation on the obtained head features to obtain Hough feature numbers, and performing secondary feature classification through the head recognition model to obtain a head recognition frame of the current video frame:
Im a geSize r (R,G,B,x i ,y j ,x m ,y n )
wherein ,xi To identify the lower left-hand abscissa of the frame, y j To identify the lower left-hand ordinate of the frame, x m To identify the upper right-hand abscissa of the frame, y n To identify the upper right-hand ordinate of the frame.
Preferably, the head tracking using the obtained head recognition frame includes:
and marking the head recognition frames in different video frames by adopting a K-means algorithm and a characteristic block matching method to realize head tracking.
Preferably, the marking the head recognition frame in different video frames by adopting a K-means algorithm and a feature block matching method includes:
setting an initial position point p of a human head for a current scene area, and defining an area with a radius of R by taking the initial position point p as a center to form an ROI area;
calculating all head sample points P in the ROI area i And constructing a mean value of the initial position point p to obtain a vector D:
Figure BDA0003085646750000061
wherein n is the number of human head sample points;
the coverage area of the vector D is continuously expanded until the point P is found T The point P T The distance between the video frame and the initial position point p is smaller than a threshold value T, namely, the head with the same mark in different video frames is found, the mark is the same head, otherwise, the previous step is returned;
The same marked head between different video frames is found out by using a K-means algorithm;
and for the region marked with the same head, acquiring the heads among different frames by adopting a feature matching method.
Preferably, the counting the number of the tracked heads to obtain the number of the heads in the current scene includes:
and counting the number of the heads of different marks, wherein the obtained counting result is the number of the heads in the current scene.
According to another aspect of the present invention, there is provided a head recognition and detection system in an outdoor monitoring scene, including:
the camera acquisition module is used for acquiring outdoor pedestrian videos;
the foreground processing module is used for preprocessing the acquired pedestrian video to obtain a processed pedestrian video;
the ultrahigh-resolution reconstruction module is used for reconstructing the processed pedestrian video in a high-resolution mode to obtain a reconstructed pedestrian video;
the block extraction module is used for blocking the reconstructed pedestrian video to obtain a plurality of video blocks;
the candidate head extraction module is used for extracting head characteristics of the obtained video blocks respectively;
The head classification module is used for processing the extracted head characteristics to obtain a head identification frame of the current video frame;
the ultrahigh score reconstruction module, the segmentation extraction module, the candidate head extraction module and the head classification module form a head recognition model module;
the pre-training module is used for pre-training the head recognition model module;
the head tracking module is used for tracking the head by using the acquired head identification frame;
and the people head statistics module is used for counting the number of the tracked people heads to obtain the number of the people heads in the current scene, and the people head identification and detection in the outdoor monitoring scene are completed.
According to a third aspect of the present invention there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform the method of any one of the preceding claims or to run the system of the preceding claims.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method of any of the above, or to run a system as described above.
Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:
the head recognition detection method, system, terminal and medium in the outdoor monitoring scene provided by the invention are mainly used for carrying out super-high-resolution reconstruction based on local punishment sparse representation aiming at the scenes with crowded people and lower head resolution, and effectively improving the head statistics accuracy.
According to the method, the system, the terminal and the medium for identifying and detecting the head of the person in the outdoor monitoring scene, the head of the person can be counted locally by the monitoring camera, and the hardware configuration requirement on the cloud server of the monitoring system is reduced.
According to the head recognition and detection method, system, terminal and medium in the outdoor monitoring scene, high-definition image reconstruction is carried out under the conditions of crowded people flow and fuzzy pixels, and head statistics work is carried out.
According to the method, the system, the terminal and the medium for identifying and detecting the head of the person in the outdoor monitoring scene, which are provided by the invention, the image processing is carried out in a machine learning mode, so that the accuracy of pedestrian detection is improved.
The human head identification and detection method, the system, the terminal and the medium in the outdoor monitoring scene provided by the invention adopt the super-division reconstruction based on the local punishment coefficient to perform image preprocessing, and have the advantage of high resolution.
According to the human head identification and detection method, system, terminal and medium in the outdoor monitoring scene, the characteristic extraction is carried out on the reconstructed image by the Centrist LBP operator, so that the human head characteristics are obtained, and the method and the system have the advantage of high precision.
According to the method, the system, the terminal and the medium for identifying and detecting the pedestrian heads in the outdoor monitoring scene, which are provided by the invention, the pedestrian heads can be effectively detected for pedestrians in low resolution based on machine vision, and the detection accuracy of the whole system is improved.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a schematic diagram of a conventional demographic scheme in the prior art.
Fig. 2 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scene according to an embodiment of the present invention.
Fig. 3 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scenario according to a preferred embodiment of the present invention.
FIG. 4 is a flowchart of a method for pre-training a human head recognition model in accordance with a preferred embodiment of the present invention.
Fig. 5 is a schematic diagram of a component module of a head recognition and detection system in an outdoor monitoring scenario according to an embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.
Fig. 2 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scene according to an embodiment of the present invention.
As shown in fig. 2, the method for identifying and detecting a head of a person in an outdoor monitoring scene provided in this embodiment may include the following steps:
s100, acquiring outdoor pedestrian videos;
s200, preprocessing the acquired pedestrian video to obtain a processed pedestrian video;
s300, constructing a pre-trained human head recognition model, and performing S400-S700 by using the pre-trained human head recognition model:
s400, performing high-resolution video reconstruction on the processed pedestrian video to obtain a reconstructed pedestrian video;
s500, partitioning the reconstructed pedestrian video to obtain a plurality of video blocks;
s600, extracting human head features of the obtained video blocks respectively;
S700, processing the extracted head features to obtain a head identification frame of the current video frame;
s800, performing head tracking by using the acquired head identification frame;
s900, counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene.
In this embodiment S100, as a preferred embodiment, acquiring outdoor pedestrian video may include the steps of:
and acquiring an outdoor pedestrian video signal through a camera to obtain a video source Image (S).
In this embodiment S200, as a preferred embodiment, the preprocessing may include the steps of:
s201, performing white balance and color gamut adjustment processing on a video source Image (S) of the acquired pedestrian video;
s202, foreground extraction and pedestrian component extraction are sequentially carried out on the processed video.
In this embodiment S201, as a preferred embodiment, the white balance and color gamut adjustment processing for the acquired pedestrian video may include the steps of:
s2011, respectively calculating pixel mean values R of three RGB channels of the pedestrian video offline according to the acquired pedestrian video ave 、G ave and Bave Gain value K of RGB three channels R 、K G and KB
S2012, the pixel mean value R of the RGB three channels ave 、G ave and Bave And gain value K of RGB three channels R 、K G and KB Corresponding multiplication is carried out to obtain gain pixel values of three channels of RGB:
R=K R ×R ave
G=K G ×G ave
B=K B ×B ave
at this time, the video stream of the video source Image (S) becomes Image (R, G, B), and the white balance processing is completed;
s2013, the video stream image (R, G, B) is subjected to color gamut adjustment to be image Re (R, G, B), and the color gamut adjustment process is completed.
In this embodiment S202, as a preferred embodiment, the foreground extraction and the pedestrian component extraction are sequentially performed on the processed video, which may include the following steps:
s2021, obtaining a foreground video frame image Re (R, G, B, i) of the processed video by utilizing a method of combining an inter-frame relation and an optical flow algorithm; wherein i is a frame number in the video stream;
s2022, preliminarily deleting the useless part without the head in the previous Jing Shipin frames image Re (R, G, B, i) by using a corrosion and expansion method to obtain the pedestrian part ImageSize (R, G, B, i).
In this embodiment S300, as a preferred embodiment, the construction of the pre-trained human head recognition model may include the steps of:
s301, acquiring human head samples under outdoor monitoring, setting the resolutions of the human head samples to be 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36 respectively, and classifying and sorting the set human head samples into positive human head sample sets respectively
Figure BDA0003085646750000091
And negative human head sample set->
Figure BDA0003085646750000092
wherein ,k+ Is a sample set of positive human head->
Figure BDA0003085646750000093
Number of samples of human head, k - Is a negative human head sample set->
Figure BDA0003085646750000094
The number of human head samples;
s302, aligning a human head sample set
Figure BDA0003085646750000095
The human head samples with the medium resolutions of 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by n respectively obtain a sample set S r {K + }:
Figure BDA0003085646750000096
wherein ,
Figure BDA0003085646750000097
is->
Figure BDA0003085646750000098
Is a positive sample of the head of a person;
s303, sparse representation based on local penalty coefficients is adopted for the sample set S r {K + Performing ultra-high resolution reconstruction, wherein the reconstruction method comprises the following steps:
Figure BDA0003085646750000099
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing initial image values, λ representing locally penalized regularization parameters for improving robustness of the reconstructed image;
repeating the steps until the image meets the visual definition;
s304, the sample set S after super-division reconstruction r {K + Dividing the texture feature into a plurality of region blocks, extracting the Centrist LBP operators of the different region blocks by adopting an adaptive sliding window extraction method, and obtaining a texture feature-based result:
Figure BDA0003085646750000101
Figure BDA0003085646750000102
wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) 0 ,g i ) For block gray level and non-pixel center gray levelComparison of results, g 0 The gray level g of the pixel at the center of the regional block i For the non-center pixel gray scale of the regional block, i represents the non-center pixel number;
s305, on the basis, utilizing a Centrist LBP operator to collect positive samples of the human head
Figure BDA0003085646750000103
And human head negative sample set
Figure BDA0003085646750000104
Training a middle sample (namely encoded data) by adopting an SVM model, extracting coarse-granularity human head characteristics, and constructing to obtain a primary human head characteristic extraction model;
s306, performing Hough feature transformation on the extracted coarse-granularity human head features by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers; and retraining the primary human head feature extraction model by using the obtained Hough feature number to obtain a final two-layer human head recognition model.
In S301 of this embodiment, the acquired human head sample may be acquired by manual labeling.
In S301 of this embodiment, the set head samples are respectively classified and sorted into positive head sample sets according to whether there is a head in the video frame
Figure BDA0003085646750000105
And negative human head sample set->
Figure BDA0003085646750000106
In this embodiment S400, as a preferred embodiment, the high resolution image reconstruction of the processed pedestrian video may include the following steps:
performing sparse reconstruction based on local punishment on the pedestrian component ImageSize (R, G, B, i):
Figure BDA0003085646750000107
wherein,x represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing the initial image values, λ representing the regularization parameters of the local penalty, for improving the robustness of the reconstructed image,
Figure BDA0003085646750000108
a minimum modulus value representing errors of both the minimum reconstructed image and the optimal image;
this step is repeated until the constructed image meets visual clarity.
Preferably, the method for partitioning the reconstructed pedestrian video can comprise the following steps:
in this embodiment S500, as a preferred embodiment, the reconstructed pedestrian video is divided into j sub-video blocks of the same resolution, namely:
Im ageSize r (R,G,B,i,j),j=1,2,3,4。
in this embodiment S600, as a preferred embodiment, the extracting of the head features of the obtained plurality of video blocks includes:
imagesize for each video block r (R, G, B, i, j), extracting a Centrist LBP operator of the regional block, and carrying out coarse-granularity human head feature detection on the extraction result by using a human head recognition model:
Figure BDA0003085646750000111
/>
Figure BDA0003085646750000112
wherein FFe (LBP) is the numerical value of the image after being subjected to the Centrist LBP coding, and the human head characteristic is obtained; delta (g) 0 ,g i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel 0 G is the gray level of the central pixel in the current area block i For non-center pixel gray scale values, i represents non-center pixel numbers; the human head characteristics are obtained through the operation.
In this embodiment S700, as a preferred embodiment, the processing of the extracted head feature to obtain the head identification frame of the current video frame may include the following steps:
performing Hough feature transformation on the obtained head features to obtain Hough feature numbers, and performing secondary feature classification through a head recognition model to obtain a head recognition frame of the current video frame:
Im a geSize r (R,G,B,x i ,y j ,x m ,y n )
wherein ,xi To identify the lower left-hand abscissa of the frame, y j To identify the lower left-hand ordinate of the frame, x m To identify the upper right-hand abscissa of the frame, y n To identify the upper right-hand ordinate of the frame.
In this embodiment S800, as a preferred embodiment, the head tracking using the acquired head recognition frame may include the following steps:
and marking the head recognition frames in different video frames by adopting a K-means algorithm and a characteristic block matching method, so as to realize head tracking.
In this embodiment S800, as a preferred embodiment, the marking of the head recognition frame in different video frames by using the K-means algorithm and the feature block matching method may include the following steps:
s801, setting an initial position point p of a human head for a current scene area, and defining an area with a radius of R by taking the initial position point p as a center to form an ROI area;
S802, calculating all head sample points P in the ROI area i And constructing a mean value of the initial position point p to obtain a vector D:
Figure BDA0003085646750000121
wherein n is the number of human head sample points;
s803, the coverage area of the vector D is continuously enlarged until the point P is found T Point P T Satisfying a distance from the initial position point p less than a threshold T, i.eFinding out the heads with the same mark in different video frames, marking the heads as the same head, otherwise returning to S802;
s804, using a K-means algorithm to find the same marked head among different video frames;
s805, for the same head area (namely a smaller area) marked, a feature matching method is adopted to acquire the heads among different frames.
In this embodiment S900, as a preferred embodiment, counting the number of the tracked heads to obtain the number of heads in the current scene may include the following steps:
and counting the number of the heads of different marks, wherein the obtained counting result is the number of the heads in the current scene.
Fig. 3 is a flowchart of a method for identifying and detecting a head of a person in an outdoor monitoring scene according to a preferred embodiment of the present invention.
As shown in fig. 3, the method for identifying and detecting a head of a person in an outdoor monitoring scene provided in the preferred embodiment may include the following steps:
Step 1, pedestrian video acquisition of an outdoor pavement is carried out;
step 2, performing white balance and color gamut adjustment processing on the acquired video, performing foreground extraction by utilizing an inter-frame relation and an optical flow algorithm, and extracting a pedestrian component by adopting an expansion and etching algorithm to obtain a processed pedestrian video;
step 3, constructing a pre-trained human head recognition model, and performing the steps 4 to 7 by using the pre-trained human head recognition model:
step 4, reconstructing the processed pedestrian video by 3 times of resolution image super-resolution video;
step 5, dividing the reconstructed pedestrian video into 4 video blocks;
step 6, adopting a Centrist LBP and Hough feature extraction algorithm to extract the head features of the 4 video blocks respectively;
step 7, processing the extracted head characteristics to obtain a head identification frame of the current video frame;
step 8, adopting an algorithm based on the combination of feature point matching and K-means to track the head of the head identification frame;
and 9, counting the number of the tracked heads to obtain the number of the heads in the current scene.
In this preferred embodiment, it may further include:
and 10, displaying the number of the acquired heads in the current scene.
In this preferred embodiment, the following two modes of operation may be included:
mode 1: a human head recognition model training mode; mode 2: and a real-time head statistics mode.
As shown in fig. 4, in the mode 1, the model training mode for head recognition may include the following steps:
step 101: extracting human head samples from the ImageNet database, and setting the resolution of the human head samples to 14×14, 28×28 and 36×36, to construct a positive human head sample set
Figure BDA0003085646750000131
And negative human head sample set->
Figure BDA0003085646750000132
k is a positive integer value, +, -represents a positive sample of the human head, and the human head sample set comprises 10 ten thousand human head training samples and 5000 human head test samples.
Step 102: the resolution ratios of the training samples of the human heads with the resolution ratios of 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36 from the sample set are respectively amplified by three times, and the three-time resolution ratio is obtained:
Figure BDA0003085646750000133
in the super-division method, for good robustness, sparse representation based on local penalty coefficients is adopted to carry out super-resolution reconstruction, and the reconstruction method comprises the following steps:
Figure BDA0003085646750000134
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O The method comprises the steps of representing an initial image value, wherein lambda represents regularization parameters of local punishment and is used for improving robustness of a reconstructed image, and iterative operation is adopted in a reconstruction process to obtain a video frame to be processed with high definition;
Step 103: s for super-division reconstruction r {K + Extracting self-adaptive Centrist LBP features to obtain texture features, namely extracting Centrist LBP operators of different area blocks according to different resolutions to obtain a result based on the texture features;
Figure BDA0003085646750000135
Figure BDA0003085646750000136
wherein Fe (LBP) represents the numerical value of the sample library after being subjected to the Centrist LBP coding, and the human head characteristic is obtained; delta (g) 0 ,g i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel 0 G is the gray level of the central pixel in the current area block i For a non-center pixel gray scale value, i represents a non-center pixel number.
Step 104: and carrying out coarse-granularity human head positive and negative sample feature extraction on the positive and negative samples and the corresponding SVM training model, and constructing to obtain a primary human head feature extraction model.
Step 105: performing Hough feature transformation on the extracted positive and negative sample features of the coarse-granularity human head by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers;
step 106: and retraining the primary human head feature extraction model by using the obtained Hough feature number to obtain a final two-layer human head recognition model.
In the mode 2, the real-time head statistics mode may include the following steps:
step 201: firstly, video signal acquisition is carried out through a camera, and the obtained video source is Image (S).
Step 202: the white balance of the Image (S) video source is carried out according to the following method: RGB mean R for Image (S) ave 、G ave and Bave The method comprises the steps of carrying out a first treatment on the surface of the Offline calculating gains of three RGB channels to obtain K R 、K G and KB Is a numerical value of (2); multiplying the pixel values of the RGB tee channels by the average gains of the three channels to obtain gain pixel values of the three channels:
R=KR×R ave
G=K c ×G ave
B=KB×B ave
the white balance algorithm needs to be implemented on the FPGA, namely, a multiplier architecture based on CORDIC iteration is adopted to complete corresponding operation, and at the moment, the video stream of Image (S) is changed into Image (R, G, B);
after the white balance is completed, performing color gamut adjustment on the image (R, G, B) to become image Re (R, G, B); and obtaining image Re (R, G, B, i) by utilizing a method of combining the inter-frame relation and the optical flow algorithm, and roughly deleting useless parts without human heads in the picture through corrosion and expansion to obtain the image size (R, G, B, i).
Step 203: sparse reconstruction based on local punishment is carried out on ImageSize (R, G, B, i), and a reconstruction formula is adopted as follows:
Figure BDA0003085646750000141
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing the initial image values, λ representing locally penalized regularization parameters for improving the robustness of the reconstructed image, the reconstruction process employing iterative operations.
Step 204: the reconstructed image is divided into 4 sub-video blocks of the same resolution, namely:
Im ageSize r (R,G,B,i,j),j=1,2,3,4。
step 205: imagesize for video blocks r (R, G, B, i, j) extracting based on the Centrist LBP and Hough features by adopting an adaptive sliding window extraction method to obtain texture features, namely extracting Centrist LBP operators of different area blocks according to different resolutions:
Figure BDA0003085646750000151
Figure BDA0003085646750000152
wherein Fe (LBP) is a numerical value of the sample library after being subjected to Centrist LBP coding, and thus the human head characteristics are obtained; delta (g) 0 ,g i ) G is the result of comparing the gray level of the central pixel with the gray level of the non-central pixel 0 G is the gray level of the central pixel in the current area block i For a non-center pixel gray scale value, i represents a non-center pixel number.
Step 206: and obtaining a coarse-granularity human head detection result by the feature extraction result and the corresponding SVM training model.
Step 207: on the basis, hough feature transformation is carried out, and Hough feature numbers are obtained.
Step 208: performing SVM secondary feature classification to obtain a human head identification frame of the current video frame, which is ImageSize r (R,G,B,x i ,y j ,x m ,y n )。
Step 209: the human head tracking module is used for carrying out ImageSize on the human head by adopting a method of matching K-means with the feature blocks r (R,G,B,x i ,y j ,x m ,y n ) The method for marking the head movement in different video frames comprises the following steps:
Step1, setting an initial position point p of a human head for a larger image area, and defining an area with a radius R by taking the initial position point p as a center to form an ROI area;
step2, calculating the average value from all sample points in the ROI to p, and constructing a vector D, namely:
Figure BDA0003085646750000153
step3, continuously expanding the coverage area of vector D if point P is found T So that P T If the distance between the head and the head is smaller than the threshold T, the same head is found, otherwise, the head jumps to Step2;
the same head between different frames is found by using a K-means method;
step4, marking the same head of the person between different frames by adopting a feature matching method for marking the same head area.
Step 210: and the head statistics module is used for counting the number of heads of different marks and sending the counted result to the display terminal.
Fig. 5 is a schematic diagram of a component module of a head recognition and detection system in an outdoor monitoring scenario according to an embodiment of the present invention.
As shown in fig. 5, the system for identifying and detecting a head of a person in an outdoor monitoring scene according to an embodiment provided by this embodiment may include: the system comprises a camera acquisition module, a foreground processing module, an ultra-high partition reconstruction module, a block extraction module, a candidate head extraction module, a head classification module, a pre-training module, a head tracking module and a head statistics module.
wherein :
the camera acquisition module is used for acquiring pedestrian videos of outdoor pavement;
the foreground processing module is used for preprocessing the acquired pedestrian video to obtain a processed pedestrian video;
the ultrahigh-resolution reconstruction module is used for reconstructing the high-resolution image of the processed pedestrian video to obtain a reconstructed pedestrian video;
the block extraction module is used for blocking the reconstructed pedestrian video to obtain a plurality of video blocks;
the candidate head extraction module is used for extracting head characteristics of the obtained video blocks respectively;
the head classification module is used for processing the extracted head characteristics and acquiring a head identification frame of the current video frame;
the super-high partition reconstruction module, the partition extraction module, the candidate head extraction module and the head classification module form a head recognition model module;
the pre-training module is used for pre-training the head recognition model module;
the head tracking module is used for tracking the head by using the acquired head identification frame;
and the people head statistics module is used for counting the number of the tracked people heads to obtain the number of the people heads in the current scene and finishing the identification and detection of the people heads in the outdoor monitoring scene.
An embodiment of the present invention provides a terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the program, is operative to perform the method of any one of the foregoing embodiments, or to perform the system of any one of the foregoing embodiments.
Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.
The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.
An embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the method of any of the above embodiments, or to run the system of the above embodiments.
According to the head recognition detection method, system, terminal and medium under the outdoor monitoring scene provided by the embodiment of the invention, the ultrahigh-resolution reconstruction based on the local punishment sparse representation is mainly performed aiming at the scene with crowded people and lower head resolution, so that the head statistics accuracy is effectively improved; people head statistics can be carried out locally on the monitoring camera, so that the hardware configuration requirement on the monitoring system cloud server is reduced; under the conditions of crowded people flow and fuzzy pixels, reconstructing a high-definition image, and carrying out statistics on the heads of people; the image processing is carried out in a machine learning mode, so that the accuracy of pedestrian detection is improved; the super-division reconstruction based on the local punishment coefficient is adopted to perform image preprocessing, so that the method has the advantage of high resolution; the reconstructed image is subjected to characteristic extraction by a Centrist LBP operator to obtain the human head characteristics, so that the method has the advantage of high precision; based on machine vision, pedestrian head detection can be effectively carried out on pedestrians under low resolution, and detection accuracy of the whole system is improved.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred example of constructing the system, which is not described herein.
Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In summary, the present description should not be construed as limiting the invention.

Claims (8)

1. The method for identifying and detecting the head of a person in an outdoor monitoring scene is characterized by comprising the following steps of:
acquiring outdoor pedestrian videos;
preprocessing the obtained pedestrian video to obtain a processed pedestrian video;
constructing a pre-trained human head recognition model, and performing the following steps by using the pre-trained human head recognition model:
-performing a high resolution video reconstruction of the processed pedestrian video, resulting in a reconstructed pedestrian video;
-partitioning the reconstructed pedestrian video to obtain a plurality of video blocks;
-extracting human head features from each of the plurality of video blocks;
-processing the extracted head features to obtain a head identification frame of the current video frame;
performing head tracking by using the acquired head identification frame;
Counting the number of the tracked heads to obtain the number of the heads in the current scene, and completing the head identification detection in the outdoor monitoring scene;
the building of the pre-trained human head recognition model comprises the following steps:
the human head samples are obtained, the resolution of the human head samples are respectively set to be 14 multiplied by 14, 28 multiplied by 28 and 36 multiplied by 36, and the samples are respectively classified and arranged into human head positive sample sets
Figure FDA0004190482170000012
And human head negative sample set->
Figure FDA0004190482170000013
wherein ,k+ For human head positive sample set->
Figure FDA0004190482170000014
Number of samples of human head, k - Is a negative sample set of human head->
Figure FDA0004190482170000015
The number of human head samples;
for human head positive sample set
Figure FDA0004190482170000016
Human head samples with medium resolutions of 14×14, 28×28, and 36×36 are respectively magnified by n times of resolutionObtaining a sample set S r {K + }:
Figure FDA0004190482170000017
wherein ,
Figure FDA0004190482170000018
is a positive sample set of the human head;
sparse representation based on local penalty coefficients for sample set S r {K + Performing ultra-high resolution reconstruction, wherein the reconstruction method comprises the following steps:
Figure FDA0004190482170000011
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing initial image values, λ representing locally penalized regularization parameters for improving robustness of the reconstructed image;
repeating the reconstruction method until the requirement of visual definition is met;
sample set S after super-division reconstruction r {K + Dividing the texture feature into a plurality of region blocks, extracting the Centrist LBP operators of the different region blocks by adopting an adaptive sliding window extraction method, and obtaining a texture feature-based result:
Figure FDA0004190482170000021
Figure FDA0004190482170000022
wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) 0 ,g i ) G is the comparison result of the regional block gray level and the non-pixel center gray level 0 Is a regional blockCenter pixel gray scale, g i For the non-center pixel gray scale of the regional block, i represents the non-center pixel number;
human head positive sample set by using Centrist LBP operator
Figure FDA0004190482170000025
And human head negative sample set->
Figure FDA0004190482170000026
Training the samples in the model by adopting an SVM model, extracting coarse-granularity human head characteristics, and constructing to obtain a primary human head characteristic extraction model;
performing Hough feature transformation on the extracted coarse-granularity human head features by adopting a self-adaptive sliding window extraction method to obtain Hough feature numbers; retraining the primary human head feature extraction model by using the obtained Hough feature numbers to obtain a final two-layer human head recognition model;
the high-resolution image reconstruction of the processed pedestrian video comprises the following steps:
sparse reconstruction based on local penalty is performed on the pedestrian component ImageSize (R, G, B, i):
Figure FDA0004190482170000023
wherein X represents the predicted value, Y represents the optimal value, B represents the fuzzy matrix, S represents the up-sampling matrix, X O Representing the initial image values, λ representing the regularization parameters of the local penalty, for improving the robustness of the reconstructed image,
Figure FDA0004190482170000027
a minimum modulus value representing the error between the minimum reconstructed image and the optimal image;
repeating the reconstruction method until the constructed image meets the required visual definition;
the head tracking is performed by using the acquired head identification frame, and the head identification frame is marked in different video frames by adopting a K-means algorithm and a feature block matching method, comprising the following steps:
setting an initial position point p of a human head for a current scene area, and defining an area with a radius of R by taking the initial position point p as a center to form an ROI area;
calculating all head sample points P in the ROI area i And constructing a mean value of the initial position point p to obtain a vector D:
Figure FDA0004190482170000024
wherein n is the number of human head sample points;
the coverage area of the vector D is continuously expanded until the point P is found T The point P T The distance between the video frame and the initial position point p is smaller than a threshold value T, namely the same head in different video frames is found, the same head is marked, and otherwise, the previous step is returned;
the same head between different video frames is found by using a K-means algorithm;
and acquiring the heads of the different frames by adopting a feature matching method for the areas marked with the same head.
2. The method for identifying and detecting a pedestrian head in an outdoor monitoring scene according to claim 1, wherein the step of acquiring an outdoor pedestrian video comprises:
and acquiring an outdoor pedestrian video signal through a camera to obtain a video source Image (S).
3. The method for identifying and detecting a person's head in an outdoor monitoring scene according to claim 1, wherein the preprocessing comprises:
performing white balance and color gamut adjustment processing on an acquired video source Image (S) of the pedestrian video;
sequentially extracting the foreground and pedestrian parts of the processed video;
wherein :
the white balance and color gamut adjustment processing for the acquired pedestrian video comprises the following steps:
for the acquired pedestrian video, respectively calculating pixel mean values R of RGB three channels of the pedestrian video in an offline manner ave 、G ave and Bave Gain value K of RGB three channels R 、K G and KB
Pixel mean value R of RGB three channels ave 、G ave and Bave And gain value K of RGB three channels R 、K G and KB Corresponding multiplication is carried out to obtain gain pixel values of three channels of RGB:
R=K R ×R ave
G=K G ×G ave
B=K B ×B ave
at this time, the video stream of the video source Image (S) becomes Image (R, G, B), and the white balance processing is completed;
performing color gamut adjustment on the video stream Image (R, G, B) to change the video stream Image into Image Re (R, G, B) to finish color gamut adjustment processing;
The method for extracting the foreground and pedestrian parts of the processed video sequentially comprises the following steps:
obtaining a foreground video frame ImageRe (R, G, B, i) of the processed video by utilizing a method of combining an inter-frame relation and an optical flow algorithm; wherein i is a frame number representing a video stream;
and preliminarily deleting useless parts without human heads in the previous Jing Shipin frames of ImageRe (R, G, B, i) by using a corrosion and expansion method to obtain pedestrian parts ImageSize (R, G, B, i).
4. The method for identifying and detecting the head of a person in an outdoor monitoring scene according to claim 1, wherein the step of using the pre-trained head identification model comprises any one or more of the following:
-said partitioning of said reconstructed pedestrian video, comprising:
dividing the reconstructed pedestrian video into j sub-video blocks with the same resolution, namely:
ImageSize r (R,G,B,i,j),j=1,2,3,4;
-said extracting of human head features from each of said plurality of video blocks, comprising:
ImageSize for each of the video blocks r (R, G, B, i, j), extracting a Centrist LBP operator of the regional block, and carrying out coarse-granularity human head feature detection on an extraction result by using the human head recognition model:
Figure FDA0004190482170000041
Figure FDA0004190482170000042
Wherein Fe (LBP) is the result after encoding based on the Centrist LBP operator, delta (g) 0 ,g i ) G is the comparison result of the regional block gray level and the non-pixel center gray level 0 The gray level g of the pixel at the center of the regional block i The non-central pixel gray scale of the regional block is obtained by the operation, i represents the non-central pixel number;
-said processing of the extracted head features, obtaining a head identification box of the current video frame, comprising:
performing Hough feature transformation on the obtained head features to obtain Hough feature numbers, and performing secondary feature classification through the head recognition model to obtain a head recognition frame of the current video frame:
ImageSize r (R,G,B,x i ,y j ,x m ,y n )
wherein ,xi To identify the lower left-hand abscissa of the frame, y j To identify the lower left-hand ordinate of the frame, x m To identify the upper right-hand abscissa of the frame, y n To identify the upper right-hand ordinate of the frame.
5. The method for identifying and detecting the heads in the outdoor monitoring scene according to claim 1, wherein the counting the number of the tracked heads to obtain the number of the heads in the current scene comprises:
and counting the number of the heads of different marks, wherein the obtained counting result is the number of the heads in the current scene.
6. A system for identifying and detecting a head of a person in an outdoor monitoring scene, for implementing the method for identifying and detecting a head of a person in an outdoor monitoring scene as set forth in any one of claims 1 to 5, comprising:
the camera acquisition module is used for acquiring outdoor pedestrian videos;
the foreground processing module is used for preprocessing the acquired pedestrian video to obtain a processed pedestrian video;
the ultrahigh-resolution reconstruction module is used for reconstructing the processed pedestrian video in a high-resolution mode to obtain a reconstructed pedestrian video;
the block extraction module is used for blocking the reconstructed pedestrian video to obtain a plurality of video blocks;
the candidate head extraction module is used for extracting head characteristics of the obtained video blocks respectively;
the head classification module is used for processing the extracted head characteristics to obtain a head identification frame of the current video frame;
the ultrahigh score reconstruction module, the segmentation extraction module, the candidate head extraction module and the head classification module form a head recognition model module;
the pre-training module is used for pre-training the head recognition model module;
The head tracking module is used for tracking the head by using the acquired head identification frame;
and the people head statistics module is used for counting the number of the tracked people heads to obtain the number of the people heads in the current scene, and the people head identification and detection in the outdoor monitoring scene are completed.
7. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-5 or to run the system of claim 6 when the program is executed by the processor.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-5 or to run the system of claim 6.
CN202110579518.5A 2021-05-26 2021-05-26 Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene Active CN113392726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110579518.5A CN113392726B (en) 2021-05-26 2021-05-26 Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110579518.5A CN113392726B (en) 2021-05-26 2021-05-26 Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene

Publications (2)

Publication Number Publication Date
CN113392726A CN113392726A (en) 2021-09-14
CN113392726B true CN113392726B (en) 2023-06-02

Family

ID=77619197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110579518.5A Active CN113392726B (en) 2021-05-26 2021-05-26 Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene

Country Status (1)

Country Link
CN (1) CN113392726B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926422B (en) * 2022-05-11 2023-07-04 西南交通大学 Method and system for detecting passenger flow of getting on and off vehicles

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132119A (en) * 2020-11-24 2020-12-25 科大讯飞(苏州)科技有限公司 Passenger flow statistical method and device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799935B (en) * 2012-06-21 2015-03-04 武汉烽火众智数字技术有限责任公司 Human flow counting method based on video analysis technology
CN104751491B (en) * 2015-04-10 2018-01-23 中国科学院宁波材料技术与工程研究所 A kind of crowd's tracking and people flow rate statistical method and device
CN105303193B (en) * 2015-09-21 2018-08-14 重庆邮电大学 A kind of passenger number statistical system based on single-frame images processing
CN106951885A (en) * 2017-04-08 2017-07-14 广西师范大学 A kind of people flow rate statistical method based on video analysis
CN111860390A (en) * 2020-07-27 2020-10-30 西安建筑科技大学 Elevator waiting number detection and statistics method, device, equipment and medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132119A (en) * 2020-11-24 2020-12-25 科大讯飞(苏州)科技有限公司 Passenger flow statistical method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113392726A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
Bautista et al. Convolutional neural network for vehicle detection in low resolution traffic videos
Al-Ghaili et al. Vertical-edge-based car-license-plate detection method
CN109033950B (en) Vehicle illegal parking detection method based on multi-feature fusion cascade depth model
CN106683119B (en) Moving vehicle detection method based on aerial video image
CN111738342B (en) Pantograph foreign matter detection method, storage medium and computer equipment
Asmaa et al. Road traffic density estimation using microscopic and macroscopic parameters
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN101482923A (en) Human body target detection and sexuality recognition method in video monitoring
CN111709416A (en) License plate positioning method, device and system and storage medium
Zhang et al. License plate localization in unconstrained scenes using a two-stage CNN-RNN
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN104978567A (en) Vehicle detection method based on scenario classification
CN106127812A (en) A kind of passenger flow statistical method of non-gate area, passenger station based on video monitoring
CN113255837A (en) Improved CenterNet network-based target detection method in industrial environment
Daramola et al. Automatic vehicle identification system using license plate
CN115131580B (en) Space target small sample identification method based on attention mechanism
CN108648210B (en) Rapid multi-target detection method and device under static complex scene
CN114596316A (en) Road image detail capturing method based on semantic segmentation
CN113205107A (en) Vehicle type recognition method based on improved high-efficiency network
CN113392726B (en) Method, system, terminal and medium for identifying and detecting head of person in outdoor monitoring scene
CN113326846B (en) Rapid bridge apparent disease detection method based on machine vision
CN116934820A (en) Cross-attention-based multi-size window Transformer network cloth image registration method and system
CN114663839B (en) Method and system for re-identifying blocked pedestrians
Pratomo et al. Parking detection system using background subtraction and HSV color segmentation
CN115393802A (en) Railway scene unusual invasion target identification method based on small sample learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant