CN113269065B - Method for counting people flow in front of screen based on target detection algorithm - Google Patents

Method for counting people flow in front of screen based on target detection algorithm Download PDF

Info

Publication number
CN113269065B
CN113269065B CN202110530344.3A CN202110530344A CN113269065B CN 113269065 B CN113269065 B CN 113269065B CN 202110530344 A CN202110530344 A CN 202110530344A CN 113269065 B CN113269065 B CN 113269065B
Authority
CN
China
Prior art keywords
target
frame
frames
historical
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110530344.3A
Other languages
Chinese (zh)
Other versions
CN113269065A (en
Inventor
雷李义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Image Data Technology Co ltd
Original Assignee
Shenzhen Image Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Image Data Technology Co ltd filed Critical Shenzhen Image Data Technology Co ltd
Priority to CN202110530344.3A priority Critical patent/CN113269065B/en
Publication of CN113269065A publication Critical patent/CN113269065A/en
Application granted granted Critical
Publication of CN113269065B publication Critical patent/CN113269065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a method for counting the flow of people in front of a screen based on a target detection algorithm, which comprises a historical information recording base, wherein a plurality of historical people head frames are stored in the historical information recording base; receiving a video, wherein the video is divided into a plurality of single-frame images according to the time sequence; performing feature extraction on the single-frame image through a target detection neural network model to obtain a plurality of target frames containing category information and position information; filtering the target frames according to the category information and the position information to obtain a plurality of target face frames and a plurality of target head frames; and comparing the target person head frame with the historical person head frame and the target person face frame one by one respectively, and further counting the flow number of the person and the screen attention number.

Description

Method for counting people flow in front of screen based on target detection algorithm
Technical Field
The application relates to the technical field of face recognition, in particular to a method for counting the flow of people in front of a screen based on a target detection algorithm.
Background
With the development and progress of artificial intelligence technology, people flow statistics technology based on video streams has been rapidly developed and has been applied to many public scenes such as scenic spots, communities and markets. However, the existing application usually focuses on tracking and counting people, and lacks statistics on aspects such as people focusing on a screen, and cannot count the attraction of contents displayed on the screen to users, but the statistical data on the aspect is very important for a commercial screen.
Content of application
Technical problem to be solved
The application provides a method for counting the flow of people in front of a screen based on a target detection algorithm, which solves the technical problem that in the prior art, only the flow of people in front of the screen can be counted, but the screen attention number concerning the screen content cannot be counted.
(II) technical scheme
In order to achieve the above purpose, the present application provides the following technical solutions:
a method for counting the flow of people in front of a screen based on a target detection algorithm comprises the following steps:
step S1: a history information database storing a plurality of history head frames;
step S2: receiving a video, wherein the video is divided into a plurality of single-frame images according to the time sequence;
and step S3: performing feature extraction on the single-frame image through a target detection neural network model to obtain a plurality of target frames containing category information and position information;
and step S4: filtering the target frames according to the category information and the position information to obtain a plurality of target face frames and a plurality of target head frames;
step S5: compare the target person head frame with a plurality of historical person head frames one by one, output a first matching value each time the comparison is made, and judge that the first matching value > a first threshold? If the first matching value is larger than the first threshold value, the target pedestrian head frame and the historical pedestrian head frame are judged to be the same pedestrian, the pedestrian flow number is not updated, otherwise, the target pedestrian head frame and the historical pedestrian head frame are judged to be different pedestrians, and the pedestrian flow number is added by 1; until the comparison of all target person head frames is completed;
step S6: compare the target face frame with the plurality of target face frames one by one, output a second matching value after the comparison is completed, and judge that the second matching value is greater than a second threshold? If the second matching value is larger than the second threshold value, judging that the target person head frame has the concerned screen content, namely adding 1 to the screen concerned number, otherwise, judging that the target person head frame does not pay attention to the screen content, and not updating the screen concerned number; until all target people finish comparing the head money;
step S7: and updating the historical information base, replacing the corresponding historical person head frame in the historical information base with the target person head frame when the first matching value is larger than the first threshold value, and adding the target person head frame into the historical information base and marking as the historical person head frame when the first matching value is smaller than the first threshold value.
Preferably, the target neural detection network model is established based on an SSD target detection algorithm and is obtained by inputting a real human head image and a human face image with a limited angle, and the limited angle range is a horizontal rotation angle of-45 ° to +45 °.
Preferably, the category information includes a head image, a face image and a background image, and the position information is a relative coordinate [ x ] of the target frame in the single-frame image 0 ,y 0 ,x 1 ,y 1 ]。
Preferably, step S4 comprises:
step S41: filtering the long-distance target frame: calculating the width-height product of the target frame according to the relative coordinates of the target frame, and filtering the target frame with the width Gao Chengji smaller than 0.03;
step S42: filtering static target frames: obtaining central coordinate offsets dx and dy and width and height offsets dw and dh between different target frames according to the relative coordinates of the target frames, judging that the target frames with the dx, dy, dw and dh smaller than 0.02 are static in the single-frame images, counting the frame number of the different single-frame images, and judging that the object corresponding to the target frame is a static target and filtering when the static accumulated time of the target frame exceeds 1 minute;
step S43: and obtaining a target human head frame and a target human face frame for the residual target frames filtered in the step S41 and the step S42 according to the category information.
Preferably, the video is a real-time video or a historical video, and the single-frame image is a video picture captured every 0.2 seconds in the video.
Preferably, the first matching value is an intersection ratio of the target human head frame and the historical human head frame, the value range of the first matching value is 0 to 1, the first threshold value is 0, when the first matching value is greater than the first threshold value 0, it is determined that the pedestrians corresponding to the target human head frame and the historical human head frame are the same pedestrian, otherwise, the pedestrians are different pedestrians, and at this time, the pedestrian flow number is increased by 1.
Preferably, step S5 further comprises a verification of the human flow number: counting lines are arranged on two sides of the edge of the picture and used for detecting the entering or leaving of pedestrians in the picture and recording the number of entering people and the number of leaving people; for the target human head frame and the historical human head frame which are judged as the same pedestrian, judging the motion direction of the corresponding pedestrian according to the positions of the target human head frame and the historical human head frame between the two counting lines, and recording the entering and leaving data for the same tracked pedestrian only once;
checking the flow number of people, the number of people entering and the number of people leaving for a period of time to obtain the flow number of the checked people: the check man flow number = [ (number of entering persons + number of leaving persons)/2 + number of man flows ]/2.
Preferably, the second matching value includes a similarity value and a matching frequency, where the similarity value is an intersection and combination ratio of the target human head frame and the target human face frame, a value range of the similarity value is 0 to 1, the matching frequency is a frequency of successful matching between the target human head frame and the target human face frame, the second threshold includes a similarity threshold of 0.3 and a matching threshold of 15 times, that is, when the similarity value is greater than the similarity threshold, the target human head frame and the target human face frame are determined as the same pedestrian, and when the matching frequency is greater than the matching threshold, the screen attention number is increased by 1.
Preferably, updating the history information base further comprises counting the number of times of no matching success of the history head box, and deleting the history head box when the number of times of no matching of the history head box exceeds 5 times.
(III) advantageous effects
Compared with the prior art, the beneficial effects of this application are:
the application provides a method for counting the flow of people before a screen based on a target detection algorithm, wherein a target detection neural network model of image data with heads and image data with limited angles is established to output a target head frame of the pedestrian passing in a period of time before the screen, and the target head frame is tracked and synchronized with historical head frame information to complete the flow of people before the screen in a period of time; and counting the number of pedestrians who pass through the screen and pay attention to the screen content within a period of time, namely the screen attention number, by adopting the mutual matching of the target human face frame and the target human face frame with the angle limit.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application and not to limit the application, in which:
FIG. 1 shows an overall flow diagram of an embodiment of the present application;
FIG. 2 illustrates a target detection neural network model logic diagram of an embodiment of the present application;
FIG. 3 shows a flowchart of step S4 of an embodiment of the present application;
fig. 4 shows an overall logic diagram of an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1 and 4, the embodiment of the present application discloses a method for counting people flow in front of a screen based on a target detection algorithm, which is mainly used for advertisement screens in shopping malls and scenic spots with large people flow and is used for counting the people flow and the screen attention in front of the screen within a period of time, and the method comprises the following steps:
step S1: recording a historical information base, wherein a plurality of historical people head frames are stored in the historical information base;
step S2: receiving a video, wherein the video is divided into a plurality of single-frame images according to the time sequence; specifically, the video is a real-time video or a historical video, the single-frame image is a video picture captured every 0.2 seconds in the video, and in this embodiment, the video can be acquired by using a camera installed on a screen.
And step S3: performing feature extraction on the single-frame image through a target detection neural network model to obtain a plurality of target frames containing category information and position information;
specifically, the target detection neural network model is established based on an SSD target detection algorithm and is obtained by inputting a real human head image and a human face image with a limited angle for training, wherein the limited angle range is a horizontal rotation angle of-45 degrees to +45 degrees, so that the human face image which looks at the screen in front of the screen is screened subsequently, and the number of people paying attention to the screen is counted; referring to fig. 2, the image firstly extracts the bottom visual features for the small-scale target through 15 layers of directly connected convolutional neural networks, then extracts the middle visual features for the medium-scale target through 6 layers of directly connected convolutional neural networks, and then further extracts the high-level visual features for the large-scale target through 6 layers of directly connected convolutional neural networks. Performing regression on the visual features of the three layers through two independent two-layer convolutional neural networks respectively to obtain category information and position information of the target frame; because the target detection neural network model can output a large number of overlapped target frames, and because the non-maximum suppression algorithm can screen the target frames with high overlapping degree, only the target frames with high confidence coefficient are reserved, and the overlapped target frames with low confidence coefficient are removed, the target frames with category information and position information output by the target neural network model are filtered by the non-maximum suppression algorithm, and finally the target frames with category information and position information are output.
The category information comprises a human head image, a human face image and a background image, and the position information is the relative coordinate [ x ] of the target frame in the single-frame image 0 ,y 0 ,x 1 ,y 1 ]. (ii) a The human face image is a human face with a limited angle of-45 degrees to +45 degrees, and comprises a human face image with a limited angle of 0 degrees, namely the situation that a pedestrian stands in front of a screen to look directly at the screen, a human face image with a limited angle of 0 degrees to 45 degrees, namely the situation that the pedestrian turns left to look at the screen in the walking process, and a human face image with a limited angle of 0 degrees to-45 degrees, namely the situation that the pedestrian turns right to look at the screen in the walking process.
And step S4: filtering the target frames according to the category information and the position information to obtain a plurality of target face frames and a plurality of target head frames;
referring to fig. 3, step S4 includes:
step S41: filtering the long-distance target frame: calculating the width-height product of the target frame according to the relative coordinates of the target frame, and filtering the target frame with the width Gao Chengji less than 0.03; specifically, the pedestrian flow in front of the screen mainly considers the pedestrian flow statistics that the screen content can be seen near the screen, so that the pedestrians far away from the screen are firstly filtered, and only the pedestrians within a certain distance range from the screen are counted. The average size of human head frames and human face frames corresponding to different distances is obtained through field measurement, the target frames obtained by a target detection neural network model are traversed in the statistical process, the target frames with undersize and unqualified sizes are removed, in the embodiment, a camera of a screen is tested, and the width-height product of the target frames at the position 3 meters away from the screen is found to be about 0.03 through the test, so that the target frames beyond 2 meters are not counted in the embodiment, and therefore the target frames with the width Gao Chengji smaller than 0.03 are filtered.
Step S42: filtering static target frames: obtaining central coordinate offsets dx and dy and width and height offsets dw and dh between different target frames according to the relative coordinates of the target frames, judging that the target frames with the dx, dy, dw and dh smaller than 0.02 are static in the single-frame images, counting the frame number of the different single-frame images, and judging that the object corresponding to the target frame is a static target and filtering when the static accumulated time of the target frame exceeds 1 minute; specifically, because the environment in a market is complex, a billboard with a portrait may exist in the background of the picture, and therefore filtering of the static target frame is added. Comparing the positions and sizes of a target frame in the current single-frame image and a historical human head frame in a historical information base, when the difference between the positions and the sizes is smaller than a certain threshold value, the target of the current frame is considered to be in a static state, namely the central coordinate offsets dx and dy and the width and height offsets dw and dh between different target frames, and setting the target frames with dx, dy, dw and dh smaller than 0.02 to judge that the target frames are static in the single-frame image; and counting the number of frames of the single-frame image with the target frame, considering the target as a static background target when the static time or the number of times of the target frame exceeds a certain threshold, namely the static accumulated time of the target frame exceeds 1 minute or the number of times of the target frame at the position exceeds 300 times, determining that the object corresponding to the target frame at the position is static, and filtering the target frame at the position detected later.
Step S43: and obtaining a target human head frame and a target human face frame for the residual target frames filtered in the step S41 and the step S42 according to the category information.
Step S5: comparing the target person head frame with a plurality of historical person head frames one by one, outputting a first matching value each time of comparison, and judging that the first matching value is greater than a first threshold value? If the first matching value is larger than the first threshold value, the target pedestrian head frame and the historical pedestrian head frame are judged to be the same pedestrian, the pedestrian flow number is not updated, otherwise, the target pedestrian head frame and the historical pedestrian head frame are judged to be different pedestrians, and the pedestrian flow number is added by 1; until the comparison of all target person head frames is completed; specifically, the first matching value is an intersection ratio of the target human head frame and the historical human head frame, the value range of the first matching value is 0 to 1, the first threshold value is 0, when the first matching value is larger than the first threshold value 0, it is determined that the pedestrians corresponding to the target human head frame and the historical human head frame are the same pedestrian, otherwise, the pedestrians are different pedestrians, and the pedestrian flow number is increased by 1. And traversing and comparing the target head frame of the current frame with the historical head frames in the historical record library, and judging whether the two target frames are the same pedestrian or not by calculating the intersection and parallel ratio between the two frames.
If the current real target person head frame is unsuccessfully matched with the historical person head frame in the historical information base, judging that the current real target person head frame is different pedestrians, and adding 1 to the number of the pedestrian volume;
if the target human head frame of the current frame is successfully matched with the historical human head frame of the historical information base, the same pedestrian is judged, and then the flow number of the person is further verified through verifying the position change of the target frames judged as the same pedestrian in different time in the picture to obtain the flow number of the verified person: counting lines are arranged on two sides of the edge of the picture and used for detecting the entering or leaving of pedestrians in the picture and recording the number of entering people and the number of leaving people; for the target human head frame and the historical human head frame which are judged as the same pedestrian, judging the motion direction of the corresponding pedestrian according to the positions of the target human head frame and the historical human head frame between the two counting lines, and recording the entering and leaving data for the same tracked pedestrian only once;
checking the flow number of people, the number of people entering and the number of people leaving for a period of time to obtain the flow number of the checked people: the check man flow number = [ (number of entering persons + number of leaving persons)/2 + number of man flows ]/2. The finally calculated flow number of the examiners is the flow number of the persons passing through the screen within a period of time, and the period of time can be adjusted according to actual use scenes and is generally set to be 1 day.
Because the angle of the face is limited in the target detection neural network model, the pedestrian corresponding to the detected face frame can be considered to face the screen and can see the content displayed on the screen, and therefore the number of times of attention of the pedestrian to the screen, namely the screen attention number, is counted by matching the target human head frame with the target face frame. As step S6:
step S6: compare the target face frame with the plurality of target face frames one by one, output a second matching value after the comparison is completed, and judge that the second matching value is greater than a second threshold? If the second matching value is larger than the second threshold value, judging that the target person head frame has the concerned screen content, namely adding 1 to the screen concerned number, otherwise, judging that the target person head frame does not pay attention to the screen content, and not updating the screen concerned number; until all target people finish comparing the head money;
specifically, the second matching value includes a similarity value and matching times, where the similarity value is an intersection and comparison of the target human head frame and the target human face frame, a value range of the similarity value is 0 to 1, the matching times are times of successful matching of the target human head frame and the target human face frame, the second threshold includes a similarity threshold of 0.3 and a matching threshold of 15 times, that is, when the similarity value is greater than the similarity threshold, the target human head frame and the target human face frame are determined as the same pedestrian, and when the matching times are greater than the matching threshold, the screen attention number is increased by 1. Specifically, a target human head frame and a target human face frame of a current single-frame image are subjected to traversal comparison, an intersection ratio between the two frames, namely a similarity value, is calculated, when the similarity value is larger than a similarity threshold value of 0.3, the target human head frame and the target human face frame are considered to be the same pedestrian, then the matching times of the target human head frame and the target human face frame are recorded, and when the matching times exceed the matching threshold value of 15 times, it is judged that the pedestrian is concerned to annotate a screen, namely the screen attention number is increased by 1.
Because the camera is installed on the screen, only pictures at a horizontal angle can be shot, and a single-frame image overlapped by pedestrians can be shot in an occasion where the overlapping of the pedestrians is difficult to avoid, the target frame information of the current single-frame image cannot be directly used by the historical people frame recorded in the historical information base, and the historical information base needs to be updated by adopting a strategy like the step S7, so that the historical people frame corresponding to the pedestrian in the historical information base cannot be lost when the pedestrian disappears in the picture and reappears in the picture in the case of short overlapping.
Step S7: and updating the historical information base, replacing the corresponding historical person head frame in the historical information base with the target person head frame when the first matching value is larger than the first threshold value, and adding the target person head frame into the historical information base and marking as the historical person head frame when the first matching value is smaller than the first threshold value. Further, updating the historical information base also comprises counting the times of no matching success of the historical person head frame, and when the times of no matching of the historical person head frame exceeds 5 times, deleting the historical person head frame.
It should be noted that although embodiments of the present application have been shown and described, it would be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A method for counting the flow of people in front of a screen based on a target detection algorithm is characterized by comprising the following steps:
step S1: a history information database storing a plurality of history head frames;
step S2: receiving a video, wherein the video is divided into a plurality of single-frame images according to the time sequence;
and step S3: performing feature extraction on the single-frame image through a target detection neural network model to obtain a plurality of target frames containing category information and position information; the target detection neural network model is established based on an SSD target detection algorithm and is obtained by inputting a real human head image and a human face image with a limited angle for training, wherein the limited angle range is a horizontal rotation angle of-45 degrees to +45 degrees;
and step S4: filtering the target frames according to the category information and the position information to obtain a plurality of target face frames and a plurality of target head frames;
step S5: comparing the target human head frame with a plurality of historical human head frames one by one, outputting a first matching value every time of comparison, wherein the first matching value is the intersection and comparison of the target human head frame and the historical human head frames, setting the value range of the first matching value to be 0-1 and a first threshold value 0, judging whether the first matching value is larger than the first threshold value 0, if the first matching value is larger than the first threshold value 0, judging that the target human head frame and the historical human head frames are the same pedestrian, the pedestrian flow number is not updated, otherwise, judging that the target human head frame and the historical human head frames are different pedestrians, and adding 1 to the pedestrian flow number; until the comparison of all target person head frames is completed;
step S6: comparing the target human head frame with the plurality of target human face frames one by one, outputting a second matching value after the comparison is finished, judging whether the second matching value is greater than a second threshold value, if the second matching value is greater than the second threshold value, judging that the target human head frame has attention screen content, namely adding 1 to the screen attention number, otherwise, judging that the target human head frame does not pay attention to the screen content, and not updating the screen attention number; until all the target people finish comparing the head money; specifically, the second matching value includes a similarity value and a matching frequency, wherein the similarity value is an intersection and combination ratio of a target human head frame and a target human face frame, the value range of the similarity value is 0 to 1, the matching frequency is the frequency of successful matching of the target human head frame and the target human face frame, the second threshold includes a similarity threshold value of 0.3 and a matching threshold value of 15 times, namely when the similarity value is greater than the similarity threshold value, the target human head frame and the target human face frame are determined as the same pedestrian, and when the matching frequency is greater than the matching threshold value, the screen attention number is increased by 1;
step S7: and updating the historical information base, replacing the corresponding historical person head frame in the historical information base with the target person head frame when the first matching value is larger than the first threshold value, and adding the target person head frame into the historical information base and marking as the historical person head frame when the first matching value is smaller than the first threshold value.
2. The method according to claim 1, wherein the category information comprises a head image, a face image and a background image, and the position information is a relative coordinate [ x ] of the target frame in the single frame image 0 ,y 0 ,x 1 ,y 1 ]。
3. The method for counting the flow of people before the screen based on the target detection algorithm as claimed in claim 2, wherein the step S4 comprises:
step S41: filtering the long-distance target frame: calculating the width-height product of the target frame according to the relative coordinates of the target frame, and filtering the target frame with the width Gao Chengji smaller than 0.03;
step S42: filtering static target frames: obtaining central coordinate offsets dx and dy and width and height offsets dw and dh between different target frames according to the relative coordinates of the target frames, judging that the target frames with the dx, dy, dw and dh smaller than 0.02 are static in the single-frame images, counting the frame number of the different single-frame images, and judging that the object corresponding to the target frame is a static target and filtering when the static accumulated time of the target frame exceeds 1 minute;
step S43: and obtaining a target human head frame and a target human face frame for the residual target frames filtered in the step S41 and the step S42 according to the category information.
4. The method according to claim 1, wherein the video is a real-time video or a historical video, and the single-frame image is a video frame captured every 0.2 seconds in the video.
5. The method for counting the flow of people before the screen based on the target detection algorithm as claimed in claim 4, wherein the step S5 further comprises the verification of the flow number of people: counting lines are arranged on two sides of the edge of the picture and used for detecting the entering or leaving of pedestrians in the picture and recording the number of entering people and the number of leaving people; for the target human head frame and the historical human head frame which are judged as the same pedestrian, judging the motion direction of the corresponding pedestrian according to the positions of the target human head frame and the historical human head frame between the two counting lines, and recording the entering and leaving data for the same tracked pedestrian only once;
checking the flow number of people, the number of people entering and the number of people leaving for a period of time to obtain the flow number of the checked people: the check man flow number = [ (number of entering persons + number of leaving persons)/2 + number of man flows ]/2.
6. The method of claim 1, wherein updating the historical information base further comprises counting the number of times that no matching of the historical people header box is successful, and when the number of times that no matching of the historical people header box is successful exceeds 5 times, deleting the historical people header box.
CN202110530344.3A 2021-05-14 2021-05-14 Method for counting people flow in front of screen based on target detection algorithm Active CN113269065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110530344.3A CN113269065B (en) 2021-05-14 2021-05-14 Method for counting people flow in front of screen based on target detection algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110530344.3A CN113269065B (en) 2021-05-14 2021-05-14 Method for counting people flow in front of screen based on target detection algorithm

Publications (2)

Publication Number Publication Date
CN113269065A CN113269065A (en) 2021-08-17
CN113269065B true CN113269065B (en) 2023-02-28

Family

ID=77230919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110530344.3A Active CN113269065B (en) 2021-05-14 2021-05-14 Method for counting people flow in front of screen based on target detection algorithm

Country Status (1)

Country Link
CN (1) CN113269065B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844234A (en) * 2016-03-21 2016-08-10 商汤集团有限公司 People counting method and device based on head shoulder detection
CN108647612A (en) * 2018-04-28 2018-10-12 成都睿码科技有限责任公司 Billboard watches flow of the people analysis system
CN108805619A (en) * 2018-06-07 2018-11-13 肇庆高新区徒瓦科技有限公司 A kind of stream of people's statistical system for billboard
CN111353461A (en) * 2020-03-11 2020-06-30 京东数字科技控股有限公司 Method, device and system for detecting attention of advertising screen and storage medium
CN111832465A (en) * 2020-07-08 2020-10-27 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
CN112036345A (en) * 2020-09-04 2020-12-04 京东方科技集团股份有限公司 Method for detecting number of people in target place, recommendation method, detection system and medium
WO2020252924A1 (en) * 2019-06-19 2020-12-24 平安科技(深圳)有限公司 Method and apparatus for detecting pedestrian in video, and server and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10867391B2 (en) * 2018-09-28 2020-12-15 Adobe Inc. Tracking viewer engagement with non-interactive displays
CN110166830A (en) * 2019-05-27 2019-08-23 航美传媒集团有限公司 The monitoring system of advertisement machine electronic curtain

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844234A (en) * 2016-03-21 2016-08-10 商汤集团有限公司 People counting method and device based on head shoulder detection
CN108647612A (en) * 2018-04-28 2018-10-12 成都睿码科技有限责任公司 Billboard watches flow of the people analysis system
CN108805619A (en) * 2018-06-07 2018-11-13 肇庆高新区徒瓦科技有限公司 A kind of stream of people's statistical system for billboard
WO2020252924A1 (en) * 2019-06-19 2020-12-24 平安科技(深圳)有限公司 Method and apparatus for detecting pedestrian in video, and server and storage medium
CN111353461A (en) * 2020-03-11 2020-06-30 京东数字科技控股有限公司 Method, device and system for detecting attention of advertising screen and storage medium
CN111832465A (en) * 2020-07-08 2020-10-27 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
CN112036345A (en) * 2020-09-04 2020-12-04 京东方科技集团股份有限公司 Method for detecting number of people in target place, recommendation method, detection system and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于智能视频分析的户外广告效果评估系统;李尧;《中国优秀硕士学位论文全文数据库》;20190215(第02期);第1-64页 *

Also Published As

Publication number Publication date
CN113269065A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
US10735694B2 (en) System and method for activity monitoring using video data
CN105139425B (en) A kind of demographic method and device
CN109690624A (en) Automatic scene calibration method for video analysis
CN109190508B (en) Multi-camera data fusion method based on space coordinate system
US11048948B2 (en) System and method for counting objects
CN106980829B (en) Abnormal behaviour automatic testing method of fighting based on video analysis
CN104813339B (en) Methods, devices and systems for detecting objects in a video
CN108154110B (en) Intensive people flow statistical method based on deep learning people head detection
CN108229335A (en) It is associated with face identification method and device, electronic equipment, storage medium, program
CN101095149A (en) Image comparison
CN103605971B (en) Method and device for capturing face images
Tang et al. ESTHER: Joint camera self-calibration and automatic radial distortion correction from tracking of walking humans
CN109583373B (en) Pedestrian re-identification implementation method
Celik et al. Towards a robust solution to people counting
CN110969118A (en) Track monitoring system and method
CN112541434B (en) Face recognition method based on central point tracking model
CN110111565A (en) A kind of people's vehicle flowrate System and method for flowed down based on real-time video
Wei et al. City-scale vehicle tracking and traffic flow estimation using low frame-rate traffic cameras
CN110298268A (en) Method, apparatus, storage medium and the camera of the single-lens two-way passenger flow of identification
CN110705366A (en) Real-time human head detection method based on stair scene
CN115273208A (en) Track generation method, system and device and electronic equipment
CN113269065B (en) Method for counting people flow in front of screen based on target detection algorithm
CN116824641A (en) Gesture classification method, device, equipment and computer storage medium
CN109215150A (en) Face is called the roll and method of counting and its system
CN116797961A (en) Picture acquisition method and device for moving sphere, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant