CN116524431A - Method and device for detecting number of people, electronic equipment, storage medium and conference control system - Google Patents
Method and device for detecting number of people, electronic equipment, storage medium and conference control system Download PDFInfo
- Publication number
- CN116524431A CN116524431A CN202310384854.3A CN202310384854A CN116524431A CN 116524431 A CN116524431 A CN 116524431A CN 202310384854 A CN202310384854 A CN 202310384854A CN 116524431 A CN116524431 A CN 116524431A
- Authority
- CN
- China
- Prior art keywords
- video frames
- target video
- people
- time
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000001514 detection method Methods 0.000 claims abstract description 199
- 238000003384 imaging method Methods 0.000 claims abstract description 13
- 238000013135 deep learning Methods 0.000 claims description 14
- 230000003139 buffering effect Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- PICXIOQBANWBIZ-UHFFFAOYSA-N zinc;1-oxidopyridine-2-thione Chemical class [Zn+2].[O-]N1C=CC=CC1=S.[O-]N1C=CC=CC1=S PICXIOQBANWBIZ-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Abstract
The application provides a method, a device, an electronic device, a storage medium and a meeting control system for detecting the number of people, wherein the meeting control system comprises: the video imaging module is used for acquiring video data of the target area and transmitting the acquired video data to the people number detection module; the people number detection module is used for receiving the video data transmitted by the video imaging module, detecting the number of people according to N target video frames taking the current detection moment as the end moment, obtaining a number of people detection result, and uploading the number of people detection result to the conference control platform. The conference control system can effectively protect personal privacy on the premise of ensuring that the detection of the number of people has higher accuracy.
Description
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a method and apparatus for detecting a number of people, an electronic device, a storage medium, and a conference control system.
Background
In order to respond to the environmental protection concepts of carbon peak and carbon neutralization, a high-efficiency conference control management system is indispensable. Whether the system is efficient or not is controlled, and the conference room resources are controlled according to whether the system can accurately identify the using information of the conference room or not.
The main means for acquiring the information of the number of people by the current meeting control system is PIR (human pyroelectric infrared sensor) technology and video technology.
The PIR technology can provide better privacy protection by sensing whether temperature change occurs in a designated area or not so as to judge whether people exist in a scene or not. However, the PIR has a low accuracy, and cannot detect a target in a completely stationary state, and has a high report missing rate.
The video technology detects the specific number of people in the picture by using deep learning, so that the accuracy is higher, but the meeting room scene has higher requirements on privacy, and the simple video technology cannot meet the requirements.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, an electronic device, a storage medium, and a conference control system for detecting the number of people, so as to effectively protect privacy of the person on the premise of ensuring that the number of people detection has higher accuracy.
Specifically, the application is realized by the following technical scheme:
according to a first aspect of embodiments of the present application, there is provided a conference control system, including:
the system comprises a video imaging module, a people number detection module and a conference control platform; wherein:
the video imaging module is used for acquiring video data of the target area and transmitting the acquired video data to the people number detection module;
the people number detection module is used for receiving the video data transmitted by the video imaging module, and detecting the number of people according to N target video frames taking the current detection moment as the end moment to obtain a people number detection result;
the people number detection module is also used for uploading the people number detection result to the conference control platform.
According to a second aspect of embodiments of the present application, there is provided a person number detection method, including:
acquiring video data of a target area in real time;
detecting the number of people according to N target video frames taking the current detection time as the end time, and obtaining a detection result of the number of people; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored.
According to a third aspect of the embodiments of the present application, there is provided a person number detection apparatus including:
an acquisition unit for acquiring video data of a target area in real time;
the detection unit is used for detecting the number of the people according to N target video frames taking the current detection time as the end time to obtain a number detection result; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored.
According to a fourth aspect of embodiments of the present application, there is provided an electronic device comprising a processor and a memory storing machine executable instructions executable by the processor for executing the machine executable instructions to implement the method provided in the second aspect.
According to a fifth aspect of embodiments of the present application, there is provided a storage medium having stored therein machine executable instructions which when executed by a processor implement the method provided in the second aspect.
The technical scheme that this application provided can bring following beneficial effect at least:
acquiring video data of a target area in real time through a video imaging module, and transmitting the acquired video data to a face detection module; the face detection module detects the number of the people according to N target video frames taking the current detection time as the end time, and transmits the number detection result to the conference control platform, and the accuracy of the number detection is improved in a video detection mode; in addition, in the process of people number detection, the video frames used for people number detection, namely the target video frames, are not displayed, and other video frames except the video frames currently used for people number detection are not stored, so that the risk of personal privacy information disclosure is reduced, and privacy protection is effectively realized.
Drawings
FIG. 1 is a schematic diagram of a control system according to an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram of a people detection module shown in an exemplary embodiment of the present application;
fig. 3 is a flow chart illustrating a method for detecting the number of people according to an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of a buffered frame at time t1 according to an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram of a buffered frame at time t2 according to an exemplary embodiment of the present application;
fig. 6 is a schematic structural view of a person number detection device shown in an exemplary embodiment of the present application;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to better understand the technical solutions provided by the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more obvious, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.
It should be noted that, the sequence number of each step in the embodiment of the present application does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
Referring to fig. 1, a schematic structural diagram of a conference control system according to an embodiment of the present application is shown in fig. 1, where the conference control system may include: a video imaging module 110, a person number detection module 120, and a conference control platform 130; wherein:
the video imaging module 110 is configured to acquire video data of a target area, and transmit the acquired video data to the number-of-people detection module 120;
the person number detection module 120 is configured to receive the video data transmitted by the video imaging module 110, detect the number of persons according to N target video frames with the current detection time as the end time, obtain a person number detection result, and transmit the person number detection result to the conference control platform 130; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored;
the people number detection module 120 is further configured to upload the people number detection result to the conference control platform.
The session control platform 130 may be configured to control the designated device of the target area according to the number of people detection result, or record/count the number of people detection result.
By way of example, the designated devices may include, but are not limited to, one or more of an air conditioner, a fan, a lamp or projector, a display device, etc. of the target area.
By way of example, the target area may be an area where people detection is required, such as a conference room.
In the embodiment of the application, in order to reasonably control the designated device of the target area, the video imaging module may, for example, obtain the video data of the target area through a video acquisition device (such as a camera) disposed in the target area, and transmit the obtained video data to the number detection module.
The people number detection module can detect the number of people according to the received video data.
In the embodiment of the application, in order to provide better privacy protection under the condition of ensuring the accuracy of the detection of the number of people in the target area, the target video frames are not displayed in the process of detecting the number of people, and other video frames except for N target video frames taking the current detection time as the end time are not stored, so that the privacy protection is realized to the greatest extent.
For example, taking the current detection time as the time t1 as an example, the people number detection module may perform people number detection on N consecutive video frames with the time t1 as the end time.
Wherein, N is more than or equal to 2, and the specific value of N can be determined according to the requirement of the adopted people number detection algorithm so as to ensure the accuracy of people number detection.
It should be noted that, in the embodiment of the present application, the mentioned not-to-save emphasis on video frames is for people detection, and the related video frames need not be stored, but the embodiment of the present application is not limited to storing video frames for other purposes.
The person number detection module can transmit the person number detection result to the conference control platform under the condition that the person number detection module obtains the person number detection result of the target area.
The conference control platform can control the appointed equipment of the target area according to the received detection result of the number of people.
In some embodiments, as shown in fig. 2, the people detection module may include: the cache area and the algorithm identification module; wherein:
the buffer area is used for buffering N target video frames taking the current detection time as the end time;
the algorithm identification module is used for detecting N target video frames which are cached in the cache area and take the current detection time as the end time by using a deep learning method, so as to obtain a person number detection result. Specifically, the detection result of the number of people can be obtained through a head-shoulder detection mode, and the detection result can be obtained through other deep learning algorithms.
For example, in order to better realize privacy protection while ensuring the implementation of the people count detection, a buffer area for storing (buffering) the target video frames may be provided, the target video frames currently used for the people count detection may be stored in the buffer area, and other acquired video frames may not need to be stored.
In addition, in order to further improve the privacy protection effect, the head-shoulder detection method may be used for detecting the number of people.
The algorithm identification module in the people number detection module can adopt a deep learning method to detect the head and the shoulder of N target video frames which are cached in the cache area and take the current detection time as the end time, so as to obtain a people number detection result.
In one example, the buffer may be specifically configured to store N target video frames with the current detection time as the end time, so as to cover N target video frames with the last detection time as the end time.
For example, the buffer may store the target video frame currently used for people detection in an overlay storage manner.
Correspondingly, for video data acquired in real time, N target video frames with the current detection time as the end time, that is, the target video frames currently used for detecting the number of people, may be stored in a preset buffer area to cover N target video frames with the previous detection time as the end time.
For example, assuming that the time t2 is the current time and the time t1 is the previous time, the target video frames buffered at the time t1 include video frames 1 to N, and the N target video frames with the time t2 as the end time include video frames 2 to (n+1), at the time t2, the video frames 2 to (n+1) may be stored in the value buffer to cover the video frames 1 to N.
As an example, the buffer may be specifically configured to delete the first stored target video frame and store the currently acquired target video frame when the number of stored target video frames reaches N frames.
For example, in order to improve the storage efficiency of the target video frame, the buffer may implement the overlay storage of the target video frame in a first-in first-out manner.
For example, for a currently acquired target video frame, in a case where the target video frame needs to be stored in the buffer, it may be determined whether the number of target video frames stored in the current buffer is up to N.
Under the condition that the number of the target video frames stored in the preset buffer zone reaches N, the target video frames stored in the preset buffer zone at first can be deleted, and the target video frames acquired at present are stored in the preset buffer zone.
It should be noted that, in the embodiment of the present application, when the number of the target video frames stored in the preset buffer area does not reach N for the currently acquired target video frames, deletion of the target video frames may not be required, and the currently acquired target video frames may be stored in the preset buffer area.
In some embodiments, the target video frame may include: and acquiring video frames in real time.
For example, in order to improve the accuracy of the people number detection, the target video frame may include video frames acquired in real time, that is, the people number detection may be performed on N consecutive video frames acquired in real time, to obtain the people number detection result.
In other embodiments, the target video frame may include: and extracting the obtained video frames from the video data acquired in real time according to a preset interval.
For example, considering that a person in a target area (e.g., a conference room) does not generally move at a high speed, the target video frame may further include a video frame extracted from video data acquired in real time at a preset interval.
For example, 1 target video frame may be extracted every 1 video frame.
In some embodiments, the frequency of people count detection is 1/M of the frame rate of the video data; m is more than or equal to 1, and M is a positive integer.
In one example, m=1.
For example, in order to realize real-time people detection, the frequency of people detection may be the frame rate of real-time video data, that is, 1 video frame may be acquired every new time, and 1 people detection may be performed (when the number of acquired video frames reaches N frames), so that the number of people in the target area may be detected in real time, and further, the real-time performance of resource control in the target area is improved.
In another example, M > 1.
For example, in consideration of that the number of people in the target area does not change greatly in a period of several frames in most cases, in order to reduce the workload of the number of people detection, the frequency of carrying out the number of people detection may be smaller than the frame rate of real-time video data, i.e., 1 number of people detection may be carried out without acquiring 1 video frame every new time.
For example, the frequency of the people count detection may be 1/2 of the frame rate of the real-time video data (i.e., m=2), i.e., 1 people count detection is performed every 2 video frames are newly acquired.
It should be noted that, in the embodiment of the present application, in a case where the target video frame includes a video frame acquired in real time, M may be greater than or equal to 1; under the condition that the target video frames comprise video frames extracted from the acquired real-time video data according to preset intervals (the assumed intervals are M, M is more than or equal to 1), M is more than or equal to m+1, and M is an integer multiple of (m+1), so that the number of people in the same target video frames is prevented from being detected for multiple times.
For example, if m=1, i.e., 1 target video frame is extracted every 1 frame, if m=1 (i.e., M < m+1), the target video frame is updated every 2 frames, and thus, in each case of updating the target video frame, the same target video frame may be subjected to head-shoulder detection 2 consecutive times.
Referring to fig. 3, a flow chart of a person number detection method provided in an embodiment of the present application is shown, wherein the person number detection method may be applied to the person number detection module in the above embodiment, and as shown in fig. 1, the person number detection method may include the following steps:
step S300, video data of the target area are acquired in real time.
Step S310, detecting the number of people according to N target video frames taking the current detection time as the end time, and obtaining a number of people detection result; wherein N is more than or equal to 2, the target video frames are not displayed, and other video frames except the N target video frames taking the current detection time as the end time are not stored.
For example, the person number detection result may be used by the conference control platform to control a designated device in the target area.
In the embodiment of the application, in order to provide better privacy protection under the condition of ensuring the accuracy of people counting in the target area, in the process of detecting the number of people, the target video frames are not displayed, and other video frames except for N target video frames taking the current detection time as the end time are not stored, so that the privacy protection is realized to the greatest extent. For example, taking the current detection time as the time t1 as an example, the number of people may be detected for N consecutive video frames with the time t1 as the end time.
Wherein, N is more than or equal to 2, and the specific value of N can be determined according to the requirement of the adopted people number detection algorithm so as to ensure the accuracy of people number detection.
It can be seen that, in the method flow shown in fig. 2, the accuracy of the number of people detection is improved by acquiring the video data of the target area in real time and detecting the number of people according to the N target video frames with the current detection time as the end time to obtain the number of people detection result; in addition, in the process of people number detection, the video frames used for people number detection, namely the target video frames, are not displayed, and other video frames except the video frames currently used for people number detection are not stored, so that the risk of personal privacy information disclosure is reduced, privacy protection is effectively realized, and further, on the premise of ensuring that the people number detection has higher accuracy, personal privacy protection is effectively realized.
In some embodiments, after acquiring the video data of the target area in real time, the method further includes:
and storing N target video frames taking the current detection time as the end time into a preset buffer area to cover the N target video frames taking the last detection time as the end time.
For example, in order to better realize privacy protection while ensuring the implementation of the people count detection, a buffer area for storing (buffering) the target video frames may be provided, the target video frames currently used for the people count detection may be buffered in the buffer area, and other acquired video frames may not need to be stored.
Illustratively, the buffer is capable of storing at least N video frames, and stores the target video frame currently used for head-shoulder detection in an overlay storage manner.
Correspondingly, for video data acquired in real time, N target video frames with the current detection time as the end time, that is, the target video frames currently used for detecting the number of people, may be stored in a preset buffer area to cover N target video frames with the previous detection time as the end time.
For example, assuming that the time t2 is the current time and the time t1 is the previous time, the target video frames buffered at the time t1 include video frames 1 to N, and the N target video frames with the time t2 as the end time include video frames 2 to (n+1), at the time t2, the video frames 2 to (n+1) may be stored in the value buffer to cover the video frames 1 to N.
In an example, the storing the N target video frames with the current detection time as the end time in the preset buffer may include:
and deleting the target video frame stored in the preset buffer zone at first under the condition that the number of the target video frames stored in the preset buffer zone reaches N frames for the target video frame acquired at present, and storing the target video frame acquired at present in the preset buffer zone.
For example, in order to improve the storage efficiency of the target video frame, the overlay storage of the target video frame may be implemented in a first-in first-out manner.
For example, for a currently acquired target video frame, in a case where the target video frame needs to be stored in the buffer, it may be determined whether the number of target video frames stored in the current buffer is up to N.
Under the condition that the number of the target video frames stored in the preset buffer zone reaches N, the target video frames stored in the preset buffer zone at first can be deleted, and the target video frames acquired at present are stored in the preset buffer zone.
It should be noted that, in the embodiment of the present application, when the number of the target video frames stored in the preset buffer area does not reach N for the currently acquired target video frames, deletion of the target video frames may not be required, and the currently acquired target video frames may be stored in the preset buffer area.
In addition, in the embodiment of the present application, the number of people may be detected when N target video frames stored in the preset buffer area are reached, so as to ensure the accuracy of the number of people detection.
Moreover, under the condition that the detection of the number of people is stopped, all the target video frames stored in the preset buffer area can be deleted, so that privacy protection is better realized.
In one example, the detecting the number of people according to the N video frames with the current detection time as the end time may include:
and performing head-shoulder detection on the N target video frames cached in the preset cache region by using a deep learning method.
For example, in order to further improve the privacy protection effect, the head-shoulder detection mode may be used for detecting the number of people.
For example, a deep learning method may be adopted to perform head-shoulder detection on N target video frames buffered in the buffer area with the current detection time as the end time, so as to obtain a detection result of the number of people.
In some embodiments, the target video frame may include: and acquiring video frames in real time.
For example, in order to improve the accuracy of the people number detection, the target video frame may include video frames acquired in real time, that is, the people number detection may be performed on N consecutive video frames acquired in real time, to obtain the people number detection result.
In other embodiments, the target video frame may include: and extracting the obtained video frames from the video data acquired in real time according to a preset interval.
For example, considering that a person in a target area (e.g., a conference room) does not generally move at a high speed, the target video frame may further include a video frame extracted from video data acquired in real time at a preset interval.
For example, 1 target video frame may be extracted every 1 video frame.
In some embodiments, the frequency of people count detection is 1/M of the frame rate of the video data; m is more than or equal to 1, and M is a positive integer.
In one example, m=1.
For example, in order to realize real-time people detection, the frequency of people detection may be the frame rate of real-time video data, that is, 1 video frame may be acquired every new time, and 1 people detection may be performed (when the number of acquired video frames reaches N frames), so that the number of people in the target area may be detected in real time, and further, the real-time performance of resource control in the target area is improved.
In another example, M > 1.
For example, in consideration of that the number of people in the target area does not change greatly in a period of several frames in most cases, in order to reduce the workload of the number of people detection, the frequency of carrying out the number of people detection may be smaller than the frame rate of real-time video data, i.e., 1 number of people detection may be carried out without acquiring 1 video frame every new time.
For example, the frequency of the people count detection may be 1/2 of the frame rate of the real-time video data (i.e., m=2), i.e., 1 people count detection is performed every 2 video frames are newly acquired.
It should be noted that, in the embodiment of the present application, in a case where the target video frame includes a video frame acquired in real time, M may be greater than or equal to 1; under the condition that the target video frames comprise video frames extracted from the acquired real-time video data according to preset intervals (the assumed intervals are M, M is more than or equal to 1), M is more than or equal to m+1, and M is an integer multiple of (m+1), so that the number of people in the same target video frames is prevented from being detected for multiple times.
For example, if m=1, i.e., 1 target video frame is extracted every 1 frame, if m=1 (i.e., M < m+1), the target video frame is updated every 2 frames, and thus, the same target video frame is subjected to the population detection 2 consecutive times every time the target video frame is updated.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.
In this embodiment, taking the statistics of the number of people in a conference control (conference control) system as an example, the target area is a conference room.
For example, a video+deep learning mode can be adopted to detect each frame of image in the video real-time stream and the retrograde deep learning head and shoulder, and the real-time head and shoulder number is used as the real-time number of people to upload to the conference control platform.
In order to protect the privacy of the user, the real-time video frames are only sent to the deep learning head-shoulder detection module (namely the head-shoulder detection module), the real-time video frames are not presented or stored any more, and after the head-shoulder algorithm is finished, the old video frames are covered by the new video frames.
Exemplary, specific coverage methods may be as shown in fig. 4 and 5: a buffer is maintained in the deep learning head and shoulder detection module, and N frames of video frames can be stored (the size of N depends on the requirement of an algorithm).
At time t1, the algorithm uses the buffer frames 1 to N in the buffer area to perform head-shoulder detection to obtain a head-shoulder detection result, and counts the number of head-shoulders (namely the number of people).
At time t2 (t2—t1=inter-frame space, for example, 25fps video, then t2—t1=1/25 s=40 ms, i.e. the head-shoulder detection frequency is equal to the frame rate of the video data), the buffered frames in the buffer area are updated to be buffered frames 2 to n+1, the algorithm uses the updated buffered frames to perform head-shoulder detection, and the updated head-shoulder number is counted.
Accordingly, the number of head and shoulders is calculated according to all the buffered frames in the buffer at each detection time, and the new video frame covers the previous video frame at the next detection time. By using the method, the real-time head and shoulder number is calculated, and the video frames are discarded in an overlapping mode, so that privacy protection is realized.
The methods provided herein are described above. The apparatus provided in this application is described below:
referring to fig. 6, a schematic structural diagram of a person number detection device according to an embodiment of the present application, as shown in fig. 6, the person number detection device may include:
an acquiring unit 610, configured to acquire video data of a target area in real time;
the detecting unit 620 is configured to detect the number of people according to the N target video frames with the current detecting time as the ending time, so as to obtain a detection result of the number of people; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored.
In some embodiments, after the acquiring unit 610 acquires the video data of the target area in real time, the method further includes
And storing N target video frames taking the current detection time as the end time into a preset buffer area to cover the N target video frames taking the last detection time as the end time.
In some embodiments, the obtaining unit 610 caches N target video frames with the current detection time as the end time into a preset cache area, including:
and deleting the target video frame stored in the preset buffer zone at first under the condition that the number of the target video frames stored in the preset buffer zone reaches N frames for the target video frame which is currently acquired, and storing the target video frame which is currently acquired into the preset buffer zone.
In some embodiments, the detecting unit 620 is configured to perform people count detection according to N target video frames with the current detection time as the end time, and includes:
and performing head-shoulder detection on the N target video frames cached in the preset cache region by using a deep learning method.
In some embodiments, the target video frame comprises: the method comprises the steps of acquiring video frames in real time or extracting the acquired video frames from video data acquired in real time according to preset intervals.
In some embodiments, the frequency of people count detection is 1/M of the frame rate of the video data; m is more than or equal to 1, and M is a positive integer.
An embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor is configured to execute the machine executable instructions to implement the above-described people detection method.
Fig. 7 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device may include a processor 701, a memory 702 storing machine-executable instructions. The processor 701 and the memory 702 may communicate via a system bus 703. Also, the processor 701 may perform the person number detection method described above by reading and executing the machine executable instructions corresponding to the person number detection logic in the memory 702.
The memory 702 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
In some embodiments, there is also provided a storage medium, such as memory 702 in fig. 7, that is a machine-readable storage medium having stored therein machine-executable instructions that when executed by a processor implement the people detection method described above. For example, the storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.
Claims (12)
1. A conference control system, comprising: the system comprises a video imaging module, a people number detection module and a conference control platform; wherein:
the video imaging module is used for acquiring video data of the target area and transmitting the acquired video data to the people number detection module;
the people number detection module is used for receiving the video data transmitted by the video imaging module, and detecting the number of people according to N target video frames taking the current detection moment as the end moment to obtain a people number detection result; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored;
the people number detection module is further used for uploading the people number detection result to the conference control platform.
2. The conference control system of claim 1, wherein said people detection module comprises: the cache area and the algorithm identification module; wherein:
the buffer area is used for buffering N target video frames taking the current detection moment as the end moment;
the algorithm identification module is used for performing head-shoulder detection on N target video frames which are cached in the cache region and take the current detection time as the end time by using a deep learning method to obtain a person number detection result.
3. The system of claim 2, wherein,
the buffer area is specifically configured to store N target video frames with a current detection time as an end time, and cover N target video frames with a previous detection time as an end time.
4. The system of claim 3, wherein,
the buffer area is specifically configured to delete a first stored target video frame and store a currently acquired target video frame when the number of stored target video frames reaches N frames.
5. The conference control system of any of claims 1-4, wherein the target video frame comprises: the method comprises the steps of acquiring video frames in real time or extracting the acquired video frames from video data acquired in real time according to preset intervals.
6. The conference control system according to any one of claims 1 to 4, wherein the frequency of the person number detection is 1/M of the frame rate of the video data; m is more than or equal to 1, and M is a positive integer.
7. A method for detecting the number of people, the method comprising:
acquiring video data of a target area in real time;
detecting the number of people according to N target video frames taking the current detection time as the end time, and obtaining a detection result of the number of people; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored.
8. The method of claim 7, further comprising, after the acquiring video data of the target area in real time
Storing N target video frames taking the current detection time as the end time into a preset buffer area to cover the N target video frames taking the previous detection time as the end time;
the caching the N target video frames with the current detection time as the end time to a preset cache area includes:
for the currently acquired target video frames, deleting the target video frames stored in the preset buffer area first under the condition that the number of the target video frames stored in the preset buffer area reaches N frames, and storing the currently acquired target video frames in the preset buffer area;
the detecting the number of people according to the N target video frames taking the current detection time as the end time comprises the following steps:
performing head-shoulder detection on N target video frames cached in the preset cache region by using a deep learning method;
and/or the number of the groups of groups,
the target video frame comprises: the method comprises the steps of acquiring video frames in real time or extracting the acquired video frames from video data acquired in real time according to preset intervals;
and/or the number of the groups of groups,
the frequency of people number detection is 1/M of the frame rate of the video data; m is more than or equal to 1, and M is a positive integer.
9. A person number detection device, comprising:
an acquisition unit for acquiring video data of a target area in real time;
the detection unit is used for detecting the number of the people according to N target video frames taking the current detection time as the end time to obtain a number detection result; wherein N is more than or equal to 2; the target video frames are video frames for detecting the number of people, the target video frames are not displayed, and other video frames except N target video frames taking the current detection time as the end time are not stored.
10. The apparatus according to claim 9, wherein the acquiring unit further comprises, after acquiring video data of the target area in real time
Storing N target video frames taking the current detection time as the end time into a preset buffer area to cover the N target video frames taking the previous detection time as the end time;
the obtaining unit caches N target video frames with the current detection time as the end time to a preset cache region, including:
for the currently acquired target video frames, deleting the target video frames stored in the preset buffer area first under the condition that the number of the target video frames stored in the preset buffer area reaches N frames, and storing the currently acquired target video frames in the preset buffer area;
the detecting unit is configured to detect the number of people according to N target video frames with the current detecting time as an end time, and includes:
performing head-shoulder detection on N target video frames cached in the preset cache region by using a deep learning method;
and/or the number of the groups of groups,
the target video frame comprises: the method comprises the steps of acquiring video frames in real time or extracting the acquired video frames from video data acquired in real time according to preset intervals;
and/or the number of the groups of groups,
the frequency of people number detection is 1/M of the frame rate of the video data; m is more than or equal to 1, and M is a positive integer.
11. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor for executing the machine executable instructions to implement the method of claim 7 or 8.
12. A storage medium having stored therein machine executable instructions which when executed by a processor implement the method of claim 7 or 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310384854.3A CN116524431A (en) | 2023-04-03 | 2023-04-03 | Method and device for detecting number of people, electronic equipment, storage medium and conference control system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310384854.3A CN116524431A (en) | 2023-04-03 | 2023-04-03 | Method and device for detecting number of people, electronic equipment, storage medium and conference control system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116524431A true CN116524431A (en) | 2023-08-01 |
Family
ID=87407437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310384854.3A Pending CN116524431A (en) | 2023-04-03 | 2023-04-03 | Method and device for detecting number of people, electronic equipment, storage medium and conference control system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524431A (en) |
-
2023
- 2023-04-03 CN CN202310384854.3A patent/CN116524431A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020094091A1 (en) | Image capturing method, monitoring camera, and monitoring system | |
US20170262706A1 (en) | Smart tracking video recorder | |
JP2013054739A (en) | Methods and apparatus to count people in images | |
WO2020094088A1 (en) | Image capturing method, monitoring camera, and monitoring system | |
US9521377B2 (en) | Motion detection method and device using the same | |
JPWO2018198373A1 (en) | Video surveillance system | |
WO2021068553A1 (en) | Monitoring method, apparatus and device | |
CN110826496B (en) | Crowd density estimation method, device, equipment and storage medium | |
WO2018024165A1 (en) | Method and device for storing warning image | |
CN110633648A (en) | Face recognition method and system in natural walking state | |
WO2013069565A1 (en) | Imaging/recording device | |
WO2015178234A1 (en) | Image search system | |
JP5758165B2 (en) | Article detection device and stationary person detection device | |
CN108540760A (en) | Video monitoring recognition methods, device and system | |
JP2006093955A (en) | Video processing apparatus | |
WO2017121020A1 (en) | Moving image generating method and device | |
US10783365B2 (en) | Image processing device and image processing system | |
CN116524431A (en) | Method and device for detecting number of people, electronic equipment, storage medium and conference control system | |
CN113393629B (en) | Intrusion behavior detection method and device and multi-channel video monitoring system | |
JP2008228119A (en) | Monitoring camera system, animation retrieval apparatus, face image database updating apparatus and operation control method therefor | |
CN103699893A (en) | Face feature information collection device | |
JP6632632B2 (en) | Monitoring system | |
KR101984070B1 (en) | Stereo image based intelligent vibration monitoring method | |
CN112926542A (en) | Performance detection method and device, electronic equipment and storage medium | |
JP2020136855A (en) | Monitoring system, monitor support device, monitoring method, monitor support method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |