CN110443116B

CN110443116B - Video pedestrian detection method, device, server and storage medium

Info

Publication number: CN110443116B
Application number: CN201910533186.XA
Authority: CN
Inventors: 叶明�
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-06-19
Filing date: 2019-06-19
Publication date: 2023-06-20
Anticipated expiration: 2039-06-19
Also published as: CN110443116A; WO2020252924A1

Abstract

A video pedestrian detection method comprising: receiving a video stream image acquired by a camera; carrying out human head and human body detection on each frame of image in the received video stream image to obtain a human head detection frame and/or a human body detection frame; after the human head detection frames are paired with the human body detection frames, screening out human head detection frames which cannot be paired; pedestrian tracking is carried out on the current frame image according to the human head detection frame in the previous frame image; and supplementing the human body detection frames corresponding to the screened unmatched human head detection frames in the current frame image according to the tracking result. The invention also provides a video pedestrian detection device, a server and a storage medium. The invention can greatly improve the detection rate of pedestrians.

Description

Video pedestrian detection method, device, server and storage medium

Technical Field

The invention relates to the technical field of image recognition, in particular to a video pedestrian detection method, a video pedestrian detection device, a server and a storage medium.

Background

There are many advanced pedestrian statistics systems at home and abroad, but the system is only aimed at the situation that pedestrians are rare, and cannot be used in the areas with dense personnel such as airports and stations. Detection and tracking of pedestrians in high density people is a difficult problem. More common methods such as detecting a moving area through a background model and performing foreground segmentation, or detecting a human body through a human body model and features cannot solve the problem of pedestrian detection of high-density people, because: in high-density crowd videos, most or all of the observation area is in a motion state, and serious shielding exists among pedestrians. In addition, a category of researches exist, the problem of segmentation detection of pedestrians in high-density crowds is avoided, the density of the crowds is estimated through some global statistical characteristics, but the category of researches cannot give accurate pedestrian flow and direction. Among the many studies, there are few studies directed to detection and tracking of pedestrians in a high density population.

The common pedestrian detection algorithm is used for using an NMS (non-maximum suppression) algorithm, the omission ratio of the pedestrian detection algorithm for high-density crowds is higher, although optimization algorithms such as soft-NMS and the like appear, the improvement is very limited, and the accuracy of pedestrian tracking is directly influenced due to low detection ratio.

Disclosure of Invention

In view of the above, it is necessary to provide a video pedestrian detection method, a device, a server and a storage medium, which can improve the detection rate of pedestrians by utilizing the characteristic that the head of a person is not easy to be blocked in a security scene.

A first aspect of the present invention provides a video pedestrian detection method, the method comprising:

receiving a video stream image acquired by a camera;

carrying out human head and human body detection on each frame of image in the received video stream image to obtain a human head detection frame and/or a human body detection frame;

after the human head detection frames are paired with the human body detection frames, screening out human head detection frames which cannot be paired;

pedestrian tracking is carried out on the current frame image according to the human head detection frame in the previous frame image; a kind of electronic device with high-pressure air-conditioning system

And supplementing the human body detection frames corresponding to the screened unpaired human head detection frames in the current frame image according to the tracking result.

Preferably, the pairing of the human head detection frame and the human body detection frame includes:

Judging whether the human head detection frame exists in a human body detection frame or not;

when the human head detection frame exists in the human body detection frame, confirming the position relationship between the human head detection frame and the human body detection frame;

when the human head detection frame is detected to be positioned at the position close to the top in the middle of the human body detection frame, the human head detection frame and the human body detection frame are confirmed to be successfully paired;

when the position, close to the top, of the human head detection frame is detected to be not positioned in the middle of the human body detection frame, the human head detection frame and the human body detection frame are confirmed to be failed to be paired.

Preferably, the step of tracking the pedestrian for the current frame image according to the human head detection frame in the previous frame image includes:

pedestrian tracking is carried out on the current frame image through the relation between the position of the human head detection frame in the previous frame image and the position of the human head detection frame in the current frame image:

if the offset between the position of the human head detection frame in the current frame image and the position of the human head detection frame in the previous frame image is smaller than or equal to a preset value, confirming that the human head detection frame in the current frame image and the human head detection frame in the previous frame image are the human head detection frame of the same pedestrian;

And if the offset between the position of the human head detection frame in the current frame image and the position of the human head detection frame in the previous frame image is larger than the preset value, confirming that the human head detection frame in the current frame image and the human head detection frame in the previous frame image are not the same human head detection frame of the pedestrian.

Preferably, the method further comprises:

pedestrian tracking is carried out on the current frame image according to the human body detection frame in the previous frame image;

if the offset between the position of the human body detection frame in the current frame image and the position of the human body detection frame in the previous frame image is smaller than or equal to the preset value, confirming that the human body detection frame in the current frame image and the human body detection frame in the previous frame image are the human body detection frame of the same pedestrian;

if the offset between the position of the human body detection frame in the current frame image and the position of the human body detection frame in the previous frame image is larger than the preset value, confirming that the human body detection frame in the current frame image and the human body detection frame in the previous frame image are not the same human body detection frame of the pedestrian.

Preferably, when the human head detection frame and the human body detection frame are detected simultaneously in the previous frame image, the human body detection frame is preferentially adopted for pedestrian tracking.

Preferably, after receiving the video stream image acquired by the camera, the method further comprises:

decoding the video stream image to obtain each frame image in the video stream image.

Preferably, the step of obtaining the head detection frame and/or the human body detection frame includes:

and detecting each frame of image in the video stream image through a deep nerve detection network to obtain a human head detection frame and/or a human body detection frame.

A second aspect of the present invention provides a video pedestrian detection apparatus, the apparatus comprising:

the receiving module is used for receiving the video stream image acquired by the camera;

the detection module is used for detecting the human head and the human body of each frame of image in the received video stream image to obtain a human head detection frame and/or a human body detection frame;

the screening module is used for screening out the head detection frames which cannot be paired after the head detection frames are paired with the human body detection frames;

The tracking module is used for tracking pedestrians in the current frame image according to the head detection frame in the previous frame image; a kind of electronic device with high-pressure air-conditioning system

And the processing module is used for complementing the human body detection frames corresponding to the screened unpaired human head detection frames in the current frame image according to the tracking result.

A third aspect of the present invention provides a server comprising a processor and a memory, the processor being operable to implement the video pedestrian detection method when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video pedestrian detection method.

The invention discloses a video pedestrian detection method, a device, a system and a storage medium. The invention provides an algorithm based on double-target detection and tracking, which increases human head detection on the basis of original pedestrian detection. Considering that security scene cameras are mostly inclined at 45 degrees and overlook the viewing angle, the shielding rate of the human head is far lower than that of a human body. The algorithm can greatly improve the detection rate of pedestrians in areas with serious crowd shielding.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a video pedestrian detection method according to an embodiment of the present invention.

Fig. 2 is a functional block diagram of a video pedestrian detection device according to a second embodiment of the invention.

Fig. 3 is a schematic diagram of a server according to a third embodiment of the present invention.

The invention will be further described in the following detailed description in conjunction with the above-described figures.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The terms first, second, third and the like in the description and in the claims of the invention and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The video pedestrian detection method of the embodiment of the invention is applied to a hardware environment formed by at least one server and a mobile terminal connected with the server through a network. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network. The video pedestrian detection method of the embodiment of the invention can be executed by a server or a mobile terminal; or may be performed jointly by the server and the mobile terminal.

The server for the video pedestrian detection method can be directly integrated with the video pedestrian detection function provided by the method or be provided with a client for realizing the method. For another example, the method provided by the invention can also be operated on a server and other devices in the form of a software development kit (Software Development Kit, SDK), an interface of the video pedestrian detection function is provided in the form of the SDK, and the server or other devices can realize the video pedestrian detection function through the provided interface.

Example 1

Fig. 1 is a flowchart of a video pedestrian detection method according to an embodiment of the present invention. The order of execution in the flow chart may be changed, and certain steps may be omitted, according to different needs.

Step S1, receiving video stream images acquired by a camera.

In this embodiment, the camera is used to capture pedestrian information in a preset scene. The preset scene is a high-density pedestrian scene, such as an airport, a bus stop, a train station, a mall, and the like. The camera may be a fisheye camera. Since the installation position of the camera is at the top position of the monitoring area in the preset scene, pedestrians in the monitoring area are generally shot at a 45-degree overlooking view angle.

For example, the camera is installed at a corner of the top of a bus stop waiting room, and the camera generally shoots pedestrians in the waiting room at an oblique 45-degree overlook view angle, and when there are many pedestrians, the camera hardly shoots the whole body of the pedestrians, so that it is inconvenient to accurately detect and track the pedestrians. However, in this scenario, the pedestrian's head is hardly blocked, that is to say the blocking rate of the pedestrian's head is much lower than the blocking rate of the body.

In this embodiment, the camera is connected to the server through a wired or wireless network communication. And the camera sends the acquired video image to be identified to the server through a wired or wireless network. The wired network may be of any type of conventional wired communication, such as the internet, a local area network. The wireless network may be of any type of conventional wireless communication, such as radio, wireless fidelity (Wireless Fidelity, WIFI), cellular, satellite, broadcast, etc. Wireless communication technologies may include, but are not limited to, global system for mobile communications (Global System for Mobile Communications, GSM), general packet radio service (General Packet Radio Service, GPRS), code Division multiple access (Code Division Multiple Access, CDMA), wideband code Division multiple access (W-CDMA), CDMA2000, IMT single carrier (IMT Single Carrier), enhanced data rates for GSM Evolution (Enhanced Data Rates for GSM Evolution, EDGE), long Term Evolution (Long Term Evolution, LTE), advanced Long Term Evolution (LTE), time Division Long Term Evolution (Time Division LTE, TD-LTE), fifth generation mobile communication technology (5G), high performance radio local area network (High Performance Radio Local Area Network, hiperLAN), high performance radio wide area network (High Performance Radio Wide Area Network, hiperWAN), local multipoint distribution service (Local Multipoint Distribution Service, LMDS), full microwave access global interworking (Worldwide Interoperability for Microwave Access, wiMAX), zigBee), bluetooth, orthogonal frequency Division multiplexing (Flash Orthogonal Frequency-Division Multiplexing, flash-OFDM), high capacity space Division multiple access (High Capacity Spatial Division Multiple Access, HC-SDMA), universal mobile telecommunication system (Universal Mobile Telecommunications System, UMTS), mobile telecommunication system Time Division duplexing (Time Division duplexing) (Time Division 23, time Division duplexing (3723), high performance radio wide area network (High Performance Radio Wide Area Network, hiperWAN), local multipoint distribution service (Local Multipoint Distribution Service, LMDS), full microwave access (UMTS), bluetooth, full microwave access (UMTS), high capacity space Division multiple access (UMTS), bluetooth (3737, TD-SCDMA), evolution-Data Optimized (EV-DO), digital enhanced cordless telecommunications (Digital Enhanced Cordless Telecommunications, DECT), and others.

Preferably, after receiving the video stream image collected by the camera, the video pedestrian detection method further comprises:

and decoding the video stream image to acquire each frame of image in the video stream image.

And S2, detecting the human head and the human body of each frame of image in the received video stream image to obtain a human head detection frame and/or a human body detection frame.

In this embodiment, since the shielding rate of the head is far lower than the shielding rate of the body, only the head detection frame of the pedestrian in the image may be obtained and the body detection frame of the pedestrian may not be obtained when the head and the body are detected for each frame of image. When a human body is shielded, the human head is generally not shielded. In general, 90% of the human head of each frame image can be detected, and a human head detection frame is obtained, but only 50% of the human body of each frame image can be detected. According to the scheme, pedestrians which do not detect the human body detection frame are complemented according to the detected human head detection frame, so that detection and tracking of pedestrians are realized.

In this embodiment, the human head and the human body are detected for each frame of the video stream image by the deep neural detection network. For example, the method adopts Faster RCNN to detect the head and the human body of each frame of image, and specifically comprises the following steps:

(1) Inputting the whole picture corresponding to each frame of image into a convolutional neural network CNN, and extracting the characteristics;

(2) Generating preset suggestion windows by using a suggestion window network RPN, for example, generating 300 suggestion windows for each picture;

(3) Mapping the preset proposal window onto a final layer convolution feature map (feature map) of the convolution neural network;

(4) Dividing the mapped region into preset regions with the same size, and processing each region to obtain a corresponding preset feature map with a fixed size;

generating a fixed-size feature map for each RoI, e.g., by the RoI mapping layer;

(5) And carrying out joint training on the preset feature images through classification probability and frame regression to obtain a human head detection frame and/or a human body detection frame.

For example, the preset feature map is trained by using Softmax Loss (detection classification probability) and smoth L1 Loss (detection frame regression) in combination with classification probability and frame regression (Bounding box regression) to obtain a human head detection frame and a human body detection frame.

And S3, after the human head detection frames are paired with the human body detection frames, screening out human head detection frames which cannot be paired.

In this embodiment, the human head detection frame and the human body detection frame are paired according to the relative positional relationship between the body of a normal person and the human head.

The pairing of the human head detection frame and the human body detection frame comprises:

when detecting that the human head detection frame is positioned below the human body detection frame and close to the bottom, the human head detection frame is not the head of the pedestrian corresponding to the human body detection frame.

In one embodiment, the human head detection frame and the human body detection frame may be paired according to whether the human head detection frame exists in the human body detection frame. And when the human head detection frame exists in the human body detection frame, the human head detection frame and the human body detection frame are successfully paired, and one pedestrian in the video image is confirmed.

In another embodiment, the human head detection frame and the human body detection frame may be paired according to a positional relationship of the human head detection frame and the human body detection frame. In the video image, it is of course also possible that a plurality of human head detection frames exist in one human body detection frame. In this case, the human head detection frame and the human body detection frame are paired according to the position of the human head detection frame in the human body detection frame. For example, the human head detecting frame is normally located at a position near the top in the middle of the human body detecting frame. Therefore, when the human head detection frame is detected to be positioned at the position close to the top in the middle of the human body detection frame, the human head detection frame and the human body detection frame are confirmed to be successfully paired; when detecting that the human head detection frame is positioned below the human body detection frame and close to the bottom, the human head detection frame can be confirmed to be not the head of the pedestrian corresponding to the human body detection frame, and the human head detection frame can be the head of other pedestrians.

It can be understood that a part of human head detection frames do not correspond to human body detection frames after pairing. That is, the case where the human body is blocked occurs in the video image.

And S4, tracking pedestrians in the current frame image according to the head detection frame in the previous frame image.

In the present embodiment, since the body of the pedestrian is easily shielded in the high-density pedestrian scene, the camera is more likely to capture the head of the pedestrian. Therefore, we track the human head detection frame in the video image first, and then complement the corresponding human body detection frame according to the tracked human head detection frame, so as to track the whole pedestrian.

In the present embodiment, pedestrian tracking can be performed on the current frame image by a relationship between the position of the human head detection frame in the previous frame image and the position of the human head detection frame in the current frame image.

Specifically, if the offset between the position of the head detection frame in the current frame image and the position of the head detection frame in the previous frame image is less than or equal to a preset value (for example, two pixels), it may be determined that the head detection frame in the current frame image and the head detection frame in the previous frame image are the same head detection frame of the pedestrian.

If the offset between the position of the head detection frame in the current frame image and the position of the head detection frame in the previous frame image is greater than a preset value (for example, two pixel points), it may be confirmed that the head detection frame in the current frame image and the head detection frame in the previous frame image are not the same head detection frame of the pedestrian.

It will be appreciated that pedestrian tracking may also be performed based on the human detection frame in the previous frame of image. If the offset between the position of the human body detection frame in the current frame image and the position of the human body detection frame in the previous frame image is smaller than or equal to a preset value (for example, two pixel points), it can be confirmed that the human body detection frame in the current frame image and the human body detection frame in the previous frame image are the same human body detection frame of the pedestrian. If the offset between the position of the human body detection frame in the current frame image and the position of the human body detection frame in the previous frame image is greater than a preset value (for example, two pixel points), it may be confirmed that the human body detection frame in the current frame image and the human body detection frame in the previous frame image are not human body detection frames of the same pedestrian.

Preferably, when the human head detection frame and the human body detection frame are detected simultaneously in the previous frame image, the human body detection frame is preferentially adopted for pedestrian tracking. Because the human body detection frame can more represent the characteristics of pedestrians, the human body detection frame can be preferentially adopted for pedestrian tracking when the human head detection frame and the human body detection frame are detected simultaneously.

And S5, supplementing the human body detection frames corresponding to the screened unpaired human head detection frames in the current frame image according to the tracking result.

And supplementing the human body detection frames corresponding to the screened unpaired human head detection frames in the current frame image, so that the shielded human body detection frames in the video image are supplemented, and tracking of all pedestrians in the video image is realized.

For example, there are a head detection frame a, a head detection frame B, and a head detection frame C in the current frame image, where the head detection frame a, the head detection frame B, and the head detection frame C are all present in the body detection frame D, and when the head detection frame a is located at a position near the top in the middle of the body detection frame D, it is confirmed that the head detection frame a is successfully paired with the body detection frame D, and the head detection frame B and the head detection frame C are respectively located at the left lower corner and the right lower corner of the body detection frame D, it is confirmed that the head detection frame B and the head detection frame C cannot be paired with the body detection frame D. Therefore, the screened unmatched human head detection frame B and human head detection frame C need to be complemented in the current frame image, and the corresponding complemented human body detection frames are a human body detection frame E and a human body detection frame F respectively.

In summary, the method for detecting the video pedestrians provided by the invention comprises the steps of receiving video stream images acquired by a camera; carrying out human head and human body detection on each frame of image in the received video stream image to obtain a human head detection frame and/or a human body detection frame; after the human head detection frames are paired with the human body detection frames, screening out human head detection frames which cannot be paired; pedestrian tracking is carried out on the current frame image according to the human head detection frame in the previous frame image; and supplementing the human body detection frames corresponding to the screened unmatched human head detection frames in the current frame image according to the tracking result.

Because the common pedestrian detection algorithm uses an NMS (non-maximum suppression) algorithm, the omission ratio of the common pedestrian detection algorithm for high-density crowds is high, and the accuracy of pedestrian tracking is directly affected due to low detection rate. The algorithm for detecting and tracking pedestrians based on the human head detection frame and the human body detection frame adopted in the invention increases human head detection on the basis of original pedestrian detection. The security scene cameras can be mostly detected from people in video images shot at 45-degree oblique overlook viewing angles. The algorithm can greatly improve the detection rate of pedestrians in areas with serious crowd shielding.

Example two

FIG. 2 is a functional block diagram of a video pedestrian detection device according to a preferred embodiment of the invention.

In some embodiments, the video pedestrian detection device 20 (hereinafter referred to as "detection device 20") operates in a server. The detection means 20 may comprise a plurality of functional modules consisting of program code segments. Program code for each program segment in the detection device 20 may be stored in a memory and executed by at least one processor to perform (see fig. 1 and its associated description for details) the video pedestrian detection function.

In this embodiment, the detecting device 20 may be divided into a plurality of functional modules according to the functions performed by the detecting device. The functional module may include: the device comprises a receiving module 201, a detecting module 202, a screening module 203, a tracking module 204 and a processing module 205. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In some embodiments, the function of each module will be described in detail in the following embodiments.

The receiving module 201 is configured to receive a video stream image acquired by a camera.

Preferably, after receiving the video stream image acquired by the camera, the detecting device 20 may further:

The detection module 202 is configured to detect a human head and a human body for each frame of image in the received video stream image, so as to obtain a human head detection frame and/or a human body detection frame.

The screening module 203 is configured to pair the human head detection frame and the human body detection frame, and then screen out a human head detection frame that cannot be paired.

The tracking module 204 is configured to track a pedestrian in the current frame image according to the head detection frame in the previous frame image.

The processing module 205 is configured to complement the human body detection frame corresponding to the screened unpaired human head detection frame in the current frame image according to the tracking result.

In this embodiment, the human body detection frames corresponding to the screened unpaired human head detection frames are complemented in the current frame image, so that the human body detection frames blocked in the video image are complemented, and tracking of all pedestrians in the video image is realized.

In summary, the video pedestrian detection device 20 provided by the present invention includes a receiving module 201, a detecting module 202, a screening module 203, a tracking module 204 and a processing module 205. The receiving module 201 is configured to receive a video stream image acquired by a camera; the detection module 202 is configured to detect a human head and a human body for each frame of image in the received video stream image, so as to obtain a human head detection frame and/or a human body detection frame; the screening module 203 is configured to pair the human head detection frame and the human body detection frame, and then screen out a human head detection frame that cannot be paired; the tracking module 204 is configured to track pedestrians in the current frame image according to a head detection frame in the previous frame image; and the processing module 205 is configured to complement the human body detection frame corresponding to the screened unpaired human head detection frame in the current frame image according to the tracking result.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a dual-screen device, or a network device, etc.) or processor (processor) to perform portions of the methods described in the various embodiments of the invention.

Example III

The server 3 includes: a database 31, a memory 32, at least one processor 33, a computer program 34 stored in the memory 32 and executable on the at least one processor 33, and at least one communication bus 35.

The at least one processor 33, when executing the computer program 34, implements the steps of the video pedestrian detection method embodiments described above.

Illustratively, the computer program 34 may be partitioned into one or more modules/units that are stored in the memory 32 and executed by the at least one processor 33 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used to describe the execution of the computer program 34 in the server 3.

The server 3 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (application program lication Specific Integrated Circuit, ASIC), a programmable gate array (Field-Programmable Gate Array, FPGA), a digital processor (Digital Signal Processor, DSP), an embedded device, and the like. It will be appreciated by those skilled in the art that the schematic diagram 3 is merely an example of the server 3 and does not constitute a limitation of the server 3, and may include more or less components than illustrated, or may combine certain components, or different components, e.g. the server 3 may further include input and output devices, network access devices, buses, etc.

The Database (Database) 31 is a repository built on the server 3 that organizes, stores and manages data according to a data structure. Databases are generally classified into three types, hierarchical databases, network databases, and relational databases. In this embodiment, the database 31 is used to store the video stream image information.

The at least one processor 33 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The processor 33 may be a microprocessor or the processor 33 may be any conventional processor or the like, the processor 33 being a control center of the server 3, the various interfaces and lines being used to connect the various parts of the entire server 3.

The memory 32 may be used to store the computer program 34 and/or modules/units, and the processor 33 may implement the various functions of the server 3 by running or executing the computer program and/or modules/units stored in the memory 32, and invoking data stored in the memory 32. The memory 32 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the server 3, and the like. In addition, the memory 32 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The memory 32 has stored therein program code, and the at least one processor 33 can invoke the program code stored in the memory 32 to perform related functions. For example, the modules (the receiving module 201, the detecting module 202, the screening module 203, the tracking module 204 and the processing module 205) described in fig. 2 are program codes stored in the memory 32 and executed by the at least one processor 33, so as to implement the functions of the modules for the purpose of video pedestrian detection.

The receiving module 201 is configured to receive a video stream image collected by a camera;

the detection module 202 is configured to detect a human head and a human body for each frame of image in the received video stream image, so as to obtain a human head detection frame and/or a human body detection frame;

the screening module 203 is configured to pair the human head detection frame and the human body detection frame, and then screen out a human head detection frame that cannot be paired;

the tracking module 204 is configured to track pedestrians in the current frame image according to the head detection frame in the previous frame image; a kind of electronic device with high-pressure air-conditioning system

Preferably, the pedestrian tracking is performed on the current frame image by a relation between a position of the human head detection frame in the previous frame image and a position of the human head detection frame in the current frame image:

Preferably, the processor 33 is further configured to:

Preferably, after receiving the video stream image acquired by the camera, the processor 33 is further configured to:

Preferably, the human head detection frame and/or the human body detection frame is obtained by detecting each frame of image in the video stream image through a deep neural detection network.

The modules/units integrated by the server 3 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

Although not shown, the server 3 may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 33 through a power management system, so as to perform functions of managing charging, discharging, and power consumption management through the power management system. The power supply may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like. The server 3 may further include a bluetooth module, a Wi-Fi module, etc., which will not be described herein.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

In the several embodiments provided in the present invention, it should be understood that the disclosed electronic device and method may be implemented in other manners. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be other manners of division when actually implemented.

In addition, each functional unit in the embodiments of the present invention may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method of video pedestrian detection, the method comprising:

receiving a video stream image acquired by a camera;

after the human head detection frame and the human body detection frame are paired, screening out a human head detection frame which cannot be paired, wherein the pairing of the human head detection frame and the human body detection frame comprises the following steps: judging whether the human head detection frame exists in a human body detection frame or not; when the human head detection frame exists in the human body detection frame, confirming the position relationship between the human head detection frame and the human body detection frame; when the human head detection frame is detected to be positioned at the position close to the top in the middle of the human body detection frame, the human head detection frame and the human body detection frame are confirmed to be successfully paired; when the position, close to the top, of the human head detection frame is detected to be not positioned in the middle of the human body detection frame, confirming that the pairing of the human head detection frame and the human body detection frame fails;

pedestrian tracking is carried out on the current frame image according to the human head detection frame in the previous frame image, wherein the pedestrian tracking is carried out on the current frame image through the relation between the position of the human head detection frame in the previous frame image and the position of the human head detection frame in the current frame image: if the offset between the position of the human head detection frame in the current frame image and the position of the human head detection frame in the previous frame image is smaller than or equal to a preset value, confirming that the human head detection frame in the current frame image and the human head detection frame in the previous frame image are the human head detection frame of the same pedestrian; if the offset between the position of the human head detection frame in the current frame image and the position of the human head detection frame in the previous frame image is larger than the preset value, confirming that the human head detection frame in the current frame image and the human head detection frame in the previous frame image are not the same human head detection frame of the pedestrian; a kind of electronic device with high-pressure air-conditioning system

2. The video pedestrian detection method of claim 1 further comprising:

3. The video pedestrian detection method of claim 2 wherein the human body detection frame is preferentially employed for pedestrian tracking when both the human head detection frame and the human body detection frame are detected in a previous frame image.

4. The video pedestrian detection method of claim 1 wherein upon receiving a video stream image captured by a camera, the method further comprises:

5. The method for detecting a pedestrian in a video according to claim 1, wherein the step of performing head and body detection on each frame of image in the received video stream image to obtain a head detection frame and/or a body detection frame comprises:

6. A video pedestrian detection device, the device comprising:

the screening module is used for screening out the head detection frame which cannot be paired after the head detection frame and the human body detection frame are paired, wherein the head detection frame and the human body detection frame are paired and comprise: judging whether the human head detection frame exists in a human body detection frame or not; when the human head detection frame exists in the human body detection frame, confirming the position relationship between the human head detection frame and the human body detection frame; when the human head detection frame is detected to be positioned at the position close to the top in the middle of the human body detection frame, the human head detection frame and the human body detection frame are confirmed to be successfully paired; when the position, close to the top, of the human head detection frame is detected to be not positioned in the middle of the human body detection frame, confirming that the pairing of the human head detection frame and the human body detection frame fails;

The system comprises a tracking module, a pedestrian tracking module and a pedestrian detection module, wherein the tracking module is used for tracking pedestrians of a current frame image according to a human head detection frame in a previous frame image, and the pedestrians of the current frame image are tracked through the relation between the position of the human head detection frame in the previous frame image and the position of the human head detection frame in the current frame image: if the offset between the position of the human head detection frame in the current frame image and the position of the human head detection frame in the previous frame image is smaller than or equal to a preset value, confirming that the human head detection frame in the current frame image and the human head detection frame in the previous frame image are the human head detection frame of the same pedestrian; if the offset between the position of the human head detection frame in the current frame image and the position of the human head detection frame in the previous frame image is larger than the preset value, confirming that the human head detection frame in the current frame image and the human head detection frame in the previous frame image are not the same human head detection frame of the pedestrian; a kind of electronic device with high-pressure air-conditioning system

7. A server comprising a processor and a memory, wherein the processor is configured to implement the video pedestrian detection method according to any one of claims 1 to 5 when executing a computer program stored in the memory.

8. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the video pedestrian detection method of any one of claims 1 to 5.