CN111209869A

CN111209869A - Target following display method, system, equipment and medium based on video monitoring

Info

Publication number: CN111209869A
Application number: CN202010019820.0A
Authority: CN
Inventors: 蔡可杰; 陆冠宇; 何凯; 陈青松
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-05-29
Anticipated expiration: 2040-01-08
Also published as: CN111209869B

Abstract

The invention provides a target following display method, a system, equipment and a medium based on video monitoring, wherein the method comprises the steps of obtaining a monitoring video and extracting an image frame of the monitoring video; analyzing the image frame to obtain the target state and attribute information of the target, and generating target on-screen display information according to the target state and attribute information; when the algorithm frame rate is detected to be smaller than the video stream frame rate, performing interpolation operation on the on-screen display information of the target by using time weighted filtering, and supplementing the on-screen display information missing from the target; embedding a data frame containing target on-screen display information into a video stream; and extracting the on-screen display data frame, analyzing the on-screen display data frame to obtain the position, size and attribute information of the target tracking frame, and displaying the position, size and attribute information on the WEB terminal. The method utilizes time weighted filtering to carry out interpolation operation on the on-screen display information of the target, and fills up the on-screen display information missing from the target; the refreshing frequency of on-screen display information is improved, and the problems of flickering and unsmooth display of characters and graphs are solved.

Description

Target following display method, system, equipment and medium based on video monitoring

Technical Field

The invention relates to the technical field of image processing, in particular to a target following display method, a target following display system, a target following display device and a target following display medium based on video monitoring.

Background

Video surveillance technology has been widely used in various industries, particularly in some public safety areas. People can view images collected by each front-end camera in the video monitoring system in front of a screen. In many application scenarios, a user may need to track an object and automatically identify and identify the position of the object in a video.

However, the processing return result of the existing video monitoring algorithm is different from the frequency (frame rate) of the video stream, for example, the face recognition algorithm and the video structuring algorithm mostly adopt an input frequency of 10HZ, and the frequency often adopted when the video front-end preview interface is displayed is more than 25HZ, and this frequency difference easily causes the OSD (On-Screen Display) refresh frequency to be lower than the human eye image retention frequency, which causes the problem that some videos do not overlap and draw the trace frame graphics and attribute characters, thereby causing the target OSD information to flicker and be unsmooth.

Content of application

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a method, system, device and medium for displaying object following based on video surveillance, which can solve the problem of flickering and unsmooth OSD information when viewed by a user in the prior art.

To achieve the above and other related objects, a first aspect of the present invention provides a target following display method based on video surveillance, including:

acquiring a monitoring video, and extracting an image frame of the monitoring video;

analyzing the image frame to obtain target state and attribute information of a target, and generating target on-screen display information according to the target state and attribute information; when the algorithm frame rate is detected to be smaller than the video stream frame rate, performing interpolation operation on the on-screen display information of the target by using time weighted filtering, and supplementing the on-screen display information missing the target in the image frame, wherein the on-screen display information comprises the position, the size and the attribute information of a tracking frame of the target;

embedding a data frame containing the target on-screen display information into a video stream;

and receiving the video stream, extracting an on-screen display data frame, analyzing the on-screen display data frame to obtain the position, the size and the attribute information of the tracking frame of the target, and displaying the tracking frame, the size and the attribute information on a WEB end.

In a second aspect of the present invention, there is provided a target following display system based on video surveillance, comprising:

the image frame module is used for acquiring a monitoring video and extracting image frames of the monitoring video;

the OSD information generating module is used for analyzing the image frames to obtain target state and attribute information of a target and generating target on-screen display information according to the target state and the attribute information; when the algorithm frame rate is detected to be smaller than the video stream frame rate, performing interpolation operation on the on-screen display information of the target by using time weighted filtering, and supplementing the on-screen display information missing the target in the image frame, wherein the on-screen display information comprises the position, the size and the attribute information of a tracking frame of the target;

an embedding module for embedding a data frame containing the target on-screen display information into a video stream;

and the display module is used for analyzing the on-screen display data frame to obtain the position, the size and the attribute information of the tracking frame of the target, and displaying the tracking frame, the size and the attribute information on a WEB end.

A third aspect of the present invention provides an electronic device comprising:

one or more processors;

a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to execute the instructions, the one or more processors executing the instructions to cause the electronic device to perform the video surveillance-based target following display method of any of the first aspects.

A fourth aspect of the present invention provides a computer-readable storage medium storing at least one program which, when invoked and executed, implements the video surveillance-based object following display method according to any one of the first aspects.

As described above, the video monitoring-based target following display method, system, device and medium of the present invention have the following advantageous effects:

according to the method, on-screen display information of a target is generated according to the target state and attribute information of the target; when the algorithm frame rate is detected to be smaller than the video stream frame rate, carrying out interpolation operation on the on-screen display information of the target by utilizing time weighted filtering, and supplementing the on-screen display information missing from the target in the image frame; the refreshing frequency of the on-screen display information is improved, and the refreshing frequency is the same as the frame rate of the video stream, so that the problems of flickering and unsmooth display of superimposed characters and graphics are solved.

Drawings

Fig. 1 is a flowchart illustrating a target following display method based on video monitoring according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating target state switching provided in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a video surveillance-based target following display method step S2 according to an embodiment of the present invention;

fig. 4 is a block diagram illustrating a structure of a target following display system based on video monitoring according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.

In the following description, reference is made to the accompanying drawings that describe several embodiments of the invention. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present invention is defined only by the claims of the issued patent. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.

Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first preset threshold may be referred to as a second preset threshold, and similarly, the second preset threshold may be referred to as a first preset threshold, without departing from the scope of the various described embodiments. The first preset threshold and the preset threshold are both described as one threshold, but they are not the same preset threshold unless the context clearly indicates otherwise. Similar situations also include a first volume and a second volume.

Furthermore, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context indicates otherwise, it should be further understood that the terms "comprises" and "comprising" indicate the presence of the stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, items, species, and/or groups. A; b; c; a and B; a and C; b and C; A. b and C "are only exceptions to this definition should be done when combinations of elements, functions, steps or operations are inherently mutually exclusive in some manner.

Referring to fig. 1, a flowchart of a target following display method based on video monitoring according to an embodiment of the present invention includes:

step S1, acquiring a monitoring video, and extracting image frames of the monitoring video;

the monitoring video can be an image shot by a camera in a public area or an image collected by a camera in a specific place; intercepting the monitoring video to obtain an image frame, preprocessing the image frame, for example, cutting the size of the image frame according to the image size required by the algorithm, and converting the format and frame rate of the image frame to make the image frame meet the algorithm requirement.

Step S2, analyzing the image frame to obtain the target state and attribute information of the target, and generating the target on-screen display information according to the target state and attribute information; when the algorithm frame rate is detected to be smaller than the video stream frame rate, performing interpolation operation on the on-screen display information of the target by using time weighted filtering, and supplementing the on-screen display information missing the target in the image frame, wherein the on-screen display information comprises the position, the size and the attribute information of a tracking frame of the target;

the algorithm frame rate is a processing frame rate of the image, the video stream frame rate is a video frame rate, and the target can be a moving target, such as a person, a vehicle, an unmanned aerial vehicle and the like.

Specifically, interpolation operation is carried out according to the previous on-screen display information of the target, and the position, the size and the attribute information of the tracking frame of the target in the current image frame are predicted, so that the on-screen display information of the target missing in the image frame is supplemented.

Step S3, embedding the data frame containing the target on-screen display information into the video stream;

the data frame structure comprises a scene graph, a tracking frame and a target attribute where a target is located, the scene graph, the tracking frame and the target attribute where the target is located are packaged into a private data frame, and the data frame is embedded into a video stream (video code stream) through coding, so that the OSD (on screen display) information frame rate after algorithm processing in the video stream is the same as the video stream frame rate.

Step S4, receiving the video stream, extracting an on-screen display data frame, analyzing the on-screen display data frame to obtain the position, size, and attribute information of the tracking frame of the target, and displaying the tracking frame on a WEB terminal (a terminal compatible with different operating systems, browsers, and resolutions).

In this embodiment, when it is detected that the algorithm frame rate is less than the video stream frame rate, performing interpolation operation on the on-screen display information of the target by using time weighted filtering, and completing the on-screen display information missing from the target in the image frame; the refresh frequency of on-screen display information is improved to be the same as the frame rate of the video stream, so that the problems of flicker and unsmooth display caused by overlapping characters (attribute information) and graphs (tracking frames) are solved.

Referring to fig. 3, a flowchart of a video surveillance-based target following display method step S2 according to an embodiment of the present invention includes:

if a certain target is not tracked in a plurality of continuous frames, the target state of the target is regarded as an LOST state, the historical value of the target is destroyed, interpolation operation is stopped, and meanwhile, on-screen display information of the target is stopped; otherwise, if the target is not in the LOST state and whether the attribute information of the target is updated is judged, if the attribute information of the target is not updated, the position, the size and the attribute information of the target of the current frame are calculated according to the on-screen display information of the previous frames; if the attribute information of the target is updated, the attribute information is superposed with on-screen display information, and the target state and the attribute information of the last frames of the target are stored.

It should be noted that, according to the moving speed of the target in the first three frames, the moving speed of the target in the current image frame is calculated by adopting a time weighted average filtering method, and the position and the size of the target are obtained according to the moving speed of the target in the current image frame; and obtaining the attribute information of the current frame by using the attribute information of the first three frames of the target.

In some examples, the image collected by the camera sensor is cut and subjected to frame rate control through the video preprocessing unit, then the image is input to the algorithm terminal, the intelligent algorithm analyzes the video image frame, and each target in the image is detected by using the detection tracking operator. The analysis operator can analyze the attribute information, the position, the size and other information of each target one by one and the whole process that one target appears to disappear from the visual field of the camera, and the method divides the whole process of each target into 4 states: CREATE, TRACK, CONF, and LOST, as shown in fig. 2, provide a target state switching diagram for an embodiment of the present invention. And after the algorithm end finishes processing the input image, all the target frames and the attribute information in the current image frame are returned to the DSP end. The DSP end determines whether the tracking target needs to be subjected to DSP interpolation operation according to the state of the target output by the algorithm end in the previous frame, for example, when the state of the target in the previous frame is a CONF state, the target is always in a tracking state. And the algorithm end returns effective attribute information. When the state of the target in the previous frame is LOST, it is indicated that the target has been LOST. The frame frequency of the result returned by the algorithm end is lower than the frequency of the video stream of the reported WEB or the result returned by the algorithm end lacks the tracking frame and attribute information of the target, so that interpolation operation needs to be carried out on the target, and the result of the interpolation operation is superposed to the OSD for display. If the result returned by the current frame algorithm end contains the tracking frame and the attribute information of the target, interpolation operation is not needed, the result returned by the algorithm end is directly packaged and then superposed into OSD for display, and the last 3 times of algorithm end return results are stored. If the target is not checked and tracked to a certain target in 3 continuous frames, the algorithm end sets the state of the target to be LOST state and returns the LOST state to the DSP end, the DSP end destroys the historical value of the target, stops interpolation operation and stops OSD display of the target.

The image frames collected by the camera are preprocessed to obtain image frames with image size, format and frame rate meeting the algorithm requirements, and the algorithm end outputs the tracking frames and attribute information of all monitored objects (targets) in the current image frame after processing one frame of data. The algorithm end sends the structural information of the monitoring targets to the DSP end, the DSP end encapsulates information such as images of the targets, scene images where the targets are located, tracking frames of the targets, attributes of the targets and the like into a private data frame, the private data frame is sent to a front-end WEB through an RTP packet, the WEB analyzes the private data frame in the private data frame, and OSD information of each target is drawn on a video preview interface in the modes of graphics, characters and the like. Since the moving object to be monitored has continuity in both time and space, the motion trajectory of the object has continuity in all of the successive frames of the surveillance video. And the moving speed of the target is also continuously changed, according to the characteristics of the moving target, the current speed of the target can be filtered by adopting a time weighted average method, and the formula is as follows:

in the formula (1), V_xi、V_yiRespectively the transverse speed and the longitudinal speed of the target in the ith image frame; q₁、Q₂、Q₃Respectively corresponding weight coefficients of the target in the three continuous frames,

respectively the respective transverse speeds of the target in the previous frame to the previous three frames,

the longitudinal speeds of the target in the previous frame to the previous three frames are respectively. And in order to reduce the time complexity of calculation, the method selects the three frames nearest to the front of the current frame as reference to calculate the motion speed of the target in the current frame.

In another more specific example, a face recognition algorithm is taken as an example to illustrate the scheme in the application. The face recognition algorithm selects a face and a human body target in a video through a rectangular frame tracking frame, and displays attribute information of the face and the human body, wherein the tracking frame and the attribute information need to be displayed along with the target through OSD. As shown in table 1, assume that an object appears in a continuous 9-frame image, where the images with

frame numbers

1, 2, 4, 6, 8, 9 are sent to the algorithm for processing. The images with the frame numbers of 3, 5 and 7 are processed by difference operation at the DSP end, the target state returned by an operator is tracked according to the algorithm, and after the target is created, the target is in a stable tracking state, namely a CONF state through transition of one frame.

TABLE 1 face recognition algorithm target OSD follow-up display interpolation process

When the target is in the CONF state, the position of the target position in the next frame image can be calculated through position prediction according to the coordinate position of the target in the previous frame and the speed of the target in the first three frames. Delta_tIs the time interval, Δ, between the upper and lower frames_x，Δ_yThe displacement of the target on the X component and the Y component respectively is calculated by carrying out weighted average calculation according to the speed values of the target in the first three frames,

in equation (2), a calculation equation for calculating the OSD superimposition position is calculated. For the processing of the OSD content, when the object is continuously in the CONF state, the attribute information of the same object still maintains the content in the previous state, for example, the age and sex of face recognition, whether to wear a hat, whether to wear glasses, and the like. This process continues until the object is LOST, i.e., the object state changes to the LOST state, and the object is deemed LOST, at which point the OSD content for the object no longer needs to be displayed.

In this embodiment, the position of the target in the current image frame is calculated based on the moving speed of the target and the time interval between the previous and subsequent frames. In addition, the size of the target frame keeps the size of the target frame returned last time by the algorithm end; the attribute information of the target keeps the attribute information of the target returned last time by the algorithm end; therefore, information such as the tracking frame, the attribute position, the size and the like of the target in the current image frame is obtained. Experiments prove that the method can improve the OSD display updating frequency, keep the target OSD updating frequency consistent with the video display refreshing frequency, reduce the OSD flickering phenomenon and enable the OSD display to be smoother.

Referring to fig. 4, a video surveillance-based target following display system according to an embodiment of the present invention includes:

the system comprises an image frame module 1, a video processing module and a video processing module, wherein the image frame module is used for acquiring a monitoring video and extracting an image frame of the monitoring video;

the OSD information generating module 2 is used for analyzing the image frames to obtain target state and attribute information of a target and generating target on-screen display information according to the target state and the attribute information; when the algorithm frame rate is detected to be smaller than the video stream frame rate, performing interpolation operation on the on-screen display information of the target by using time weighted filtering, and supplementing the on-screen display information missing the target in the image frame, wherein the on-screen display information comprises the position, the size and the attribute information of a tracking frame of the target;

the OSD information complementing unit 21 is configured to, if a certain target is not tracked in consecutive frames, regard a target state of the target as an LOST state, otherwise, determine whether attribute information of the target is updated when the target is not in the LOST state, and if the attribute information of the target is not updated, calculate a position, a size, and attribute information of the target of a current frame according to on-screen display information of previous frames; if the attribute information of the target is updated, the attribute information is superposed with on-screen display information, and the target state and the attribute information of the last frames of the target are stored.

Specifically, according to the moving speed of the target in the first three frames, the moving speed of the target in the current image frame is calculated by adopting a time weighted average filtering method, and the position and the size of the target are obtained according to the moving speed of the target in the current image frame; and obtaining the attribute information of the current frame by using the attribute information of the first three frames of the target.

An embedding module 3, configured to embed a data frame containing the target on-screen display information into a video stream;

and the display module 4 is used for receiving the video stream, extracting an on-screen display data frame, analyzing the on-screen display data frame to obtain the position, the size and the attribute information of the tracking frame of the target, and displaying the tracking frame, the size and the attribute information on a WEB end.

It should be noted that the above-mentioned target following display system based on video monitoring is only an example and not a limitation of the present invention. In fact, the target following display system based on video monitoring and the target following display method based on video monitoring are in a one-to-one correspondence relationship, and the related technical details and technical effects are the same, and are not detailed one by one here.

Please refer to fig. 5, which is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic device 5 provided in this embodiment mainly includes a memory 51, one or more processors 52, and one or more programs stored in the memory 51, where the memory 51 stores execution instructions, and when the electronic device 5 runs, the processors 52 communicate with the memory 51.

In some embodiments, the processor is also operatively coupled to I/O ports that enable electronic device 50 to interact with various other electronic devices, and input structures that enable a user to interact with electronic device 50. Thus, the input structures may include buttons, keyboards, mice, touch pads, and the like. In addition, the electronic display may include a touch component that facilitates user input by detecting the occurrence and/or location of an object touching its screen (e.g., a surface of the electronic display).

The processor is operatively coupled to memory and/or non-volatile storage. More specifically, the processor may execute instructions stored in the memory and/or the non-volatile storage device to perform operations in the computing device, such as generating image data and/or transmitting image data to an electronic display. As such, the processor may include one or more general purpose microprocessors, one or more application specific processors (ASICs), one or more field programmable logic arrays (FPGAs), or any combination thereof.

The memory may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In certain embodiments, the memory may also include memory that is remote from the one or more processors, such as network-attached memory accessed via RF circuitry or external ports and a communication network (not shown), which may be the internet, one or more intranets, Local Area Networks (LANs), wide area networks (WLANs), Storage Area Networks (SANs), etc., or a suitable combination thereof. The memory controller may control access to the memory by other components of the device, such as the CPU and peripheral interfaces.

With this understanding in mind, aspects of the present invention or portions thereof that may contribute to the prior art may be embodied in the form of a software product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, cause the one or more machines to perform operations in accordance with embodiments of the present invention. Such as the steps in the video surveillance object following display method, etc. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The storage medium may be located in a local server or a third-party server, such as a third-party cloud service platform. The specific cloud service platform is not limited herein, such as the Ali cloud, Tencent cloud, etc. The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: a personal computer, dedicated server computer, mainframe computer, etc. configured as a node in a distributed system.

Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-writable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In summary, the present invention generates the on-screen display information of the target according to the target state and the attribute information of the target; when the algorithm frame rate is detected to be smaller than the video stream frame rate, carrying out interpolation operation on the on-screen display information of the target by utilizing time weighted filtering, and supplementing the on-screen display information missing from the target in the image frame; the refreshing frequency of the on-screen display information is improved, and the refreshing frequency is the same as the frame rate of the video stream, so that the problems of flickering and unsmooth display of superimposed characters and graphics are solved. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A target following display method based on video monitoring is characterized by comprising the following steps:

2. The video surveillance-based target following display method according to claim 1, wherein the position, size and attribute information of the tracking frame of the target in the current image frame are predicted by performing interpolation operation according to the previous on-screen display information of the target.

3. The video surveillance-based target following display method according to claim 1 or 2, wherein the target states of the target include CREATE, TRACK, CONF, and LOST.

4. The video surveillance-based object following display method according to claim 3, wherein the step of complementing the on-screen display information that the object is missing in the image frame comprises:

if a certain target is not tracked in several continuous frames, the target state of the target is regarded as the LOST state, otherwise, when the target is not in the LOST state, whether the attribute information of the target is updated or not is judged, and if the attribute information of the target is not updated, the position, the size and the attribute information of the target in the current frame are calculated according to the on-screen display information of the previous frames; if the attribute information of the target is updated, the attribute information is superposed with on-screen display information, and the target state and the attribute information of the last frames of the target are stored.

5. The video surveillance-based target following display method according to claim 4, wherein the step of calculating the position, size and attribute information of the target in the current frame according to the on-screen display information of the previous frames comprises:

calculating the moving speed of the target in the current image frame by adopting a time weighted average filtering method according to the moving speed of the target in the first three frames, and obtaining the position and the size of the target according to the moving speed of the target in the current image frame; and obtaining the attribute information of the current frame by using the attribute information of the first three frames of the target.

6. The video surveillance-based object following display method according to claim 1, wherein the step of embedding a data frame containing on-screen display information of the object into a video stream comprises:

and encapsulating the scene graph where the target is located, the tracking frame and the target attribute into a data frame, and embedding the data frame into a video stream through encoding.

7. A video surveillance based object following display system, the system comprising:

8. The video surveillance-based target following display system according to claim 1, wherein the OSD information generating module comprises:

the OSD information supplementing unit is used for regarding the target state of a target as the LOST state if a certain target is not tracked in a plurality of continuous frames, otherwise, judging whether the attribute information of the target is updated or not when the target is not in the LOST state, and if the attribute information of the target is not updated, calculating the position, the size and the attribute information of the target of the current frame according to the on-screen display information of the previous frames; if the attribute information of the target is updated, the attribute information is superposed with on-screen display information, and the target state and the attribute information of the last frames of the target are stored.

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory;

and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to execute instructions, the one or more processors executing the instructions to cause the electronic device to perform the video surveillance-based object following display method of any of claims 1-6.

10. A computer-readable storage medium characterized by storing at least one program which, when invoked and executed, implements the video surveillance-based object following display method of any one of claims 1 to 6.