CN111881320A

CN111881320A - Video query method, device, equipment and readable storage medium

Info

Publication number: CN111881320A
Application number: CN202010757187.5A
Authority: CN
Inventors: 夏钦展; 吕廷昌
Original assignee: Goertek Techology Co Ltd
Current assignee: Goertek Techology Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-11-03

Abstract

The invention discloses a video query method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: receiving and analyzing a video viewing request, and determining a target event to be viewed and a target object set corresponding to the target event; determining the starting and ending time of the detected target object set by using the identification record file; the identification record file comprises an object identification record acquired in real time in the video monitoring process; and calling a target video corresponding to the start-stop time. In the method, the time point of the event occurrence is determined by combining the object identification condition, so that when some event needs to be checked, the event occurrence time is quickly positioned based on the object occurrence condition, and finally, the video corresponding to the event is quickly called.

Description

Video query method, device, equipment and readable storage medium

Technical Field

The present invention relates to the field of monitoring technologies, and in particular, to a video query method, apparatus, device, and readable storage medium.

Background

The existing security cameras in public areas basically record 7X24 hours continuously, so that historical video records can be consulted at any time when needed.

For such a 7X 24-hour continuous recording surveillance video, it is easy to query a recording at a certain time point, but it is inconvenient to query a recording at an event at a specific time point, and specifically, a long recording needs to be viewed once to find out the required content. For example, to find a person with a pet dog in a video of a certain day, a person needs to send the video of about one day to check, so that the person with the pet dog in the video can be found, which is time-consuming and labor-consuming. In the prior art, a video analysis system can locate the time corresponding to an event, but the time is long because the video needs to be imported into the system, and the analysis result is influenced by the video quality.

In summary, how to effectively solve the problems of video viewing time positioning and the like is a technical problem which needs to be solved urgently by those skilled in the art at present.

Disclosure of Invention

The invention aims to provide a video query method, a video query device, video query equipment and a readable storage medium, which can realize the quick positioning of a target event by identifying a record file when the event needs to be checked, save time, and reduce the waste of manpower and physical resources.

In order to solve the technical problems, the invention provides the following technical scheme:

a video viewing method, comprising:

receiving and analyzing a video viewing request, and determining a target event to be viewed and a target object set corresponding to the target event;

determining the starting and ending time of the target object set by using the identification record file; the identification record file comprises an object identification record acquired in real time in the video monitoring process;

and calling a target video corresponding to the start-stop time.

Preferably, the process of obtaining the object identification record comprises:

in the video monitoring process, a camera is used for collecting monitoring videos;

and carrying out object recognition on the monitoring video by using an object recognition model to obtain the object recognition record.

Preferably, the performing object recognition on the surveillance video by using an object recognition model to obtain the object recognition record includes:

inputting each frame of image in the monitoring video into an object recognition model;

carrying out object recognition processing on the input images by using the object recognition model to obtain an image recognition result corresponding to each frame of image, wherein the image recognition result comprises whether an object recognized recognition mark exists, a recognized object type mark and a confidence coefficient;

counting the identification marks, the object type marks and the confidence degrees in the image identification results to obtain the object identification records; each of the object identification records includes an object identification, a time of appearance, and a time of disappearance.

Preferably, the obtaining the object identification record by performing statistics on the identification tag, the object category tag, and the confidence in each image identification result includes:

screening an image frame corresponding to the identified object by using the identification mark and the confidence coefficient;

carrying out classified statistics on the image frames by using the object class marks to obtain continuous frames corresponding to each object class mark;

determining the appearance time and the disappearance time in the object identification record using the successive frames;

and determining the object class marks corresponding to the continuous frames as the object identifications in the object identification records.

Preferably, determining the appearance time and the disappearance time in the object identification record using the successive frames comprises:

determining the acquisition time corresponding to the first frame in the continuous frames as the occurrence time;

and determining the acquisition time corresponding to the last frame in the continuous frames as the disappearance time.

Preferably, the target object set includes an element, and accordingly, the determining, by using the identification record file, a start time and a stop time of detecting the target object set includes:

determining a start-stop time of the element by using the identification record file;

determining a start-stop time for the element as the start-stop time.

Preferably, the target object set includes at least two elements, and the determining, by using the identification record file, a start-stop time when the target object set is detected includes:

determining the starting and ending time of each element by using the identification record file;

and determining the time intersection or the time union corresponding to the start time and the end time of each element as the start time and the end time.

A video query device, comprising:

the object set determining module is used for receiving and analyzing a video viewing request, and determining a target event to be viewed and a target object set corresponding to the target event;

the query module is used for determining the starting and ending time of the target object set by utilizing the identification record file; the identification record file comprises an object identification record acquired in real time in the video monitoring process;

and the video calling module is used for calling the target video corresponding to the start-stop time.

A video query device, comprising:

a memory for storing a computer program;

and the processor is used for realizing the steps of the video query method when executing the computer program.

A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned video querying method.

By applying the method provided by the embodiment of the invention, the video viewing request is received and analyzed, and the target event to be viewed and the target object set corresponding to the target event are determined; determining the starting and ending time of the detected target object set by using the identification record file; the identification record file comprises an object identification record acquired in real time in the video monitoring process; and calling a target video corresponding to the start-stop time.

In the video monitoring process, although it is difficult to determine what kind of event has occurred in the monitored scene through video monitoring, in the video monitoring process, it is possible to identify what things have occurred or are present in the monitored scene. And, in general, the occurrence of an event is always accompanied by the appearance and disappearance of objects. Based on the method, the time point of the event occurrence is determined by combining the object identification condition, so that when some event needs to be checked, the time of the event occurrence is quickly positioned based on the object occurrence condition, and finally, the video corresponding to the event is quickly called. Specifically, after a video viewing request is received, a target event to be viewed and a target object set corresponding to the target event are determined. And determining the starting and ending time corresponding to the target object set by using the identification record file, and then calling the target video based on the starting and ending time. And when the target video is the target event, the corresponding monitoring video is recorded. And the object recording file is recorded in the monitoring process, the time positioning of the event cannot be influenced by the storage quality of the stored monitoring video, the event can be quickly positioned, the time consumption is reduced, and the waste of human resources is avoided.

Accordingly, embodiments of the present invention further provide a video query apparatus, a device and a readable storage medium corresponding to the video query method, which have the above technical effects and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an implementation of a video query method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a video query apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a video query device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a video query device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

referring to fig. 1, fig. 1 is a flowchart of a video query method according to an embodiment of the present invention, where the method includes the following steps:

s101, receiving and analyzing a video viewing request, and determining a target event to be viewed and a target object set corresponding to the target event.

The video viewing request may include events that need to be viewed specifically, such as a person entering/exiting event, an animal entering/exiting event, a vehicle entering/exiting event, a person walking a dog event, and the like. For an event, the number of elements in the corresponding target object set may be one or more.

In the embodiment of the present invention, the correspondence between the events and the objects may be stored in advance, so that the target object set is determined based on the target events. For example, if the person in-out event is related to a person, the target object set corresponding to the person in-out event includes the person; the person dog walking event is related to both the person and the dog, and then the target object set comprises the person and the dog.

Of course, in the embodiment of the present invention, the video viewing request may further specifically include a target object set corresponding to a target event to be viewed. For example, the video viewing request may carry a walk animal event with a person flag and an animal flag; the video viewing request may carry a vehicle crushing animal event with a vehicle logo and an animal logo.

S102, determining the starting and ending time of the target object set by using the identification record file.

The identification record file comprises an object identification record acquired in real time in the video monitoring process.

It should be noted that, in the embodiment of the present invention, the number of elements in the target object set is not limited. That is, the number of elements in the target object set may be 1 or more.

Specifically, for the target object set including an element, correspondingly, determining the starting time and the ending time of detecting the target object set by using the identification record file includes:

determining the starting and ending time of detected elements by using an identification record file;

and step two, determining the start-stop time of the element as the start-stop time.

That is, the start-stop time corresponding to a single element in the target object set is directly determined as the start-stop time corresponding to the entire target object set.

Specifically, the target object set includes at least two elements, and correspondingly, determining the start-stop time of detecting the target object set by using the identification record file includes:

determining the starting and ending time of each detected element by using an identification record file;

and step two, determining the time intersection or the time union corresponding to the start-stop time of each element as the start-stop time.

That is, when the target object set includes at least 2 elements, there may be multiple mutually overlapping or mutually independent start-stop times of the determined start-stop times of the respective elements, and then a time intersection or a time union corresponding to the start-stop times of the respective elements may be used as the start-stop time of the target object set.

It should be noted that, the start-stop time corresponding to an element may be a start time and an end time of a time period, or may also be a start time and an end time corresponding to a plurality of time periods, respectively (that is, an object corresponding to the element appears in the video and is identified in different time periods); the starting and ending time for the target object set may be one time period or a plurality of time periods.

Wherein the process of obtaining an object identification record comprises:

step 1, in the process of video monitoring, a camera is used for collecting monitoring videos;

and 2, carrying out object recognition on the monitoring video by using the object recognition model to obtain an object recognition record.

For convenience of description, the above two steps will be described in combination.

The object recognition model is a model which can perform object recognition on an input picture/video and output a recognition result through training or creation. The object recognition model may be specifically a model trained and constructed based on depth information, machine learning. In the embodiment of the invention, the specific implementation principle and the structure of the object recognition model are not limited, and only the object recognition model can be used for recognizing the target video and obtaining the corresponding recognition result.

In the process of video monitoring, after the camera collects the monitoring video, the object recognition model can be used for carrying out object recognition on the monitoring video to obtain an object recognition record. That is to say, the object identification record is not obtained after identification processing is performed on the basis of the stored monitoring video, but is obtained by identifying the monitoring video acquired by the camera in real time, so that the quality of the object identification is not affected by the video quality damaged by video storage or transmission, and the object identification record recorded in the identification record file has a higher reference value.

Wherein, step 2 may specifically include:

step 2.1, inputting each frame of image in the monitoring video into an object identification model;

2.2, carrying out object recognition processing on the input image by using an object recognition model to obtain an image recognition result corresponding to each frame of image, wherein the image recognition result comprises whether an object recognized recognition mark exists, a recognized object type mark and a confidence coefficient;

step 2.3, counting the identification marks, the object type marks and the confidence degrees in the image identification results to obtain object identification records; each object identification record includes an object identification, a time of appearance, and a time of disappearance.

That is, after the camera collects the surveillance video, the object video model can directly perform object recognition on the image frames collected by the camera. And whether an identification mark for identifying an object exists in the image identification result corresponding to each frame of image (for example, 1 is used for identifying the object and 0 is used for identifying no object), numbers can be preset for different identification objects, when the corresponding object is identified, the corresponding number is directly used as the identified object type mark, and in addition, the reliability of the identification result is represented by confidence coefficient.

It should be noted that, in the embodiment of the present invention, the recognition result corresponding to one frame image may include one, two or more object class marks, that is, it indicates that one, two or more objects to be recognized exist in the image frame.

For step 2.3, the method may further specifically include:

2.3.1, screening image frames corresponding to the identified object by using the identification mark and the confidence coefficient;

step 2.3.2, carrying out classified statistics on the image frames by utilizing the object class marks to obtain continuous frames corresponding to each object class mark;

step 2.3.3, determining the appearance time and disappearance time in the object identification record by using the continuous frames;

and 2.3.4, determining the object class marks corresponding to the continuous frames as object identifications in the object identification records.

Screening image frames corresponding to the identified object based on the identification mark and the confidence coefficient, wherein the image frames can be specifically image frames with the corresponding confidence coefficient larger than a preset threshold value and the identified object displayed on the identification mark; it is also possible to leave the image frame with the confidence corresponding to the image of the last frame (or the designated frame) larger than the preset threshold, and the identification mark showing the identified object, i.e., after the designated number of frames of the identified object are not detected, it is determined that the identified object disappears. That is, a continuous frame may have an object corresponding to a certain object type label, or may have an object corresponding to a certain object type label in a portion (where the portion generally refers to a front portion of the continuous frame), and an object corresponding to a certain object type label in a portion (where the portion generally refers to a rear portion of the continuous frame) does not exist.

Wherein step 2.3.3 may specifically comprise:

step 2.3.3.1, determining the acquisition time corresponding to the first frame in the continuous frames as the occurrence time;

step 2.3.3.2, determining the acquisition time corresponding to the last frame in the consecutive frames as the disappearance time.

The acquisition time corresponding to the first frame in the continuous frames is determined as the appearance time, and the acquisition time corresponding to the last frame in the continuous frames is determined as the disappearance time.

For convenience of understanding, the following description illustrates how to identify the record file, how to obtain the identification record file, and how to determine the start-stop event corresponding to the target event based on the identification record file:

a pre-trained object recognition model can be integrated into Camera in advance, and can recognize common moving objects, such as people, common animal names, vehicle names, and the like. The object recognition model can output whether an object is recognized or not, and can output the class information of the object if the object is recognized.

When Camera starts, an object recognition model starts and creates an independent recognition file at the same time, after Camera starts and starts video recording, a recognition module starts to recognize each frame of original image, when an object in a certain frame of image is recognized for the first time, the name and the appearance time of the object are recorded, the recognition result of the object is judged in each subsequent frame, when the object is not recognized in 10 continuous frames, the recognition module considers that the object disappears, at the moment, the recognition module adds a record to the recognition record file, and the name, the appearance time and the disappearance time of the object are recorded.

For a frame of image, sometimes the recognition module will recognize a plurality of different objects, at this time a record is created for each different object, each object recording an independent appearance time and disappearance time.

Preferably, in order to reduce the storage overhead, the real object name is not recorded in the identification record file, but a preset N bytes (N is greater than or equal to 1, the specific value of N is related to the total number of categories of the identified object, and is generally set to be 1, and the record can include 2⁸Seed identified object), for example, the code of a person is 0x 00000001. The appearance time and disappearance time may be recorded in M bytes (M may be 8), respectively, recording the number of milliseconds for the video start time when the object appears. If 1 byte is used for representing the object name (namely object identification), and 8 bytes are used for identifying the appearance time and the disappearance time, 17 bytes are needed for one object identification record. To further reduce storage overhead, no separator is required between an object identification record and an object identification record, the latter object identification record being written directly after the former object identification record.

Table 1 below shows a segment in the identification record file (assuming that 0x00000001 represents a human and 0x00000002 represents a pet dog):

object name code	Time of occurrence	Disappearance time
			…	…	…
0x00000001	0x1101101011000000	0x10001001001010110
			0x00000002	0x1110101011000100	0x10001000111011110
0x00000001	0x10011100010000000	0x10100110001101100
			…	…	…

TABLE 1

When retrieval from the video is required, the system reads the entire identification record file. And restoring the coded information of the object in the record into the information of the object, and establishing a list for each object, wherein the elements in the list are the starting time and the disappearance time of each time period when the object appears in the video. For example, the records in table 1 above may be converted to the following records:

human: (56000,70230), (80000,85100) …// represents the occurrence of two human beings, the first being 56 seconds to 70.23 seconds, the second being 80 seconds to 85.1 seconds

The pet dog: (60100,70110) …// represents a single appearance of the pet dog, 60.1 seconds to 70.11 seconds

At this time, if the information that a person appears in the video is searched, the system can directly determine two sections in which the person appears.

And S103, calling a target video corresponding to the start-stop time.

The start-stop time may correspond to the start time and the end time of one time period, or may correspond to the start time and the end time of a plurality of time periods. Therefore, the target video may be a surveillance video corresponding to one time period, or may also be a surveillance video corresponding to a plurality of time periods.

After the target video is called based on the start-stop time, the target video can be displayed to the user. Such as playing on a visual interface or sending to a client for viewing by the client.

For example, if a picture in which both a person and a pet dog appear is to be searched, the system may first calculate an intersection of times at which the person and the pet dog appear, and then obtain the intersection and display the corresponding segment to the user, and the user may click any segment to play.

Example two:

corresponding to the above method embodiments, the embodiments of the present invention further provide a video query apparatus, and the video query apparatus described below and the video query method described above may be referred to correspondingly.

Referring to fig. 2, the apparatus includes the following modules:

an object set determining module 101, configured to receive and analyze a video viewing request, and determine a target event to be viewed and a target object set corresponding to the target event;

the query module 102 is configured to determine, by using the identification record file, start and end times of detecting the target object set; the identification record file comprises an object identification record acquired in real time in the video monitoring process;

and the video calling module 103 is used for calling the target video corresponding to the start-stop time.

The device provided by the embodiment of the invention is applied to receive and analyze the video viewing request and determine the target event to be viewed and the target object set corresponding to the target event; determining the starting and ending time of the detected target object set by using the identification record file; the identification record file comprises an object identification record acquired in real time in the video monitoring process; and calling a target video corresponding to the start-stop time.

In the video monitoring process, although it is difficult to determine what kind of event has occurred in the monitored scene through video monitoring, in the video monitoring process, it is possible to identify what things have occurred or are present in the monitored scene. And, in general, the occurrence of an event is always accompanied by the appearance and disappearance of objects. Therefore, in the device, the time point of the event occurrence is determined by combining the object identification condition, so that when some event needs to be checked, the time of the event occurrence is quickly positioned based on the object occurrence condition, and finally, the video corresponding to the event is quickly called. Specifically, after a video viewing request is received, a target event to be viewed and a target object set corresponding to the target event are determined. And determining the starting and ending time corresponding to the target object set by using the identification record file, and then calling the target video based on the starting and ending time. And when the target video is the target event, the corresponding monitoring video is recorded. And the object recording file is recorded in the monitoring process, the time positioning of the event cannot be influenced by the storage quality of the stored monitoring video, the event can be quickly positioned, the time consumption is reduced, and the waste of human resources is avoided.

In one embodiment of the invention, the apparatus comprises: the monitoring real-time identification recording module is used for acquiring a monitoring video by using a camera in the video monitoring process; and carrying out object recognition on the monitoring video by using the object recognition model to obtain an object recognition record.

In a specific embodiment of the present invention, the monitoring real-time identification recording module is specifically configured to input each frame of image in the monitoring video into the object identification model; carrying out object recognition processing on the input images by using an object recognition model to obtain an image recognition result corresponding to each frame of image, wherein the image recognition result comprises whether an object recognized recognition mark exists, a recognized object type mark and a confidence coefficient; counting the identification marks, the object type marks and the confidence degrees in the image identification results to obtain object identification records; each object identification record includes an object identification, a time of appearance, and a time of disappearance.

In a specific embodiment of the present invention, the monitoring real-time identification recording module is specifically configured to screen an image frame corresponding to an identified object by using the identification mark and the confidence level; carrying out classification statistics on the image frames by using the object class marks to obtain continuous frames corresponding to each object class mark; determining the appearance time and disappearance time in the object identification record by using the continuous frames; and determining the object class marks corresponding to the continuous frames as the object identifications in the object identification records.

In a specific embodiment of the present invention, the monitoring real-time identification recording module is specifically configured to determine an acquisition time corresponding to a first frame of the consecutive frames as an occurrence time; and determining the acquisition time corresponding to the last frame in the continuous frames as the disappearance time.

In an embodiment of the present invention, the target object set includes an element, and accordingly, the query module 102 is specifically configured to determine, by using the identification record file, a start-stop time of the detected element; the start-stop time of an element is determined as the start-stop time.

In a specific embodiment of the present invention, the target object set includes at least two elements, and correspondingly, the query module 102 is specifically configured to determine, by using the identification record file, a start-stop time when each element is detected; and determining the time intersection or the time union corresponding to the start time and the end time of each element as the start time and the end time.

Example three:

corresponding to the above method embodiment, the embodiment of the present invention further provides a video query device, and a video query device described below and a video query method described above may be referred to in a corresponding manner.

Referring to fig. 3, the video query apparatus includes:

a memory 332 for storing a computer program;

the processor 322 is configured to implement the steps of the video query method of the above method embodiments when executing the computer program.

Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a specific structure of a video query device provided in this embodiment, the video query device may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer applications 342 or data 344. Memory 332 may be, among other things, transient or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the memory 332 to execute a series of instruction operations in the memory 332 on the video query device 301.

The video query apparatus 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341.

The steps in the video query method described above may be implemented by the structure of the video query device.

Example four:

corresponding to the above method embodiment, an embodiment of the present invention further provides a readable storage medium, and a readable storage medium described below and a video query method described above may be referred to in correspondence with each other.

A readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the video query method of the above-mentioned method embodiments.

The readable storage medium may be a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various readable storage media capable of storing program codes.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims

1. A video query method, comprising:

and calling a target video corresponding to the start-stop time.

2. The video query method of claim 1, wherein the process of obtaining the object identification record comprises:

3. The video query method of claim 2, wherein the performing object recognition on the surveillance video by using an object recognition model to obtain the object recognition record comprises:

4. The video query method according to claim 3, wherein the obtaining the object identification record by performing statistics on the identification tag, the object category tag, and the confidence in each of the image identification results includes:

5. The video query method of claim 4, wherein determining the appearance time and the disappearance time in the object identification record using the consecutive frames comprises:

6. The video query method of claim 1, wherein the set of target objects comprises an element, and wherein the determining, using the identification log file, the start time and the end time of detecting the set of target objects comprises:

determining a start-stop time for the element as the start-stop time.

7. The video query method of claim 1, wherein the set of target objects comprises at least two elements, and wherein the determining, using the identification log file, the start-stop time of the set of target objects comprises:

8. A video query apparatus, comprising:

9. A video query device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the video query method according to any one of claims 1 to 7 when executing said computer program.

10. A readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the video query method according to any one of claims 1 to 7.