CN114554302A

CN114554302A - Video tracing method and device, computer equipment and storage medium

Info

Publication number: CN114554302A
Application number: CN202210175888.7A
Authority: CN
Inventors: 龚竞秋; 蒋家堂; 王金希; 王雪
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-27

Abstract

The application relates to a video tracing method, a video tracing device, a computer device, a storage medium and a computer program product. Relates to the technical field of block chains, and can be used in the field of information security. Target event positioning information corresponding to a block chain is obtained through query according to target event description information and target time information carried in a tracing request, wherein the block chain comprises a plurality of blocks used for storing different event positioning information corresponding to different video clips, and the event positioning information corresponding to the video clips comprises event description information and time information. And inquiring a video database according to the target event positioning information to obtain a video segment corresponding to the target event. Compared with the traditional mode of manual retrieval, the method and the device have the advantages that each block in the block chain is queried by utilizing the event description information and the time information, the event positioning information corresponding to the event description information is obtained, the video clip of the event needing to be queried is obtained based on the event positioning information, and the video tracing efficiency is improved.

Description

Video tracing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of blockchain technologies, and in particular, to a video tracing method, apparatus, computer device, storage medium, and computer program product.

Background

Video monitoring is used as a safety protection means and is widely applied to various places such as data centers, bank outlets, vaults and the like. When an accident occurs in a monitored site, the source of a monitoring video needs to be traced, so that key contents in the video are obtained. The current tracing method for video monitoring is to distinguish the video content and the occurrence time in a manual way. However, the duration of the monitoring video is long, and the required video content cannot be quickly found through a tracing mode of manually distinguishing the video content from the occurrence time.

Therefore, the existing video tracing method has the defect of low retrieval efficiency.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video tracing method, an apparatus, a computer device, a computer readable storage medium and a computer program product, which can improve the retrieval efficiency.

In a first aspect, the present application provides a video tracing method, where the method includes:

receiving a tracing request for a target event, and acquiring target event description information and target time information carried in the tracing request;

inquiring a block chain according to the target event description information and the target time information to obtain corresponding target event positioning information; the block chain comprises a plurality of blocks used for storing different event positioning information corresponding to different video clips; the event positioning information corresponding to the video clip comprises event description information and time information of the video clip; the video clip is obtained by classifying a source video based on events, the event description information represents the description of the set behavior in the video clip, and the time information represents the time of the video clip in the video;

and querying a video database according to the target event positioning information to obtain a video segment corresponding to the target event.

In one embodiment, the method further comprises:

acquiring a source video, performing event classification on the source video to obtain a plurality of events to be stored and a plurality of time information to be stored, and constructing a plurality of blocks in a block chain according to the plurality of time information to be stored;

acquiring target events to be stored, the event contents of which are not overlapped, from the plurality of events to be stored, and generating event description information corresponding to the target events to be stored;

and obtaining event positioning information according to the event description information and the time information to be stored, and storing the event positioning information into a block corresponding to the time information to be stored to obtain the block chain.

In one embodiment, the event classification of the source video to obtain a plurality of events to be stored and a plurality of time information to be stored includes:

obtaining a candidate video clip containing a set behavior in the source video according to a classification behavior identifier, obtaining an extended video clip of an event corresponding to the set behavior according to an integrity identifier, and obtaining the video clip of the event to be detected and a corresponding timestamp thereof based on the candidate video clip and the extended video clip;

aligning the event video clip to be detected and the timestamp;

acquiring a behavior probability and a completion probability corresponding to the to-be-detected event video clip according to the aligned to-be-detected event video clip and the time stamp; the behavior probability represents whether the aligned video clip of the event to be detected is the event to be stored or not, and the completion degree probability represents whether the event to be stored in the aligned video clip of the event to be detected is complete or not;

and determining the event to be stored and the time information to be stored containing the complete event to be stored according to the behavior probability and the completion degree probability.

In one embodiment, the aligning the video segment of the event to be detected and the timestamp includes:

and carrying out pyramid pooling on the timestamp, inputting the processed timestamp and the event video clip to be detected into a preset cyclic neural network, and aligning the event video clip to be detected and the timestamp through the preset cyclic neural network.

In one embodiment, the constructing the plurality of blocks in the block chain according to the plurality of time information to be stored includes:

for each piece of time information to be stored, determining the duration corresponding to the time information to be stored according to a timestamp contained in the time information to be stored, and generating a block with the corresponding duration in a block chain to be constructed;

and obtaining the block chain according to the plurality of blocks.

In one embodiment, the obtaining of target events to be stored, whose event contents do not overlap, from the plurality of events to be stored includes:

aiming at each frame of video image in the video clip corresponding to the event to be stored, acquiring an image frame to be detected, wherein the similarity of the image frame to be detected and other frame of video images in the video clip is smaller than a preset similarity threshold;

inputting the image frames to be detected into a preset probability function to obtain corresponding non-overlapping probability values, and determining target image frames to be detected, the corresponding event contents of which are not overlapped, according to the comparison result of the non-overlapping probability values and preset hyper-parameters;

and obtaining a target event to be stored according to the plurality of target image frames to be detected.

In one embodiment, the acquiring an image frame to be detected whose similarity with the video images of other frames in the video segment is smaller than a preset similarity threshold includes:

if the frame of video image is the first frame in the video clip, obtaining an image frame to be detected through a long-term and short-term memory algorithm according to the visual characteristics of the frame of video image, the time information of the frame of video image in the video clip and the generated description information;

and if the frame of video image frame is not the first frame in the video clip, obtaining the frame to be detected through a long-term and short-term memory algorithm according to the last frame of image frame to be detected corresponding to the frame of video image frame, the visual characteristics of the frame of video image, the time information occupied by the frame of image in the video clip and the generated description information.

In one embodiment, the generating event description information corresponding to the target event to be stored includes:

acquiring a plurality of sub-events contained in the target event to be stored, and determining the sub-event description corresponding to the sub-event through a long-short term memory algorithm according to the visual characteristics of the video clip corresponding to the sub-event and the generated sub-event description aiming at each sub-event;

and obtaining event description information corresponding to the target event to be stored according to the sub-event descriptions.

In one embodiment, the obtaining event positioning information according to the event description information and the to-be-stored time information, and storing the event positioning information into a block corresponding to the to-be-stored time information to obtain the block chain includes:

and splicing the event description information and the time information to be stored to obtain corresponding coordinates, using the coordinates as event positioning information, and storing the event positioning information into a block corresponding to the time information to be stored in a coordinate form to obtain the block chain.

In a second aspect, the present application provides a video tracing apparatus, the apparatus including:

the system comprises a receiving module, a source tracing module and a source tracing module, wherein the receiving module is used for receiving a source tracing request of a target event and acquiring target event description information and target time information carried in the source tracing request;

the source tracing module is used for inquiring the block chain according to the target event description information and the target time information to obtain corresponding target event positioning information; the block chain comprises a plurality of blocks used for storing different event positioning information corresponding to different video clips; the event positioning information corresponding to the video clip comprises event description information and time information of the video clip; the video clip is obtained by classifying a source video based on events, the event description information represents the description of the set behavior in the video clip, and the time information represents the time of the video clip in the video;

and the query module is used for querying a video database according to the target event positioning information to obtain a video segment corresponding to the target event.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method described above.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprises a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.

According to the video tracing method, the video tracing device, the computer equipment, the storage medium and the computer program product, when a tracing request is received, a block chain is inquired according to target event description information and target time information carried in the tracing request, so that corresponding target event positioning information in the block chain is obtained, wherein the block chain comprises a plurality of blocks used for storing different event positioning information corresponding to different video clips obtained by classifying source videos, and the event positioning information corresponding to the video clips comprises event description information obtained by describing set behaviors in the video clips and time information of the video clips in videos. And inquiring a video database according to the target event positioning information to obtain a video segment corresponding to the target event. Compared with the traditional manual retrieval mode, the method and the device have the advantages that the event locating information corresponding to the event description information is obtained by querying each block in the block chain through the event description information and the time information, so that the video clip of the event needing to be queried can be obtained based on the event locating information, and the video tracing efficiency is improved.

Drawings

FIG. 1 is a diagram of an exemplary video tracing application environment;

FIG. 2 is a schematic flow chart diagram of a video tracing method in one embodiment;

FIG. 3 is a flowchart illustrating the event classification step in one embodiment;

FIG. 4 is a flowchart illustrating a video tracing method according to another embodiment;

FIG. 5 is a flowchart illustrating a video tracing method according to another embodiment;

FIG. 6 is a block diagram of a video source apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The video tracing method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 in the blockchain over the network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The terminal 102 may receive a tracing request for a target event triggered by a user, and query the blockchain based on target event description information and target time information in the tracing request to obtain corresponding target event positioning information, so as to obtain a video clip corresponding to the target event by querying a video database based on event description information and time information in the target event positioning information obtained by the query. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The servers 104 in the blockchain may be implemented as individual servers or as a server cluster of multiple servers.

In an embodiment, as shown in fig. 2, a video source tracing method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step S202, a tracing request for the target event is received, and target event description information and target time information carried in the tracing request are obtained.

The target event can be an event that a user needs to trace a source, and the event can be recorded by the image acquisition device to form a corresponding video clip. The terminal 102 may receive a tracing request of a user for a target event, and obtain target event description information and target time information carried in the tracing request. The target event description information may be a description of a target event that needs to be traced by a user, and the description may be in a form of a natural language; the target time information may be an occurrence time of a target event in the source video. The terminal 102 may trace the source of the target event based on the user's natural language description and time information. For example, a video clip corresponding to the target event is queried. In addition, in some embodiments, the information input by the user may further include information of a place where the target event occurs, for example, if the target event is that the user draws money at a bank, the place where the target event occurs may be information of a certain website, and the terminal 102 may store the tracing information of the target event according to the place where the target event occurs.

Step S204, inquiring a block chain according to the target event description information and the target time information to obtain corresponding target event positioning information; the block chain comprises a plurality of blocks for storing different event positioning information corresponding to different video clips; the event positioning information corresponding to the video clip comprises event description information and time information of the video clip; the video clip is obtained by classifying the source video according to events, the event description information represents the description of the set behaviors in the video clip, and the time information represents the time of the video clip in the video.

The blockchain may be a certificate-storing blockchain, and the blockchain includes a plurality of blocks for storing different event location information corresponding to different video clips. That is, each tile in the tile chain may store event location information corresponding to one video clip. The size of each tile may be determined according to the duration of the video segment. The terminal 102 may query the block chain according to the target event description information and the target time information in the tracing request triggered by the user, and obtain target event positioning information corresponding to the target event description information and the target time information, so that the terminal 102 may obtain the event description information and the time information in the target event positioning information. That is, the user may initiate a source tracing request in a natural language description manner, and the terminal 102 may query the block chain based on the natural language description information of the user to obtain the corresponding target block, and then obtain the target event positioning information from the target block. The event positioning information corresponding to the video clip includes event description information of the video clip and time information thereof, the video clip may be a video clip corresponding to a certain event, the video clip may be a part of a source video, and the source video may be a most complete original video, for example, a complete video recorded by a certain image capturing device. The terminal 102 may obtain a plurality of video segments by performing event classification on the source video in a push manner, that is, each video segment may represent an event. And the event includes event description information and time information. Wherein the event description information may be a description of a set behavior in its corresponding video segment. For example, if the video clip is a money-taking action of the user, the event description information may be a description of a series of actions that occur to the user from the time the user appears in the video frame to the time the user leaves the video frame. The description can be described in a natural language form, so that corresponding event positioning information can be inquired only according to the natural language description of the event by the user during tracing, and the video tracing efficiency can be improved. The time information may be a time of its corresponding event in the source video. For example, an event start time and an event end time may be included.

The block chains may be multiple, and each block chain may correspond to a video shooting location. For example, taking a bank monitoring video as an example, the image acquisition device may be set in different websites, and the terminal 102 may construct a block chain based on each website, perform event classification on a source video acquired by the image acquisition device of the website, and store obtained event positioning information into the block chain of the corresponding website, thereby implementing construction of multiple block chains. When a user needs to trace the source, the block chain of the network point where the event occurs can be inquired, so that corresponding event positioning information can be obtained quickly.

And S206, querying a video database according to the target event positioning information to obtain a video clip corresponding to the target event.

The target event positioning information may be positioning information of a video clip where an event that a user needs to trace a source is located, the target event positioning information includes event description information and time information of the target event, a plurality of video clips are stored in the video database, and each video clip represents one event. The terminal 102 may query the video database according to the target event positioning information to obtain a video segment corresponding to the target event. The terminal 102 may display the video clip, and the terminal 102 may also display event description information corresponding to an event contained in the video clip and occurrence time of the event in the video clip, thereby implementing tracing of the video based on the description and time of the event.

In the video tracing method, when a tracing request is received, a block chain is queried according to target event description information and target time information carried in the tracing request to obtain corresponding target event positioning information in the block chain, wherein the block chain comprises a plurality of blocks used for storing different event positioning information corresponding to different video clips obtained by classifying source videos, and the event positioning information corresponding to the video clips comprises event description information obtained by describing set behaviors in the video clips and time information of the video clips in videos. And inquiring a video database according to the target event positioning information to obtain a video segment corresponding to the target event. Compared with the traditional mode of manual retrieval, the method and the device have the advantages that each block in the block chain is queried by utilizing the event description information and the time information, and the event positioning information corresponding to the event description information is obtained, so that the video clip of the event needing to be queried can be obtained based on the event positioning information, and the video tracing efficiency is improved.

In one embodiment, further comprising: acquiring a source video, performing event classification on the source video to obtain a plurality of events to be stored and a plurality of time information to be stored, and constructing a plurality of blocks in a block chain according to the plurality of time information to be stored; acquiring target events to be stored, the event contents of which are not overlapped, from a plurality of events to be stored, and generating event description information corresponding to the target events to be stored; and obtaining event positioning information according to the event description information and the time information to be stored, and storing the event positioning information into a block corresponding to the time information to be stored to obtain a block chain.

In this embodiment, the source video may be a complete video acquired by the image acquisition device, and the terminal 102 may acquire the source video and perform event classification on the source video, where the classification may be a push type video-phase-based positioning mode. The terminal 102 may obtain a plurality of events to be stored and a plurality of pieces of time information to be stored by performing event classification on the source video. The event to be stored may be an event included in the source video, and the time information to be stored may be a time occupied by the event in the source video. The time information to be stored may include a plurality of time information, and the terminal 102 may construct a plurality of blocks in the block chain based on the plurality of time information to be stored.

After the terminal 102 obtains the multiple events to be stored by classification, the target events to be stored, whose event contents are not overlapped, may be obtained from the multiple events to be stored, and event description information corresponding to the target events to be stored is generated. The event description information may be natural language information, so that the terminal 102 may obtain event location information corresponding to the event based on the event description information and the time information to be stored. The terminal 102 may obtain a plurality of event location information based on the plurality of events to be stored and the plurality of times to be stored. And the terminal 102 may store each event positioning information into the block corresponding to the time to be stored, and when the terminal 102 stores all event positioning information into the block corresponding to the time information thereof, the block chain may be obtained. The number of the block chains can be determined based on the acquisition location corresponding to the source video. For example, taking a bank monitoring video as an example, cameras of the bank monitoring video are generally distributed in different websites, each website can acquire a corresponding source video, and the terminal 102 can construct different block chains based on different positions of the websites to obtain a plurality of block chains. When the user needs to trace the source, the query of the event positioning information can be performed on the block chain corresponding to the event occurrence place. The event classification may be performed by a preset evaluator, and the event description information may be generated by a Long short-term memory (LSTM) algorithm.

Through the embodiment, the terminal 102 can obtain a plurality of events and time thereof by classifying the source video, and form event positioning information based on the description information and time of the events, and store the event positioning information in the block chain, so that corresponding video clips can be queried only according to the description of the events when a user traces the source, and the efficiency of tracing the source of the video is improved.

In one embodiment, event classification is performed on a source video to obtain a plurality of events to be stored and a plurality of time information to be stored, including: obtaining a candidate video clip containing a set behavior in a source video according to a classification behavior identifier, obtaining an extended video clip of an event corresponding to the set behavior according to an integrity identifier, and obtaining the video clip of the event to be detected and a corresponding timestamp thereof based on the candidate video clip and the extended video clip; aligning the video clip of the event to be detected and the timestamp; acquiring a behavior probability and a completion probability corresponding to the to-be-detected event video clip according to the aligned to-be-detected event video clip and the time stamp; the behavior probability represents whether the aligned video clip of the event to be detected is the event to be stored, and the completion degree probability represents whether the aligned event to be stored in the video clip of the event to be detected is complete; and determining the event to be stored and the time information to be stored containing the complete event to be stored according to the behavior probability and the completion degree probability.

In this embodiment, the terminal 102 may obtain video segments corresponding to a plurality of events by performing push-type classification on the source video. The terminal 102 may obtain, by the classification behavior identifier, a candidate video segment including a set behavior in the source video, and obtain, according to the integrity identifier, an extended video segment of an event corresponding to the set behavior, so that the terminal 102 may obtain, based on the candidate video segment and the extended video segment, the video segment of the event to be detected and the timestamp corresponding thereto in the source video. For example, in the case of bank surveillance video, the user draws money at the ATM as an event, and in the video clip of the event, the action of drawing money by the user can be used as a candidate video clip in the video, the terminal 102 can obtain an extended video clip of the money drawing event based on the action of drawing money by the user through the integrity identifier, for example, the clip of the user entering the ATM and leaving the ATM, and the terminal 102 can obtain the video clip of the money drawing event by the user and the timestamp thereof in the source video based on the candidate clip and the extended clip. Specifically, as shown in fig. 3, fig. 3 is a schematic flow chart of an event classification step in one embodiment. In the process of performing stage positioning on the source video, the terminal 102 may identify and classify the video into a candidate region and an expansion region through the classification behavior identifier and the integrity identifier, and perform sampling of the sparse segment, so that the terminal 102 may use the sparse segment as the video segment of the event to be detected.

The terminal 102 may also align the video clip of the event to be detected with the timestamp. For example, in one embodiment, aligning the video clip of the event to be detected and the timestamp comprises: and performing pyramid pooling on the timestamp, inputting the processed timestamp and the event video segment to be detected into a preset cyclic neural network, and aligning the event video segment to be detected and the timestamp through the preset cyclic neural network. In this embodiment, in the alignment process, the terminal 102 may perform pyramid pooling on the timestamp, and input the processed timestamp and the event video segment to be detected into a preset RNN (Recurrent Neural Network) Network, so as to align the event video segment to be detected and the timestamp through the preset RNN Network. Specifically, as shown in fig. 3, the terminal 102 may perform pyramid pooling based on the timestamps of the events at the above stages, and implement aligning the video time structure and generating the text structure through RNN network training.

After the terminal 102 is aligned, the behavior probability and the completion probability corresponding to the event video clip to be detected can be obtained according to the aligned event video clip to be detected and the timestamp. The behavior probability represents whether the aligned event video clip to be detected is an event to be stored, for example, whether the event video clip is an action of drawing money for a user; the completion degree probability represents whether the events to be stored in the aligned event video clips to be detected are complete or not, for example, whether the whole process from the user appearing in the picture to the user leaving the picture is included. Specifically, as shown in fig. 3, after the terminal 102 aligns the time, the terminal 102 may calculate the key frame according to the precise time positioning, and process the region feature through the classifier to obtain the behavior probability and the completion degree probability, the terminal 102 determines whether the timestamp is the key frame time according to the two probabilities, for example, when both the two probabilities are greater than or equal to a preset probability threshold, the timestamp is determined to be the key frame time, and after the terminal 102 determines the key frame, the terminal 102 may retrieve the height of the relevant block corresponding to the timestamp of the key frame to obtain the corresponding block, and record the key frame probability into the block chain. Therefore, the terminal 102 may determine a plurality of events to be stored and time information to be stored including complete events to be stored from the source video according to the behavior probability and the completion degree probability.

Through the embodiment, the terminal 102 can classify the source video based on the preset identifier, determine the complete event and the complete time information to be stored through the behavior probability and the completion degree probability, trace the source based on the complete event and the time information to be stored, and improve the efficiency of tracing the source of the video.

In one embodiment, constructing a plurality of blocks in a block chain from a plurality of time information to be stored comprises: for each piece of time information to be stored, determining the duration corresponding to the time information to be stored according to the timestamp contained in the time information to be stored, and generating a block with the size corresponding to the duration in a block chain to be constructed; and obtaining a block chain according to the plurality of blocks.

In this embodiment, the terminal 102 may construct a plurality of blocks to form the block chain based on the plurality of pieces of time information to be stored. The event to be stored may include a plurality of events, and then there may be a plurality of corresponding time information to be stored, and for each time information to be stored, the terminal 102 may determine a duration corresponding to the time information to be stored according to a timestamp included in the time information to be stored. For example, the terminal 102 may obtain a start time and an end time of an event to be stored in the video, which are included in the time information to be stored, so as to obtain a plurality of timestamps of the event to be stored, and the terminal 102 determines the duration of the time information to be stored based on the plurality of timestamps. After the terminal 102 determines the duration, the size of the block of the event to be stored corresponding to the time information to be stored in the block chain to be constructed may be determined based on the duration, and since there are a plurality of events to be stored, the terminal 102 may construct a plurality of blocks by the above method, so that the terminal 102 may obtain the block chain according to the plurality of blocks. Specifically, the terminal 102 may divide the time according to the sampling segment of the source video, form a timestamp in the block chain according to the divided time, construct a block in the block chain based on the timestamp, and record the block content according to the timestamp mark block height. In addition, the terminal 102 may further record, into the block chain, the probability of determining whether the image frame in the video corresponding to the timestamp is a key frame. For example, the terminal 102 may determine whether the image frame corresponding to the timestamp is a key frame according to the behavior probability and the completion degree probability, if so, retrieve the height of the relevant block corresponding to the timestamp of the key frame to find the block, and record the key frame probability to the block chain.

Through the embodiment, the terminal 102 can determine the duration of the block based on a plurality of timestamps in one piece of to-be-stored time information, so that the block corresponding to the to-be-stored time information can be constructed, and the query efficiency during video source tracing is improved.

In one embodiment, obtaining target events to be stored, whose event contents do not overlap, from a plurality of events to be stored includes: acquiring an image frame to be detected, wherein the similarity of the image frame to be detected and other frame video images in the video clip is smaller than a preset similarity threshold value, aiming at each frame video image in the video clip corresponding to the event to be stored; inputting the image frames to be detected into a preset probability function to obtain corresponding non-overlapping probability values, and determining the corresponding target image frames to be detected with non-overlapping event contents according to the comparison result of the non-overlapping probability values and preset hyper-parameters; and obtaining a target event to be stored according to the image frames to be detected of the targets.

In this embodiment, when the terminal 102 performs push-type classification on the events in the source video, there may be overlapped pictures, so that the terminal 102 needs to perform deduplication on the events to be stored to obtain target events to be stored whose event contents are not overlapped. The video clip corresponding to the event to be stored may include images of a plurality of frames, for each frame of video image in the video clip of the event to be stored, the terminal 102 may obtain an image frame to be detected whose similarity between the frame of video image and other frame of video images in the video clip is smaller than a preset similarity threshold, the terminal 102 may input the image frame to be detected into a preset probability function to obtain a corresponding non-overlapping probability value representing that the image frame to be detected is a non-overlapping image, the terminal 102 may obtain a comparison result between the non-overlapping probability value and a preset hyper-parameter, and determine whether the image frame is a target image frame to be detected whose event content is non-overlapping according to the comparison result. If there are multiple video image frames in the video clip, the terminal 102 may determine whether each video image frame is an image frame to be detected by the target, so that the terminal 102 may obtain the event to be stored by the target based on the multiple image frames to be detected by the target.

The image frame to be detected, in which the similarity between the obtained video image frame and the video images of other frames in the video clip is smaller than the preset similarity threshold, can be obtained in an LSTM manner. For example, in one embodiment, acquiring an image frame to be detected whose similarity to video images of other frames in the video segment is smaller than a preset similarity threshold includes: if the frame video image is the first frame in the video clip, obtaining the image frame to be detected through a long-short term memory algorithm according to the visual characteristics of the frame video image, the time information of the frame image in the video clip and the generated description information; and if the frame of video image frame is not the first frame in the video clip, obtaining the frame to be detected through a long-short term memory algorithm according to the last frame of image frame to be detected corresponding to the frame of video image frame, the visual characteristics of the frame of video image, the time information occupied by the frame of image in the video clip and the generated description information.

In this embodiment, when the terminal 102 acquires the image frame to be detected from the video clip, different calculation methods may be adopted based on whether the video image of the frame is the first frame. If the terminal 102 detects that the frame of video image is the first frame in the video segment, the terminal 102 may obtain the image frame to be detected through the LSTM algorithm according to the visual characteristics of the frame of video image, the time information of the frame of image in the video segment, and the generated description information. If the terminal 102 detects that the frame image is not the first frame in the video segment, the terminal 102 may obtain the frame image to be detected through the LSTM algorithm according to the previous frame image frame to be detected corresponding to the frame video image frame, the visual characteristics of the frame video image, the time information occupied by the frame video image in the video segment, and the generated description information.

Specifically, the terminal 102 may generate a description word for each frame of the video image in units of words, that is, the terminal 102 may select an event and form a character, and generate a next word based on the video time structure and the generated word sequence, thereby ensuring consistency between the generated sentences. The terminal 102 may use LSTM for picking segments with independent events and no overlap with previous events, in order to avoid a lot of redundancy in the generated description. The image frame to be detected obtained by the terminal 102 through LSTM may be recorded as a vector:

wherein h is_t-1For the previous frame to be detected, if h_tFor the first frame, the vector may not contain h_t-1V of the above input_tIs the visual characteristic r extracted from the video frame picture by the video stage positioning method_tIs a distance feature, similar to an image mask, represented by a binary mask that represents the normalized time span of the segment with respect to the entire duration. c. C_ktIs a characteristic of the sentence that has been output, generates the last potential state of the network for generating the caption of the previous sentence, so that the terminal 102 can take into account the spoken word when making the event selection, t being the time of this. The terminal 102 obtains the image frame h to be detected_tThen, the non-overlapping probability value can be obtained by the following formula:

wherein p is_tThe probability that the relevant and unique information exists in the segment, that is, the non-overlapping probability value, is a super-parameter, and when the non-overlapping probability value is greater than the whole threshold, for example, greater than the super-parameter, the terminal 102 determines that the segment is a segment that needs to generate a description, that is, an event to be stored as a target.

Through the embodiment, the terminal 102 can determine the non-overlapping target events to be stored in the video images of the events to be detected in an LSTM mode, so that the query efficiency during video source tracing is improved.

In one embodiment, generating event description information corresponding to a target event to be stored includes: acquiring a plurality of sub-events contained in a target event to be stored, and determining the sub-event description corresponding to the sub-event through a long-short term memory algorithm according to the visual characteristics of the video clip corresponding to the sub-event and the generated sub-event description aiming at each sub-event; and obtaining event description information corresponding to the target event to be stored according to the plurality of sub-event descriptions.

In this embodiment, the terminal 102 may generate each non-overlapping target to be storedStoring an event description of the event. The target event to be stored may include a plurality of sub-events, for example, each action of the user may constitute a sub-event. When generating the event description, the terminal 102 may obtain a plurality of sub-events included in the target event to be stored. For each sub-event, the terminal 102 may determine the sub-event description corresponding to the sub-event through the LSTM algorithm according to the visual characteristics of the video segment corresponding to the sub-event and the generated sub-event description. Since there are multiple sub-events, the terminal 102 may obtain event description information corresponding to the target event to be stored according to the sub-event descriptions of the multiple sub-events. Specifically, after the terminal 102 acquires a segment with an independent event through the LSTM, a single sentence description of the segment event can be obtained based on the LSTM, and the terminal 102 can splice all sentences output by the LSTM into a long paragraph to represent a description of the event content in the whole long video. The specific calculation formula can be as follows:

wherein

Indicating the l-th step in describing the k-th event, k may be the number of sub-events,

a visual characteristic representing a secondary region of the event.

Representing the word generated in the previous step.

Through the embodiment, the terminal 102 can generate the pushed event description information of the target event to be stored through the LSTM algorithm, and the description information of the whole target event to be stored is generated in a single sentence splicing description mode, so that the video source tracing can be performed based on the description information, and the query efficiency of the video source tracing is improved.

In one embodiment, obtaining event positioning information according to the event description information and the time information to be stored, and storing the event positioning information into a block corresponding to the time information to be stored to obtain a block chain, includes: and splicing the event description information and the time information to be stored to obtain corresponding coordinates, storing the event positioning information into a block corresponding to the time information to be stored in a coordinate mode to obtain a block chain, wherein the corresponding coordinates are used as event positioning information.

In this embodiment, after the terminal 102 obtains the event description information and the time information to be stored, the event description information and the time information to be stored may be spliced to obtain corresponding coordinates, which are used as the event positioning information, the terminal 102 may store the event positioning information in the form of the coordinates into a block corresponding to the time to be stored, since the event to be stored may include a plurality of events, the event positioning information generated by the terminal 102 may also be a plurality of events, and the terminal 102 may store each event positioning information into the block corresponding to the time information to be stored, so as to obtain a corresponding block chain. Specifically, the terminal 102 may generate a text description record w, a timestamp t, and other information to form a coordinate L (w, t), which is recorded by the blockchain, and the terminal 102 may further analyze the text and store the text information and the timestamp information as the identity authentication information, and when the user encounters the same condition again, the tracing may be performed through the blockchain.

Through the embodiment, the terminal 102 can store the event to be stored and the time information to be stored in the form of coordinates, so that the terminal 102 can query corresponding event positioning information through the description information of the event when performing video tracing, and the query efficiency of tracing the video is improved.

In one embodiment, as shown in fig. 4, fig. 4 is a flowchart illustrating a video source tracing method in another embodiment. Taking the video tracing mode applied to the bank monitoring video as an example, the method comprises the following procedures: the bank monitoring video is taken as unstructured data, the content and the characteristics of a video image are difficult to describe structurally, monitoring cameras are deployed in various places related to banks such as websites, data centers and vaults, the monitoring video works in real time, the terminal 102 can convert the video content, text conversion of the video content is realized by combining LSTM and RNN based on block chain timestamps, the converted text and the timestamps are combined to form coordinates, and the coordinates are stored on the block chain to realize source tracing. The method specifically can be a push-type event classification process, and comprises a push-type fusion block chain time stamp word sequence generation method based on a video phase and a block chain-oriented construction work safety traceability system. In the actual building process of the block chain, the terminal 102 may obtain each monitoring video in the bank through a camera, obtain time and key frame positioning and block chain interaction after positioning an event based on a video phase in a push type, select an independent event in the video based on the LSTM, and then splice characters with independent event information.

Specifically, the terminal 102 may perform push-type video-stage-based positioning of events for each monitoring video in the bank, align a video time structure through the RNN network and generate a text structure, and the terminal 102 may obtain a timestamp of an event in a manner of selecting a key frame and record the timestamp of the key frame into the block chain. The terminal 102 can also select non-overlapping video segments corresponding to independent events through the LSTM, and obtain single sentence descriptions of the segments through the LSTM, so that the terminal 102 can combine the generated text descriptions and the timestamp information into coordinates, record the coordinates into a block chain, and the user can trace the events by inputting the description information of the events.

In addition, in an embodiment, an application embodiment is provided, as shown in fig. 5, fig. 5 is a schematic flowchart of a video source tracing method in another embodiment. Taking the user to draw money from a website as an example, the interaction process between the terminal 102 and the user includes: the user uses the website money-drawing service, the terminal 102 positions and records action information from the moment the user enters a video picture, converts the action information into characters by using a push-type fusion block chain timestamp and a word sequence generation method based on a video stage, corresponds the time, and traces the source of the money-drawing process content on the block chain. Because each window or ATM machine tool can only allow one person to operate, the operation records corresponding to the user account can be corresponded. The method comprises the following specific steps: the user draws money through a bank account, the block chain records money drawing time, place and operation, one block of the node of the network point records the process, the terminal 102 of the network point compares the operation content according to the time in the day, the operation content of the time is displayed to the client, and the user can trace the source of the video recorded by the network point through the video tracing method. In addition, the terminal 102 may also perform maintenance on the bank data center and record the maintenance process. For example, for the daily maintenance and cleaning of a bank data center, and the replacement of hardware facilities and software operation, from the time when a worker enters a video picture to the time when the worker leaves the picture, the terminal 102 may be converted into characters by the video traceability method, store the key information according to the timestamp, compare the characters according to the daily provision requirements, and perform traceability verification based on the blockchain timestamp to ensure safety. In addition, the terminal 102 may also perform video storage and tracing in a bank vault or the like. For example, the terminal 102 may access installation monitoring places such as a vault of a bank vault, a vault guard room, a bill and coin processing place, an in-out transfer place, a main channel, and the like to the system and the alarm center control system, the terminal 102 may store characters formed in a push type video-based phase through a block chain node every day, secure storage of the characters in different time periods is ensured, and meanwhile, the terminal 102 retrieves traceability contents through coordinates in combination with character storage time.

Through the above embodiment, the terminal 102 queries each block in the block chain by using the event description information and the time information to obtain the event positioning information corresponding to the event description information, so that a video clip of an event to be queried can be obtained based on the event positioning information, and the video tracing efficiency is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a video tracing apparatus for implementing the video tracing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so specific limitations in one or more embodiments of the video tracing apparatus provided below can be referred to the limitations of the video tracing method in the foregoing, and details are not described herein again.

In one embodiment, as shown in fig. 6, there is provided a video tracing apparatus, including: a receiving module 500, a tracing module 502 and a query module 504, wherein:

the receiving module 500 is configured to receive a tracing request for a target event, and obtain target event description information and target time information carried in the tracing request.

A source tracing module 502, configured to query a block chain according to the target event description information and the target time information to obtain corresponding target event location information; the block chain comprises a plurality of blocks used for storing different event positioning information corresponding to different video clips; the event positioning information corresponding to the video clip comprises event description information and time information of the video clip; the video clip is obtained by classifying the source video according to events, the event description information represents the description of the set behaviors in the video clip, and the time information represents the time of the video clip in the video.

The query module 504 is configured to query the video database according to the target event positioning information to obtain a video segment corresponding to the target event.

In one embodiment, the above apparatus further comprises: the building module is used for obtaining a source video, classifying events of the source video to obtain a plurality of events to be stored and a plurality of time information to be stored, and building a plurality of blocks in a block chain according to the plurality of time information to be stored; acquiring target events to be stored, the event contents of which are not overlapped, from a plurality of events to be stored, and generating event description information corresponding to the target events to be stored; and obtaining event positioning information according to the event description information and the time information to be stored, and storing the event positioning information into a block corresponding to the time information to be stored to obtain a block chain.

In an embodiment, the building module is specifically configured to obtain a candidate video segment including a set behavior in a source video according to a classification behavior identifier, obtain an extended video segment of an event corresponding to the set behavior according to an integrity identifier, and obtain a video segment of the event to be detected and a timestamp corresponding to the video segment based on the candidate video segment and the extended video segment; aligning the video clip of the event to be detected and the timestamp; acquiring a behavior probability and a completion probability corresponding to the to-be-detected event video clip according to the aligned to-be-detected event video clip and the time stamp; the behavior probability represents whether the aligned video clip of the event to be detected is the event to be stored or not, and the completion degree probability represents whether the event to be stored in the aligned video clip of the event to be detected is complete or not; and determining the event to be stored and the time information to be stored containing the complete event to be stored according to the behavior probability and the completion degree probability.

In an embodiment, the building module is specifically configured to perform pyramid pooling on the timestamp, input the processed timestamp and the event video segment to be detected into a preset cyclic neural network, and align the event video segment to be detected and the timestamp through the preset cyclic neural network.

In an embodiment, the building module is specifically configured to determine, for each piece of to-be-stored time information, a duration corresponding to the to-be-stored time information according to a timestamp included in the to-be-stored time information, and generate a block having a size corresponding to the duration in a block chain to be built; and obtaining a block chain according to the plurality of blocks.

In an embodiment, the construction module is specifically configured to, for each frame of video image in a video segment corresponding to the event to be stored, acquire an image frame to be detected whose similarity with other frames of video images in the video segment is smaller than a preset similarity threshold; inputting the image frames to be detected into a preset probability function to obtain corresponding non-overlapping probability values, and determining the corresponding target image frames to be detected with non-overlapping event contents according to the comparison result of the non-overlapping probability values and preset hyper-parameters; and obtaining a target event to be stored according to the image frames to be detected of the targets.

In an embodiment, the building module is specifically configured to, if the frame of video image is a first frame in a video segment, obtain an image frame to be detected through a long-term and short-term memory algorithm according to the visual characteristics of the frame of video image, time information occupied by the frame of video image in the video segment, and generated description information; and if the frame of video image frame is not the first frame in the video clip, obtaining the frame to be detected through a long-short term memory algorithm according to the last frame of image frame to be detected corresponding to the frame of video image frame, the visual characteristics of the frame of video image, the time information occupied by the frame of image in the video clip and the generated description information.

In an embodiment, the building module is specifically configured to obtain a plurality of sub-events included in a target event to be stored, and determine, for each sub-event, a sub-event description corresponding to the sub-event through a long-short term memory algorithm according to a visual feature of a video segment corresponding to the sub-event and the generated sub-event description; and obtaining event description information corresponding to the target event to be stored according to the plurality of sub-event descriptions.

In an embodiment, the building module is specifically configured to splice the event description information and the time information to be stored to obtain corresponding coordinates, and store the event positioning information in a coordinate form into a block corresponding to the time information to be stored to obtain a block chain.

All or part of the modules in the video tracing device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a video tracing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the video tracing method when executing the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the video traceability method described above.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the video traceability method described above.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A video tracing method, characterized in that the method comprises:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the event classifying the source video to obtain a plurality of events to be stored and a plurality of time information to be stored comprises:

aligning the event video clip to be detected and the timestamp;

4. The method of claim 3, wherein said aligning said video segment of the event to be detected and said time stamp comprises:

5. The method of claim 3, wherein the constructing the plurality of blocks in the block chain according to the plurality of time information to be stored comprises:

and obtaining the block chain according to the plurality of blocks.

6. The method according to claim 2, wherein the obtaining target events to be stored, whose event contents do not overlap, from the plurality of events to be stored comprises:

inputting the image frames to be detected into a preset probability function to obtain corresponding non-overlapping probability values, and determining corresponding target image frames to be detected with non-overlapping event contents according to the comparison result of the non-overlapping probability values and preset hyper-parameters;

7. The method according to claim 6, wherein the acquiring the image frame to be detected whose similarity with the video images of other frames in the video segment is smaller than the preset similarity threshold value comprises:

8. The method according to claim 2, wherein the generating event description information corresponding to the target event to be stored includes:

9. The method according to claim 2, wherein the obtaining event location information according to the event description information and the time information to be stored, and storing the event location information into a block corresponding to the time information to be stored to obtain the block chain comprises:

10. A video tracing apparatus, said apparatus comprising:

the source tracing module is used for inquiring the block chain according to the target event description information and the target time information to obtain corresponding target event positioning information; the block chain comprises a plurality of blocks for storing different event positioning information corresponding to different video clips; the event positioning information corresponding to the video clip comprises event description information and time information of the video clip; the video clip is obtained based on event classification of a source video, the event description information represents description of set behaviors in the video clip, and the time information represents time of the video clip in the video;

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 9 when executed by a processor.