CN117911914A

CN117911914A - Distributed intelligent video analysis system and method based on message queue

Info

Publication number: CN117911914A
Application number: CN202311618591.4A
Authority: CN
Inventors: 贺江; 兰雨晴; 余丹; 李易君; 彭建强
Original assignee: China Standard Intelligent Security Technology Co Ltd
Current assignee: China Standard Intelligent Security Technology Co Ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-04-19

Abstract

The invention provides a distributed intelligent video analysis system and a method based on a message queue, wherein the system comprises the following steps: the system comprises a stream pulling and frame drawing module, a message queue agent module and a message queue agent module, wherein the stream pulling and frame drawing module is used for reading a video stream to be processed, extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream, and pushing the target video frame to the message queue agent module; the message queue agent module is used for encapsulating the received target video frame into message data and writing the encapsulated message data into a message queue; the algorithm analysis module is used for reading the message data from the message queue and extracting a target video frame from the message data; and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data. The technical scheme provided by the invention can improve the efficiency of video analysis.

Description

Distributed intelligent video analysis system and method based on message queue

Technical Field

The invention relates to the technical field of data processing, in particular to a distributed intelligent video analysis system and method based on a message queue.

Background

At present, when analyzing video content, the streaming frame extraction and algorithm analysis of multiple paths of videos are usually performed on a single machine. However, this analysis method requires high performance of the machine and, once the machine fails, results in interruption of all video analysis.

If the task of video analysis is distributed to a plurality of machines for processing, the problem of synchronization of analysis progress on each machine cannot be solved. Thus, there is a need for a more efficient video analysis method.

Disclosure of Invention

The invention provides a distributed intelligent video analysis system and a method based on a message queue, which can improve the efficiency of video analysis.

In view of this, the present invention provides, in one aspect, a distributed intelligent video analysis system based on a message queue, the system including a pull stream frame extraction module, an algorithm analysis module, and a message queue agent module, wherein:

The pull stream frame extraction module is used for reading a video stream to be processed, extracting a target video frame matched with the configuration information from the video stream according to the configuration information of the video stream, and pushing the target video frame to the message queue agent module;

the message queue agent module is used for encapsulating the received target video frame into message data and writing the encapsulated message data into a message queue;

the algorithm analysis module is used for reading message data from the message queue and extracting a target video frame from the message data; and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.

In one embodiment, the pull stream frame extraction module is specifically configured to read header information and comment data from configuration information of the video stream, and generate a semantic keyword of the video stream according to the header information and the comment data; and identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.

In one embodiment, the pull-stream frame extraction module is specifically further configured to splice the header information and the comment data into corpus data, split the sentence into a plurality of phrases for any sentence in the corpus data, and generate word expression features of each phrase based on a BERT model; and inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.

In one embodiment, the algorithm analysis module is specifically configured to identify a target object in the target video frame, and crop a region image of the target object from the target video frame; the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.

Another aspect of the present invention provides a distributed intelligent video analysis method based on a message queue, the method comprising:

Reading a video stream to be processed, and extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream;

encapsulating the target video frame into message data, and writing the encapsulated message data into a message queue;

Reading message data from the message queue, and extracting a target video frame from the message data;

And generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.

In one embodiment, extracting, from the video stream, a target video frame matching the configuration information according to the configuration information of the video stream includes:

Reading title information and comment data from the configuration information of the video stream, and generating semantic keywords of the video stream according to the title information and the comment data;

And identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.

In one embodiment, generating the semantic keywords of the video stream includes:

Splicing the title information and the comment data into corpus data, splitting the sentence into a plurality of phrases aiming at any sentence in the corpus data, and generating word expression characteristics of each phrase based on a BERT model;

And inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.

In one embodiment, generating quality data for the target video frame includes:

identifying a target object in the target video frame, and cutting out a region image of the target object from the target video frame;

the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.

In one embodiment, if the title information and comment data cannot be read from the configuration information of the video stream, the step of generating the semantic keyword of the video stream includes:

Step one: when the title information and comment data cannot be read, voice extraction is carried out on the video stream information, voice in the video stream information is extracted into text information, the text information is set to be W, the total sentence number of the text W is set to be M, N _m is the total word number of the M-th sentence, and the total word number in the text W is set to be:

Wherein N _W is the total word number in the text W, M is the sentence number in the text W, which is an integer greater than or equal to 1 and less than or equal to M;

Step two: let the text information W have P non-repeated words in total, F _p be the number of occurrences of the P-th non-repeated word in the text information W, where P is the number of the non-repeated word, which is an integer greater than or equal to 1 and less than or equal to P, and then the word frequency of the P-th non-repeated word is:

Wherein TP _p is the word frequency of the p-th non-repeated word;

Step three: according to the calculation result of the second step, calculating the importance degree of the p-th non-repeated word, wherein the calculation formula is as follows:

Wherein K _p is the importance of the p-th non-repeated word, and M _p is the total number of sentences containing the p-th non-repeated word;

and according to the importance degree reverse arrangement of the non-repeated words, obtaining the words with the specified quantity, which are ranked at the front, as semantic keywords.

According to the technical scheme provided by the invention, the pull stream frame extraction and the algorithm analysis are decoupled, and the video frames to be processed are associated through the message queue, so that the pull stream frame extraction and the algorithm analysis can be distributed on different machines without strong dependency relationship. Meanwhile, the two modules can be associated through message queuing, so that the unified progress of independent processing steps on different machines can be ensured, the problem that video frames are omitted or repeatedly processed is solved, and the analysis efficiency of video is greatly improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of functional modules of a distributed intelligent video analysis system based on a message queue according to an embodiment of the present invention;

Fig. 2 is a schematic diagram of steps of a distributed intelligent video analysis method based on a message queue according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

Referring to fig. 1, one embodiment of the present application provides a distributed intelligent video analysis system based on a message queue, the system includes a pull stream frame extraction module, an algorithm analysis module, and a message queue agent module, wherein:

The foreground semantic information of each video frame may be semantic information of main content focused by the video frame. For example, two persons shown in a video frame are in a conversation, so that conversation shots of the two persons belong to foreground contents, and main semantic information of the video frame can be determined by analyzing the semantics of the conversation shots. If the similarity between the foreground semantic information and the semantic keywords is higher than a certain threshold, the foreground semantic information and the semantic keywords are considered to be matched, and the nonsensical video frames in the video stream can be screened out by the processing mode, and the video frames capable of representing the main content of the video are left, so that the data volume of subsequent video analysis is reduced.

The quality evaluation model can be obtained through training of training samples in the early stage, the training samples can be image samples with different quality, and each image sample can be marked with a corresponding quality score. Thus, the quality score is used as a label, and the training process of the quality evaluation model can be completed.

Referring to fig. 2, another aspect of the present invention provides a distributed intelligent video analysis method based on a message queue, the method including:

S1: reading a video stream to be processed, and extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream;

s2: encapsulating the target video frame into message data, and writing the encapsulated message data into a message queue;

s3: reading message data from the message queue, and extracting a target video frame from the message data;

s4: and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.

Since the header information and comment data, for example, the video has no comment data or is just online, may not be read from the configuration information of the video stream, in this case, the semantic keyword of the video stream may not be generated according to the header information and the comment data. In order to solve the problem that the semantic keywords of the video stream can be generated even if the title information and comment data are not read, the following algorithm is adopted:

Where N _W is the total number of words in the text W, M is the sentence number in the text W, which is an integer greater than or equal to 1 and less than or equal to M, note that the words herein refer to words other than stop words, for example, stop words such as "yes" and "yes" are not within the word range.

Where TP _p is the word frequency of the p-th non-repeating word.

Where K _p is the importance of the p-th non-repeated word and M _p is the total number of sentences containing the p-th non-repeated word.

According to the importance degree reverse arrangement of the non-repeated words, the words with the top ranking are obtained to be semantic keywords, and meanwhile, a certain number of words can be taken as semantic keywords according to the needs, for example, the words with the top three or the top five are taken as semantic keywords. The method generates semantic keywords according to text information generated by voice information of the video, solves the problem that the semantic keywords of the video stream can be generated under the condition that the title information and comment data are not read, provides effective data for subsequent video analysis, and improves compatibility and usability of the system.

In one embodiment, generating quality data for the target video frame includes:

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The distributed intelligent video analysis system based on the message queue is characterized by comprising a pull stream frame drawing module, an algorithm analysis module and a message queue agent module, wherein:

2. The system of claim 1, wherein the pull-stream frame extraction module is specifically configured to read header information and comment data from configuration information of the video stream, and generate semantic keywords of the video stream according to the header information and the comment data; and identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.

3. The system of claim 2, wherein the pull-stream frame extraction module is further specifically configured to splice the header information and the comment data into corpus data, split the sentence into a plurality of phrases for any sentence in the corpus data, and generate word expression features of each phrase based on a BERT model; and inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.

4. The system of claim 1, wherein the algorithm analysis module is specifically configured to identify a target object in the target video frame and crop a region image of the target object from the target video frame; the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.

5. A distributed intelligent video analysis method based on a message queue, the method comprising:

6. The method of claim 5, wherein extracting, from the video stream, a target video frame that matches the configuration information based on the configuration information of the video stream comprises:

7. The method of claim 6, wherein generating semantic keywords for the video stream comprises:

8. The method of claim 5, wherein generating quality data for the target video frame comprises:

9. The method of claim 6, wherein the step of generating semantic keywords for the video stream if header information and comment data cannot be read from configuration information of the video stream comprises:

Wherein TP _p is the word frequency of the p-th non-repeated word;