CN117911914A - Distributed intelligent video analysis system and method based on message queue - Google Patents

Distributed intelligent video analysis system and method based on message queue Download PDF

Info

Publication number
CN117911914A
CN117911914A CN202311618591.4A CN202311618591A CN117911914A CN 117911914 A CN117911914 A CN 117911914A CN 202311618591 A CN202311618591 A CN 202311618591A CN 117911914 A CN117911914 A CN 117911914A
Authority
CN
China
Prior art keywords
data
video frame
information
video stream
target video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311618591.4A
Other languages
Chinese (zh)
Inventor
贺江
兰雨晴
余丹
李易君
彭建强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Standard Intelligent Security Technology Co Ltd
Original Assignee
China Standard Intelligent Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Standard Intelligent Security Technology Co Ltd filed Critical China Standard Intelligent Security Technology Co Ltd
Priority to CN202311618591.4A priority Critical patent/CN117911914A/en
Publication of CN117911914A publication Critical patent/CN117911914A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a distributed intelligent video analysis system and a method based on a message queue, wherein the system comprises the following steps: the system comprises a stream pulling and frame drawing module, a message queue agent module and a message queue agent module, wherein the stream pulling and frame drawing module is used for reading a video stream to be processed, extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream, and pushing the target video frame to the message queue agent module; the message queue agent module is used for encapsulating the received target video frame into message data and writing the encapsulated message data into a message queue; the algorithm analysis module is used for reading the message data from the message queue and extracting a target video frame from the message data; and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data. The technical scheme provided by the invention can improve the efficiency of video analysis.

Description

Distributed intelligent video analysis system and method based on message queue
Technical Field
The invention relates to the technical field of data processing, in particular to a distributed intelligent video analysis system and method based on a message queue.
Background
At present, when analyzing video content, the streaming frame extraction and algorithm analysis of multiple paths of videos are usually performed on a single machine. However, this analysis method requires high performance of the machine and, once the machine fails, results in interruption of all video analysis.
If the task of video analysis is distributed to a plurality of machines for processing, the problem of synchronization of analysis progress on each machine cannot be solved. Thus, there is a need for a more efficient video analysis method.
Disclosure of Invention
The invention provides a distributed intelligent video analysis system and a method based on a message queue, which can improve the efficiency of video analysis.
In view of this, the present invention provides, in one aspect, a distributed intelligent video analysis system based on a message queue, the system including a pull stream frame extraction module, an algorithm analysis module, and a message queue agent module, wherein:
The pull stream frame extraction module is used for reading a video stream to be processed, extracting a target video frame matched with the configuration information from the video stream according to the configuration information of the video stream, and pushing the target video frame to the message queue agent module;
the message queue agent module is used for encapsulating the received target video frame into message data and writing the encapsulated message data into a message queue;
the algorithm analysis module is used for reading message data from the message queue and extracting a target video frame from the message data; and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.
In one embodiment, the pull stream frame extraction module is specifically configured to read header information and comment data from configuration information of the video stream, and generate a semantic keyword of the video stream according to the header information and the comment data; and identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.
In one embodiment, the pull-stream frame extraction module is specifically further configured to splice the header information and the comment data into corpus data, split the sentence into a plurality of phrases for any sentence in the corpus data, and generate word expression features of each phrase based on a BERT model; and inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.
In one embodiment, the algorithm analysis module is specifically configured to identify a target object in the target video frame, and crop a region image of the target object from the target video frame; the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.
Another aspect of the present invention provides a distributed intelligent video analysis method based on a message queue, the method comprising:
Reading a video stream to be processed, and extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream;
encapsulating the target video frame into message data, and writing the encapsulated message data into a message queue;
Reading message data from the message queue, and extracting a target video frame from the message data;
And generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.
In one embodiment, extracting, from the video stream, a target video frame matching the configuration information according to the configuration information of the video stream includes:
Reading title information and comment data from the configuration information of the video stream, and generating semantic keywords of the video stream according to the title information and the comment data;
And identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.
In one embodiment, generating the semantic keywords of the video stream includes:
Splicing the title information and the comment data into corpus data, splitting the sentence into a plurality of phrases aiming at any sentence in the corpus data, and generating word expression characteristics of each phrase based on a BERT model;
And inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.
In one embodiment, generating quality data for the target video frame includes:
identifying a target object in the target video frame, and cutting out a region image of the target object from the target video frame;
the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.
In one embodiment, if the title information and comment data cannot be read from the configuration information of the video stream, the step of generating the semantic keyword of the video stream includes:
Step one: when the title information and comment data cannot be read, voice extraction is carried out on the video stream information, voice in the video stream information is extracted into text information, the text information is set to be W, the total sentence number of the text W is set to be M, N m is the total word number of the M-th sentence, and the total word number in the text W is set to be:
Wherein N W is the total word number in the text W, M is the sentence number in the text W, which is an integer greater than or equal to 1 and less than or equal to M;
Step two: let the text information W have P non-repeated words in total, F p be the number of occurrences of the P-th non-repeated word in the text information W, where P is the number of the non-repeated word, which is an integer greater than or equal to 1 and less than or equal to P, and then the word frequency of the P-th non-repeated word is:
Wherein TP p is the word frequency of the p-th non-repeated word;
Step three: according to the calculation result of the second step, calculating the importance degree of the p-th non-repeated word, wherein the calculation formula is as follows:
Wherein K p is the importance of the p-th non-repeated word, and M p is the total number of sentences containing the p-th non-repeated word;
and according to the importance degree reverse arrangement of the non-repeated words, obtaining the words with the specified quantity, which are ranked at the front, as semantic keywords.
According to the technical scheme provided by the invention, the pull stream frame extraction and the algorithm analysis are decoupled, and the video frames to be processed are associated through the message queue, so that the pull stream frame extraction and the algorithm analysis can be distributed on different machines without strong dependency relationship. Meanwhile, the two modules can be associated through message queuing, so that the unified progress of independent processing steps on different machines can be ensured, the problem that video frames are omitted or repeatedly processed is solved, and the analysis efficiency of video is greatly improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of functional modules of a distributed intelligent video analysis system based on a message queue according to an embodiment of the present invention;
Fig. 2 is a schematic diagram of steps of a distributed intelligent video analysis method based on a message queue according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Referring to fig. 1, one embodiment of the present application provides a distributed intelligent video analysis system based on a message queue, the system includes a pull stream frame extraction module, an algorithm analysis module, and a message queue agent module, wherein:
The pull stream frame extraction module is used for reading a video stream to be processed, extracting a target video frame matched with the configuration information from the video stream according to the configuration information of the video stream, and pushing the target video frame to the message queue agent module;
the message queue agent module is used for encapsulating the received target video frame into message data and writing the encapsulated message data into a message queue;
the algorithm analysis module is used for reading message data from the message queue and extracting a target video frame from the message data; and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.
In one embodiment, the pull stream frame extraction module is specifically configured to read header information and comment data from configuration information of the video stream, and generate a semantic keyword of the video stream according to the header information and the comment data; and identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.
The foreground semantic information of each video frame may be semantic information of main content focused by the video frame. For example, two persons shown in a video frame are in a conversation, so that conversation shots of the two persons belong to foreground contents, and main semantic information of the video frame can be determined by analyzing the semantics of the conversation shots. If the similarity between the foreground semantic information and the semantic keywords is higher than a certain threshold, the foreground semantic information and the semantic keywords are considered to be matched, and the nonsensical video frames in the video stream can be screened out by the processing mode, and the video frames capable of representing the main content of the video are left, so that the data volume of subsequent video analysis is reduced.
In one embodiment, the pull-stream frame extraction module is specifically further configured to splice the header information and the comment data into corpus data, split the sentence into a plurality of phrases for any sentence in the corpus data, and generate word expression features of each phrase based on a BERT model; and inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.
In one embodiment, the algorithm analysis module is specifically configured to identify a target object in the target video frame, and crop a region image of the target object from the target video frame; the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.
The quality evaluation model can be obtained through training of training samples in the early stage, the training samples can be image samples with different quality, and each image sample can be marked with a corresponding quality score. Thus, the quality score is used as a label, and the training process of the quality evaluation model can be completed.
Referring to fig. 2, another aspect of the present invention provides a distributed intelligent video analysis method based on a message queue, the method including:
S1: reading a video stream to be processed, and extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream;
s2: encapsulating the target video frame into message data, and writing the encapsulated message data into a message queue;
s3: reading message data from the message queue, and extracting a target video frame from the message data;
s4: and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.
In one embodiment, extracting, from the video stream, a target video frame matching the configuration information according to the configuration information of the video stream includes:
Reading title information and comment data from the configuration information of the video stream, and generating semantic keywords of the video stream according to the title information and the comment data;
And identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.
Since the header information and comment data, for example, the video has no comment data or is just online, may not be read from the configuration information of the video stream, in this case, the semantic keyword of the video stream may not be generated according to the header information and the comment data. In order to solve the problem that the semantic keywords of the video stream can be generated even if the title information and comment data are not read, the following algorithm is adopted:
Step one: when the title information and comment data cannot be read, voice extraction is carried out on the video stream information, voice in the video stream information is extracted into text information, the text information is set to be W, the total sentence number of the text W is set to be M, N m is the total word number of the M-th sentence, and the total word number in the text W is set to be:
Where N W is the total number of words in the text W, M is the sentence number in the text W, which is an integer greater than or equal to 1 and less than or equal to M, note that the words herein refer to words other than stop words, for example, stop words such as "yes" and "yes" are not within the word range.
Step two: let the text information W have P non-repeated words in total, F p be the number of occurrences of the P-th non-repeated word in the text information W, where P is the number of the non-repeated word, which is an integer greater than or equal to 1 and less than or equal to P, and then the word frequency of the P-th non-repeated word is:
Where TP p is the word frequency of the p-th non-repeating word.
Step three: according to the calculation result of the second step, calculating the importance degree of the p-th non-repeated word, wherein the calculation formula is as follows:
Where K p is the importance of the p-th non-repeated word and M p is the total number of sentences containing the p-th non-repeated word.
According to the importance degree reverse arrangement of the non-repeated words, the words with the top ranking are obtained to be semantic keywords, and meanwhile, a certain number of words can be taken as semantic keywords according to the needs, for example, the words with the top three or the top five are taken as semantic keywords. The method generates semantic keywords according to text information generated by voice information of the video, solves the problem that the semantic keywords of the video stream can be generated under the condition that the title information and comment data are not read, provides effective data for subsequent video analysis, and improves compatibility and usability of the system.
In one embodiment, generating the semantic keywords of the video stream includes:
Splicing the title information and the comment data into corpus data, splitting the sentence into a plurality of phrases aiming at any sentence in the corpus data, and generating word expression characteristics of each phrase based on a BERT model;
And inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.
In one embodiment, generating quality data for the target video frame includes:
identifying a target object in the target video frame, and cutting out a region image of the target object from the target video frame;
the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.
According to the technical scheme provided by the invention, the pull stream frame extraction and the algorithm analysis are decoupled, and the video frames to be processed are associated through the message queue, so that the pull stream frame extraction and the algorithm analysis can be distributed on different machines without strong dependency relationship. Meanwhile, the two modules can be associated through message queuing, so that the unified progress of independent processing steps on different machines can be ensured, the problem that video frames are omitted or repeatedly processed is solved, and the analysis efficiency of video is greatly improved.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (9)

1. The distributed intelligent video analysis system based on the message queue is characterized by comprising a pull stream frame drawing module, an algorithm analysis module and a message queue agent module, wherein:
The pull stream frame extraction module is used for reading a video stream to be processed, extracting a target video frame matched with the configuration information from the video stream according to the configuration information of the video stream, and pushing the target video frame to the message queue agent module;
the message queue agent module is used for encapsulating the received target video frame into message data and writing the encapsulated message data into a message queue;
the algorithm analysis module is used for reading message data from the message queue and extracting a target video frame from the message data; and generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.
2. The system of claim 1, wherein the pull-stream frame extraction module is specifically configured to read header information and comment data from configuration information of the video stream, and generate semantic keywords of the video stream according to the header information and the comment data; and identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.
3. The system of claim 2, wherein the pull-stream frame extraction module is further specifically configured to splice the header information and the comment data into corpus data, split the sentence into a plurality of phrases for any sentence in the corpus data, and generate word expression features of each phrase based on a BERT model; and inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.
4. The system of claim 1, wherein the algorithm analysis module is specifically configured to identify a target object in the target video frame and crop a region image of the target object from the target video frame; the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.
5. A distributed intelligent video analysis method based on a message queue, the method comprising:
Reading a video stream to be processed, and extracting a target video frame matched with configuration information from the video stream according to the configuration information of the video stream;
encapsulating the target video frame into message data, and writing the encapsulated message data into a message queue;
Reading message data from the message queue, and extracting a target video frame from the message data;
And generating quality data of the target video frame, and outputting evaluation information for the video stream to be processed based on the quality data.
6. The method of claim 5, wherein extracting, from the video stream, a target video frame that matches the configuration information based on the configuration information of the video stream comprises:
Reading title information and comment data from the configuration information of the video stream, and generating semantic keywords of the video stream according to the title information and the comment data;
And identifying foreground semantic information of each video frame in the video stream, and taking the video frame with the foreground semantic information matched with the semantic keyword as the extracted target video frame.
7. The method of claim 6, wherein generating semantic keywords for the video stream comprises:
Splicing the title information and the comment data into corpus data, splitting the sentence into a plurality of phrases aiming at any sentence in the corpus data, and generating word expression characteristics of each phrase based on a BERT model;
And inputting each word expression characteristic into a full connection layer so as to output semantic keywords of the sentence through the full connection layer.
8. The method of claim 5, wherein generating quality data for the target video frame comprises:
identifying a target object in the target video frame, and cutting out a region image of the target object from the target video frame;
the region image is input into a quality evaluation model to output quality data of the region image through the quality evaluation model, and the quality data of the region image is determined as the quality data of the target video frame.
9. The method of claim 6, wherein the step of generating semantic keywords for the video stream if header information and comment data cannot be read from configuration information of the video stream comprises:
Step one: when the title information and comment data cannot be read, voice extraction is carried out on the video stream information, voice in the video stream information is extracted into text information, the text information is set to be W, the total sentence number of the text W is set to be M, N m is the total word number of the M-th sentence, and the total word number in the text W is set to be:
Wherein N W is the total word number in the text W, M is the sentence number in the text W, which is an integer greater than or equal to 1 and less than or equal to M;
Step two: let the text information W have P non-repeated words in total, F p be the number of occurrences of the P-th non-repeated word in the text information W, where P is the number of the non-repeated word, which is an integer greater than or equal to 1 and less than or equal to P, and then the word frequency of the P-th non-repeated word is:
Wherein TP p is the word frequency of the p-th non-repeated word;
Step three: according to the calculation result of the second step, calculating the importance degree of the p-th non-repeated word, wherein the calculation formula is as follows:
Wherein K p is the importance of the p-th non-repeated word, and M p is the total number of sentences containing the p-th non-repeated word;
and according to the importance degree reverse arrangement of the non-repeated words, obtaining the words with the specified quantity, which are ranked at the front, as semantic keywords.
CN202311618591.4A 2023-11-30 2023-11-30 Distributed intelligent video analysis system and method based on message queue Pending CN117911914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311618591.4A CN117911914A (en) 2023-11-30 2023-11-30 Distributed intelligent video analysis system and method based on message queue

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311618591.4A CN117911914A (en) 2023-11-30 2023-11-30 Distributed intelligent video analysis system and method based on message queue

Publications (1)

Publication Number Publication Date
CN117911914A true CN117911914A (en) 2024-04-19

Family

ID=90682840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311618591.4A Pending CN117911914A (en) 2023-11-30 2023-11-30 Distributed intelligent video analysis system and method based on message queue

Country Status (1)

Country Link
CN (1) CN117911914A (en)

Similar Documents

Publication Publication Date Title
KR102455616B1 (en) Theme classification method based on multimodality, device, apparatus, and storage medium
Jin et al. A novel lexicalized HMM-based learning framework for web opinion mining
US8775174B2 (en) Method for indexing multimedia information
US7475007B2 (en) Expression extraction device, expression extraction method, and recording medium
US20030187632A1 (en) Multimedia conferencing system
US8126897B2 (en) Unified inverted index for video passage retrieval
CN110175246B (en) Method for extracting concept words from video subtitles
JP2004516754A (en) Program classification method and apparatus using cues observed in transcript information
JP2003085190A (en) Method and system for segmenting and discriminating event in image using voice comment
CN112733654B (en) Method and device for splitting video
CN111276149B (en) Voice recognition method, device, equipment and readable storage medium
Dufour et al. Characterizing and detecting spontaneous speech: Application to speaker role recognition
WO2023124647A1 (en) Summary determination method and related device thereof
CN110008313A (en) A kind of unsupervised text snippet method of extraction-type
CN114598933B (en) Video content processing method, system, terminal and storage medium
González et al. Siamese hierarchical attention networks for extractive summarization
CN113158667B (en) Event detection method based on entity relationship level attention mechanism
CN109800326B (en) Video processing method, device, equipment and storage medium
CN110287799B (en) Video UCL semantic indexing method and device based on deep learning
CN117216214A (en) Question and answer extraction generation method, device, equipment and medium
US20200081923A1 (en) Data search method and data search system thereof
CN117911914A (en) Distributed intelligent video analysis system and method based on message queue
Mirza et al. Alignarr: Aligning narratives on movies
CN114662002A (en) Object recommendation method, medium, device and computing equipment
Bhatti et al. LSTM-based Siamese neural network for Urdu news story segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination