CN116524417B - Method and device for extracting distributed real-time video key frames based on Flink - Google Patents

Method and device for extracting distributed real-time video key frames based on Flink Download PDF

Info

Publication number
CN116524417B
CN116524417B CN202310789876.8A CN202310789876A CN116524417B CN 116524417 B CN116524417 B CN 116524417B CN 202310789876 A CN202310789876 A CN 202310789876A CN 116524417 B CN116524417 B CN 116524417B
Authority
CN
China
Prior art keywords
pixel
key
identifier
frame
custom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310789876.8A
Other languages
Chinese (zh)
Other versions
CN116524417A (en
Inventor
覃克春
王子立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen SDMC Technology Co Ltd
Original Assignee
Shenzhen SDMC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen SDMC Technology Co Ltd filed Critical Shenzhen SDMC Technology Co Ltd
Priority to CN202310789876.8A priority Critical patent/CN116524417B/en
Publication of CN116524417A publication Critical patent/CN116524417A/en
Application granted granted Critical
Publication of CN116524417B publication Critical patent/CN116524417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/44Secrecy systems
    • H04N1/448Rendering the image unintelligible, e.g. scrambling
    • H04N1/4486Rendering the image unintelligible, e.g. scrambling using digital data encryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a method and a device for extracting a distributed real-time video key frame based on a link, wherein the method comprises the following steps: extracting key frames of the target video read in real time; performing key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, performing MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through an MQ cluster, and sending the obtained MQ messages to corresponding MQ topics; the method comprises the steps that a message object in a corresponding MQ is obtained through monitoring processing of an MQ topic by a Flink cluster, corresponding pixel information processing is sequentially conducted by using a plurality of Flink custom pixel operators, key packaging combination is conducted on detected key frames, the key packaging combination is sent to an MQ output topic, and accordingly corresponding key frames are generated by corresponding MQ consumers based on pixel information and frame information in the packaged key; and obtaining the key frames generated by the MQ consumers.

Description

Method and device for extracting distributed real-time video key frames based on Flink
Technical Field
The application relates to the technical field of videos, in particular to a method and a device for extracting distributed real-time video key frames based on a link.
Background
High resolution video data is continually being generated in a wide range of applications, from monitoring, movies, to streaming media platforms for social media and entertainment or news content. This creates the need to process video in real time, namely: as fast as video generation. Extracting meaningful descriptions from video is the first step in understanding content and indexing and searching based on the content. The first step in this process is to detect successive frames that show successive actions in time and space, called shots. Video key frames are the process of partitioning a video into meaningful sequences of frames according to similarity. In the field of video content applications such as robotics, surveillance, video conferencing, sports, and the Web, capturing, making, or distributing video content is a fundamental problem. It plays an important role in video production, and can improve viewing experience and profit from the content (e.g., find the place to insert advertisements, create promotional video using representative shots, and create video summaries).
The existing extraction method of the distributed video key frames mainly has the following problems:
problem 1: compressed (e.g., MPEG-4) video is often more complex to process, especially when it is not first converted to the original format. Compressed video computation is not accurate enough, processing the original video can provide more accurate content results, but requires a large amount of memory space and a large amount of bandwidth. The size of a single frame of original video can reach over 6 megabytes (for typical 1920x1080 resolution color video). Frames of this size are too large to be processed without splitting them into smaller parts. And in video production, it is necessary to process a plurality of videos simultaneously in real time.
Problem 2: based on coarse-granularity and fine-granularity parallelism (i.e., distributing video input from multiple sources to multiple topics using MQs across multiple machines and multiple CPU or GPU cores), video frames are split into blocks for finer parallel processing. But to avoid communication overhead all parts of the same frame are sent to the same processing server. Thus, some servers may still be idle, relying on custom implementations of hardware and software. Thus, this solution is less principle and difficult to replicate.
How to solve the problems of complex video processing process and high communication overhead in the above-mentioned extraction method of real-time video key frames, find a novel extraction method of distributed real-time video key frames to extract the distributed real-time, multi-resolution and multi-video key frames is a technical problem to be solved.
Disclosure of Invention
Based on this, it is necessary to provide a method, an apparatus, a storage medium, an electronic device and a computer program product for extracting distributed real-time video key frames based on a link, aiming at the defects of complex video processing process and high communication overhead existing in the existing method for extracting real-time video key frames.
In a first aspect, an embodiment of the present application provides a method for extracting a distributed real-time video keyframe based on a link, where the method includes:
extracting video frames of the target video read in real time to divide each frame into corresponding pixel blocks;
performing key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, performing MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through an MQ cluster, and sending the obtained MQ messages to corresponding MQ topics; the MQ theme is monitored through the Flink cluster to acquire a message object in the corresponding MQ, a plurality of Flink custom pixel operators are used for sequentially processing corresponding pixel information, key packaging combination is carried out on the detected key frames and the key packaging combination is sent to the MQ output theme, so that corresponding key frames are generated by corresponding MQ consumers based on the pixel information and the frame information in the packaged keys;
the key frames generated by the MQ consumers are obtained.
Preferably, after the key identification processing is performed on the pixel block corresponding to each frame to obtain the corresponding key, the method further includes:
reading a key corresponding to a pixel block corresponding to each frame;
Each key read comprises a first identifier, a second identifier, an identifier for identifying a corresponding resolution, an identifier for identifying a corresponding total frame number, an identifier for identifying a corresponding data block size, an identifier for identifying a corresponding pixel data code, a corresponding video frame identifier and a corresponding pixel block identifier in sequence, wherein the first identifier is used for identifying the time of reading the corresponding video, the second identifier is a random integer generated by a snowflake model, and the first identifier and the second identifier form a unique identifier of the corresponding video.
Preferably, the plurality of flank custom pixel operators include a first flank custom pixel operator, a second flank custom pixel operator, a third flank custom pixel operator, a fourth flank custom pixel operator and a fifth flank custom pixel operator, and the sequentially performing corresponding pixel information processing by using the plurality of flank custom pixel operators includes: gray data conversion processing is carried out through the first Flink custom pixel operator;
receiving a block with gray image data through the second flank custom pixel operator;
Sequentially adding all partial histograms corresponding to a frame through the third Flink custom pixel operator to generate a total histogram of the corresponding frame;
calculating the difference value of two adjacent frames of the same video through the fourth Flink custom pixel operator;
and performing lens detection processing through a fifth Flink custom pixel operator to obtain a corresponding detection result.
Preferably, the performing lens detection processing by using the fifth flank custom pixel operator includes:
and receiving the histogram difference between any adjacent frames through a shot detection operator in response to the detection result being greater than a preset difference threshold, and outputting corresponding shot change data to a corresponding MQ output theme, wherein the shot change data comprises data related to shot cutting and data related to gradual fade-in.
In a second aspect, an embodiment of the present application provides a device for extracting a distributed real-time video keyframe based on a link, where the device includes:
the extraction module is used for extracting video frames of the target video read in real time so as to divide each frame into corresponding pixel blocks;
the processing module is used for carrying out key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, carrying out MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through the MQ cluster, and sending the obtained MQ messages to the corresponding MQ topics; the MQ theme is monitored through the Flink cluster to acquire a message object in the corresponding MQ, a plurality of Flink custom pixel operators are used for sequentially processing corresponding pixel information, key packaging combination is carried out on the detected key frames and the key packaging combination is sent to the MQ output theme, so that corresponding key frames are generated by corresponding MQ consumers based on the pixel information and the frame information in the packaged keys;
And the acquisition module is used for acquiring the key frames generated by the MQ consumers.
Preferably, the apparatus further comprises:
the reading module is used for reading the key corresponding to the pixel block corresponding to each frame after the key identification processing is carried out on the pixel block corresponding to each frame to obtain the corresponding key;
each key read by the reading module sequentially comprises a first identifier, a second identifier, an identifier for identifying the corresponding resolution, an identifier for identifying the corresponding total frame number, an identifier for identifying the corresponding data block size, an identifier for identifying the corresponding pixel data code, a corresponding video frame identifier and a corresponding pixel block identifier, wherein the first identifier is used for identifying the time of reading the corresponding video, the second identifier is a random integer generated by a snowflake model, and the first identifier and the second identifier form a unique identifier of the corresponding video.
Preferably, the plurality of flank custom pixel operators include a first flank custom pixel operator, a second flank custom pixel operator, a third flank custom pixel operator, a fourth flank custom pixel operator, and a fifth flank custom pixel operator, and the processing module includes:
The first processing submodule is used for carrying out gray data conversion processing through the first Flink custom pixel operator;
the second processing submodule is used for receiving the blocks with the gray image data through the second Flink custom pixel operators;
the third processing sub-module is used for sequentially adding all partial histograms corresponding to one frame through the third Flink custom pixel operator to generate a total histogram of the corresponding frame;
a fourth processing sub-module, configured to calculate, by using the fourth flank custom pixel operator, a difference value between two adjacent frames of the same video;
and the fifth processing sub-module is used for performing lens detection processing through a fifth Flink self-defined pixel operator to obtain a corresponding detection result.
Preferably, the fifth processing submodule is specifically configured to:
and receiving the histogram difference between any adjacent frames through a shot detection operator in response to the detection result being greater than a preset difference threshold, and outputting corresponding shot change data to a corresponding MQ output theme, wherein the shot change data comprises data related to shot cutting and data related to gradual fade-in.
In a third aspect, embodiments of the present application provide a computer readable storage medium storing a computer program for performing the above-described method steps.
In a fourth aspect, an embodiment of the present application provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the method steps described above.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the above-mentioned method steps.
In the embodiment of the application, the key frames of the target video read in real time are extracted to divide each frame into corresponding pixel blocks; performing key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, performing MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through an MQ cluster, and sending the obtained MQ messages to corresponding MQ topics; and monitoring the MQ theme through the Flink cluster to acquire a message object in the corresponding MQ, sequentially processing corresponding pixel information by using a plurality of Flink custom pixel operators, performing key encapsulation and combination on the detected key frames, and sending the key encapsulation and combination to the MQ output theme, so that corresponding MQ consumers generate corresponding key frames based on the pixel information and the frame information in the encapsulated keys. According to the method for extracting the distributed real-time video key frame based on the Flink, provided by the embodiment of the application, the MQ cluster is introduced to encapsulate the selected key and the corresponding pixel block with the MQ message through the MQ cluster, and the obtained MQ message is sent to the corresponding MQ theme; and introducing a Flink cluster, performing monitoring processing on the MQ theme through the Flink cluster to acquire a message object in the corresponding MQ, sequentially performing corresponding pixel information processing by using a plurality of Flink custom pixel operators, performing key encapsulation and combination on the detected key frames, and sending the key encapsulation and combination to the MQ output theme, so that corresponding MQ consumers generate corresponding key frames based on the pixel information and the frame information in the encapsulated key, thereby simplifying the flow of the extraction method of the real-time video key frames and effectively reducing communication overhead.
Drawings
Exemplary embodiments of the present application may be more fully understood by reference to the following drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a flowchart of a method for extracting a distributed real-time video key frame based on a Flink according to an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram of a key corresponding to a pixel block corresponding to each frame read in a specific application scenario;
fig. 3 is a schematic structural diagram of a device 300 for extracting a link-based distributed real-time video key frame according to an exemplary embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
In addition, the terms "first" and "second" etc. are used to distinguish different objects and are not used to describe a particular order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a method and a device for extracting a distributed real-time video key frame based on a link, an electronic device and a computer readable medium, and the method and the device are described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for extracting a link-based distributed real-time video key frame according to some embodiments of the present application is shown, where the method for extracting a link-based distributed real-time video key frame is applied to a client in a client cluster, as shown in fig. 1, and may include the following steps:
Step S101: and extracting key frames of the target video read in real time to divide each frame into corresponding pixel blocks.
In an actual application scenario, the process of extracting a key frame of a target video to divide each frame into corresponding pixel blocks is specifically as follows:
after receiving the target video, the client reads the video and extracts its video frames, then divides each frame into smaller blocks of pixels, receives a different video in the input, and each block of pixels is one line in size, there will be 1440 blocks of 7.5KB each for a 2560x1440 resolution color frame (24 bits per pixel, 8 bits per color channel). For a gray frame, the size of a block is exactly one third of the previous value (i.e., 2.5 KB).
Step S102: performing key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, performing MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through an MQ cluster, and sending the obtained MQ messages to corresponding MQ topics; and monitoring the MQ theme through the Flink cluster to acquire a message object in the corresponding MQ, sequentially processing corresponding pixel information by using a plurality of Flink custom pixel operators, performing key encapsulation and combination on the detected key frames, and sending the key encapsulation and combination to the MQ output theme, so that corresponding MQ consumers generate corresponding key frames based on the pixel information and the frame information in the encapsulated keys.
In a possible implementation manner, after performing key identification processing on a pixel block corresponding to each frame to obtain a corresponding key, the method for extracting a distributed real-time video key frame based on the link provided by the embodiment of the application further includes the following steps:
reading a key corresponding to a pixel block corresponding to each frame;
each key read comprises a first identifier, a second identifier, an identifier for identifying a corresponding resolution, an identifier for identifying a corresponding total frame number, an identifier for identifying a corresponding data block size, an identifier for identifying a corresponding pixel data code, a corresponding video frame identifier and a corresponding pixel block identifier in sequence, wherein the first identifier is used for identifying the time of reading the corresponding video, the second identifier is a random integer generated by a snowflake model, and the first identifier and the second identifier form a unique identifier of the corresponding video.
Fig. 2 is a schematic diagram of a key corresponding to a pixel block corresponding to each frame read in a specific application scenario.
The client pixel block performs a key identification process such that each pixel is generated from a key and assigned a key. As shown in fig. 2, the keys corresponding to the pixel blocks corresponding to each frame contain unique identifiers (a and B) of the video. It also contains an identifier (H) of the block itself and the frame (G) from which it comes. Identifier (a) is the time when the client starts to read the video and identifier (B) is a random integer generated by the snowflake algorithm (algorithm employed by the snowflake model). The identifiers a and B together constitute a unique identifier for each video, the identifier (C) is the resolution, and the identifier (D) is the total frame number. Identifier (E) is data block size: it indicates how many rows each block contains, and if the height of the frame is known, the number of blocks per frame can be calculated. The identifier (F) section represents the encoding of the pixel data: 0. gray scale is expressed, and 1 represents RGB color.
In one possible implementation, the plurality of flank custom pixel operators include a first flank custom pixel operator, a second flank custom pixel operator, a third flank custom pixel operator, a fourth flank custom pixel operator, and a fifth flank custom pixel operator, and sequentially performing corresponding pixel information processing using the plurality of flank custom pixel operators includes the following steps:
gray data conversion processing is carried out through a first Flink custom pixel operator;
receiving a block with gray image data through a second flank custom pixel operator;
sequentially adding all partial histograms corresponding to a frame through a third Flink custom pixel operator to generate a total histogram of the corresponding frame;
calculating the difference value of two adjacent frames of the same video through a fourth Flink custom pixel operator;
and performing lens detection processing through a fifth Flink custom pixel operator to obtain a corresponding detection result.
In an actual application scene, the process of performing gray data conversion processing by the first flank custom pixel operator is specifically as follows: after receiving the data object, the Flinksource splits the data object into image data and a unique key, judges whether the image data is color (24 bits per pixel), and if so, converts the RGB three-channel map into a gray image
If the gray pattern is, the process is directly performed downwards.
In the practical application scenario, the process of receiving the block with the gray image data by the second flank custom pixel operator (histogram calculation) is specifically as follows:
by for each input block, byCalculating an intensity histogram thereof, wherein n is the total number of pixels, < ->For grey level +.>Is a number of pixels of a display panel. The output of the custom operators is redistributed among the third flank custom pixel operators based on the identifier of the source frame (i.e., in the key of each block) so that each third flank custom pixel operator receives the data of the same frame (i.e., block histogram). Each time a new partial histogram arrives, the operator uses its corresponding state (e.g., the state assigned to key Frame frame10_video3) to identify the Video and the Frame from which it came.
In an actual application scene, all partial histograms corresponding to a frame are sequentially added by a third link custom pixel operator, and a process of generating a total histogram of the corresponding frame (summarizing to generate a frame total histogram) is specifically as follows:
the third flank custom pixel operator receives the histograms of all blocks in a frame. The operator adds all partial histograms to generate the overall histogram for the frame. If a frame is partitioned into K blocks, the operator will wait for K blocks and then output a histogram. The operation times out after a few seconds, and the histogram will be calculated using the partial histogram received up to this point. The same operator can process multiple frames at the same time, even from different videos. Each output (i.e., the complete histogram of one frame) is sent twice, each time using a different key for the adjacent frame, to calculate the difference between any two consecutive frames at the next fourth flank custom pixel operator.
In an actual application scene, the process of calculating the difference between two adjacent frames of the same video through the fourth flank custom pixel operator is specifically as follows:
and receiving histograms of two adjacent frames, and calculating a difference value. One key is used every two consecutive frames. For example, for four example frames, F1, F2, F3, F4, three keys are created: f1_f2, f2_f3 and f3_f4. Each full frame histogram is routed twice by the previous Flink custom pixel operator C. For example, the flank custom pixel operator C sends the histogram of frame F3 twice, once using the f2_f3 key and a second time using the f3_f4 key. The histogram of frame F3 with the key f2_f3 is compared to the histogram of frame F2 for the same key on operation Fu Shili d.2 (i.e., the second instance of the operator D) which will output the difference between the two. Also in operator d.3, the difference between frame F3 and frame F4 is calculated. Each instance of the parallel operator will hold data for multiple keys (key states). For each key, it waits for the histogram of two adjacent frames to arrive. When the first histogram arrives, it saves it in the local state of the key. When the second histogram arrives, it calculates their difference and outputs the result.
In an actual application scene, the process of performing lens detection processing through the fifth flank custom pixel operator to obtain a corresponding detection result is specifically as follows:
checking for differences exceeding the threshold Tb, the shot detection operator receives all histogram differences between all adjacent frames and outputs shot changes (shot cuts and fades in) to the output MQ theme. A key is also used on this operator to distinguish between the different videos being processed. The differences in the histograms are not ordered. The difference in histograms is grouped according to the identifier of the Video from which they came (e.g., video 1), by assigning them the same key. All histogram differences with the same key are routed to the same operator instance. The histogram difference from Video1 is sent to instance e.1. Likewise, all histogram differences from Video2 are sent to instance e.2. In each instance, the histogram differences of the same video share a local state. In this case, a camera interruption is detected. It also searches for potential fades. This requires that the differences are kept in memory, that the differences of all neighboring frames exceeding the threshold Ts are accumulated and their sum is checked against the threshold Tb. In this case, a lens is detected as a gradual change result. The difference in the histograms is generated regardless of the order of the frames. The priority queue on each fifth flank custom pixel operator instance holds all histogram differences for each arriving video. The top element of the priority queue is always the earliest available histogram difference. Whenever the next histogram difference is needed (by the shot detection algorithm of the gray level histogram), the operator first checks the top element of the queue. If found, the shot detection algorithm of the gray level histogram continues to be executed. Otherwise, it waits. At the same time, the arriving elements are inserted into the queue.
In one possible implementation manner, the lens detection process is performed by the fifth flank custom pixel operator, including the following steps:
in response to the detection result being greater than a preset difference threshold (the aforementioned threshold Tb), receiving a histogram difference between any adjacent frames by a shot detection operator and outputting corresponding shot change data to the corresponding MQ output theme, the shot change data including data related to shot cuts and data related to gradual fades in.
It should be noted that the preset difference threshold in the above steps is not limited specifically, and the preset difference threshold may be adjusted according to requirements of different application scenarios.
Step S103: key frames generated by MQ consumers are obtained.
In a specific application scenario, the method for extracting the distributed real-time video key frames based on the Flink provided by the embodiment of the application comprises a client cluster, an MQ cluster and a Flink cluster. After the user uploads the video, each client reads the video in real time and extracts its frame, and each frame is divided into smaller pixel blocks which are easier to transmit and process in parallel in a highly distributed system, and then generates a unique key and a unified package of pixel blocks. And (3) carrying out MQ message encapsulation on the identified blocks and sending the MQ message encapsulation to corresponding topics, wherein one topic can simultaneously receive a plurality of videos (any resolution and frame rate), then using a flexible custom pixel operator (corresponding to the plurality of flexible custom pixel operators), wherein the plurality of flexible custom pixel operators comprise a first flexible custom pixel operator, a second flexible custom pixel operator, a third flexible custom pixel operator, a fourth flexible custom pixel operator and a fifth flexible custom pixel operator, carrying out a series of corresponding pixel information processing on the histogram by using a flexible custom pixel operator (corresponding to the first flexible custom pixel operator), converting the pixels into gray images through a three-channel image, calculating the histogram by using a histogram custom operator 3 (corresponding to the second flexible custom pixel operator), and adding all the histogram of all the histogram blocks to generate a whole histogram of all the histogram blocks, and generating all the histogram blocks. If a frame is split into K blocks, the operator will wait for K blocks and then output a histogram. The custom operator 4 (corresponding to the fourth flank custom pixel operator described above) performs a histogram difference calculation operator to receive the histograms of two adjacent frames and calculate their difference values, and then outputs to the custom operator 5 (corresponding to the fifth flank custom pixel operator described above), and the custom operator 5 (corresponding to the fifth flank custom pixel operator described above) receives all the histogram differences between adjacent frames according to the shot detection algorithm and outputs shot changes (shot switching and gradient) to the MQ output theme. The MQ outputs the topic consumer to perform video conversion after receiving the message, and obtains the generated key frame.
According to the method for extracting the distributed real-time video key frame based on the Flink, provided by the embodiment of the application, the MQ cluster is introduced to encapsulate the selected key and the corresponding pixel block with the MQ message through the MQ cluster, and the obtained MQ message is sent to the corresponding MQ theme; and introducing a Flink cluster, performing monitoring processing on the MQ theme through the Flink cluster to acquire a message object in the corresponding MQ, sequentially performing corresponding pixel information processing by using a plurality of Flink custom pixel operators, performing key encapsulation and combination on the detected key frames, and sending the key encapsulation and combination to the MQ output theme, so that corresponding MQ consumers generate corresponding key frames based on the pixel information and the frame information in the encapsulated key, thereby simplifying the flow of the extraction method of the real-time video key frames and effectively reducing communication overhead.
In the above embodiment, a method for extracting a distributed real-time video key frame based on a link is provided, and correspondingly, the application also provides a device for extracting a distributed real-time video key frame based on a link. The device for extracting the distributed real-time video key frames based on the Flink can implement the method for extracting the distributed real-time video key frames based on the Flink, and the device for extracting the distributed real-time video key frames based on the Flink can be realized in a mode of software, hardware or combination of software and hardware. For example, the flank-based distributed real-time video keyframe extraction apparatus may include integrated or separate functional modules or units to perform the corresponding steps in the methods described above.
Referring to fig. 3, a schematic diagram of a device for extracting a link-based distributed real-time video key frame according to some embodiments of the present application is shown. The extraction device is applied to clients in the client cluster, and since the device embodiment is basically similar to the method embodiment, the description is simpler, and relevant parts are only needed to see the part of the description of the method embodiment. The device embodiments described below are merely illustrative.
As shown in fig. 3, the link-based extraction apparatus 300 for distributed real-time video key frames may include:
the extracting module 301 is configured to extract video frames of the target video read in real time, so as to divide each frame into corresponding pixel blocks;
the processing module 302 is configured to perform key identification processing on a pixel block corresponding to each frame to obtain a corresponding key, perform MQ message encapsulation processing on the selected key and the corresponding pixel block through the MQ cluster, and send the obtained MQ message to a corresponding MQ theme; the method comprises the steps that a message object in a corresponding MQ is obtained through monitoring processing of an MQ topic by a Flink cluster, corresponding pixel information processing is sequentially conducted by using a plurality of Flink custom pixel operators, key packaging combination is conducted on detected key frames, the key packaging combination is sent to an MQ output topic, and accordingly corresponding key frames are generated by corresponding MQ consumers based on pixel information and frame information in the packaged key;
An obtaining module 303, configured to obtain a key frame generated by the MQ consumer.
In some implementations of the embodiments of the present application, the apparatus 300 for extracting a link-based distributed real-time video key frame further includes:
a reading module (not shown in fig. 3) for reading the key corresponding to the pixel block corresponding to each frame after performing the key identification processing on the pixel block corresponding to each frame to obtain the corresponding key;
each key read by the reading module comprises, in order, a first identifier, a second identifier, an identifier for identifying a corresponding resolution, an identifier for identifying a corresponding total frame number, an identifier for identifying a corresponding data block size, an identifier for identifying a corresponding pixel data encoding, a corresponding video frame identification, and a corresponding pixel block identification, the first identifier is used for identifying the time of reading the corresponding video, the second identifier is a random integer generated by the snowflake model, and the first identifier and the second identifier form a unique identifier of the corresponding video.
In some implementations of the embodiments of the present application, the plurality of flank custom pixel operators includes a first flank custom pixel operator, a second flank custom pixel operator, a third flank custom pixel operator, a fourth flank custom pixel operator, and a fifth flank custom pixel operator, and the processing module 302 includes:
A first processing sub-module (not shown in fig. 3) for performing gray data conversion processing by the first flank custom pixel operator;
a second processing sub-module (not shown in fig. 3) for receiving blocks with grayscale image data via a second flank custom pixel operator;
a third processing sub-module (not shown in fig. 3) configured to sequentially add all partial histograms corresponding to a frame by using a third flank custom pixel operator, to generate a total histogram of the corresponding frame;
a fourth processing sub-module (not shown in fig. 3) for calculating a difference value between two adjacent frames of the same video by a fourth flank custom pixel operator;
and a fifth processing sub-module (not shown in fig. 3) configured to perform lens detection processing by using a fifth flank custom pixel operator, so as to obtain a corresponding detection result.
In some implementations of the embodiments of the application, the fifth processing submodule is specifically configured to:
and receiving a histogram difference between any adjacent frames through a shot detection operator in response to the detection result being greater than a preset difference threshold, and outputting corresponding shot change data to a corresponding MQ output theme, wherein the shot change data comprises data related to shot cutting and data related to gradual fade-in.
The device 300 for extracting the link-based distributed real-time video key frames provided by the embodiment of the present application in some implementations of the embodiment of the present application has the same beneficial effects as the method for extracting the link-based distributed real-time video key frames provided by the foregoing embodiment of the present application due to the same inventive concept.
A third aspect of the present application provides a computer readable storage medium, where the computer readable storage medium includes a method program for extracting a link-based distributed real-time video key frame, where the method program for extracting a link-based distributed real-time video key frame is executed by a processor, to implement a step of a method for extracting a link-based distributed real-time video key frame as described in any one of the above.
The application discloses a method, a device and a readable storage medium for extracting distributed real-time video key frames based on a Flink, wherein an MQ cluster is introduced to encapsulate an MQ message between a selected key and a corresponding pixel block through the MQ cluster, and the obtained MQ message is sent to a corresponding MQ theme; and introducing a Flink cluster, performing monitoring processing on the MQ theme through the Flink cluster to acquire a message object in the corresponding MQ, sequentially performing corresponding pixel information processing by using a plurality of Flink custom pixel operators, performing key encapsulation and combination on the detected key frames, and sending the key encapsulation and combination to the MQ output theme, so that corresponding MQ consumers generate corresponding key frames based on the pixel information and the frame information in the encapsulated key, thereby simplifying the flow of the extraction method of the real-time video key frames and effectively reducing communication overhead.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application, and are intended to be included within the scope of the appended claims and description.

Claims (10)

1. A method for extracting a distributed real-time video key frame based on a link is characterized by comprising the following steps:
extracting video frames of the target video read in real time to divide each frame into corresponding pixel blocks;
performing key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, performing MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through an MQ cluster, and sending the obtained MQ messages to corresponding MQ topics; the MQ theme is monitored through the Flink cluster to acquire a message object in the corresponding MQ, a plurality of Flink custom pixel operators are used for sequentially processing corresponding pixel information, key packaging combination is carried out on the detected key frames and the key packaging combination is sent to the MQ output theme, so that corresponding key frames are generated by corresponding MQ consumers based on the pixel information and the frame information in the packaged keys;
The key frames generated by the MQ consumers are obtained.
2. The method according to claim 1, wherein after performing the key identification processing on the pixel block corresponding to each frame to obtain the corresponding key, the method further comprises:
reading a key corresponding to a pixel block corresponding to each frame;
each key read comprises a first identifier, a second identifier, an identifier for identifying a corresponding resolution, an identifier for identifying a corresponding total frame number, an identifier for identifying a corresponding data block size, an identifier for identifying a corresponding pixel data code, a corresponding video frame identifier and a corresponding pixel block identifier in sequence, wherein the first identifier is used for identifying the time of reading the corresponding video, the second identifier is a random integer generated by a snowflake model, and the first identifier and the second identifier form a unique identifier of the corresponding video.
3. The extraction method according to claim 1, wherein the plurality of flank custom pixel operators include a first flank custom pixel operator, a second flank custom pixel operator, a third flank custom pixel operator, a fourth flank custom pixel operator, and a fifth flank custom pixel operator, and the sequentially performing corresponding pixel information processing using the plurality of flank custom pixel operators includes: gray data conversion processing is carried out through the first Flink custom pixel operator;
Receiving a block with gray image data through the second flank custom pixel operator;
sequentially adding all partial histograms corresponding to a frame through the third Flink custom pixel operator to generate a total histogram of the corresponding frame;
calculating the difference value of two adjacent frames of the same video through the fourth Flink custom pixel operator;
and performing lens detection processing through a fifth Flink custom pixel operator to obtain a corresponding detection result.
4. The extraction method according to claim 3, wherein the performing lens detection processing by the fifth flank custom pixel operator includes:
and receiving the histogram difference between any adjacent frames through a shot detection operator in response to the detection result being greater than a preset difference threshold, and outputting corresponding shot change data to a corresponding MQ output theme, wherein the shot change data comprises data related to shot cutting and data related to gradual fade-in.
5. A device for extracting a distributed real-time video key frame based on a link, the device comprising:
the extraction module is used for extracting video frames of the target video read in real time so as to divide each frame into corresponding pixel blocks;
The processing module is used for carrying out key identification processing on the pixel blocks corresponding to each frame to obtain corresponding keys, carrying out MQ message encapsulation processing on the selected keys and the corresponding pixel blocks through the MQ cluster, and sending the obtained MQ messages to the corresponding MQ topics; the MQ theme is monitored through the Flink cluster to acquire a message object in the corresponding MQ, a plurality of Flink custom pixel operators are used for sequentially processing corresponding pixel information, key packaging combination is carried out on the detected key frames and the key packaging combination is sent to the MQ output theme, so that corresponding key frames are generated by corresponding MQ consumers based on the pixel information and the frame information in the packaged keys;
and the acquisition module is used for acquiring the key frames generated by the MQ consumers.
6. The extraction device of claim 5, wherein the device further comprises:
the reading module is used for reading the key corresponding to the pixel block corresponding to each frame after the key identification processing is carried out on the pixel block corresponding to each frame to obtain the corresponding key;
each key read by the reading module sequentially comprises a first identifier, a second identifier, an identifier for identifying the corresponding resolution, an identifier for identifying the corresponding total frame number, an identifier for identifying the corresponding data block size, an identifier for identifying the corresponding pixel data code, a corresponding video frame identifier and a corresponding pixel block identifier, wherein the first identifier is used for identifying the time of reading the corresponding video, the second identifier is a random integer generated by a snowflake model, and the first identifier and the second identifier form a unique identifier of the corresponding video.
7. The extraction apparatus of claim 5, wherein the plurality of flank custom pixel operators comprises a first flank custom pixel operator, a second flank custom pixel operator, a third flank custom pixel operator, a fourth flank custom pixel operator, and a fifth flank custom pixel operator, the processing module comprising:
the first processing submodule is used for carrying out gray data conversion processing through the first Flink custom pixel operator;
the second processing submodule is used for receiving the blocks with the gray image data through the second Flink custom pixel operators;
the third processing sub-module is used for sequentially adding all partial histograms corresponding to one frame through the third Flink custom pixel operator to generate a total histogram of the corresponding frame;
a fourth processing sub-module, configured to calculate, by using the fourth flank custom pixel operator, a difference value between two adjacent frames of the same video;
and the fifth processing sub-module is used for performing lens detection processing through a fifth Flink self-defined pixel operator to obtain a corresponding detection result.
8. The extraction device of claim 7, wherein the extraction device comprises a plurality of extraction elements,
The fifth processing submodule is specifically configured to:
and receiving the histogram difference between any adjacent frames through a shot detection operator in response to the detection result being greater than a preset difference threshold, and outputting corresponding shot change data to a corresponding MQ output theme, wherein the shot change data comprises data related to shot cutting and data related to gradual fade-in.
9. A computer readable storage medium, characterized in that it stores a computer program for executing the method of any of the preceding claims 1 to 4.
10. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the method of any one of the preceding claims 1 to 4.
CN202310789876.8A 2023-06-30 2023-06-30 Method and device for extracting distributed real-time video key frames based on Flink Active CN116524417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310789876.8A CN116524417B (en) 2023-06-30 2023-06-30 Method and device for extracting distributed real-time video key frames based on Flink

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310789876.8A CN116524417B (en) 2023-06-30 2023-06-30 Method and device for extracting distributed real-time video key frames based on Flink

Publications (2)

Publication Number Publication Date
CN116524417A CN116524417A (en) 2023-08-01
CN116524417B true CN116524417B (en) 2023-10-20

Family

ID=87396242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310789876.8A Active CN116524417B (en) 2023-06-30 2023-06-30 Method and device for extracting distributed real-time video key frames based on Flink

Country Status (1)

Country Link
CN (1) CN116524417B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000465A1 (en) * 2015-07-01 2017-01-05 中国矿业大学 Method for real-time selection of key frames when mining wireless distributed video coding
KR20210040323A (en) * 2020-06-28 2021-04-13 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Method and apparatus for recognizing key identifier in video, device and storage medium
CN112990191A (en) * 2021-01-06 2021-06-18 中国电子科技集团公司信息科学研究院 Shot boundary detection and key frame extraction method based on subtitle video
CN115617495A (en) * 2022-12-06 2023-01-17 深圳安德空间技术有限公司 Ground penetrating radar data reasoning method and system based on distributed architecture
CN115643255A (en) * 2022-10-24 2023-01-24 中国农业银行股份有限公司 Video transmission method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000465A1 (en) * 2015-07-01 2017-01-05 中国矿业大学 Method for real-time selection of key frames when mining wireless distributed video coding
KR20210040323A (en) * 2020-06-28 2021-04-13 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Method and apparatus for recognizing key identifier in video, device and storage medium
CN112990191A (en) * 2021-01-06 2021-06-18 中国电子科技集团公司信息科学研究院 Shot boundary detection and key frame extraction method based on subtitle video
CN115643255A (en) * 2022-10-24 2023-01-24 中国农业银行股份有限公司 Video transmission method, device, equipment and storage medium
CN115617495A (en) * 2022-12-06 2023-01-17 深圳安德空间技术有限公司 Ground penetrating radar data reasoning method and system based on distributed architecture

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
流式处理框架发展综述;戚红雨;;信息化研究(06);5-12 *

Also Published As

Publication number Publication date
CN116524417A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
JP6972260B2 (en) Systems and methods for partitioning search indexes to improve media segment identification efficiency
EP3703375A1 (en) Three-dimensional model encoding device, three-dimensional model decoding device, three-dimensional model encoding method, and three-dimensional model decoding method
CN112272327B (en) Data processing method, device, storage medium and equipment
Dou et al. Edge computing-enabled deep learning for real-time video optimization in IIoT
CN113051236B (en) Method and device for auditing video and computer-readable storage medium
US20190327525A1 (en) Video Fingerprinting Based on Fourier Transform of Histogram
US20220377395A1 (en) Method and system for automatic real-time frame segmentation of high resolution video streams into constituent features and modifications of features in each frame to simultaneously create multiple different linear views from same video source
US20230222726A1 (en) Information processing device and method
CN115022679B (en) Video processing method, device, electronic equipment and medium
EP2296095B1 (en) Video descriptor generator
CN114051120A (en) Video alarm method, device, storage medium and electronic equipment
CN116524417B (en) Method and device for extracting distributed real-time video key frames based on Flink
CN110049379B (en) Video delay detection method and system
CN110582021A (en) Information processing method and device, electronic equipment and storage medium
US20140078401A1 (en) Distribution and use of video statistics for cloud-based video encoding
CN111340101B (en) Stability evaluation method, apparatus, electronic device, and computer-readable storage medium
CN111447444A (en) Image processing method and device
JP6850166B2 (en) Video stream match judgment program
EP1410335A1 (en) Method and system for image compression with improved colour palette selection
CN113824715B (en) Method and device for playing real-time video stream
CN114125493B (en) Distributed storage method, device and equipment for streaming media
KR102097753B1 (en) Method and Apparatus for Processing Video for Monitoring Video
CN114005062A (en) Abnormal frame processing method, abnormal frame processing device, server and storage medium
CN117612482B (en) Network playing method and system for LED display screen
US20240107105A1 (en) Qr attribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant