CN114598921A - Video frame extraction method and device, terminal equipment and storage medium - Google Patents

Video frame extraction method and device, terminal equipment and storage medium Download PDF

Info

Publication number
CN114598921A
CN114598921A CN202210223894.5A CN202210223894A CN114598921A CN 114598921 A CN114598921 A CN 114598921A CN 202210223894 A CN202210223894 A CN 202210223894A CN 114598921 A CN114598921 A CN 114598921A
Authority
CN
China
Prior art keywords
video
processed
video frame
frame
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210223894.5A
Other languages
Chinese (zh)
Other versions
CN114598921B (en
Inventor
李�浩
李富强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN202210223894.5A priority Critical patent/CN114598921B/en
Publication of CN114598921A publication Critical patent/CN114598921A/en
Application granted granted Critical
Publication of CN114598921B publication Critical patent/CN114598921B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a video frame extraction method, a video frame extraction device, terminal equipment and a storage medium, wherein the method comprises the following steps: extracting a plurality of video frames to be processed from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, and the target video frame can process the video stream at a higher speed to extract a required representative frame, so that the efficiency is improved, and the labor cost and the time cost are reduced.

Description

Video frame extraction method and device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for extracting a video frame, a terminal device, and a storage medium.
Background
With the development of science and technology and some environmental influences, the mode of transmitting information through videos is widely applied, such as web lessons or live broadcasting. In a real-person family education video tutoring scene, a teacher explains knowledge points of students in a PPT (point-to-point) document form in a video, and the teacher often moves back and forth in video playing, so that subject information in a back document can be shielded.
And for the video, proper video pages are often required to be extracted for video tutoring cover introduction, course promotion and the like. Generally, to extract the video pages, people are required to screen and intercept the video streams one by one, and the pages without shielding are analyzed by means of discrimination of human eyes, so that the screening efficiency is low, and a large amount of manpower and time cost are consumed.
Disclosure of Invention
The application provides a video frame extraction method, a video frame extraction device, terminal equipment and a storage medium.
In a first aspect, a method for extracting a video frame is provided, where the method includes:
extracting a plurality of video frames to be processed from a video stream to be processed;
performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
Optionally, the determining a target video frame from the plurality of video frames to be processed includes:
step S1, constructing a video frame dictionary, wherein a key of the video frame dictionary is the maximum width value of a text line rectangle of a first video frame, the value of the video frame dictionary is the first video frame, and the first video frame belongs to the video frame to be processed;
step S2, obtaining the maximum width value of a text line rectangle of a second video frame, wherein the second video frame is the next frame of the first video frame in the video frames to be processed;
step S3, if the maximum width value of the text line rectangle of the second video frame is larger than the key of the video frame dictionary, updating the video frame dictionary; if the maximum width value of the text line rectangle of the second video frame is not larger than the key of the video frame dictionary, not updating the video frame dictionary;
step S4, regarding the second video frame as the first video frame;
and repeating the steps S2-S4 until all the video frames to be processed are processed, and determining the target video frame, wherein the target video frame is the value of the video frame dictionary.
Optionally, the updating the video frame dictionary includes:
updating the key of the video frame dictionary to be the maximum width value of the text line rectangle of the second video frame;
and updating the value of the video frame dictionary to be the video frame of the second video frame.
Optionally, the performing text line detection processing on the multiple video frames to be processed to obtain a maximum width value of a text line rectangle of each video frame to be processed includes:
performing text line detection processing on the plurality of video frames to be processed to obtain a text line in each video frame to be processed;
acquiring a text line rectangle which is the minimum bounding rectangle of the text line;
and calculating the widths of all the text line rectangles in each video frame to be processed to obtain the maximum width value of the text line rectangles of each video frame to be processed.
Optionally, the extracting a plurality of to-be-processed video frames from the to-be-processed video stream includes:
and performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain a plurality of video frames to be processed, wherein the number of the video frames to be processed is less than that of all the video frames in the video stream to be processed.
Optionally, the performing, according to a preset extraction rule, difference frame extraction on the video stream to be processed to obtain the multiple video frames to be processed includes:
based on a preset difference frame number N, performing difference frame extraction on the video stream to be processed according to a mode of extracting a video frame from every N frames to obtain a plurality of video frames to be processed, wherein N is an integer greater than 1; alternatively, the first and second electrodes may be,
and according to a preset time interval, performing difference frame extraction on the video stream to be processed to obtain a plurality of video frames to be processed.
Optionally, the method further includes:
performing text recognition processing on the plurality of video frames to be processed, and determining whether the plurality of video frames to be processed contain similar frames, wherein the intra-frame character contents of the similar frames are the same;
under the condition that the plurality of video frames to be processed contain the same type of frames, performing text line detection processing on the same type of frames to obtain the width sum of a text line rectangle of each frame in the same type of frames;
and determining a representative video frame from the homogeneous frames, wherein the representative video frame is a video frame with the largest sum of the widths of the horizontal line rectangles in the homogeneous frames.
In a second aspect, there is provided a video frame extraction apparatus, including:
the frame extraction module is used for extracting a plurality of video frames to be processed from the video stream to be processed;
the text line detection module is used for carrying out text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and the determining module is used for determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
In a third aspect, a terminal device is provided, comprising a memory and a processor, the memory storing a computer program, which, when executed by the processor, causes the processor to perform the steps as in the first aspect and any one of its possible implementations.
In a fourth aspect, there is provided a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of the first aspect and any possible implementation thereof.
The method comprises the steps of extracting a plurality of video frames to be processed from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, and the target video frame can process the video stream at a higher speed to extract a required representative frame, so that the efficiency is improved, and the labor cost and the time cost are reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
Fig. 1 is a schematic flowchart of a video frame extraction method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a certain video frame to be processed according to an embodiment of the present application;
fig. 3 is a schematic diagram of another video frame to be processed according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a video frame text line detection according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a target video frame determination method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a video frame extraction apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present application will be described below with reference to the drawings.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video frame extraction method according to an embodiment of the present disclosure. The method can comprise the following steps:
101. a plurality of video frames to be processed are extracted from a video stream to be processed.
The execution subject of the embodiment of the present application may be a video frame extraction apparatus, and in a specific implementation, may be an electronic device or a terminal device, including but not limited to a desktop computer, or other portable devices such as a laptop computer and a tablet computer.
In an optional implementation manner, before the extracting the plurality of to-be-processed video frames from the to-be-processed video stream, the method further includes:
acquiring a video stream shot in real time as the video stream to be processed; alternatively, the first and second electrodes may be,
and acquiring the video stream generated by the history as the video stream to be processed.
Specifically, the video stream to be processed in the embodiment of the present application may obtain or manually select a file according to a preset path. In practical application, the video stream file to be processed may be a live family education video shot in real time, or may also be a family education tutoring video file prepared in advance, and the background of the video file contains an online document PPT explained by a teacher, which is not limited in this application embodiment. The video frame extraction method in the embodiment of the application can process the video files to automatically extract the suitable video frames from the video files.
In an alternative embodiment, the step 101 includes:
and performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain the plurality of video frames to be processed, wherein the number of the plurality of video frames to be processed is less than the number of all video frames in the video stream to be processed.
Specifically, since the video stream file may have a long time and a large number of frames, and there are a large number of repeated and similar video frames in the video stream, in order to increase the processing speed, the video frames may be obtained by using a difference frame extraction method. I.e. the interval extraction part of the video frame. The preset extraction rule can be set as required.
Referring to fig. 2 and fig. 3, fig. 2 and fig. 3 are schematic diagrams of a certain video frame to be processed according to an embodiment of the present application, respectively, and fig. 2 and fig. 3 are different video frames extracted from the same video stream, where the video stream is a real person lecture video and a person is behind a courseware PPT page, where the PPT page in fig. 2 and fig. 3 is the same, but part of the content in fig. 3 is occluded by the person.
Further optionally, the performing, according to a preset extraction rule, difference frame extraction on the video stream to be processed to obtain the multiple video frames to be processed includes:
based on a preset difference frame number N, performing difference frame extraction on the video stream to be processed in a mode of extracting a video frame every N frames to obtain a plurality of video frames to be processed, wherein the N is an integer greater than 1; alternatively, the first and second electrodes may be,
and according to a preset time interval, performing difference frame extraction on the video stream to be processed to obtain the plurality of video frames to be processed.
In the embodiment of the present application, the number N of the difference frames may be preset, that is, difference frame extraction may be performed in a manner of extracting one video frame per N frames, so as to obtain a plurality of to-be-processed video frames. The number N of the difference frames may be set according to needs, for example, N is 5 or 10, which is not limited in the embodiment of the present application. If N is 5, one frame is extracted every 5 frames, for example, the 1 st frame, the 6 th frame, the 11 th frame in the video stream may be extracted.
In addition, the extracted duration interval t may also be preset, that is, one frame is extracted every duration t in the video stream to be processed, so as to obtain a plurality of video frames to be processed. The duration interval t may be set as needed, for example, t is 4s, which is not limited in the embodiment of the present application.
102. And performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of the text line rectangle of each video frame to be processed.
Specifically, for the extracted video frame to be processed, text line detection processing may be performed. The text line detection technology involved in the embodiment of the application detects all possible text line areas on the basis of an input image.
Optionally, any text line detection scheme based on deep learning may be adopted in this embodiment of the present application.
In an alternative embodiment, the step 102 includes:
performing text line detection processing on the plurality of video frames to be processed to obtain a text line in each video frame to be processed;
acquiring a text line rectangle which is the minimum enclosing rectangle of the text line;
and calculating the widths of all the text line rectangles in each video frame to be processed to obtain the maximum width value of the text line rectangles of each video frame to be processed.
Specifically, after the text line detection processing, the text line regions are determined, and the minimum bounding rectangle of each text line region is found, which is referred to as a text line rectangle. The width of the text line rectangle can represent the length of the line of text, so that the width of each text line rectangle in each video frame can be obtained, and the largest width value of the width values, namely the largest width value of the text line rectangle of the video frame to be processed, can be determined.
In one embodiment, the width of the text line rectangle may be calculated by obtaining the coordinates of the upper left corner and the lower right corner of the text line rectangle.
For example, referring to a schematic diagram of detecting text lines of a video frame shown in fig. 4, as shown in fig. 4, after the text lines of the video frame are detected, all text lines appearing in a page of the frame are obtained, a minimum bounding rectangular box of the text lines (as shown by a rectangular box in the figure) is obtained, and the width of the rectangular box, that is, the width of a text line rectangle, can be calculated. For example, for a text line text _ line, the coordinate of the upper left corner of the minimum bounding rectangle is (x)min,ymin) The coordinate of the lower right corner is (x)max,ymax) Then the width of the text line rectangle is: x is a radical of a fluorine atommax-ymin
103. And determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
Specifically, in the embodiment of the present application, a maximum width value of a text row rectangle in each to-be-processed video frame may be obtained, and a video frame with the maximum width value is reserved from the maximum width value as the target video frame. The maximum width value of the text line rectangle in the video frame may reflect the occlusion situation of the text in the picture to a certain extent, and the smaller the maximum width value of the text line rectangle is, the more text may be occluded, for example, the maximum width value of the text line rectangle in fig. 2 is greater than the maximum width value of the text line rectangle in fig. 3, and the video frame shown in fig. 2 is preferably selected.
The target video frame is a video frame which needs to be extracted finally and can be used as a video representative picture, for example, the target video frame can be extracted and used as a video tutoring cover introduction or course propaganda; the target video frame may be stored and used as learning material.
It should be noted that, with the video frame extraction method in the embodiment of the present application, the video stream to be processed may be a video stream a with the same text background, for example, a video segment when the same PPT page a is explained, by using the video frame extraction method, an unoccluded (or relatively least occluded) frame in the video segment is extracted as a representative video frame, and the content of the video stream a or the PPT page a may be more fully displayed.
For a video stream B containing different text backgrounds, it can be regarded as a video clip set with multiple PPT pages. The page text may be recognized first, the plurality of video segments are divided according to the change of the text background, the video frame extraction method may be independently performed for each video segment to obtain a representative video frame of each video segment, all the extracted representative video frames may be saved, or one frame may be further selected from the representative video frames as the final representative video frame of the video stream B.
Step 103 may also refer to the specific description in the embodiment shown in fig. 5, which is not described herein again.
In the embodiment of the application, a plurality of video frames to be processed are extracted from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, the target video frame can process the video stream at a higher speed to extract a required representative frame, batch automatic processing can be realized, the efficiency is improved, and the labor cost and the time cost are reduced.
Fig. 5 is a flowchart illustrating a method for determining a target video frame according to an embodiment of the present application, and as shown in fig. 5, the method is an alternative implementation of step 103 in the embodiment shown in fig. 1.
501. Constructing a video frame dictionary, wherein a key of the video frame dictionary is the maximum width value of a text line rectangle of a first video frame, the value of the video frame dictionary is the first video frame, and the first video frame belongs to the video frame to be processed;
502. acquiring the maximum width value of a text line rectangle of a second video frame, wherein the second video frame is the next frame of the first video frame in the plurality of video frames to be processed;
503. if the maximum width value of the text line rectangle of the second video frame is larger than the key of the video frame dictionary, updating the video frame dictionary; if the maximum width value of the text line rectangle of the second video frame is not larger than the key of the video frame dictionary, not updating the video frame dictionary;
504. taking the second video frame as the first video frame;
505. and repeating the step 502 to the step 504 until all the to-be-processed video frames are processed, and determining the target video frame, wherein the target video frame is the value of the video frame dictionary.
The maximum width value (i.e., x) of the text line rectangle in each acquired framemax-ymin) Then, a video frame dictionary may be constructed, where a key of the dictionary is a maximum width value (max _ rectangle _ length) of a text line rectangle, and a value of the dictionary is a corresponding video frame page (img _ data), and the dictionary may specifically be represented as follows:
dict={“max_rectangle_lenhth”:“img_data”};
sequentially processing video frames to be processed, namely substituting the maximum width values of the first video frame and the text line rectangles of the first video frame into the dictionary, then circularly processing the next video frame (second video frame) by the method, continuously extracting the maximum width value of the text line rectangles in the second video frame, and if the maximum width value of the text line rectangles in the second video frame is not greater than the text line width value in the existing dictionary, keeping the dictionary unchanged; if the text line width value is greater than the text line width value in the existing dictionary, the dictionary is updated.
In one embodiment, the updating the video frame dictionary includes:
updating the key of the video frame dictionary to be the maximum width value of the text line rectangle of the second video frame;
and updating the value of the video frame dictionary to be the video frame of the second video frame.
And sequentially processing the video frames to be processed until all the video frames to be processed are processed, and obtaining the finally updated dictionary. At this time, the key in the dictionary is the width of the largest text line rectangle appearing in the video frame to be processed, and the value of the dictionary is the corresponding video frame to be extracted — the target video frame, which can be expressed as:
final_dict={“find_max_rectangle_length”:“final_img_data”};
in an alternative embodiment, the method further comprises:
601. performing text recognition processing on a plurality of video frames to be processed, and determining whether the plurality of video frames to be processed contain similar frames, wherein the intra-frame character contents of the similar frames are the same;
602. under the condition that the plurality of video frames to be processed comprise the frames of the same type, performing text line detection processing on the frames of the same type to obtain the width sum of a text line rectangle of each frame in the frames of the same type;
603. and determining a representative video frame from the similar frames, wherein the representative video frame is the video frame with the maximum sum of the widths of the text line rectangles in the similar frames.
The similar frames refer to video frames containing the same text background. In the foregoing embodiment, it is mentioned that, for a video stream B containing different text backgrounds, a video clip set with a plurality of PPT pages can be regarded as existing. Then, the page characters can be identified firstly, and a plurality of video segments are divided according to the change of the character background; the above-mentioned video frame extraction method can be independently performed for each video segment, and the extracted frames are homogeneous frames to determine the representative video frame of the video segment. All the extracted representative video frames may be saved, or one frame may be further selected from the representative video frames as the final representative video frame of the video stream B.
Optionally, in this embodiment of the present application, any text recognition algorithm may be selected to perform the step of text recognition processing. The text line detection processing in step 602 and the width sum calculation of the text line rectangle of each frame in the similar frame may refer to specific descriptions related to the embodiments shown in fig. 1 and fig. 4, including how to obtain the width of the text line rectangle in the video frame, that is, only the widths of all text line rectangles detected in one video frame need to be added to obtain the width sum of the text line rectangle of each frame, which is not described herein again.
Furthermore, the video frame with the largest sum of the widths of the text line rectangles in the same type of frame can be selected as the representative video frame of the same type of frame, so that the effect of extracting the video frame without occlusion or with the smallest occlusion in the video clips with the same text background can be achieved.
Based on the description of the above video frame extraction method embodiment, the embodiment of the present application further discloses a video frame extraction device. Referring to fig. 6, the video frame extracting apparatus 600 includes:
a frame extraction module 610, configured to extract a plurality of video frames to be processed from a video stream to be processed;
the text line detection module 620 is configured to perform text line detection processing on the multiple video frames to be processed to obtain a maximum width value of a text line rectangle of each video frame to be processed;
a determining module 630, configured to determine a target video frame from the multiple video frames to be processed, where the target video frame is a video frame with a largest maximum width value of the text line rectangle in the multiple video frames to be processed.
According to an embodiment of the present application, each step involved in the methods shown in fig. 1 and fig. 5 may be performed by each module in the video frame extracting apparatus 600 shown in fig. 6, and is not described herein again.
The video frame extraction apparatus 600 in the embodiment of the present application may extract a plurality of video frames to be processed from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, the target video frame can process the video stream at a higher speed to extract a required representative frame, batch automatic processing can be realized, the efficiency is improved, and the labor cost and the time cost are reduced.
Based on the description of the method embodiment and the apparatus embodiment, the embodiment of the present application further provides a terminal device. Referring to fig. 7, the terminal device 700 includes at least a processor 701, an input device 702, an output device 703, and a computer storage medium 704. The processor 701, the input device 702, the output device 703, and the computer storage medium 704 in the terminal device 700 may be connected by a bus or other means.
A computer storage medium 704 may be stored in the memory of the terminal device 700, the computer storage medium 704 being configured to store a computer program comprising program instructions, and the processor 701 being configured to execute the program instructions stored by the computer storage medium 704. The processor 701 (or CPU) is a computing core and a control core of the terminal device 700, and is adapted to implement one or more instructions, and specifically, adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 701 according to the embodiment of the present application may be configured to perform a series of processes, including the method according to the embodiments shown in fig. 1 and fig. 5.
An embodiment of the present application further provides a computer storage medium (Memory), where the computer storage medium is a Memory device in a terminal device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer storage medium provides a storage space that stores an operating system of the terminal device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 701. It should be noted that the computer storage medium herein may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 701 to implement the corresponding steps in the above embodiments; in a specific implementation, one or more instructions in the computer storage medium may be loaded by the processor 701 and perform any step of the method in fig. 1 and/or fig. 5, which is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the module is only one logical division, and other divisions may be possible in actual implementation, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a random access memory (ram), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

Claims (10)

1. A method for extracting video frames, the method comprising:
extracting a plurality of video frames to be processed from a video stream to be processed;
performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
2. The method according to claim 1, wherein said determining a target video frame from the plurality of video frames to be processed comprises:
step S1, constructing a video frame dictionary, wherein a key of the video frame dictionary is the maximum width value of a text line rectangle of a first video frame, the value of the video frame dictionary is the first video frame, and the first video frame belongs to the video frame to be processed;
step S2, obtaining the maximum width value of a text line rectangle of a second video frame, wherein the second video frame is the next frame of the first video frame in the video frames to be processed;
step S3, if the maximum width value of the text line rectangle of the second video frame is larger than the key of the video frame dictionary, updating the video frame dictionary; if the maximum width value of the text line rectangle of the second video frame is not larger than the key of the video frame dictionary, not updating the video frame dictionary;
step S4, regarding the second video frame as the first video frame;
and repeating the steps S2-S4 until all the video frames to be processed are processed, and determining the target video frame, wherein the target video frame is the value of the video frame dictionary.
3. The method of claim 2, wherein the updating the video frame dictionary comprises:
updating the key of the video frame dictionary to be the maximum width value of the text line rectangle of the second video frame;
and updating the value of the video frame dictionary to be the video frame of the second video frame.
4. The method according to claim 1, wherein said performing text line detection processing on the plurality of video frames to be processed to obtain a maximum width value of a text line rectangle of each video frame to be processed comprises:
performing text line detection processing on the plurality of video frames to be processed to obtain a text line in each video frame to be processed;
acquiring a text line rectangle which is the minimum bounding rectangle of the text line;
and calculating the widths of all the text line rectangles in each video frame to be processed to obtain the maximum width value of the text line rectangles of each video frame to be processed.
5. The method according to any one of claims 1 to 4, wherein said extracting a plurality of video frames to be processed from a video stream to be processed comprises:
and performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain a plurality of video frames to be processed, wherein the number of the video frames to be processed is less than that of all the video frames in the video stream to be processed.
6. The method according to claim 5, wherein said performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain the plurality of video frames to be processed comprises:
based on a preset difference frame number N, performing difference frame extraction on the video stream to be processed according to a mode of extracting a video frame from every N frames to obtain a plurality of video frames to be processed, wherein N is an integer greater than 1; alternatively, the first and second electrodes may be,
and according to a preset time interval, performing difference frame extraction on the video stream to be processed to obtain a plurality of video frames to be processed.
7. The method of claim 1, further comprising:
performing text recognition processing on the plurality of video frames to be processed, and determining whether the plurality of video frames to be processed contain similar frames, wherein the intra-frame character contents of the similar frames are the same;
under the condition that the plurality of video frames to be processed contain the same type of frames, performing text line detection processing on the same type of frames to obtain the width sum of a text line rectangle of each frame in the same type of frames;
and determining a representative video frame from the homogeneous frame, wherein the representative video frame is a video frame with the maximum sum of the widths of the horizontal rectangles in the homogeneous frame.
8. A video frame extraction apparatus, comprising:
the frame extraction module is used for extracting a plurality of video frames to be processed from the video stream to be processed;
the text line detection module is used for carrying out text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and the determining module is used for determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is a video frame with the largest maximum width value of the rectangle of the text line in the plurality of video frames to be processed.
9. A terminal device, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the video frame extraction method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the video frame extraction method according to any one of claims 1 to 7.
CN202210223894.5A 2022-03-07 2022-03-07 Video frame extraction method, device, terminal equipment and storage medium Active CN114598921B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210223894.5A CN114598921B (en) 2022-03-07 2022-03-07 Video frame extraction method, device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210223894.5A CN114598921B (en) 2022-03-07 2022-03-07 Video frame extraction method, device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114598921A true CN114598921A (en) 2022-06-07
CN114598921B CN114598921B (en) 2024-04-12

Family

ID=81807060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210223894.5A Active CN114598921B (en) 2022-03-07 2022-03-07 Video frame extraction method, device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114598921B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090417A (en) * 2023-04-11 2023-05-09 福昕鲲鹏(北京)信息科技有限公司 Layout document text selection rendering method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6937766B1 (en) * 1999-04-15 2005-08-30 MATE—Media Access Technologies Ltd. Method of indexing and searching images of text in video
CN111768346A (en) * 2020-05-12 2020-10-13 北京奇艺世纪科技有限公司 Method, device and equipment for correcting back image of identity card and storage medium
CN113033552A (en) * 2021-03-19 2021-06-25 北京字跳网络技术有限公司 Text recognition method and device and electronic equipment
CN113408241A (en) * 2021-07-16 2021-09-17 网易(杭州)网络有限公司 Text data processing method and device, electronic equipment and readable medium
CN113591530A (en) * 2021-02-24 2021-11-02 腾讯科技(深圳)有限公司 Video detection method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6937766B1 (en) * 1999-04-15 2005-08-30 MATE—Media Access Technologies Ltd. Method of indexing and searching images of text in video
CN111768346A (en) * 2020-05-12 2020-10-13 北京奇艺世纪科技有限公司 Method, device and equipment for correcting back image of identity card and storage medium
CN113591530A (en) * 2021-02-24 2021-11-02 腾讯科技(深圳)有限公司 Video detection method and device, electronic equipment and storage medium
CN113033552A (en) * 2021-03-19 2021-06-25 北京字跳网络技术有限公司 Text recognition method and device and electronic equipment
CN113408241A (en) * 2021-07-16 2021-09-17 网易(杭州)网络有限公司 Text data processing method and device, electronic equipment and readable medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090417A (en) * 2023-04-11 2023-05-09 福昕鲲鹏(北京)信息科技有限公司 Layout document text selection rendering method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114598921B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN107181976B (en) Bullet screen display method and electronic equipment
CN109803180B (en) Video preview generation method and device, computer equipment and storage medium
JP6240199B2 (en) Method and apparatus for identifying object in image
CN108230346B (en) Method and device for segmenting semantic features of image and electronic equipment
US9046991B2 (en) System and method for dynamically displaying structurally dissimilar thumbnail images of an electronic document
CN110225366B (en) Video data processing and advertisement space determining method, device, medium and electronic equipment
CN110688524B (en) Video retrieval method and device, electronic equipment and storage medium
US20220172476A1 (en) Video similarity detection method, apparatus, and device
JP7331146B2 (en) Subtitle cross-border processing method, device and electronic device
CN111385665A (en) Bullet screen information processing method, device, equipment and storage medium
EP3408752B1 (en) Object management and visualization using a computing device
CN112149570A (en) Multi-person living body detection method and device, electronic equipment and storage medium
CN114598921A (en) Video frame extraction method and device, terminal equipment and storage medium
CN105184838A (en) Picture processing method and terminal
EP3564833A1 (en) Method and device for identifying main picture in web page
CN109522429B (en) Method and apparatus for generating information
CN112215221A (en) Automatic vehicle frame number identification method
CN113391779B (en) Parameter adjusting method, device and equipment for paper-like screen
CN111127310B (en) Image processing method and device, electronic equipment and storage medium
CN113676734A (en) Image compression method and image compression device
CN114640876A (en) Multimedia service video display method and device, computer equipment and storage medium
CN112988005A (en) Method for automatically loading captions
CN111083552A (en) Thumbnail generation method, device, equipment and medium
CN111208955A (en) Printing method, printing device and server
CN113448470B (en) Webpage long screenshot method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant