CN114598921A - Video frame extraction method and device, terminal equipment and storage medium - Google Patents
Video frame extraction method and device, terminal equipment and storage medium Download PDFInfo
- Publication number
- CN114598921A CN114598921A CN202210223894.5A CN202210223894A CN114598921A CN 114598921 A CN114598921 A CN 114598921A CN 202210223894 A CN202210223894 A CN 202210223894A CN 114598921 A CN114598921 A CN 114598921A
- Authority
- CN
- China
- Prior art keywords
- video
- processed
- video frame
- frame
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 32
- 238000001514 detection method Methods 0.000 claims abstract description 27
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- YCKRFDGAMUMZLT-UHFFFAOYSA-N Fluorine atom Chemical compound [F] YCKRFDGAMUMZLT-UHFFFAOYSA-N 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 229910052731 fluorine Inorganic materials 0.000 description 1
- 239000011737 fluorine Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a video frame extraction method, a video frame extraction device, terminal equipment and a storage medium, wherein the method comprises the following steps: extracting a plurality of video frames to be processed from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, and the target video frame can process the video stream at a higher speed to extract a required representative frame, so that the efficiency is improved, and the labor cost and the time cost are reduced.
Description
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for extracting a video frame, a terminal device, and a storage medium.
Background
With the development of science and technology and some environmental influences, the mode of transmitting information through videos is widely applied, such as web lessons or live broadcasting. In a real-person family education video tutoring scene, a teacher explains knowledge points of students in a PPT (point-to-point) document form in a video, and the teacher often moves back and forth in video playing, so that subject information in a back document can be shielded.
And for the video, proper video pages are often required to be extracted for video tutoring cover introduction, course promotion and the like. Generally, to extract the video pages, people are required to screen and intercept the video streams one by one, and the pages without shielding are analyzed by means of discrimination of human eyes, so that the screening efficiency is low, and a large amount of manpower and time cost are consumed.
Disclosure of Invention
The application provides a video frame extraction method, a video frame extraction device, terminal equipment and a storage medium.
In a first aspect, a method for extracting a video frame is provided, where the method includes:
extracting a plurality of video frames to be processed from a video stream to be processed;
performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
Optionally, the determining a target video frame from the plurality of video frames to be processed includes:
step S1, constructing a video frame dictionary, wherein a key of the video frame dictionary is the maximum width value of a text line rectangle of a first video frame, the value of the video frame dictionary is the first video frame, and the first video frame belongs to the video frame to be processed;
step S2, obtaining the maximum width value of a text line rectangle of a second video frame, wherein the second video frame is the next frame of the first video frame in the video frames to be processed;
step S3, if the maximum width value of the text line rectangle of the second video frame is larger than the key of the video frame dictionary, updating the video frame dictionary; if the maximum width value of the text line rectangle of the second video frame is not larger than the key of the video frame dictionary, not updating the video frame dictionary;
step S4, regarding the second video frame as the first video frame;
and repeating the steps S2-S4 until all the video frames to be processed are processed, and determining the target video frame, wherein the target video frame is the value of the video frame dictionary.
Optionally, the updating the video frame dictionary includes:
updating the key of the video frame dictionary to be the maximum width value of the text line rectangle of the second video frame;
and updating the value of the video frame dictionary to be the video frame of the second video frame.
Optionally, the performing text line detection processing on the multiple video frames to be processed to obtain a maximum width value of a text line rectangle of each video frame to be processed includes:
performing text line detection processing on the plurality of video frames to be processed to obtain a text line in each video frame to be processed;
acquiring a text line rectangle which is the minimum bounding rectangle of the text line;
and calculating the widths of all the text line rectangles in each video frame to be processed to obtain the maximum width value of the text line rectangles of each video frame to be processed.
Optionally, the extracting a plurality of to-be-processed video frames from the to-be-processed video stream includes:
and performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain a plurality of video frames to be processed, wherein the number of the video frames to be processed is less than that of all the video frames in the video stream to be processed.
Optionally, the performing, according to a preset extraction rule, difference frame extraction on the video stream to be processed to obtain the multiple video frames to be processed includes:
based on a preset difference frame number N, performing difference frame extraction on the video stream to be processed according to a mode of extracting a video frame from every N frames to obtain a plurality of video frames to be processed, wherein N is an integer greater than 1; alternatively, the first and second electrodes may be,
and according to a preset time interval, performing difference frame extraction on the video stream to be processed to obtain a plurality of video frames to be processed.
Optionally, the method further includes:
performing text recognition processing on the plurality of video frames to be processed, and determining whether the plurality of video frames to be processed contain similar frames, wherein the intra-frame character contents of the similar frames are the same;
under the condition that the plurality of video frames to be processed contain the same type of frames, performing text line detection processing on the same type of frames to obtain the width sum of a text line rectangle of each frame in the same type of frames;
and determining a representative video frame from the homogeneous frames, wherein the representative video frame is a video frame with the largest sum of the widths of the horizontal line rectangles in the homogeneous frames.
In a second aspect, there is provided a video frame extraction apparatus, including:
the frame extraction module is used for extracting a plurality of video frames to be processed from the video stream to be processed;
the text line detection module is used for carrying out text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and the determining module is used for determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
In a third aspect, a terminal device is provided, comprising a memory and a processor, the memory storing a computer program, which, when executed by the processor, causes the processor to perform the steps as in the first aspect and any one of its possible implementations.
In a fourth aspect, there is provided a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of the first aspect and any possible implementation thereof.
The method comprises the steps of extracting a plurality of video frames to be processed from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, and the target video frame can process the video stream at a higher speed to extract a required representative frame, so that the efficiency is improved, and the labor cost and the time cost are reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
Fig. 1 is a schematic flowchart of a video frame extraction method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a certain video frame to be processed according to an embodiment of the present application;
fig. 3 is a schematic diagram of another video frame to be processed according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a video frame text line detection according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a target video frame determination method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a video frame extraction apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present application will be described below with reference to the drawings.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video frame extraction method according to an embodiment of the present disclosure. The method can comprise the following steps:
101. a plurality of video frames to be processed are extracted from a video stream to be processed.
The execution subject of the embodiment of the present application may be a video frame extraction apparatus, and in a specific implementation, may be an electronic device or a terminal device, including but not limited to a desktop computer, or other portable devices such as a laptop computer and a tablet computer.
In an optional implementation manner, before the extracting the plurality of to-be-processed video frames from the to-be-processed video stream, the method further includes:
acquiring a video stream shot in real time as the video stream to be processed; alternatively, the first and second electrodes may be,
and acquiring the video stream generated by the history as the video stream to be processed.
Specifically, the video stream to be processed in the embodiment of the present application may obtain or manually select a file according to a preset path. In practical application, the video stream file to be processed may be a live family education video shot in real time, or may also be a family education tutoring video file prepared in advance, and the background of the video file contains an online document PPT explained by a teacher, which is not limited in this application embodiment. The video frame extraction method in the embodiment of the application can process the video files to automatically extract the suitable video frames from the video files.
In an alternative embodiment, the step 101 includes:
and performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain the plurality of video frames to be processed, wherein the number of the plurality of video frames to be processed is less than the number of all video frames in the video stream to be processed.
Specifically, since the video stream file may have a long time and a large number of frames, and there are a large number of repeated and similar video frames in the video stream, in order to increase the processing speed, the video frames may be obtained by using a difference frame extraction method. I.e. the interval extraction part of the video frame. The preset extraction rule can be set as required.
Referring to fig. 2 and fig. 3, fig. 2 and fig. 3 are schematic diagrams of a certain video frame to be processed according to an embodiment of the present application, respectively, and fig. 2 and fig. 3 are different video frames extracted from the same video stream, where the video stream is a real person lecture video and a person is behind a courseware PPT page, where the PPT page in fig. 2 and fig. 3 is the same, but part of the content in fig. 3 is occluded by the person.
Further optionally, the performing, according to a preset extraction rule, difference frame extraction on the video stream to be processed to obtain the multiple video frames to be processed includes:
based on a preset difference frame number N, performing difference frame extraction on the video stream to be processed in a mode of extracting a video frame every N frames to obtain a plurality of video frames to be processed, wherein the N is an integer greater than 1; alternatively, the first and second electrodes may be,
and according to a preset time interval, performing difference frame extraction on the video stream to be processed to obtain the plurality of video frames to be processed.
In the embodiment of the present application, the number N of the difference frames may be preset, that is, difference frame extraction may be performed in a manner of extracting one video frame per N frames, so as to obtain a plurality of to-be-processed video frames. The number N of the difference frames may be set according to needs, for example, N is 5 or 10, which is not limited in the embodiment of the present application. If N is 5, one frame is extracted every 5 frames, for example, the 1 st frame, the 6 th frame, the 11 th frame in the video stream may be extracted.
In addition, the extracted duration interval t may also be preset, that is, one frame is extracted every duration t in the video stream to be processed, so as to obtain a plurality of video frames to be processed. The duration interval t may be set as needed, for example, t is 4s, which is not limited in the embodiment of the present application.
102. And performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of the text line rectangle of each video frame to be processed.
Specifically, for the extracted video frame to be processed, text line detection processing may be performed. The text line detection technology involved in the embodiment of the application detects all possible text line areas on the basis of an input image.
Optionally, any text line detection scheme based on deep learning may be adopted in this embodiment of the present application.
In an alternative embodiment, the step 102 includes:
performing text line detection processing on the plurality of video frames to be processed to obtain a text line in each video frame to be processed;
acquiring a text line rectangle which is the minimum enclosing rectangle of the text line;
and calculating the widths of all the text line rectangles in each video frame to be processed to obtain the maximum width value of the text line rectangles of each video frame to be processed.
Specifically, after the text line detection processing, the text line regions are determined, and the minimum bounding rectangle of each text line region is found, which is referred to as a text line rectangle. The width of the text line rectangle can represent the length of the line of text, so that the width of each text line rectangle in each video frame can be obtained, and the largest width value of the width values, namely the largest width value of the text line rectangle of the video frame to be processed, can be determined.
In one embodiment, the width of the text line rectangle may be calculated by obtaining the coordinates of the upper left corner and the lower right corner of the text line rectangle.
For example, referring to a schematic diagram of detecting text lines of a video frame shown in fig. 4, as shown in fig. 4, after the text lines of the video frame are detected, all text lines appearing in a page of the frame are obtained, a minimum bounding rectangular box of the text lines (as shown by a rectangular box in the figure) is obtained, and the width of the rectangular box, that is, the width of a text line rectangle, can be calculated. For example, for a text line text _ line, the coordinate of the upper left corner of the minimum bounding rectangle is (x)min,ymin) The coordinate of the lower right corner is (x)max,ymax) Then the width of the text line rectangle is: x is a radical of a fluorine atommax-ymin。
103. And determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
Specifically, in the embodiment of the present application, a maximum width value of a text row rectangle in each to-be-processed video frame may be obtained, and a video frame with the maximum width value is reserved from the maximum width value as the target video frame. The maximum width value of the text line rectangle in the video frame may reflect the occlusion situation of the text in the picture to a certain extent, and the smaller the maximum width value of the text line rectangle is, the more text may be occluded, for example, the maximum width value of the text line rectangle in fig. 2 is greater than the maximum width value of the text line rectangle in fig. 3, and the video frame shown in fig. 2 is preferably selected.
The target video frame is a video frame which needs to be extracted finally and can be used as a video representative picture, for example, the target video frame can be extracted and used as a video tutoring cover introduction or course propaganda; the target video frame may be stored and used as learning material.
It should be noted that, with the video frame extraction method in the embodiment of the present application, the video stream to be processed may be a video stream a with the same text background, for example, a video segment when the same PPT page a is explained, by using the video frame extraction method, an unoccluded (or relatively least occluded) frame in the video segment is extracted as a representative video frame, and the content of the video stream a or the PPT page a may be more fully displayed.
For a video stream B containing different text backgrounds, it can be regarded as a video clip set with multiple PPT pages. The page text may be recognized first, the plurality of video segments are divided according to the change of the text background, the video frame extraction method may be independently performed for each video segment to obtain a representative video frame of each video segment, all the extracted representative video frames may be saved, or one frame may be further selected from the representative video frames as the final representative video frame of the video stream B.
Step 103 may also refer to the specific description in the embodiment shown in fig. 5, which is not described herein again.
In the embodiment of the application, a plurality of video frames to be processed are extracted from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, the target video frame can process the video stream at a higher speed to extract a required representative frame, batch automatic processing can be realized, the efficiency is improved, and the labor cost and the time cost are reduced.
Fig. 5 is a flowchart illustrating a method for determining a target video frame according to an embodiment of the present application, and as shown in fig. 5, the method is an alternative implementation of step 103 in the embodiment shown in fig. 1.
501. Constructing a video frame dictionary, wherein a key of the video frame dictionary is the maximum width value of a text line rectangle of a first video frame, the value of the video frame dictionary is the first video frame, and the first video frame belongs to the video frame to be processed;
502. acquiring the maximum width value of a text line rectangle of a second video frame, wherein the second video frame is the next frame of the first video frame in the plurality of video frames to be processed;
503. if the maximum width value of the text line rectangle of the second video frame is larger than the key of the video frame dictionary, updating the video frame dictionary; if the maximum width value of the text line rectangle of the second video frame is not larger than the key of the video frame dictionary, not updating the video frame dictionary;
504. taking the second video frame as the first video frame;
505. and repeating the step 502 to the step 504 until all the to-be-processed video frames are processed, and determining the target video frame, wherein the target video frame is the value of the video frame dictionary.
The maximum width value (i.e., x) of the text line rectangle in each acquired framemax-ymin) Then, a video frame dictionary may be constructed, where a key of the dictionary is a maximum width value (max _ rectangle _ length) of a text line rectangle, and a value of the dictionary is a corresponding video frame page (img _ data), and the dictionary may specifically be represented as follows:
dict={“max_rectangle_lenhth”:“img_data”};
sequentially processing video frames to be processed, namely substituting the maximum width values of the first video frame and the text line rectangles of the first video frame into the dictionary, then circularly processing the next video frame (second video frame) by the method, continuously extracting the maximum width value of the text line rectangles in the second video frame, and if the maximum width value of the text line rectangles in the second video frame is not greater than the text line width value in the existing dictionary, keeping the dictionary unchanged; if the text line width value is greater than the text line width value in the existing dictionary, the dictionary is updated.
In one embodiment, the updating the video frame dictionary includes:
updating the key of the video frame dictionary to be the maximum width value of the text line rectangle of the second video frame;
and updating the value of the video frame dictionary to be the video frame of the second video frame.
And sequentially processing the video frames to be processed until all the video frames to be processed are processed, and obtaining the finally updated dictionary. At this time, the key in the dictionary is the width of the largest text line rectangle appearing in the video frame to be processed, and the value of the dictionary is the corresponding video frame to be extracted — the target video frame, which can be expressed as:
final_dict={“find_max_rectangle_length”:“final_img_data”};
in an alternative embodiment, the method further comprises:
601. performing text recognition processing on a plurality of video frames to be processed, and determining whether the plurality of video frames to be processed contain similar frames, wherein the intra-frame character contents of the similar frames are the same;
602. under the condition that the plurality of video frames to be processed comprise the frames of the same type, performing text line detection processing on the frames of the same type to obtain the width sum of a text line rectangle of each frame in the frames of the same type;
603. and determining a representative video frame from the similar frames, wherein the representative video frame is the video frame with the maximum sum of the widths of the text line rectangles in the similar frames.
The similar frames refer to video frames containing the same text background. In the foregoing embodiment, it is mentioned that, for a video stream B containing different text backgrounds, a video clip set with a plurality of PPT pages can be regarded as existing. Then, the page characters can be identified firstly, and a plurality of video segments are divided according to the change of the character background; the above-mentioned video frame extraction method can be independently performed for each video segment, and the extracted frames are homogeneous frames to determine the representative video frame of the video segment. All the extracted representative video frames may be saved, or one frame may be further selected from the representative video frames as the final representative video frame of the video stream B.
Optionally, in this embodiment of the present application, any text recognition algorithm may be selected to perform the step of text recognition processing. The text line detection processing in step 602 and the width sum calculation of the text line rectangle of each frame in the similar frame may refer to specific descriptions related to the embodiments shown in fig. 1 and fig. 4, including how to obtain the width of the text line rectangle in the video frame, that is, only the widths of all text line rectangles detected in one video frame need to be added to obtain the width sum of the text line rectangle of each frame, which is not described herein again.
Furthermore, the video frame with the largest sum of the widths of the text line rectangles in the same type of frame can be selected as the representative video frame of the same type of frame, so that the effect of extracting the video frame without occlusion or with the smallest occlusion in the video clips with the same text background can be achieved.
Based on the description of the above video frame extraction method embodiment, the embodiment of the present application further discloses a video frame extraction device. Referring to fig. 6, the video frame extracting apparatus 600 includes:
a frame extraction module 610, configured to extract a plurality of video frames to be processed from a video stream to be processed;
the text line detection module 620 is configured to perform text line detection processing on the multiple video frames to be processed to obtain a maximum width value of a text line rectangle of each video frame to be processed;
a determining module 630, configured to determine a target video frame from the multiple video frames to be processed, where the target video frame is a video frame with a largest maximum width value of the text line rectangle in the multiple video frames to be processed.
According to an embodiment of the present application, each step involved in the methods shown in fig. 1 and fig. 5 may be performed by each module in the video frame extracting apparatus 600 shown in fig. 6, and is not described herein again.
The video frame extraction apparatus 600 in the embodiment of the present application may extract a plurality of video frames to be processed from a video stream to be processed; performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed; and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the horizontal rectangle in the plurality of video frames to be processed, the target video frame can process the video stream at a higher speed to extract a required representative frame, batch automatic processing can be realized, the efficiency is improved, and the labor cost and the time cost are reduced.
Based on the description of the method embodiment and the apparatus embodiment, the embodiment of the present application further provides a terminal device. Referring to fig. 7, the terminal device 700 includes at least a processor 701, an input device 702, an output device 703, and a computer storage medium 704. The processor 701, the input device 702, the output device 703, and the computer storage medium 704 in the terminal device 700 may be connected by a bus or other means.
A computer storage medium 704 may be stored in the memory of the terminal device 700, the computer storage medium 704 being configured to store a computer program comprising program instructions, and the processor 701 being configured to execute the program instructions stored by the computer storage medium 704. The processor 701 (or CPU) is a computing core and a control core of the terminal device 700, and is adapted to implement one or more instructions, and specifically, adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 701 according to the embodiment of the present application may be configured to perform a series of processes, including the method according to the embodiments shown in fig. 1 and fig. 5.
An embodiment of the present application further provides a computer storage medium (Memory), where the computer storage medium is a Memory device in a terminal device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer storage medium provides a storage space that stores an operating system of the terminal device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 701. It should be noted that the computer storage medium herein may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 701 to implement the corresponding steps in the above embodiments; in a specific implementation, one or more instructions in the computer storage medium may be loaded by the processor 701 and perform any step of the method in fig. 1 and/or fig. 5, which is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the module is only one logical division, and other divisions may be possible in actual implementation, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a random access memory (ram), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).
Claims (10)
1. A method for extracting video frames, the method comprising:
extracting a plurality of video frames to be processed from a video stream to be processed;
performing text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is the video frame with the largest maximum width value of the text line rectangle in the plurality of video frames to be processed.
2. The method according to claim 1, wherein said determining a target video frame from the plurality of video frames to be processed comprises:
step S1, constructing a video frame dictionary, wherein a key of the video frame dictionary is the maximum width value of a text line rectangle of a first video frame, the value of the video frame dictionary is the first video frame, and the first video frame belongs to the video frame to be processed;
step S2, obtaining the maximum width value of a text line rectangle of a second video frame, wherein the second video frame is the next frame of the first video frame in the video frames to be processed;
step S3, if the maximum width value of the text line rectangle of the second video frame is larger than the key of the video frame dictionary, updating the video frame dictionary; if the maximum width value of the text line rectangle of the second video frame is not larger than the key of the video frame dictionary, not updating the video frame dictionary;
step S4, regarding the second video frame as the first video frame;
and repeating the steps S2-S4 until all the video frames to be processed are processed, and determining the target video frame, wherein the target video frame is the value of the video frame dictionary.
3. The method of claim 2, wherein the updating the video frame dictionary comprises:
updating the key of the video frame dictionary to be the maximum width value of the text line rectangle of the second video frame;
and updating the value of the video frame dictionary to be the video frame of the second video frame.
4. The method according to claim 1, wherein said performing text line detection processing on the plurality of video frames to be processed to obtain a maximum width value of a text line rectangle of each video frame to be processed comprises:
performing text line detection processing on the plurality of video frames to be processed to obtain a text line in each video frame to be processed;
acquiring a text line rectangle which is the minimum bounding rectangle of the text line;
and calculating the widths of all the text line rectangles in each video frame to be processed to obtain the maximum width value of the text line rectangles of each video frame to be processed.
5. The method according to any one of claims 1 to 4, wherein said extracting a plurality of video frames to be processed from a video stream to be processed comprises:
and performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain a plurality of video frames to be processed, wherein the number of the video frames to be processed is less than that of all the video frames in the video stream to be processed.
6. The method according to claim 5, wherein said performing difference frame extraction on the video stream to be processed according to a preset extraction rule to obtain the plurality of video frames to be processed comprises:
based on a preset difference frame number N, performing difference frame extraction on the video stream to be processed according to a mode of extracting a video frame from every N frames to obtain a plurality of video frames to be processed, wherein N is an integer greater than 1; alternatively, the first and second electrodes may be,
and according to a preset time interval, performing difference frame extraction on the video stream to be processed to obtain a plurality of video frames to be processed.
7. The method of claim 1, further comprising:
performing text recognition processing on the plurality of video frames to be processed, and determining whether the plurality of video frames to be processed contain similar frames, wherein the intra-frame character contents of the similar frames are the same;
under the condition that the plurality of video frames to be processed contain the same type of frames, performing text line detection processing on the same type of frames to obtain the width sum of a text line rectangle of each frame in the same type of frames;
and determining a representative video frame from the homogeneous frame, wherein the representative video frame is a video frame with the maximum sum of the widths of the horizontal rectangles in the homogeneous frame.
8. A video frame extraction apparatus, comprising:
the frame extraction module is used for extracting a plurality of video frames to be processed from the video stream to be processed;
the text line detection module is used for carrying out text line detection processing on the plurality of video frames to be processed to obtain the maximum width value of a text line rectangle of each video frame to be processed;
and the determining module is used for determining a target video frame from the plurality of video frames to be processed, wherein the target video frame is a video frame with the largest maximum width value of the rectangle of the text line in the plurality of video frames to be processed.
9. A terminal device, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the video frame extraction method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the video frame extraction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210223894.5A CN114598921B (en) | 2022-03-07 | 2022-03-07 | Video frame extraction method, device, terminal equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210223894.5A CN114598921B (en) | 2022-03-07 | 2022-03-07 | Video frame extraction method, device, terminal equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114598921A true CN114598921A (en) | 2022-06-07 |
CN114598921B CN114598921B (en) | 2024-04-12 |
Family
ID=81807060
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210223894.5A Active CN114598921B (en) | 2022-03-07 | 2022-03-07 | Video frame extraction method, device, terminal equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114598921B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116090417A (en) * | 2023-04-11 | 2023-05-09 | 福昕鲲鹏(北京)信息科技有限公司 | Layout document text selection rendering method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6937766B1 (en) * | 1999-04-15 | 2005-08-30 | MATE—Media Access Technologies Ltd. | Method of indexing and searching images of text in video |
CN111768346A (en) * | 2020-05-12 | 2020-10-13 | 北京奇艺世纪科技有限公司 | Method, device and equipment for correcting back image of identity card and storage medium |
CN113033552A (en) * | 2021-03-19 | 2021-06-25 | 北京字跳网络技术有限公司 | Text recognition method and device and electronic equipment |
CN113408241A (en) * | 2021-07-16 | 2021-09-17 | 网易(杭州)网络有限公司 | Text data processing method and device, electronic equipment and readable medium |
CN113591530A (en) * | 2021-02-24 | 2021-11-02 | 腾讯科技(深圳)有限公司 | Video detection method and device, electronic equipment and storage medium |
-
2022
- 2022-03-07 CN CN202210223894.5A patent/CN114598921B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6937766B1 (en) * | 1999-04-15 | 2005-08-30 | MATE—Media Access Technologies Ltd. | Method of indexing and searching images of text in video |
CN111768346A (en) * | 2020-05-12 | 2020-10-13 | 北京奇艺世纪科技有限公司 | Method, device and equipment for correcting back image of identity card and storage medium |
CN113591530A (en) * | 2021-02-24 | 2021-11-02 | 腾讯科技(深圳)有限公司 | Video detection method and device, electronic equipment and storage medium |
CN113033552A (en) * | 2021-03-19 | 2021-06-25 | 北京字跳网络技术有限公司 | Text recognition method and device and electronic equipment |
CN113408241A (en) * | 2021-07-16 | 2021-09-17 | 网易(杭州)网络有限公司 | Text data processing method and device, electronic equipment and readable medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116090417A (en) * | 2023-04-11 | 2023-05-09 | 福昕鲲鹏(北京)信息科技有限公司 | Layout document text selection rendering method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114598921B (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107181976B (en) | Bullet screen display method and electronic equipment | |
CN109803180B (en) | Video preview generation method and device, computer equipment and storage medium | |
JP6240199B2 (en) | Method and apparatus for identifying object in image | |
CN108230346B (en) | Method and device for segmenting semantic features of image and electronic equipment | |
US9046991B2 (en) | System and method for dynamically displaying structurally dissimilar thumbnail images of an electronic document | |
CN110225366B (en) | Video data processing and advertisement space determining method, device, medium and electronic equipment | |
CN110688524B (en) | Video retrieval method and device, electronic equipment and storage medium | |
US20220172476A1 (en) | Video similarity detection method, apparatus, and device | |
JP7331146B2 (en) | Subtitle cross-border processing method, device and electronic device | |
CN111385665A (en) | Bullet screen information processing method, device, equipment and storage medium | |
EP3408752B1 (en) | Object management and visualization using a computing device | |
CN112149570A (en) | Multi-person living body detection method and device, electronic equipment and storage medium | |
CN114598921A (en) | Video frame extraction method and device, terminal equipment and storage medium | |
CN105184838A (en) | Picture processing method and terminal | |
EP3564833A1 (en) | Method and device for identifying main picture in web page | |
CN109522429B (en) | Method and apparatus for generating information | |
CN112215221A (en) | Automatic vehicle frame number identification method | |
CN113391779B (en) | Parameter adjusting method, device and equipment for paper-like screen | |
CN111127310B (en) | Image processing method and device, electronic equipment and storage medium | |
CN113676734A (en) | Image compression method and image compression device | |
CN114640876A (en) | Multimedia service video display method and device, computer equipment and storage medium | |
CN112988005A (en) | Method for automatically loading captions | |
CN111083552A (en) | Thumbnail generation method, device, equipment and medium | |
CN111208955A (en) | Printing method, printing device and server | |
CN113448470B (en) | Webpage long screenshot method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |