CN110213614B

CN110213614B - Method and device for extracting key frame from video file

Info

Publication number: CN110213614B
Application number: CN201910380622.4A
Authority: CN
Inventors: 何茜
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-05-08
Filing date: 2019-05-08
Publication date: 2021-11-02
Anticipated expiration: 2039-05-08
Also published as: CN110213614A

Abstract

The embodiment of the disclosure discloses a method and a device for extracting key frames from a video file. One embodiment of the method comprises: determining the size of each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on file header information decoded from the acquired video file to be processed; and selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image. This embodiment reduces the workload of extracting key frames.

Description

Method and device for extracting key frame from video file

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for extracting key frames from a video file.

Background

Video files often consist of a large number of image frames. When processing a video file, a certain number of key frames may be extracted from the video file first, and then the video file to be processed is processed by processing the extracted key frames. In practice, the key frames are usually image frames containing more features in the video file to be processed.

At present, a certain number of key frames are extracted from a video file mainly through a built-in interface. Specifically, the features contained in each frame of image are extracted from each frame of image through the interface, and then a certain number of image frames are selected as key frames through the analysis result of the features contained in each frame of image.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for extracting key frames from a video file.

In a first aspect, an embodiment of the present disclosure provides a method for extracting a key frame from a video file, the method including: determining the size of each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on file header information decoded from the acquired video file to be processed; and selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In some embodiments, the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image includes: selecting a target number of image frames from a video file to be processed, and executing the following weight determination steps: determining the file size of the selected image frame; determining a time interval parameter of the selected image frame based on a time interval between any adjacent image frames in the selected image frame; determining the selection weight of the selected image frame according to the file size and the time interval parameter of the selected image frame; and in response to the fact that the selection weight of the selected image frame is larger than that of any other target number of image frames in the video file to be processed, taking the selected image frame as the target number of key frames selected from the video file to be processed.

In some embodiments, the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image includes: and selecting a target number of key frames from the image frames of which the occupied sizes are larger than a preset threshold value in the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In some embodiments, before the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, the method further includes: displaying a relation graph between the size occupied by each frame of image in the generated video file to be processed and the time indicated by the timestamp information of each frame of image; and selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, wherein the key frames comprise: and selecting a target number of key frames from the image frames indicated by the detected user selection operation for the relational graph.

In some embodiments, before the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, the method further includes: and taking the selected number of the received image frames input by the user as a target number.

In a second aspect, an embodiment of the present disclosure provides an apparatus for extracting a key frame from a video file, the apparatus including: a determining unit configured to determine a size occupied by each frame of image in the video file to be processed and a time indicated by timestamp information of each frame of image based on header information decoded from the acquired video file to be processed; and the selecting unit is configured to select a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In some embodiments, the selecting unit includes: a determining module configured to select a target number of image frames from the video file to be processed, and perform the following weight determining steps: determining the file size of the selected image frame; determining a time interval parameter of the selected image frame based on a time interval between any adjacent image frames in the selected image frame; determining the selection weight of the selected image frame according to the file size and the time interval parameter of the selected image frame; and the selecting module is configured to respond to the selecting weight of the selected image frame being larger than the selecting weight of any other target number of image frames in the video file to be processed, and take the selected image frame as the target number of key frames selected from the video file to be processed.

In some embodiments, the selecting unit is further configured to: and selecting a target number of key frames from the image frames of which the occupied sizes are larger than a preset threshold value in the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In some embodiments, the above apparatus further comprises: a display unit configured to display a relational graph between a size occupied by each frame of image in the generated video file to be processed and a time indicated by timestamp information of each frame of image; the selecting unit is further configured to: and selecting a target number of key frames from the image frames indicated by the detected user selection operation for the relational graph.

In some embodiments, the above apparatus further comprises: a receiving unit configured to take the received image frame selection number input by the user as a target number.

In a third aspect, an embodiment of the present disclosure provides a terminal, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor implements the method as described in any of the implementations of the first aspect.

The embodiment of the disclosure provides a method and a device for extracting key frames from a video file, wherein the method comprises the following steps: firstly, decoding the acquired video file to be processed to obtain file header information; then, based on the file header information, determining the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image; then, a target number of key frames can be selected from the video file to be processed based on the determined size occupied by each frame of image and the time indicated by the timestamp information of each frame of image. Thus, the workload of extracting key frames is reduced.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of a method of extracting key frames from a video file, according to the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of a method of extracting key frames from a video file according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method of extracting key frames from a video file according to the present disclosure;

FIG. 5 is a schematic block diagram illustrating an embodiment of an apparatus for extracting key frames from a video file according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary architecture 100 to which the method of extracting key frames from a video file or the apparatus of extracting key frames from a video file of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and data server 103. Network 102 is the medium used to provide a communication link between terminal device 101 and data server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The terminal apparatus 101 interacts with the data server 103 through the network 102 to receive or transmit messages and the like. Various communication client applications, such as an image processing application, a video processing application, a search application, and the like, may be installed on the terminal device 101.

The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices that have a display screen and support information transmission, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal device 101 is software, it can be installed in the electronic devices listed above, and it can be implemented as multiple pieces of software or software modules, or as a single piece of software or software modules. And is not particularly limited herein.

The data server 103 may be hardware or software. When the data server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the data server 103 is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. And is not particularly limited herein.

The data server 103 may be a server that provides various services. As an example, the terminal device 101 may select a target number of key frames from the acquired video file to be processed, and then store the video file to be processed and the selected target number of key frames in association with the data server 103.

It should be noted that the method for extracting a key frame from a video file provided by the embodiment of the present disclosure is executed by the terminal device 101 in some application scenarios, and accordingly, the apparatus for extracting a key frame from a video file is disposed in the terminal device 101. In other application scenarios, the method is performed by a server in communication connection with the terminal device, and accordingly, the means for extracting key frames from the video file is disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of extracting key frames from a video file in accordance with the present disclosure is shown. The method for extracting the key frame from the video file comprises the following steps:

step 201, determining the size of each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on the file header information decoded from the acquired video file to be processed.

In the present embodiment, an execution subject of the method of extracting a key frame from a video file (such as the terminal device 101 shown in fig. 1) may acquire a video file to be processed from a local or communicatively connected database server (such as the database server 103 shown in fig. 1).

In this embodiment, after the video file to be processed is acquired, the execution main body may decode the video file to be processed. Specifically, the execution main body may parse the video file to be processed according to a file format of the video file to be processed, and then may decode the video file to be processed according to an encoding format of a video stream in the video file to be processed.

In this embodiment, after decoding the video file to be processed, the execution main body may obtain the header information in the video file to be processed. In general, information related to each frame of image may be included in the header information. For example, a start byte and a stop byte of each frame of image in the video file to be processed may be included. For example, time stamp information for each frame of image may also be included. The timestamp information may be used to represent a time corresponding to the image frame on a time axis of the video file to be processed. It should be noted that, according to different file formats of the video file to be processed, one or more file header information may be included in the video file to be processed.

In this embodiment, after obtaining the header information of the video file to be processed, the execution main body may determine the size occupied by each frame of image according to the difference between the ending byte and the starting byte of each frame of image. Further, the execution body described above may also determine the time indicated by the time stamp information of each frame image.

Step 202, selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In this embodiment, after determining the size occupied by each frame of image and the time indicated by the timestamp information of each frame of image, the execution main body may further select a target number of key frames from the video file to be processed. Wherein the target number may be a preset value. The target number may also be a value determined according to actual requirements, for example the target number may be a value associated with the number of image frames comprised by the video file to be processed. In practice, the association relationship between the number of image frames included in the video file and the target number may be established in advance, and thus, the execution may determine the target number according to the number of image frames included in the video file to be processed on the basis of the association relationship.

As an example, first, the execution body may divide the time indicated by the time stamp information of each image frame into a preset number of sections in the order of smaller time indicated by the time stamp information of each image frame. Then, the execution main body may select a certain number of image frames occupying a size greater than a preset threshold from each of the divided intervals, and further obtain a target number of key frames.

In some optional implementations of this embodiment, the execution main body may select a target number of key frames from image frames of which the size is larger than a preset threshold in the video file to be processed.

Specifically, first, the execution main body may select, according to the size occupied by each frame of image, an image frame whose occupied size is larger than a preset threshold from the video file to be processed. Then, the execution main body may further select, from the selected image frames, a target number of image frames in which a difference between time points indicated by timestamp information of adjacent image frames is greater than a preset difference, to obtain a target number of key frames. Here, the adjacent image frames may be two image frames selected to be closest to the time indicated by the time stamp information.

In the implementation manners, after the image frames with the size larger than the preset threshold value are selected from the video files to be processed, the unselected image frames do not need to be processed, so that the workload of executing the main body is reduced, and the time for extracting the key frames is further shortened.

In some optional implementations of this embodiment, before the target number of key frames is selected from the video file to be processed, the execution main body may further display a relationship diagram between a size occupied by each frame of image in the generated video file to be processed and a time indicated by timestamp information of each frame of image.

Specifically, first, the execution body described above may generate a relational map between the size occupied by each frame image and the time indicated by the time stamp information of each frame image. In practice, the above-mentioned relationship diagram is usually a scatter diagram. It is understood that each coordinate point in the scatter diagram may be derived from the size occupied by one frame image and the time indicated by the time stamp information of the frame image. The execution body may then display the relationship graph.

It can be understood that after the relationship diagram is displayed, the user can perform a user selection operation on the relationship diagram according to the actual situation. The user selection operation may be an operation of selecting a part of coordinate points from the relationship diagram.

Accordingly, the execution subject may select a target number of key frames from the image frames indicated by the detected user selection operation for the relationship diagram.

Specifically, in response to detecting a user selection operation, the execution main body may determine the coordinate point selected by the user according to the user selection operation. Further, the execution subject may select a target number of key frames from the image frames indicated by the selected coordinate points in a method similar to that described above. In practice, the execution body may detect the user selection operation through a built-in interface.

In the implementation manners, the rough range of the selected key frames is determined through the selection operation of the user, and then the key frames are selected according to the requirements of the user.

In some optional implementations of the embodiment, before selecting a target number of key frames from the video file to be processed, the execution subject may use the received user input image frame selection number as the target number.

In these implementations, the target number may be determined according to the needs of the user, making the determination of the target number more flexible.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for extracting a key frame from a video file according to the present embodiment. In the application scenario of fig. 3, first, the terminal device 301 may obtain a to-be-processed video file 302 from a communicatively connected database server. Then, the terminal device 301 may decode the acquired to-be-processed video file 302 to obtain the header information 303 in the to-be-processed video file 302. Thus, the terminal device 301 can determine the size occupied by each frame of image in the video file to be processed 302 and the time indicated by the timestamp information of each frame of image according to the decoded header information 303.

Then, the terminal device 301 may select, according to the size occupied by each frame of image, an image frame whose occupied size is greater than a preset threshold from the video file 302 to be processed. Then, the terminal device 301 may further select, from the selected image frames, a target number of image frames whose difference between the time points indicated by the timestamp information of the adjacent image frames is greater than a preset difference, that is, the target number of key frames 304.

Currently, in extracting key frames from a video file to be processed, one of the prior arts, as described in the background of the present disclosure, extracts a certain number of key frames from the video file to be processed through a built-in interface. In general, it takes a long time and a high amount of calculation to execute a subject in extracting and analyzing the features included in each frame of image. In the method provided by the above embodiment of the present disclosure, the size occupied by each frame of image and the time indicated by the timestamp information of each frame of image are determined through the header information in the video file to be processed. In general, the larger the size an image frame occupies in a video file to be processed, the more features it contains, and the more similar the features that adjacent image frames contain. Therefore, on the premise of not extracting the features contained in each frame of image, the key frames with larger feature difference can be selected from the video file to be processed according to the size of each frame of image and the time indicated by the time stamp information. Since it is not necessary to extract features for each frame of image and analyze features included in each frame of image, it is possible to reduce the workload of executing a subject and to shorten the time for extracting a key frame.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method of extracting key frames from a video file is shown. The process 400 of the method for extracting key frames from a video file comprises the following steps:

step 401, determining the size of each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on the file header information decoded from the acquired video file to be processed.

Step 401 may be performed in a similar manner as step 201 in the embodiment shown in fig. 2, and the above description for step 201 also applies to step 401, which is not described herein again.

Step 402, selecting a target number of image frames from a video file to be processed, and performing the following weight determination steps: determining the file size of the selected image frame; determining a time interval parameter of the selected image frame based on a time interval between any adjacent image frames in the selected image frame; and determining the selection weight of the selected image frame according to the file size and the time interval parameter of the selected image frame.

In the present embodiment, after determining the size occupied by each frame of image and the time indicated by the time stamp information of each frame of image, the execution subject of the method of extracting a key frame from a video file (for example, the terminal device 101 shown in fig. 1) may select a target number of image frames from a video file to be processed, and then perform the weight determination step described below on the selected target number of image frames.

First, the execution body may determine a file size of the selected target number of image frames. Here, the file size may beThe sum of the sizes of the target number of image frames in the video file to be processed. In practice, the file size can be described as

Wherein i is a number indicating each selected image frame, A_iFor representing the size occupied by the ith frame image in the video file to be processed, and m for representing the total number of the selected image frames, i.e. the target number.

Furthermore, the execution subject may determine a time interval between any adjacent image frames among the selected target number of image frames. Here, the time interval may be a time difference between time instants indicated by time stamp information of adjacent image frames. In practice, the time interval may be described as "I_i+1-I_j", wherein, I_iAnd the time indicated by the time stamp information used for representing the ith frame image. In practice, I_iThe value of i is increased, that is, the time indicated by the timestamp information of the image frame in the selected target number of image frames is increased with the increasing of the sequence number of the image frame.

In general, in a video file to be processed, features contained in adjacent multi-frame images are similar. Therefore, in practical operation, it is necessary to ensure that the selected key frames are uniformly distributed on the time axis of the video file to be processed as much as possible so as to select key frames with larger information difference. After determining the time interval between each adjacent image frame, the execution body may further determine a time interval parameter for the selected target number of image frames. Here, the time interval parameter may be a parameter for characterizing a degree of uniform distribution of the target number of image frames. In practice, the time interval parameter may be described as

Wherein, I'₁Used for representing the minimum time, I ', indicated by each time stamp information in the video file to be processed'_nFor indicating the maximum time indicated by each time stamp information in the video file to be processed,

may be an average of the determined time intervals, which may be described as

It will be appreciated that the smaller the time interval parameter, the more evenly the distribution of the number of image frames of the selected target is.

Then, the execution subject may determine the selection weight of the target number of image frames according to the file size and the time interval parameter of the target number of image frames. Wherein the selection weight may be used to characterize the degree of criticality and the degree of uniformity of the target number of image frames. In practice, the selection weight may be the ratio between the determined file size and the time interval parameter, i.e. the ratio

It will be appreciated that the larger the file size and the smaller the time interval parameter, the greater the determined selection weight.

In practice, the execution subject may select a target number of image frames from the video file to be processed multiple times, perform the weight determination steps described above, and then determine the corresponding selection weights.

And step 403, in response to that the selection weight of the selected image frame is greater than that of any other target number of image frames in the video file to be processed, taking the selected image frame as a target number of key frames selected from the video file to be processed.

After determining the plurality of selection weights, the execution subject may use the target number of image frames indicated by the maximum selection weight as the selected target number of key frames.

As can be seen from fig. 4, compared with the embodiment shown in fig. 2, the flow 400 of the method for extracting a key frame from a video file in the present embodiment represents a step of determining the selection weight of the selected target number of image frames. Therefore, the scheme described in this embodiment can select a target number of key frames from the video file to be processed according to the determined multiple selection weights. Therefore, when the key frames with larger characteristic difference are selected, the selected key frames are ensured to be uniformly distributed on the time axis of the video file to be processed.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for extracting a key frame from a video file, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for extracting a key frame from a video file provided by the present embodiment includes a determining unit 501 and a selecting unit 502. Wherein, the determining unit 501 may be configured to: and determining the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on the file header information decoded from the acquired video file to be processed. The selecting unit 502 may be configured to: and selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In the present embodiment, in the apparatus 500 for extracting a key frame from a video file: the specific processing of the determining unit 501 and the selecting unit 502 and the technical effects thereof can refer to the related descriptions of step 201 and step 202 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementation manners of this embodiment, the selecting unit 502 includes: a determination module (not shown) and a selection module (not shown). Wherein the determining module may be configured to: selecting a target number of image frames from a video file to be processed, and executing the following weight determination steps: determining the file size of the selected image frame; determining a time interval parameter of the selected image frame based on a time interval between any adjacent image frames in the selected image frame; and determining the selection weight of the selected image frame according to the file size and the time interval parameter of the selected image frame. The selecting module may be configured to: and in response to the fact that the selection weight of the selected image frame is larger than that of any other target number of image frames in the video file to be processed, taking the selected image frame as the target number of key frames selected from the video file to be processed.

In some optional implementations of this embodiment, the selecting unit 502 may be further configured to: and selecting a target number of key frames from the image frames of which the occupied sizes are larger than a preset threshold value in the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

In some optional implementations of this embodiment, the apparatus 500 may further include: a display unit (not shown in the figure). Wherein the display unit may be configured to: and displaying a relation graph between the occupied size of each frame of image in the generated video file to be processed and the time indicated by the timestamp information of each frame of image. The selecting unit 502 may be further configured to: and selecting a target number of key frames from the image frames indicated by the detected user selection operation for the relational graph.

In some optional implementations of this embodiment, the apparatus 500 may further include: a receiving unit (not shown in the figure). A receiving unit, which may be configured to: and taking the selected number of the received image frames input by the user as a target number.

The above embodiments of the present disclosure provide an apparatus: first, the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image can be determined by the determining unit 501 based on the header information decoded from the acquired video file to be processed; then, a target number of key frames may be selected from the video file to be processed by the selecting unit 502 based on the determined size occupied by each frame of image and the time indicated by the timestamp information of each frame of image. Thus, the workload of extracting key frames is reduced.

Referring now to fig. 6, shown is a schematic diagram of an electronic device (e.g., terminal device in fig. 1) 600 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be included in the terminal device; or may exist separately without being assembled into the terminal device. The computer readable medium carries one or more programs which, when executed by the terminal device, cause the terminal device to: determining the size of each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on file header information decoded from the acquired video file to be processed; and selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a determination unit and a selection unit. The names of the units do not form a limitation on the units themselves in some cases, and for example, the selecting unit may be further described as "a unit that selects a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of extracting key frames from a video file, comprising:

determining the size of each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image based on file header information decoded from the acquired video file to be processed;

selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, wherein the selecting comprises the following steps: and selecting image frames with the occupied size larger than a preset threshold value from the video file to be processed according to the size of each image frame, and selecting image frames with the target number, wherein the difference between the moments indicated by the timestamp information of the adjacent image frames is larger than a preset difference value, from the selected image frames to obtain the key frames with the target number.

2. The method of claim 1, wherein the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image comprises:

selecting a target number of image frames from the video file to be processed, and executing the following weight determination steps: determining the file size of the selected image frame; determining a time interval parameter of the selected image frame based on a time interval between any adjacent image frames in the selected image frame; determining the selection weight of the selected image frame according to the file size and the time interval parameter of the selected image frame;

and in response to the fact that the selection weight of the selected image frame is larger than the selection weight of any other target number of image frames in the video file to be processed, taking the selected image frame as the target number of key frames selected from the video file to be processed.

3. The method of claim 1, wherein before the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, the method further comprises:

displaying a relation graph between the size occupied by each frame of image in the generated video file to be processed and the time indicated by the timestamp information of each frame of image; and

the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image comprises:

and selecting a target number of key frames from the detected image frames indicated by the user selection operation aiming at the relational graph.

4. The method according to any one of claims 1-3, wherein before the selecting a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, the method further comprises:

and taking the selected number of the received image frames input by the user as the target number.

5. An apparatus for extracting key frames from a video file, comprising:

a determining unit configured to determine a size occupied by each frame of image in the video file to be processed and a time indicated by timestamp information of each frame of image based on file header information decoded from the acquired video file to be processed;

the selecting unit is configured to select a target number of key frames from the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image, and comprises the following steps: and selecting image frames with the occupied size larger than a preset threshold value from the video file to be processed according to the size of each image frame, and selecting image frames with the target number, wherein the difference between the moments indicated by the timestamp information of the adjacent image frames is larger than a preset difference value, from the selected image frames to obtain the key frames with the target number.

6. The apparatus of claim 5, wherein the selecting unit comprises:

a determining module configured to select a target number of image frames from the video file to be processed, and perform the following weight determining steps: determining the file size of the selected image frame; determining a time interval parameter of the selected image frame based on a time interval between any adjacent image frames in the selected image frame; determining the selection weight of the selected image frame according to the file size and the time interval parameter of the selected image frame;

a selecting module configured to take the selected image frame as a key frame of a target number selected from the video file to be processed in response to the selection weight of the selected image frame being greater than the selection weight of any other target number of image frames in the video file to be processed.

7. The apparatus of claim 5, wherein the selecting unit is further configured to:

and selecting a target number of key frames from the image frames of which the occupied sizes are larger than a preset threshold value in the video file to be processed based on the size occupied by each frame of image in the video file to be processed and the time indicated by the timestamp information of each frame of image.

8. The apparatus of any of claims 5-7, wherein the apparatus further comprises:

a receiving unit configured to take the received image frame selection number input by the user as the target number.

9. A terminal, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.