CN113301422A

CN113301422A - Method, terminal and storage medium for acquiring video cover

Info

Publication number: CN113301422A
Application number: CN202110565290.4A
Authority: CN
Inventors: 王豪
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2021-08-24
Anticipated expiration: 2041-05-24
Also published as: CN113301422B

Abstract

The application discloses a method, a terminal and a storage medium for acquiring a video cover, and belongs to the technical field of internet. The method comprises the following steps: acquiring page data of a video uploading page, and displaying the video uploading page based on the page data; acquiring a target video uploaded through a video uploading page, and performing first decoding on the target video based on a video coding tool to obtain a plurality of video frames with intermediate formats; based on a video decoding tool, carrying out secondary decoding on the video frame with the intermediate format corresponding to at least one preset time point to obtain at least one first decoded video frame; acquiring a target decoded video frame from at least one first decoded video frame to be used as a cover image of a target video; and uploading the cover page image through a video uploading page. The video decoding method and device can solve the problem that a video decoder in the prior art can only decode limited video formats and cannot acquire video covers.

Description

Method, terminal and storage medium for acquiring video cover

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method, a terminal, and a storage medium for acquiring a video cover.

Background

With the continuous development of the internet technology, users often upload self-made video files and video covers to a browser end through a browser, and the browser end sends the uploaded video files and video covers to a server, so that video sharing is achieved. And if the user does not make a video cover in advance, a certain decoded video frame in the video file can be used as the video cover when the video file is uploaded.

In the related art, a video decoder based on a browser performs decoding processing on a video file uploaded by a user to obtain at least one decoded video frame, and then the user can select a target decoded video frame from the at least one decoded video frame as a video cover.

In the process, the video decoder of the browser can only decode one or two video formats, and for the video file which cannot be decoded by the video decoder, the video frame in the video file cannot be obtained, and further the video cover cannot be obtained.

Disclosure of Invention

The embodiment of the application provides a method, a terminal and a storage medium for acquiring a video cover, which can solve the problem that a video decoder in the prior art can only decode limited video formats and further cannot acquire the video cover. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for acquiring a video cover, where the method includes:

acquiring page data of a video uploading page, and displaying the video uploading page based on the page data, wherein the page data comprises a video decoding tool;

acquiring a target video uploaded through the video uploading page, and performing first decoding on the target video based on the video coding tool to obtain a plurality of video frames with intermediate formats;

based on the video decoding tool, carrying out secondary decoding on the video frame with the intermediate format corresponding to at least one preset time point to obtain at least one first decoded video frame;

acquiring a target decoded video frame from the at least one first decoded video frame to serve as a cover image of the target video;

and uploading the cover page image through the video uploading page.

Optionally, the obtaining a target video uploaded through the video upload page, and performing first decoding on the target video based on the video coding tool to obtain a plurality of video frames in an intermediate format includes:

after a video uploading instruction is received, loading a video decoding tool in the page data, and after the video decoding tool is loaded, performing first decoding on the target video based on the video coding tool to obtain a plurality of video frames with intermediate formats; or

And after a video uploading instruction is received, loading the video decoding tool in the page data, and after the target video is uploaded, decoding the target video for the first time based on the video coding tool to obtain a plurality of video frames with intermediate formats.

Optionally, the obtaining, in the at least one first decoded video frame, a target decoded video frame as a cover image of the target video includes:

displaying the at least one first decoded video frame on the video upload page;

when a selection instruction for a target decoded video frame of the at least one first decoded video frame is received, determining the target decoded video frame as a cover image of the target video.

Optionally, after the at least one first decoded video frame is displayed on the video upload page, the method further includes:

when a manual cover searching instruction is received, displaying a time axis corresponding to the target video;

when a selection instruction corresponding to a target time point on the time axis is received, determining at least one video frame in an intermediate format from a plurality of video frames in the intermediate format stored in a terminal memory based on the target time point;

performing second decoding on the at least one video frame of the intermediate format based on the video decoding tool to obtain at least one second decoded video frame;

and acquiring a target decoded video frame from the at least one second decoded video frame to serve as a cover image of the target video.

Optionally, the determining, based on the target time point, at least one video frame in the intermediate format from among a plurality of video frames in the intermediate format stored in a terminal memory includes:

determining a target time range of preset duration by taking the target time point as a central time point;

and determining the video frames with the intermediate format in the target time range from the plurality of video frames with the intermediate format stored in the terminal memory.

and determining a first video frame corresponding to the target time point, and a preset number of video frames before and a preset number of video frames after the first video frame in a plurality of video frames with intermediate formats stored in the terminal memory.

Optionally, the method further includes:

in the at least one second decoded video frame, acquiring a target decoded video frame as a cover image of the target video, and deleting a plurality of video frames with intermediate formats stored in the terminal memory; or

And deleting a plurality of video frames in the intermediate format stored in the memory of the terminal after the cover page image is uploaded through the video uploading page.

Optionally, the performing, based on the video decoding tool, a second decoding on the video frame in the intermediate format corresponding to the at least one preset time point to obtain at least one decoded video frame includes:

determining frame serial numbers respectively corresponding to the at least one preset time point according to the at least one preset time point and the frame interval duration;

determining memory addresses corresponding to the intermediate format video frames respectively corresponding to the at least one preset time point according to the frame sequence numbers respectively corresponding to the at least one preset time point, the initial memory addresses corresponding to the first intermediate format video frames and the frame data amount corresponding to the intermediate format video frames;

and performing secondary decoding on the video frame with the intermediate format indicated by the memory address based on the video decoding tool to obtain at least one first decoded video frame.

In one aspect, an embodiment of the present application provides an apparatus for acquiring a video cover, where the apparatus includes:

the video display device comprises a first acquisition module, a second acquisition module and a display module, wherein the first acquisition module is configured to acquire page data of a video uploading page and display the video uploading page based on the page data, and the page data comprises a video decoding tool;

the first decoding module is configured to acquire a target video uploaded through the video uploading page, and perform first decoding on the target video based on the video coding tool to obtain a plurality of video frames in an intermediate format;

the second decoding module is configured to perform second decoding on the video frames in the intermediate format corresponding to the at least one preset time point based on the video decoding tool to obtain at least one first decoded video frame;

a second obtaining module configured to obtain a target decoded video frame as a cover image of the target video in the at least one first decoded video frame;

an upload module configured to upload the cover image through the video upload page.

Optionally, the first decoding module is configured to:

Optionally, the second obtaining module is configured to:

displaying the at least one first decoded video frame on the video upload page;

Optionally, the apparatus further comprises a manual cover finding module configured to:

Optionally, the manual cover finding module is configured to:

Optionally, the apparatus further includes a deletion module configured to:

Optionally, the second decoding module is configured to:

In one aspect, an embodiment of the present application provides a terminal, which includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the method for acquiring a video cover page described above.

In one aspect, an embodiment of the present application provides a computer-readable storage medium, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement the method for acquiring a video cover as described above.

In one aspect, the present application provides a computer program product or a computer program, where the computer program product or the computer program includes a computer program code, the computer program code is stored in a computer-readable storage medium, a processor of a computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device executes the above method for acquiring a video cover page.

When the decoded video frame is obtained, the video decoder of the browser is not used, but the video coding tool in the page data of the video uploading page is adopted, so that the problem that the video decoder can only decode limited video formats in the prior art and cannot obtain the video cover page is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment of a method for obtaining a video cover according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for obtaining a video cover according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a method for obtaining a video cover according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for obtaining a video cover according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a method for obtaining a video cover according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an apparatus for acquiring a video cover according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a method for acquiring a video cover according to an embodiment of the present application. Referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102.

The terminal 101 may include components such as a processor, memory, and the like. The processor, which may be a Central Processing Unit (CPU), may be configured to receive page data sent by the server, display a video upload page based on the page data, perform first decoding and second decoding on a target video based on a video decoding tool, and display a target decoded video frame. The Memory may be a RAM (Random Access Memory), a Flash (Flash Memory), and the like, and may be configured to store received data, data required by the processing procedure, data generated in the processing procedure, and the like, such as page data, a video frame in an intermediate format, at least one first decoded video frame, and the like. The terminal 101 may also include a transceiver, image detection components, a screen, audio output components, audio input components, and the like. The transceiver may be configured to perform data transmission with other devices, for example, to receive page data of a video upload page sent by a server, and the transceiver may include an antenna, a matching circuit, a modem, and the like. The image detection means may be a camera or the like. The screen may be a touch screen, may be used to display at least one decoded video frame, and the like. The audio output component may be a speaker, headphones, or the like. The audio input means may be a microphone or the like.

The server 102 may include components such as a processor, memory, and the like. The processor, which may be a CPU (Central Processing Unit), may be configured to send page data of the video upload page to the terminal. The Memory may be a RAM (Random Access Memory), a Flash Memory, or the like, and may be configured to store received data, data required by the processing procedure, data generated in the processing procedure, or the like, such as page data of a video upload page.

Fig. 2 is a flowchart of a method for acquiring a video cover according to an embodiment of the present disclosure. The embodiment is described by taking a terminal as an execution subject, and referring to fig. 2, the embodiment includes:

step 201, obtaining page data of a video uploading page, and displaying the video uploading page based on the page data.

The server stores page data corresponding to each webpage. The page data includes a script text file (such as a JavaScript file), a style text file (such as a CSS file), and text data, etc., the script text file is used to control the jumping and displaying logic in the web page, and the style text file is used to control the layout and display style of the page data in the web page. In the embodiment of the present application, the video decoding tool may be stored in the page data as a Webassembly file.

In implementation, when a user opens a video upload page on a browser, a terminal sends a request for acquiring video upload page data to a server. And after receiving the request sent by the terminal, the server sends the page data of the video uploading page to the terminal. After the terminal receives page data of the video uploading page sent by the server, the browser displays the video uploading page based on the page data.

It should be noted that the video decoding tool in the embodiment of the present application may include multiple decoding manners, or may include only several fixed decoding manners, where the several fixed decoding manners may be decoding manners corresponding to videos supported by the video upload page. For example, as shown in fig. 1, the video formats supported by the video upload page only include four video formats, namely mp4, mov, mkv, and webm, and therefore, the video decoding tool may only include decoding modes corresponding to the four video formats, so that the decoding modes included by the video decoding tool can be set specifically according to the characteristics of the video upload page, and meanwhile, the occupied space of the video decoding tool is reduced.

Step 202, acquiring a target video uploaded through a video uploading page, and performing first decoding on the target video based on a video coding tool to obtain a plurality of video frames in an intermediate format.

The target video in the embodiment of the application is obtained through two encoding processes, the first encoding process is to encode decoded video frames to obtain video frames in an intermediate format, and the second encoding process is to encode all the video frames in the intermediate format to obtain the target video. Since the target video in this application is obtained through two encoding processes, two decoding processes are required. The first decoding process corresponds to step 202 in the present embodiment, and the second decoding process corresponds to step 203.

The intermediate format video frame described above is a video frame obtained by encoding a decoded video frame, but both the intermediate format video frame and the decoded video frame are video frames, and the difference is that the data amount corresponding to the intermediate format video frame is smaller than the data amount of the decoded video frame.

Optionally, after the terminal determines the target video to be uploaded, the target video may be decoded for the first time by the video decoding tool. The method comprises the following specific steps: and after the video decoding tool is loaded, decoding the target video for the first time based on the video coding tool to obtain a plurality of video frames with intermediate formats.

Wherein, the target video is the video that the user needs to upload.

In implementation, a video uploading instruction is received at a terminal, a storage address of a target video in a terminal hard disk is determined, the target video can be found according to the storage address, a binary target video is read and stored in a terminal memory, and finally a loaded video decoding tool is used for decoding the target video in the terminal memory for the first time to obtain a plurality of video frames with intermediate formats.

Optionally, after the terminal finishes uploading the target video, the target video is decoded for the first time by the video decoding tool to obtain a plurality of video frames in the intermediate format. The method comprises the following specific steps: and after the video uploading instruction is received, loading a video decoding tool in the page data, and after the target video is uploaded, decoding the target video for the first time based on the video coding tool to obtain a plurality of video frames with intermediate formats.

For example, as shown in fig. 3, when the user clicks on a selection video file, the video file selection page is entered. And the user selects the video file in the video file selection page, clicks a determined button on the video file selection page, and returns to the video uploading page. And meanwhile, taking the instruction corresponding to the determined button as a video uploading instruction, and loading a video decoding tool in the page data. After the video decoding tool is loaded, or after the video decoding tool is loaded and the target video is uploaded, the loaded video decoding tool is used for decoding the target video for the first time to obtain a plurality of video frames with intermediate formats.

And 203, based on the video decoding tool, performing secondary decoding on the video frame with the intermediate format corresponding to the at least one preset time point to obtain at least one first decoded video frame.

The page data may store at least one preset time point, and the terminal may acquire the at least one preset time point when receiving the page data.

In implementation, the terminal acquires at least one preset time point in the page data, and performs second decoding on the video frame in the intermediate format corresponding to the at least one preset time point based on the video decoding tool to obtain at least one first decoded video frame.

Optionally, determining frame numbers respectively corresponding to at least one preset time point according to the at least one preset time point and the frame interval duration; determining memory addresses corresponding to the intermediate format video frames respectively corresponding to at least one preset time point according to the frame number respectively corresponding to the at least one preset time point, the initial memory address corresponding to the first intermediate format video frame and the frame data amount corresponding to the intermediate format video frame; and performing secondary decoding on the video frame of the intermediate format indicated by the memory address based on the video decoding tool to obtain at least one first decoded video frame.

The frame interval duration corresponding to each intermediate format video frame is equal, and the frame interval duration is preset by a technician, for example, the frame interval duration is 10 ms. The terminal allocates equal storage space for each intermediate format video frame, i.e. the amount of frame data corresponding to each intermediate format video frame is equal. When the terminal reads the target video stored in the hard disk into a binary target video and stores the binary target video in the memory, the terminal can directly acquire the initial memory address corresponding to the target video.

In implementation, for any one of at least one preset time point, determining a frame number corresponding to the preset time point according to the preset time point and a preset frame interval duration. And determining the number of the frames spaced between the video frame in the intermediate format corresponding to the preset time point and the video frame in the first intermediate format according to the frame sequence number corresponding to the preset time point. Then, multiplying the frame number and the data volume to obtain a product of the frame number and the data volume, adding the product and the initial memory address to obtain a sum of the product and the initial memory address, and further determining the memory address corresponding to the preset time point. And determining the memory address corresponding to each preset time point by the same method. And the terminal uses a video decoding tool to carry out secondary decoding on the video frame with the intermediate format indicated by the memory address to obtain at least one first decoded video frame.

In the embodiment of the present application, the preset time point which does not meet the condition may also be deleted. The terminal stores the duration corresponding to the target video, or reads the duration corresponding to the target video when the target video is uploaded on the video uploading page. Before the terminal determines the frame number corresponding to at least one preset time point, the maximum time point corresponding to the target video is determined according to the duration corresponding to the target video. And reserving preset time points which are less than or equal to the maximum time point, and determining first decoded video frames corresponding to the preset time points respectively based on the reserved preset time points. Deleting the preset time points which are larger than the maximum time point, and not determining the first decoding video frames respectively corresponding to the preset time points.

Or the terminal stores the maximum frame number corresponding to the target video. After determining the frame numbers respectively corresponding to at least one preset time point, the terminal may further obtain a maximum frame number corresponding to the target video, reserve the frame numbers less than or equal to the maximum frame number, and determine, based on the reserved frame numbers, first decoded video frames respectively corresponding to the reserved frame numbers. Deleting the frame numbers greater than the maximum frame number without performing the step of determining the first decoded video frames corresponding to the frame numbers respectively.

Or the terminal stores the termination memory address corresponding to the target video. The step of obtaining the termination memory address comprises the following steps: when the terminal stores the binary file corresponding to the target video in the terminal memory, the terminal can also obtain the termination memory address corresponding to the video. After the terminal determines the memory addresses corresponding to the video frames in the intermediate format respectively corresponding to at least one preset time point, the terminal determines whether the memory addresses are before the termination memory address. And if the memory address is before the termination memory address, performing second decoding on the video frame in the intermediate format indicated by the memory address based on the video decoding tool to obtain at least one first decoded video frame. If the memory address is after the terminating memory address, the memory address is deleted.

Step 204, in at least one first decoding video frame, acquiring a target decoding video frame as a cover image of the target video.

Optionally, displaying at least one first decoded video frame on the video upload page; when a selection instruction for a target decoded video frame among the at least one first decoded video frame is received, the target decoded video frame is determined as a cover image of the target video.

In an implementation, at least one first decoded video is displayed on a video upload page. And when the terminal receives the selection instruction, determining the target decoding video frame corresponding to the selection instruction as a cover image of the target video.

Taking the example that the terminal acquires a preset time point in the page data, that is, after the second decoding, only one first decoded video frame can be obtained, so that the terminal can directly display the first decoded video frame as a cover image of the target video on the video uploading page, and the specific display result is shown in fig. 3.

Optionally, when a manual cover searching instruction is received, displaying a time axis corresponding to the target video; when a selection instruction corresponding to a target time point on a time axis is received, determining at least one video frame in an intermediate format from a plurality of video frames in the intermediate format stored in a terminal memory based on the target time point; performing secondary decoding on the at least one video frame with the intermediate format based on a video decoding tool to obtain at least one second decoded video frame; and acquiring a target decoded video frame as a cover image of the target video in the at least one second decoded video frame.

The manual cover searching instruction may be triggered by a virtual key on the terminal, may also be triggered by an entity key on the terminal, and may also be triggered by a specific gesture of the user, which is not specifically limited in this application. For example, when the user operates the mouse to click on the cover image on the page, a manual cover finding instruction is triggered.

In implementation, when none of the decoded video frames displayed by the video upload page is the cover page desired by the user, the user may click on a certain position on the video upload page, or some operation, to trigger a manual cover page search instruction. And the terminal receives a manual cover searching instruction and displays the time axis, so that the user can select a time point on the time axis. When the terminal receives a selection instruction corresponding to a target time point on a time axis, at least one video frame in an intermediate format is determined in a plurality of video frames in the intermediate format stored in a terminal memory based on the target time point. And the terminal uses the video decoding tool to carry out secondary decoding on the video frame with the at least one intermediate format to obtain at least one second decoded video frame. And the terminal displays at least one second decoded video frame, and takes the target decoded video frame as a cover image of the target video when receiving a selection instruction of the target decoded video frame.

As shown in fig. 6, when the user operates the mouse to click on a certain position on the page, a manual cover searching command is triggered. After triggering the manual find cover instruction, the terminal may display a manual find cover page that displays a timeline, a time control on the timeline, 6 initial second decoded video frames, and a plurality of buttons. The time control is used for a user to select time on the time axis, and the initial second decoded video frame is a second decoded video frame determined based on the initial time point. The user drags the time control to make the time control stay at a certain position. At this time, when the terminal detects that the staying time of the time control at a certain position exceeds a preset numerical value, the terminal can determine a target time point corresponding to the position, further determine at least one second decoded video frame according to the target time point, and display a thumbnail corresponding to the second decoded video frame on the manually searched cover page. And displaying the thumbnail corresponding to the 6 second decoded video frames on the manual searching cover page, and displaying the enlarged image of the second decoded video frames on the manual searching cover page after receiving the selection instruction of the thumbnail corresponding to the first second decoded video frame. And when the user clicks the determining button, taking the second decoded video frame as a cover image of the target video.

The embodiment of the application provides two modes for determining at least one video frame with an intermediate format based on a target time point, and the specific steps are as follows:

in the first mode, a first video frame corresponding to a target time point, and a preset number of video frames before and a preset number of video frames after the first video frame are determined from a plurality of video frames in an intermediate format stored in a terminal memory.

Wherein the preset number may be preset empirically by the technician, for example, the technician sets the preset number to 5.

In implementation, in a plurality of intermediate format video frames stored in a terminal memory, a frame number corresponding to a target time point is determined according to the target time point and a frame interval duration, and then a first video frame corresponding to the target time point, and a preset number of video frames before and a preset number of video frames after the first video frame are determined.

The video frames with the preset number of front/back of the first video frame may be the video frames with the preset number of intermediate formats adjacent to the front/back of the first video frame, or the video frames with the preset number of intermediate formats not adjacent to the front/back of the first video frame.

In order to ensure that the difference between a preset number of video frames in the intermediate format is large, the similarity between the video frames in the intermediate format can be detected in advance, and then the video frames in the intermediate format with the similarity smaller than a preset threshold value are recommended to a user. Taking the determination of n nonadjacent video frames of the intermediate format after the first video frame as an example, the terminal is provided with a similarity calculation model, the similarity calculation model calculates the similarity between the first video frame and the video frame of the intermediate format after the first video until determining a second video frame of which the similarity with the first video is lower than a preset threshold, and the second video frame is taken as a video frame of the first intermediate format in the n nonadjacent video frames. And then, the similarity calculation model calculates the similarity of the second video frame and the video frame of the intermediate format behind the second video frame until a third video frame with the similarity lower than the preset similarity between the first video frame and the second video frame is determined, and the third video frame is used as the video frame of the second intermediate format in the next n nonadjacent video frames of the intermediate format. By the method, the video frame of the nth intermediate format in the next n nonadjacent video frames of the intermediate format is determined. Wherein n represents a preset number and is an integer greater than or equal to 1.

The similarity calculation model may be a machine learning model or a non-machine learning model. For example, the similarity calculation model may calculate a euclidean distance between any two video frames in the intermediate format, and calculate the similarity between the two video frames in the intermediate format based on the euclidean distance and the Softmax function.

In an alternative scheme, after a first video frame corresponding to a target time point is determined and before a preset number of video frames before and a preset number of video frames after the first video frame are determined, the terminal analyzes the content of the video, and then classifies the video according to the content of the video to obtain a category corresponding to the video. Based on the category corresponding to the video and the relationship between the category and the extracted content, video frames corresponding to the extracted content in the video are preferentially extracted, and the front preset number of video frames and the rear preset number of video frames of the first video frame are determined in the video frames.

For example, when the category corresponding to the video is determined to be a dance category, video frames corresponding to the dance category and including faces are preferentially extracted, and in the videos including the faces, a preset number of video frames before and a preset number of video frames after the first video frame are determined.

In an alternative, before determining the first video frame corresponding to the target time point, the priority of each video frame may be set based on the content of the video frame, and then, in the video frames with high priority, the first preset number of video frames and the second preset number of video frames of the first video frame are determined.

Based on the content of the video frames, the priority of each video frame may be set to be high, and the priority of the pure color video frame may be set to be low.

In one alternative, the pure color video frames are filtered before the first video frame corresponding to the target time point is determined. And determining the first video frame and the front preset number of video frames and the rear preset number of video frames of the first video frame from the filtered video frames.

In the second mode, a target time range with preset duration is determined by taking a target time point as a central time point; and determining the video frames in the intermediate format in the target time range from the plurality of video frames in the intermediate format stored in the terminal memory.

In implementation, a target time range of a preset duration is determined with a target time point as a center time point, and a maximum time point and a minimum time point in the target time range are determined. Determining the video frames of the intermediate format corresponding to the maximum time point and the minimum time point respectively from the video frames of the intermediate format stored in the terminal memory, and taking the video frames of the intermediate format as a fourth video frame and a fifth video frame. And determining the video frame of the intermediate format between the fourth video frame and the fifth video frame as the video frame of the intermediate format in the target time range.

It should be noted that the video frames in the intermediate format within the target range may include the fourth video frame and the fifth video frame, or may not include the fourth video frame and the fifth video frame.

In the embodiment of the present application, each video frame in the intermediate format between the fourth video frame and the fifth video frame may be determined as a video frame in the intermediate format within the target time range, and a preset number of non-adjacent video frames in the intermediate format between the fourth video frame and the fifth video frame may also be determined as video frames in the intermediate format within the target time range.

In the third mode, a target time range with preset duration is determined by taking a target time point as a central time point; and determining a preset number of video frames in the intermediate format in a target time range from a plurality of video frames in the intermediate format stored in a terminal memory.

In step 205, the cover page image is uploaded through the video upload page.

In implementation, the cover image is uploaded through the video upload page after the user selects the cover image.

Optionally, after acquiring a target decoded video frame from at least one second decoded video frame and using the target decoded video frame as a cover image of a target video, deleting a plurality of video frames in an intermediate format stored in a memory of the terminal; or deleting a plurality of video frames in the intermediate format stored in the memory of the terminal after the cover page image is uploaded through the video uploading page.

In the embodiment of the application, the cover image of the target video is acquired, or the plurality of video frames in the intermediate format stored in the memory of the terminal are deleted after the cover image is uploaded, so that the target video stored in the hard disk does not need to be read again after the target time point is acquired, the time for extracting the decoded video frame is saved, and the decoded video frame is rapidly displayed on the terminal.

After uploading the cover page image, the user can also fill in the relevant information of the target video on the video uploading page. After the user finishes filling in the relevant information of the target video, the user can click a submit button on the video uploading page. And the terminal receives the submission instruction and sends the target video, the cover image corresponding to the target video and the related information of the target video to the server.

For example, as shown in fig. 6, when the user opens the video upload page corresponding to the website a, the user may upload the target video, the cover image corresponding to the target video, and fill in related information on the video upload page, and click a submit button on the video upload page. And after receiving the submission instruction, sending the target video, the cover image corresponding to the target video and the related information of the target video to a server corresponding to the website A.

Fig. 6 is a schematic structural diagram of an apparatus for acquiring a video cover according to an embodiment of the present application, and referring to fig. 6, the apparatus is applied to a terminal installed with a browser, and includes:

a first obtaining module 610, configured to obtain page data of a video upload page, and display the video upload page based on the page data, where the page data includes a video decoding tool;

a first decoding module 620, configured to obtain a target video uploaded through the video upload page, and perform first decoding on the target video based on the video coding tool to obtain a plurality of video frames in an intermediate format;

a second decoding module 630, configured to perform second decoding on the video frame with the intermediate format corresponding to the at least one preset time point based on the video decoding tool, so as to obtain at least one first decoded video frame;

a second obtaining module 640 configured to obtain, in the at least one first decoded video frame, a target decoded video frame as a cover image of the target video;

an upload module 650 configured to upload the cover image through the video upload page.

Optionally, the first decoding module 610 is configured to:

Optionally, the second obtaining module 640 is configured to:

displaying the at least one first decoded video frame on the video upload page;

Optionally, the manual cover finding module is configured to:

Optionally, the apparatus further includes a deletion module configured to:

Optionally, the second decoding module 630 is configured to:

It should be noted that: in the apparatus for acquiring a video cover page provided in the above embodiment, when acquiring a video cover page, only the division of the above functional modules is used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for acquiring a video cover and the method for acquiring a video cover provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 7 shows a block diagram of a terminal 700 according to an exemplary embodiment of the present application. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.

In general, terminal 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one program code for execution by processor 701 to implement the method of acquiring video covers provided by the method embodiments herein.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 704, a display screen 705, a camera assembly 707, an audio circuit 707, a positioning component 708, and a power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, disposed on a front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 707 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is used to locate the current geographic Location of the terminal 700 for navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 709 is provided to supply power to various components of terminal 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side frame of terminal 700 and/or underneath display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the terminal 700. When a physical button or a vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical button or the vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is adjusted down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 707 according to the ambient light intensity collected by the optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on a front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 717 detects that the distance between the user and the front surface of the terminal 700 is gradually increased, the processor 701 controls the display 705 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 is not intended to be limiting of terminal 700 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The computer device provided by the embodiment of the application can be provided as a server. Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where the memory 802 stores at least one program code, and the at least one program code is loaded and executed by the processors 801 to implement the method for obtaining a video cover provided by the above-mentioned method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer readable storage medium, such as a memory, including program code, which is executable by a processor in a terminal or a server to perform the method of acquiring a video cover in the above embodiments. For example, the computer-readable storage medium may be a read-only memory (ROM), a Random Access Memory (RAM), a compact-disc read-only memory (cd-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware associated with program code, and the program may be stored in a computer readable storage medium, and the above mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of obtaining a video cover, the method comprising:

and uploading the cover page image through the video uploading page.

2. The method of claim 1, wherein the obtaining a target video uploaded through the video upload page, and performing a first decoding on the target video based on the video coding tool to obtain a plurality of video frames in an intermediate format comprises:

3. The method according to claim 1, wherein said obtaining, as a cover image of the target video, a target decoded video frame among the at least one first decoded video frame comprises:

displaying the at least one first decoded video frame on the video upload page;

4. The method of claim 3, wherein after displaying the at least one first decoded video frame on the video upload page, the method further comprises:

5. The method according to claim 4, wherein the determining at least one video frame of the intermediate format from among a plurality of video frames of the intermediate format stored in a terminal memory based on the target time point comprises:

6. The method according to claim 4, wherein the determining at least one video frame of the intermediate format from among a plurality of video frames of the intermediate format stored in a terminal memory based on the target time point comprises:

7. The method of claim 4, further comprising:

8. The method according to claim 1, wherein said second decoding of the video frame in the intermediate format corresponding to the at least one predetermined point in time based on the video decoding tool to obtain at least one decoded video frame comprises:

9. A terminal, characterized in that the terminal comprises a processor and a memory, wherein the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the operations executed by the method for acquiring video cover as claimed in any one of claims 1 to 8.

10. A computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to perform operations performed by the method of acquiring video covers of any of claims 1 to 8.