CN115914738B

CN115914738B - Video generation method, device, server and storage medium

Info

Publication number: CN115914738B
Application number: CN202211389242.5A
Authority: CN
Inventors: 刘志红
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2024-06-04
Anticipated expiration: 2042-11-08
Also published as: CN115914738A

Abstract

The embodiment of the invention provides a video generation method, a device, a server and a storage medium, relating to the field of image processing, wherein the method comprises the following steps: acquiring a picture fingerprint of a video frame to be matched in a first video; acquiring a second video corresponding to the first video; determining a video frame consistent with the picture fingerprint of the video frame to be matched from the second video as a target video frame; and editing the target video frame according to the editing processing mode corresponding to the video frame to be matched to obtain the target video. By applying the embodiment of the invention, the second video is automatically edited by the server without manually editing the second video by a user, so that the time cost and the labor cost are reduced, and the video generation efficiency is further improved.

Description

Video generation method, device, server and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a video generating method, a video generating device, a server, and a storage medium.

Background

When the television is live, the live signal source can send live video to servers deployed in different areas. Accordingly, each server may provide live services to users in the region where the server is located based on the received video. In order to provide a live broadcast service with better quality to a user, the user may edit a video (may be referred to as a first video) received by the server, for example, code the first video, add special effects, and so on, to obtain a target video.

In the related art, if the video quality of the first video received by the server is low, for example, a part of video frames is missing in the first video, the resolution of the first video is low, etc. The server can acquire videos (which can be called as second videos) sent by the same live broadcast signal source and received by servers deployed in other areas, and the user performs the same editing processing on the second videos again according to the processing mode of the first videos to obtain target videos. For example, adding an effect to a first video frame in a first video, the user again adds the same effect to the first video frame in a second video.

However, the user manually edits the plurality of videos multiple times, which requires a large time cost and labor cost, and thus results in low video generation efficiency in the related art.

Disclosure of Invention

The embodiment of the invention aims to provide a video generation method, a video generation device, a server and a storage medium, so as to improve video generation efficiency. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a video generating method, the method comprising:

Acquiring a picture fingerprint of a video frame to be matched in a first video; wherein, the video frame to be matched comprises: video frames which are subjected to editing processing in the first video; the picture fingerprint of the video frame to be matched is determined based on pixel points contained in the video frame to be matched; the first video is: obtaining based on video sent by a live broadcast signal source;

acquiring a second video corresponding to the first video; wherein, the second video is: based on the same video sent by the live broadcast signal source through a standby transmission line;

Determining a video frame consistent with the picture fingerprint of the video frame to be matched from the second video as a target video frame;

And editing the target video frame according to the editing processing mode corresponding to the video frame to be matched to obtain the target video.

Optionally, the video frame to be matched includes: in the first video, a start video frame and an end video frame in video frames subjected to editing processing;

the determining, from the second video, a video frame consistent with the picture fingerprint of the video frame to be matched as a target video frame includes:

Determining a video frame consistent with the picture fingerprint of the initial video frame from the second video to obtain a target video frame corresponding to the initial video frame, and determining a video frame consistent with the picture fingerprint of the end video frame to obtain a target video frame corresponding to the end video frame;

determining the position of a target video frame corresponding to the starting video frame in the second video as a first position, and determining the position of a target video frame corresponding to the ending video frame in the second video as a second position;

Determining, from the second video, a video segment from a target video frame corresponding to the start video frame to a target video frame corresponding to the end video frame as a first video segment, in a case where the first position coincides with a position of the start video frame in the first video and the second position coincides with a position of the end video frame in the first video;

And aiming at each video frame to be matched between the starting video frame and the ending video frame, determining the video frame at the position corresponding to the video frame to be matched in the first video segment, and obtaining a target video frame corresponding to the video frame to be matched.

Optionally, before determining, from the second video, a video frame consistent with the picture fingerprint of the video frame to be matched as the target video frame, the method further includes:

Determining a key video frame corresponding to the video frame to be matched in the first video;

Acquiring a time stamp and a picture fingerprint of a key video frame corresponding to the video frame to be matched; wherein the timestamp of a video frame indicates the position of the video frame in the belonging video;

Determining an alternative video frame from the second video based on the timestamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched;

And determining a video frame consistent with the picture fingerprint of the video frame to be matched from the alternative video frames as a target video frame.

Optionally, the determining, based on the timestamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched, an alternative video frame from the second video includes:

Determining a video frame corresponding to the timestamp from the second video according to the timestamp of the key video frame corresponding to the video frame to be matched, wherein the video frame is a video fragment with a specified duration and is used as a second video fragment;

Determining a video frame with consistent picture fingerprints of the key video frames corresponding to the video frames to be matched from the second video segment to obtain an intermediate video frame;

And determining a video frame taking the intermediate video frame as a key video frame from the second video as an alternative video frame.

Optionally, before the capturing the picture fingerprint of the video frame to be matched in the first video, the method further includes:

splitting the first video according to a preset time interval to obtain a plurality of sub-videos;

determining a video frame which is edited by a user from the first video as a video frame to be matched;

Aiming at each video frame to be matched, acquiring a time stamp and a picture fingerprint of a key video frame in a sub-video to which the video frame to be matched belongs;

Storing the offset of each video frame to be matched relative to the corresponding key video frame in the sub-video, the time stamp and the picture fingerprint of the corresponding key video frame as editing information of the first video into a preset database;

The obtaining the picture fingerprint of the video frame to be matched in the first video includes:

and acquiring the picture fingerprint of the video frame to be matched in the first video based on the editing information of the first video.

Optionally, the obtaining, based on the editing information of the first video, a picture fingerprint of a video frame to be matched in the first video includes:

acquiring editing information of the first video from the preset database;

Determining a sub-video to which the key video frame belongs as a first sub-video according to the time stamp of the key video frame in the editing information;

Extracting the video frames to be matched from the first sub-video according to the offset of the video frames to be matched in the editing information relative to the corresponding key video frames;

And calculating the picture fingerprint of the video frame to be matched according to the pixel points contained in the video frame to be matched.

In a second aspect of the implementation of the present invention, there is also provided a video generating apparatus, including:

the first fingerprint acquisition module is used for acquiring the picture fingerprints of the video frames to be matched in the first video; wherein, the video frame to be matched comprises: video frames which are subjected to editing processing in the first video; the picture fingerprint of the video frame to be matched is determined based on pixel points contained in the video frame to be matched; the first video is: obtaining based on video sent by a live broadcast signal source;

The video acquisition module is used for acquiring a second video corresponding to the first video; wherein, the second video is: based on the same video sent by the live broadcast signal source through a standby transmission line;

the target video frame determining module is used for determining a video frame consistent with the picture fingerprint of the video frame to be matched from the second video as a target video frame;

and the target video generation module is used for carrying out editing processing on the target video frames according to the editing processing mode corresponding to the video frames to be matched to obtain target videos.

The target video frame determining module is specifically configured to determine, from the second video, a video frame consistent with a picture fingerprint of the start video frame, obtain a target video frame corresponding to the start video frame, and determine a video frame consistent with a picture fingerprint of the end video frame, obtain a target video frame corresponding to the end video frame;

Optionally, the apparatus further includes:

The key video frame determining module is used for determining a video frame consistent with the picture fingerprint of the video frame to be matched from the second video when the target video frame determining module executes the target video frame determining module, and determining a key video frame corresponding to the video frame to be matched from the first video before the key video frame is used as the target video frame;

The second fingerprint acquisition module is used for acquiring the time stamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched; wherein the timestamp of a video frame indicates the position of the video frame in the belonging video;

the target video frame determining module is specifically configured to determine an alternative video frame from the second video based on a timestamp and a picture fingerprint of a key video frame corresponding to the video frame to be matched;

Optionally, the target video frame determining module is specifically configured to determine, according to a timestamp of a key video frame corresponding to the video frame to be matched, a video segment that includes a video frame corresponding to the timestamp and is a specified duration from the second video, as a second video segment;

Optionally, the apparatus further includes:

The splitting module is used for splitting the first video according to a preset time interval before the first fingerprint acquisition module executes the acquisition of the picture fingerprints of the video frames to be matched in the first video, so as to obtain a plurality of sub videos;

The video frame to be matched determining module is used for determining video frames which are edited by a user from the first video and used as video frames to be matched;

The third fingerprint acquisition module is used for acquiring a time stamp and a picture fingerprint of a key video frame in a sub-video to which each video frame to be matched belongs aiming at each video frame to be matched;

the storage module is used for storing the offset of each video frame to be matched relative to the corresponding key video frame in the sub-video, the time stamp of the corresponding key video frame and the picture fingerprint as editing information of the first video into a preset database;

the first fingerprint acquisition module is specifically configured to acquire a picture fingerprint of a video frame to be matched in the first video based on editing information of the first video.

Optionally, the first fingerprint acquisition module is specifically configured to acquire editing information of the first video from the preset database;

In a third aspect of the embodiments of the present invention, there is provided a server, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

a processor for implementing the method steps of any of the above first aspects when executing a program stored on a memory.

In a fourth aspect of the present invention, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements any of the video generation methods described above.

In a fifth aspect of the invention there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the video generation methods described above.

The embodiment of the invention provides a video generation method, a device, a server and a storage medium, wherein the method, the device, the server and the storage medium are used for acquiring picture fingerprints of video frames to be matched in a first video; wherein, the video frame to be matched comprises: video frames subjected to editing processing in the first video; the picture fingerprint of the video frame to be matched is determined based on the pixel points contained in the video frame to be matched; the first video is: obtaining based on video sent by a live broadcast signal source; acquiring a second video corresponding to the first video; wherein, the second video is: based on the same video sent by the live broadcast signal source through the standby transmission line; determining a video frame consistent with the picture fingerprint of the video frame to be matched from the second video as a target video frame; and editing the target video frame according to an editing processing mode corresponding to the video frame to be matched in the first video to obtain the target video.

Based on the processing, determining a target video frame from the second video according to the picture fingerprint of the video frame to be matched in the first video; and then, according to the editing processing mode corresponding to the video frame to be matched, editing the target video frame to obtain the target video. That is, the second video is automatically edited by the server without manually editing the second video by the user, so that the time cost and the labor cost can be reduced, and the video generation efficiency is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a first flowchart of a video generating method according to an embodiment of the present invention;

fig. 2 is a second flowchart of a video generating method according to an embodiment of the present invention;

Fig. 3 is a schematic diagram of a video generating method according to an embodiment of the present invention;

fig. 4 is a third flowchart of a video generating method according to an embodiment of the present invention;

Fig. 5 is a fourth flowchart of a video generating method according to an embodiment of the present invention;

fig. 6 is a fifth flowchart of a video generating method according to an embodiment of the present invention;

Fig. 7 is a sixth flowchart of a video generating method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video generating system according to an embodiment of the present invention;

fig. 9 is a block diagram of a video generating apparatus according to an embodiment of the present invention;

Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

In the related art, when a television is live, a user can process a first video received by a server. If the video quality of the first video received by the server is lower, the server can acquire a second video sent by the same live broadcast signal source and received by the server deployed in other areas, and the user processes the second video again in the same way as the first video, so as to obtain the target video. For example, the live television may be a live broadcast of a variety program, and servers deployed in different areas may each receive video of the variety program, and the first video may be video of the variety program shot by a camera. Correspondingly, the user can edit the first video, for example, code, add special effects and the like, and the processed video is obtained. Further, the server may play the processed video for the user. However, the user manually processes the plurality of videos multiple times, which requires a large time cost and labor cost, and thus the generation efficiency of the videos in the related art is not high.

In order to solve the above problems, an embodiment of the present invention provides a video generation method, which is applied to a server. The server can acquire the picture fingerprint of the video frame to be matched in the first video, and according to the video generation method provided by the embodiment of the invention, the target video frame corresponding to each video to be matched is determined, and further, the server can edit the target video frame to obtain the target video, so that the video generation efficiency can be improved. Subsequently, the server may provide the target video to the user.

Referring to fig. 1, fig. 1 is a first flowchart of a video generating method according to an embodiment of the present invention, where the method may include the following steps:

s101: and acquiring the picture fingerprint of the video frame to be matched in the first video.

Wherein, the video frame to be matched comprises: video frames subjected to editing processing in the first video; the picture fingerprint of the video frame to be matched is determined based on the pixel points contained in the video frame to be matched; the first video is: based on the video sent by the live signal source.

S102: and acquiring a second video corresponding to the first video.

Wherein, the second video is: based on the same video transmitted by the live signal source through the standby transmission line.

S103: and determining a video frame which is consistent with the picture fingerprint of the video frame to be matched from the second video as a target video frame.

S104: and editing the target video frame according to the editing processing mode corresponding to the video frame to be matched to obtain the target video.

Based on the video generation method provided by the embodiment of the invention, a target video frame is determined from a second video according to the picture fingerprint of the video frame to be matched in the first video; and then, according to the editing processing mode corresponding to the video frame to be matched, editing the target video frame to obtain the target video. That is, the second video is automatically edited by the server without manually editing the second video by the user, so that the time cost and the labor cost can be reduced, and the video generation efficiency is further improved.

For step S101, the live signal source may be a television station. When the television is live, the live signal source sends live video to the server, and correspondingly, the server can receive the video (namely the first video) sent by the live signal source. Further, the user may edit the video frame in the first video, for example, code the first video, add a special effect, and the like.

The video frames to be matched are video frames which are edited in the first video. For example, if the user clips a video clip from the 10 th frame to the 20 th frame from the first video and adds a special effect to the 15 th frame in the first video, the video frames to be matched include: a10 th frame video frame, a 15 th frame video frame, and a 20 th frame video frame in the first video.

In one implementation, for each video frame to be matched, the server may calculate a picture fingerprint of the video frame according to the pixel points of the video frame to be matched. For example, the server may obtain RGB (Red, green, blue, three primary colors of red, green and blue) values for each pixel in the video frame to be matched, and convert the RGB values to YUV (Luminance, chrominance, chroma, luminance, chrominance, density) values. Furthermore, the server may calculate a picture fingerprint of the video frame to be matched through a Hash algorism using the YUV value of the video frame to be matched.

In another implementation manner, the server may perform graying processing on the video frame to be matched to obtain a gray value of each pixel point in the video frame to be matched. Furthermore, the server may also calculate the picture fingerprint of the video frame to be matched by using the gray value of the video frame to be matched through a hash algorithm. It will be appreciated that for each pixel, the amount of data for the gray value for that pixel is smaller than the amount of data for the YUV value for that pixel. Therefore, based on the mode, the picture fingerprint of the video frame to be matched is calculated, so that the consumption of the operation resource of the server can be reduced, and the video generation efficiency is further improved.

For each video frame to be matched, the server calculates the picture fingerprint of the video frame to be matched according to the pixel points of the video frame to be matched, so that the picture fingerprint of the video frame to be matched can represent the picture content of the video frame to be matched. If the picture fingerprint of the video frame to be matched is consistent with the picture fingerprint of another video frame, the server can determine that the picture content of the video frame to be matched is the same as that of the other video frame.

In some embodiments, in order to facilitate management of the first video, improve efficiency of obtaining video frames to be matched, further improve efficiency of generating the video, and the server may split the first video into a plurality of sub-videos. And further, the server acquires the video frames to be matched according to the sub-videos. On the basis of fig. 1, referring to fig. 2, before step S101, the method may further include the steps of:

s105: splitting the first video according to a preset time interval to obtain a plurality of sub-videos.

S106: and determining the video frame which is edited by the user from the first video as a video frame to be matched.

S107: and aiming at each video frame to be matched, acquiring a time stamp and a picture fingerprint of a key video frame in the sub-video to which the video frame to be matched belongs.

S108: and storing the offset of each video frame to be matched relative to the corresponding key video frame in the sub-video, the time stamp and the picture fingerprint of the corresponding key video frame as editing information of the first video into a preset database.

Accordingly, step S101 may include the steps of:

S1011: and acquiring the picture fingerprint of the video frame to be matched in the first video based on the editing information of the first video.

The preset time interval may be set by the user based on demand. For example, the preset time interval may be 2s (seconds), or the preset time interval may be 3s, but is not limited thereto.

When the user edits the video frames in the first video, the server may determine the video frames that the user has edited as the video frames to be matched.

When the live broadcast signal source transmits the video, in order to improve transmission efficiency, each video frame in the video can be encoded. When encoding each video frame in the video, the live signal source may determine the video frame indicated by the user from the video as a key video frame. Further, the live signal source may encode other video frames in the video based on the key video frames. Correspondingly, for each video frame to be matched, if the video frame to be matched encodes itself based on one video frame, the video frame is the key video frame corresponding to the video frame to be matched.

For example, a video may include 50 video frames, and the live signal source may determine from the video that the video frame indicated by the user is the 1 st video frame and the 26 th video frame in the video, i.e., the 1 st video frame and the 26 th video frame in the video are key video frames. The 2 nd video frame to the 25 th video frame in the video may be encoded based on the 1 st video frame, and the 27 th video frame to the 50 th video frame may be encoded based on the 26 th video frame. That is, the 1 st video frame in the video is a key video frame from the 2 nd video frame to the 25 th video frame of the video; the 26 th video frame in the video is a key video frame from the 27 th video frame to the 50 th video frame of the video.

The timestamp of a video frame indicates the position of the video frame in the belonging video. For example, the FPS (FRAMES PER seconds of display frames per Second) of the first video is 25, and the duration of each video frame in the first video is 40ms (millisecond ms). The 1 st video frame in the first video has a time stamp of 40ms, the 2 nd video frame has a time stamp of 80ms, the 3 rd video frame has a time stamp of 120ms, and so on. Accordingly, if the time stamp of one video frame is 80ms, it means that the video frame is the 2 nd video frame in the first video.

The manner in which the server determines the picture fingerprint of the key video frame is similar to that in which the video frame to be matched is determined, reference is made to the relevant description of the previous embodiments.

For each video frame to be matched, the offset of the video frame to be matched relative to the corresponding key video frame represents the position of the video frame to be matched relative to the corresponding key video frame in the sub-video to which the video frame to be matched belongs.

For example, there are 50 video frames in a sub-video, the key video frame is the 20 th video frame, and if the offset of one video frame to be matched with respect to the key video frame is 10 video frames, it indicates that the video frame to be matched is the 10 th video frame after the key video frame, that is, the video frame to be matched is the 30 th video frame in the sub-video.

In order to improve the efficiency of obtaining the video frames to be matched and further improve the generation efficiency of the video, the server may also store the offset of each video frame to be matched relative to the corresponding key video frame, and the timestamp and the picture fingerprint of the corresponding key video frame as editing information of the first video into a preset database. Subsequently, the server can directly acquire the editing information of the first video from the preset database, acquire the video frame to be matched from the first video according to the editing information of the first video, and determine the picture fingerprint of the video frame to be matched.

The preset database may be a database pre-established for the user. For example, the preset database may be a cloud DB (Data Base), or may be a local database.

Based on the processing, the server is convenient to manage the first video by splitting the first video into a plurality of sub-videos. Subsequently, the server can directly acquire the video frames to be matched according to the editing information of the first video, each video frame in the first video does not need to be traversed to acquire the video frames to be matched, the acquisition efficiency of the video frames to be matched can be improved, and the generation efficiency of the video is further improved.

Referring to fig. 3, fig. 3 is a schematic diagram of a video generating method according to an embodiment of the present invention. The primary clipping point position is the video frame to be matched in the previous embodiment. The key frame is the key video frame in the foregoing embodiment, and the secondary cropping sequence includes: timestamp of each keyframe. For example, the secondary clipping sequence includes: timestamp a for key frame 1, timestamp B for key frame 2, … …, timestamp C for key frame n.

As shown in fig. 3, the server may split the first video into a plurality of target files, where each target file is a GOP (Group of Pictures ), and one target file includes 25 video frames, and when the FPS of the first video is 25, the duration of the target file is 2s, for example, splitting the first video results in: object files 1 of 2s, object files 2 of 2s, … …, object file n of 2 s). The server can determine key frames and time stamps in each target file to obtain a secondary cropping sequence. If the user extracts the video frame represented by the boundary 1 from the first video to the video segment of the video frame represented by the boundary 2 when editing the first video, the boundary 1 and the boundary 2 are the video frames to be matched.

In some embodiments, based on fig. 2, referring to fig. 4, step S1011 may include the steps of:

S10111: and acquiring editing information of the first video from a preset database.

S10112: and determining the sub-video to which the key video frame belongs as a first sub-video according to the time stamp of the key video frame in the editing information.

S10113: and extracting the video frames to be matched from the first sub-video according to the offset of the video frames to be matched in the editing information relative to the corresponding key video frames.

S10114: and calculating the picture fingerprint of the video frame to be matched according to the pixel points contained in the video frame to be matched.

If the quality of the first video is lower, the quality of the target video obtained by editing the first video is also lower, and in order to improve the quality of the generated target video, the server acquires editing information of the first video and acquires a video frame to be matched according to the editing information of the first video. Subsequently, the server can determine a target video frame corresponding to the video to be matched from the second video, and further, the server can edit the target video frame to obtain the target video.

The editing information includes a timestamp of a key video frame corresponding to the video frame to be matched, and the server may determine a sub-video (i.e., a first sub-video) to which the key video frame belongs according to the timestamp of the key video frame in the editing information.

For example, the sub-video includes: sub-video 1 including video frames of 1 st to 2 nd seconds, sub-video 2 including video frames of 3 rd to 4 th seconds, sub-video 3 including video frames of 5 th to 6 th seconds, and key video frames in the editing information include: key video frame 1, key video frame 2, and key video frame 3. The timestamp of key video frame 1 is 80ms, the timestamp of key video frame 2 is 2080ms, and the timestamp of key video frame 3 is 4080ms. The server may determine that the sub-video 1 is a first sub-video to which the key video frame 1 belongs, the sub-video 2 is a first sub-video to which the key video frame 2 belongs, and the sub-video 3 is a first sub-video to which the key video frame 3 belongs.

For each video frame to be matched, when the video frame to be matched is acquired, the server determines the position of the video frame to be matched in the first sub-video according to the timestamp of the key video frame corresponding to the video frame to be matched in the editing information and the offset of the video frame to be matched relative to the corresponding key video frame, and further extracts the video frame at the position to obtain the video frame to be matched.

For example, when the offset of the video frame to be matched relative to the corresponding key video frame is 10 video frames, if the key video frame is the 10 th video frame in the first sub-video, the server may determine that the video frame to be matched is the 20 th video frame in the first sub-video, and may extract the 20 th video frame in the first sub-video to obtain the video frame to be matched. Further, the server may calculate a picture fingerprint of the video frame to be matched.

Based on the processing, the editing information only comprises the time stamp of the key video frame corresponding to the video frame to be matched, the server can determine the first sub-video containing the video frame to be matched through the editing information of the first video, and determine the video frame to be matched from the first sub-video, so that the obtaining efficiency of the video frame to be matched can be improved without traversing all the video frames in the first video to obtain the video frame to be matched, and the generating efficiency of the video is further improved.

For step S102, the live signal source may send the same video to the servers deployed in different regions, and accordingly, each server may receive the video sent by the live signal source.

The first video and the second video may be obtained by the same server based on the same video transmitted by the live signal source. For example, when a live broadcast signal source broadcasts a television program for the first time, a video of the television program is sent to a server through a transmission line, and then the video of the television program received by the server is a first video; when the live broadcast signal source rebroadcasts the television program, the video of the television program is sent to the server through another transmission line (namely a standby transmission line), and at the moment, the video of the television program received by the server is a second video.

Or the first video and the second video may be obtained by different servers based on the same video sent by the live signal source. For example, when the live signal source broadcasts a television program at the same time, a video of the television program is sent to the server a through a transmission line, and then the video received by the server a may be a first video; the same video is transmitted to the server B in a different region through another transmission line (i.e., a spare transmission line), and the video received by the server B may be a second video. The servers deployed in different areas can perform data communication, and each server can acquire videos obtained by other servers based on videos received by the servers.

It will be appreciated that the video sent by the live signal source to each server is the same, however, during the process of sending video by the live signal source to each server, there may be some factors, such as weather, network bandwidth, etc., which cause delay in sending video by the live signal source to different servers, or loss of the sent video. That is, the video received by each server may be different.

Because the videos sent by the live broadcast signal source to the servers are the same, the content of the video representation in the first video and the second video is the same, and when the quality of the first video is lower, the target video can be generated based on the second video representing the same content. For example, if the content of the first video representation is a period of content in a first television program, the content of the second video representation is also the period of content in the first television program.

For step S103, a target video frame corresponding to the video frame to be matched is: and the video frame is consistent with the picture fingerprint of the video frame to be matched.

The number of the video frames to be matched may be plural, and for each video frame to be matched, the server may determine a target video frame corresponding to the video frame to be matched from the second video in the following various manners. The following emphasis is given to each mode in determining a different emphasis point in determining a target video frame, specifically, based on a difference of actual scenes, other implementations are also possible, and the following details are given.

Mode 1:

For each video frame in the second video, the server generates a picture fingerprint for the video frame. And comparing the picture fingerprint of the video frame to be matched with the picture fingerprint of each video frame in the second video by the server. If the picture fingerprint of one video frame to be matched is consistent with the picture fingerprint of one video frame in the second video, the server determines that the video frame is a target video frame corresponding to the video frame to be matched.

Mode 2:

In order to improve the efficiency of determining the target video frame and further improve the generation efficiency of the video, the server may determine the candidate video frame based on the timestamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched. And further, determining a target video frame corresponding to the video frame to be matched from the candidate video frames.

Referring to fig. 5 on the basis of fig. 1, fig. 5 is a fourth flowchart of a video generating method according to an embodiment of the present invention, and before step S103, the method may further include the following steps:

s109: and determining a key video frame corresponding to the video frame to be matched in the first video.

S110: and acquiring a time stamp and a picture fingerprint of a key video frame corresponding to the video frame to be matched.

Wherein the timestamp of a video frame indicates the position of the video frame in the belonging video.

Accordingly, step S103 may include the steps of:

s1031: and determining an alternative video frame from the second video based on the time stamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched.

S1032: and determining a video frame consistent with the picture fingerprint of the video frame to be matched from the candidate video frames as a target video frame.

The server may obtain the timestamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched based on the manner in the foregoing embodiment, and further determine the candidate video frame based on the timestamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched.

Correspondingly, the server may generate a picture fingerprint of each candidate video frame, and compare the picture fingerprint of the video frame to be matched with the picture fingerprint of each candidate video frame. If the picture fingerprint of one video frame to be matched is consistent with the picture fingerprint of one alternative video frame, the server determines that the video frame is a target video frame corresponding to the video frame to be matched.

Based on the above processing, the server only needs to calculate the picture fingerprint of the determined alternative video frame. Furthermore, the picture fingerprints of the video frames to be matched are compared with the picture fingerprints of the alternative video frames, and the picture fingerprints of each video frame in the second video do not need to be calculated, so that the efficiency of determining the target video frame can be improved, and the generation efficiency of the video is further improved.

In some embodiments, during the transmission of video by the live signal source, there may be some other limiting factors, such as weather, network bandwidth, etc. Thus, there may be a delay in the live source sending video to a different server, or the video sent by the live source may lose data during transmission.

Thus, the videos received by the servers may not be exactly the same. That is, the positions of video frames with consistent picture fingerprints in videos obtained by different servers may not be consistent. If the video frame with the same time stamp as the key video frame is directly determined, the determined video frame may be different from the picture content of the key video frame, and the accuracy of the alternative video frame determined based on the video frame is low, which results in the accuracy of the target video frame determined based on the video frame to be low, so that the server may determine the alternative video frame in the following manner in order to improve the accuracy of the determined target video frame.

Mode one:

the server determines the video frame corresponding to the time stamp containing the key video frame in the second video according to the time stamp of the key video frame corresponding to the video frame to be matched, and the video frame is a video segment with a specified duration, and further determines all the video frames in the video segment to be candidate video frames.

Based on the above processing, the number of the candidate video frames is less than the number of all video frames contained in the second video, so that the server does not need to calculate the picture fingerprint of each video frame in the second video, the efficiency of determining the target video frame can be improved, and the generation efficiency of the video is further improved.

Mode two:

In order to further improve the video generation efficiency and improve the accuracy of the determined candidate video frames, referring to fig. 6 on the basis of fig. 5, fig. 6 is a fifth flowchart of a video generation method according to an embodiment of the present invention, and step S1031 may include the following steps:

s10311: and determining the video frame corresponding to the time stamp from the second video according to the time stamp of the key video frame corresponding to the video frame to be matched, wherein the video frame is a video fragment with a specified duration and is used as the second video fragment.

S10312: and determining a video frame with the picture fingerprint of the key video frame corresponding to the video frame to be matched from the second video segment to obtain an intermediate video frame.

S10313: and determining video frames taking the intermediate video frames as key video frames from the second video as alternative video frames.

The server may determine the second video clip from the second video in a number of ways. The following emphasis is given to each way of determining a different emphasis point in the second video clip, specifically, based on the actual scene, other implementations are possible, and the following details are given.

Mode a:

the server determines, in the second video, a video frame (which may be referred to as a first boundary video frame) that is identical to the timestamp of the key video frame according to the timestamp of the key video frame corresponding to the video frame to be matched. The server then determines, in the second video, a video frame that is located before and a first duration from the first boundary video frame (which may be referred to as a second boundary video frame), and determines a video frame that is located after the first boundary video frame and a second duration from the first boundary video frame (which may be referred to as a third boundary video frame). The sum of the first time length and the second time length is a designated time length. Further, the server may take video clips from the second boundary video frame to the third boundary video frame as the second video clip.

Mode B:

the server may also determine, in the second video, a video frame (which may be referred to as a fourth boundary video frame) that precedes the first boundary video frame and is a specified duration from the first boundary video frame. Further, the server may take a video clip from the fourth boundary video frame to the first boundary video frame as the second video clip.

Mode C:

The server may also determine, in the second video, a video frame (which may be referred to as a fifth boundary video frame) that is located after the first boundary video frame and is a specified duration from the first boundary video frame. Further, the server may take video clips from the first boundary video frame to the fifth boundary video frame as the second video clip.

The designated time length can be set according to actual requirements. In order to improve the video generation efficiency, a smaller designated duration may be set, for example, the designated duration is 2s; in order to improve the accuracy of the determined second video segment, a larger specified duration may be set, for example, the specified duration may be 5s.

Alternatively, the specified duration may be determined based on the total duration of the second video, for example, the specified duration may be 10% of the total duration of the second video, or the specified duration may be 20% of the total duration of the second video, but is not limited thereto.

For each video frame to be matched, the server can determine a key video frame corresponding to the video frame to be matched, and acquire a picture fingerprint of the key video frame corresponding to the video frame to be matched. Furthermore, the server may obtain a picture fingerprint of each video frame in the second video segment, compare the picture fingerprint of each video frame in the second video segment with a picture fingerprint of a key video frame corresponding to the video frame to be matched, and determine, in the second video segment, a video frame (i.e., an intermediate video frame) consistent with the picture fingerprint of the key video frame corresponding to the video frame to be matched.

Since the picture fingerprints of the intermediate video frame and the key video frame corresponding to the video frame to be matched are identical, the picture content of the intermediate video frame and the picture content of the key video frame corresponding to the video frame to be matched are identical, that is, the intermediate video frame is the key video frame in the second video. Further, the server may determine, from the second video clip, a video frame having the intermediate video frame as a key video frame as an alternative video frame.

Based on the processing, the server can determine the candidate video frames through the time stamp and the picture fingerprint of the key video frames corresponding to the video frames to be matched, and the number of the candidate video frames is smaller than the number of all video frames contained in the second video segment, so that the server only needs to determine the target video frames based on the picture fingerprint of the candidate video frames, and does not need to calculate the picture fingerprint of each video frame in the second video segment, the number of the processed video frames can be reduced, the consumption of operation resources in the server is reduced, and the generation efficiency of the video is further improved.

In some embodiments, the video frames to be matched comprise: in the first video, a start video frame and an end video frame in the video frames subjected to the editing processing are adopted, wherein the start video frame is a first video frame in the video frames subjected to the editing processing, and the end video frame is a last video frame in the video frames subjected to the editing processing. Accordingly, referring to fig. 7 on the basis of fig. 1, fig. 7 is a sixth flowchart of a video generating method according to an embodiment of the present invention, and step S103 may include the following steps:

S1033: and determining a video frame consistent with the picture fingerprint of the initial video frame from the second video to obtain a target video frame corresponding to the initial video frame, and determining a video frame consistent with the picture fingerprint of the end video frame to obtain a target video frame corresponding to the end video frame.

S1034: the method comprises the steps of determining the position of a target video frame corresponding to a starting video frame in a second video as a first position, and determining the position of a target video frame corresponding to an ending video frame in the second video as a second position.

S1035: and determining a video segment from a target video frame corresponding to the start video frame to a target video frame corresponding to the end video frame from the second video as a first video segment under the condition that the first position is consistent with the position of the start video frame in the first video and the second position is consistent with the position of the end video frame in the first video.

S1036: and aiming at each video frame to be matched between the starting video frame and the ending video frame, determining the video frame at the position corresponding to the video frame to be matched in the first video segment to obtain a target video frame corresponding to the video frame to be matched.

The manner of determining the target video frame corresponding to the start video frame and the target video frame corresponding to the end video frame is similar to that of determining the target video frame corresponding to the video frame to be matched, and reference may be made to the relevant description of the foregoing embodiments.

The location of the target video frame corresponding to the start video frame in the second video may be represented by a timestamp of the target video frame corresponding to the start video frame in the second video. Accordingly, the position of the target video frame corresponding to the ending video frame in the second video may be represented by a timestamp of the target video frame corresponding to the ending video frame in the second video.

Since the time stamp of a video frame indicates the position of the video frame in the affiliated video, if the time stamp of the starting video frame coincides with the time stamp of the target video frame corresponding to the starting video frame, the position (i.e., the first position) of the target video frame corresponding to the starting video frame in the second video can be determined to coincide with the position of the starting video frame in the first video. Accordingly, if the timestamp of the ending video frame is consistent with the timestamp of the target video frame corresponding to the ending video frame, the position (i.e., the second position) of the target video frame corresponding to the ending video frame in the second video may be determined to be consistent with the position of the ending video frame in the first video.

When the first position is consistent with the position of the initial video frame in the first video and the second position is consistent with the position of the end video frame in the first video, the video frame positioned between the initial video frame and the end video frame in the first video is indicated to correspond to the video frame positioned between the first position and the second position in the second video one by one. Correspondingly, the server can acquire the video segments from the target video frame corresponding to the initial video frame to the target video frame corresponding to the end video frame from the second video to obtain the first video segment.

For each video frame to be matched between a starting video frame and an ending video frame, the server determines the position in the first video segment and the video frame consistent with the position of the video frame to be matched between the starting video frame and the ending video frame, and a target video frame corresponding to the video frame to be matched is obtained.

For example, if one video frame to be matched is the 2 nd video frame between the start video frame and the end video frame, the server may determine that the 2 nd video frame in the first video segment is the target video frame corresponding to the video frame to be matched.

When the first position is inconsistent with the position of the start video frame in the first video and/or the second position is inconsistent with the position of the end video frame in the first video, the server may determine a video frame (may be referred to as a first reference video frame) located at an intermediate position between the start video frame and the end video frame, and determine a video frame consistent with the picture fingerprint of the first reference video frame from the second video, so as to obtain a target video frame corresponding to the first reference video frame in the second video.

Further, the server may determine a position (may be referred to as a third position) of the target video frame in the second video corresponding to the first reference video frame. And if the third position is consistent with the position of the first reference video frame in the first video, and the first position is consistent with the position of the initial video frame in the first video, the video frame positioned between the initial video frame and the first reference video frame in the first video is in one-to-one correspondence with the video frame positioned between the first position and the third position in the second video. Accordingly, the server may obtain, from the second video, a video segment (which may be referred to as a third video segment) from the target video frame corresponding to the start video frame to the target video frame corresponding to the first reference video frame.

For each video frame to be matched between the initial video frame and the first reference video frame, the server determines the position in the third video segment, and the video frame consistent with the position of the video frame to be matched between the initial video frame and the first reference video frame, so as to obtain a target video frame corresponding to the video frame to be matched.

If the third position is inconsistent with the position of the first reference video frame in the first video, and/or the first position is inconsistent with the position of the starting video frame in the first video, the server may determine a video frame (may be referred to as a second reference video frame) located at an intermediate position between the starting video frame and the first reference video frame, and determine a video frame consistent with the picture fingerprint of the second reference video frame from the second video, so as to obtain a target video frame corresponding to the second reference video frame in the second video.

Further, for each video frame to be matched between the start video frame and the second reference video frame, the server determines a target video frame corresponding to the video frame to be matched in the second video based on the position of the video frame to be matched between the start video frame and the second reference video frame, and for each video frame to be matched between the second reference video frame and the first reference video frame, the server determines a target video frame corresponding to the video frame to be matched in the second video, and so on until the server determines the target video frames corresponding to all the video frames to be matched in the second video.

And if the third position is consistent with the position of the first reference video frame in the first video, and the second position is consistent with the position of the ending video frame in the first video, the video frame positioned between the first reference video frame and the ending video frame in the first video is in one-to-one correspondence with the video frame positioned between the third position and the second position in the second video. Accordingly, the server may obtain, from the second video, a video segment (may be referred to as a fourth video segment) from the target video frame corresponding to the first reference video frame to the target video frame corresponding to the ending video frame.

For each video frame to be matched between the first reference video frame and the ending video frame, the server determines the position in the fourth video segment, and the video frame consistent with the position of the video frame to be matched between the first reference video frame and the ending video frame, so as to obtain a target video frame corresponding to the video frame to be matched.

If the third position is inconsistent with the position of the first reference video frame in the first video and/or the second position is inconsistent with the position of the ending video frame in the first video, the server may determine a video frame (may be referred to as a third reference video frame) located at an intermediate position between the first reference video frame and the ending video frame, and determine a video frame consistent with the picture fingerprint of the third reference video frame from the second video, so as to obtain a target video frame corresponding to the third reference video frame in the second video.

Further, for each video frame to be matched between the third reference video frame and the end video frame, the server determines a target video frame corresponding to the video frame to be matched in the second video based on the position of the video frame to be matched between the third reference video frame and the end video frame, and for each video frame to be matched between the second reference video frame and the third reference video frame, the server determines a target video frame corresponding to the video frame to be matched in the second video, and so on until the server determines the target video frames corresponding to all the video frames to be matched in the second video.

Based on the above processing, when the first position is consistent with the position of the start video frame in the first video and the second position is consistent with the position of the end video frame in the first video, the server may directly determine, for each video frame to be matched between the start video frame and the end video frame, that the video frame corresponding to the position of the video frame to be matched is the target video frame corresponding to the video frame to be matched in the first video clip, without comparing the picture fingerprint of the video frame to be matched with the picture fingerprint of all video frames in the first video clip for each video frame to be matched between the start video frame and the end video frame, thereby further improving the video generation efficiency.

For step S104, after determining the target video frame corresponding to each video frame to be matched, the target video frame corresponding to the video frame to be matched may be edited according to the editing processing mode corresponding to the video frame to be matched, so as to obtain the target video.

For example, the user manually adds a mosaic in the upper right corner of the 1 st video frame in the first video, and adds a specified special effect in the lower left corner of the 1 st video frame in the first video. Correspondingly, according to the video generation method, the server can determine the target video frame corresponding to the 1 st video frame in the first video in the second video, and perform the same editing processing on the corresponding target video frame according to the editing processing mode corresponding to the 1 st video frame in the first video. Namely, the same mosaic is added at the upper right corner of the target video frame corresponding to the 1 st video frame, and the same specified special effect is added at the lower left corner of the target video frame.

In some embodiments, referring to fig. 8, fig. 8 is a schematic structural diagram of a video generating system according to an embodiment of the present invention. The video production dotting system is a system for receiving video sent by a live broadcast signal source, and comprises: a recording server A, a recording server B and a recording server C. The recording server A is a server deployed in Chongqing area, the recording server B is a server deployed in Shanghai area, and the recording server C is a server deployed in Beijing area. The live signal source may transmit satellite signals, i.e., video, to each of the listing servers. For example, a Chongqing satellite signal is a video transmitted to a recording server A in Chongqing region, a Shanghai satellite signal is a video transmitted to a recording server B in Shanghai region, and a Beijing satellite signal is a video transmitted to a recording server C in Beijing region.

Furthermore, the user may edit the video (i.e., the first video) acquired by the recording server, and store the boundary point location information through a cloud DB (Data Base) service. The boundary point location information is the editing information of the first video in the foregoing embodiment. The cloud DB service is a server for storing editing information. The boundary reflection service is the server in the foregoing embodiment.

When the quality of the first video received by the recording server is lower, for example, a part of video frames are deleted in the first video, the resolution of the first video is lower, etc., the recording server can acquire the video (i.e., the second video) sent by the same live broadcast signal source received by the recording server deployed in other areas. According to the video generation method provided by the embodiment of the invention, each video frame to be matched in the first video is determined according to the editing information of the first video stored in the cloud DB service, the target video frame corresponding to each video frame to be matched is determined in the second video, and the target video frame corresponding to the video frame to be matched is edited according to the editing processing mode corresponding to the video frame to be matched in the first video, so that the target video is obtained.

Based on the processing, when the quality of the first video is lower, the recording server can acquire the second video received by the recording servers deployed in other areas, and for each video frame to be matched in the first video, each target video frame is determined from the second video according to the stored editing information of the first video, and then the target video frame is edited according to the editing processing mode corresponding to the video frame to be matched in the first video, so that the target video is obtained. The second video is automatically edited by the server without manually editing the second video by a user, so that the time cost and the labor cost can be reduced, and the generation efficiency of the video is further improved.

Based on the same inventive concept as the video generation method, the embodiment of the invention also provides a video generation device. Referring to fig. 9, fig. 9 is a block diagram of a video generating apparatus according to an embodiment of the present invention, where the apparatus includes:

The first fingerprint acquisition module 901 is configured to acquire a picture fingerprint of a video frame to be matched in a first video; wherein, the video frame to be matched comprises: video frames subjected to editing processing in the first video; the picture fingerprint of the video frame to be matched is determined based on the pixel points contained in the video frame to be matched; the first video is: obtaining based on video sent by a live broadcast signal source;

The video acquisition module 902 is configured to acquire a second video corresponding to the first video; wherein, the second video is: based on the same video sent by the live broadcast signal source through the standby transmission line;

The target video frame determining module 903 is configured to determine, from the second video, a video frame that is consistent with a picture fingerprint of the video frame to be matched, as a target video frame;

The target video generating module 904 is configured to perform editing processing on the target video frame according to an editing processing mode corresponding to the video frame to be matched, so as to obtain a target video.

Optionally, the video frames to be matched include: in the first video, a start video frame and an end video frame in video frames subjected to editing processing;

the target video frame determining module 903 is specifically configured to determine, from the second video, a video frame that is consistent with a picture fingerprint of the start video frame, obtain a target video frame corresponding to the start video frame, and determine a video frame that is consistent with a picture fingerprint of the end video frame, obtain a target video frame corresponding to the end video frame;

determining the position of a target video frame corresponding to a starting video frame in a second video as a first position, and determining the position of a target video frame corresponding to an ending video frame in the second video as a second position;

Under the condition that the first position is consistent with the position of the initial video frame in the first video and the second position is consistent with the position of the end video frame in the first video, determining a video segment from a target video frame corresponding to the initial video frame to a target video frame corresponding to the end video frame from the second video as a first video segment;

And aiming at each video frame to be matched between the starting video frame and the ending video frame, determining the video frame at the position corresponding to the video frame to be matched in the first video segment to obtain a target video frame corresponding to the video frame to be matched.

Optionally, the apparatus further comprises:

the key video frame determining module is configured to determine, in the first video, a key video frame corresponding to the video frame to be matched, before the target video frame determining module 903 determines, from the second video, a video frame consistent with a picture fingerprint of the video frame to be matched;

the target video frame determining module 903 is specifically configured to determine an alternative video frame from the second video based on a timestamp and a picture fingerprint of a key video frame corresponding to the video frame to be matched;

and determining a video frame consistent with the picture fingerprint of the video frame to be matched from the candidate video frames as a target video frame.

Optionally, the target video frame determining module 903 is specifically configured to determine, according to a timestamp of a key video frame corresponding to the video frame to be matched, a video segment that includes a video frame corresponding to the timestamp and is a specified duration from the second video, as the second video segment;

Determining a video frame with the consistent picture fingerprint of the key video frame corresponding to the video frame to be matched from the second video segment to obtain an intermediate video frame;

And determining video frames taking the intermediate video frames as key video frames from the second video as alternative video frames.

Optionally, the apparatus further comprises:

The splitting module is configured to split the first video according to a preset time interval before the first fingerprint acquisition module 901 performs acquisition of the picture fingerprints of the video frames to be matched in the first video, so as to obtain a plurality of sub-videos;

the storage module is used for storing the offset of each video frame to be matched relative to the corresponding key video frame in the sub-video, the time stamp of the corresponding key video frame and the picture fingerprint as editing information of the first video to a preset database;

The first fingerprint acquisition module 901 is specifically configured to acquire a picture fingerprint of a video frame to be matched in a first video based on editing information of the first video.

Optionally, the first fingerprint acquisition module 901 is specifically configured to acquire editing information of the first video from a preset database;

Based on the video generating device provided by the embodiment of the invention, a target video frame is determined from a second video according to the picture fingerprint of the video frame to be matched in the first video; and then, according to the editing processing mode corresponding to the video frame to be matched, editing the target video frame to obtain the target video. That is, the second video is automatically edited by the server without manually editing the second video by the user, so that the time cost and the labor cost can be reduced, and the video generation efficiency is further improved.

The embodiment of the present invention further provides a server, as shown in fig. 10, fig. 10 is a schematic structural diagram of the server provided in the embodiment of the present invention, including a processor 1001, a communication interface 1002, a memory 1003, and a communication bus 1004, where the processor 1001, the communication interface 1002, and the memory 1003 complete communication with each other through the communication bus 1004,

A memory 1003 for storing a computer program;

The processor 1001 is configured to implement the steps of any one of the video generating methods in the above embodiments when executing the program stored in the memory 1003.

The communication bus mentioned by the server may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the server and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present invention, there is also provided a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements the video generating method according to any one of the above embodiments.

In a further embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the video generation method of any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, server, computer readable storage medium, computer program product embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the part of the description of method embodiments being relevant.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method of video generation, the method comprising:

2. The method of claim 1, wherein the video frames to be matched comprise: in the first video, a start video frame and an end video frame in video frames subjected to editing processing;

3. The method of claim 1, wherein in the determining, from the second video, a video frame that matches a picture fingerprint of the video frame to be matched, the method further comprising, prior to the target video frame:

4. The method of claim 3, wherein the determining an alternative video frame from the second video based on the timestamp and the picture fingerprint of the key video frame corresponding to the video frame to be matched comprises:

5. The method of claim 1, wherein prior to the acquiring the picture fingerprint of the video frame to be matched in the first video, the method further comprises:

6. The method according to claim 5, wherein the obtaining the picture fingerprint of the video frame to be matched in the first video based on the editing information of the first video includes:

acquiring editing information of the first video from the preset database;

7. A video generating apparatus, the apparatus comprising:

8. The apparatus of claim 7, wherein the video frames to be matched comprise: in the first video, a start video frame and an end video frame in video frames subjected to editing processing;

9. The server is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

A processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.