CN114598940A

CN114598940A - Processing method and processing device for video

Info

Publication number: CN114598940A
Application number: CN202210305685.5A
Authority: CN
Inventors: 李林超
Original assignee: Gaoding Xiamen Technology Co Ltd
Current assignee: Gaoding Xiamen Technology Co Ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-06-07

Abstract

The embodiment of the disclosure provides a processing method and a processing device for a video. In the method, an instruction to add a video cover to a video is received. In response to receiving an instruction to add a video cover to a video, a cover stream is created. And acquiring the video cover according to the instruction. The captured video cover is then added to the cover stream. And finally, encapsulating the envelope stream, the video stream and the audio stream in the video into a video file.

Description

Processing method and processing device for video

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and in particular relates to a processing method and a processing device for videos.

Background

With the rise of short videos, videos are frequently shared among social media or friends, and users can search for videos as needed. To facilitate the presentation of the video, an interesting video cover is often added to the video. The video cover may be a picture or a video frame in the video. The selection of the video cover may be made by the user.

Disclosure of Invention

Embodiments described herein provide a processing method, a processing apparatus, and a computer-readable storage medium storing a computer program for a video.

According to a first aspect of the present disclosure, a processing method for video is provided. In the method, an instruction to add a video cover to a video is received. In response to receiving an instruction to add a video cover to a video, a cover stream is created. And acquiring the video cover according to the instruction. The captured video cover is then added to the cover stream. And finally, encapsulating the envelope stream, the video stream and the audio stream in the video into a video file.

In some embodiments of the present disclosure, adding the captured video cover to the cover feed comprises: in response to the instruction indicating that the target picture is set as a video cover, decoding the target picture; coding the decoded target picture according to a target picture format; and adding the encoded pictures to the envelope stream.

In some embodiments of the present disclosure, adding the captured video cover to the cover feed comprises: in response to the instruction indicating that a target video frame in the video is set as a video cover, decoding the target video frame; encoding the decoded target video frame according to a target picture format; and adding the encoded target video frame to the envelope stream.

In some embodiments of the present disclosure, the acquired video cover includes N video covers. Adding the captured video cover to the cover flow comprises: displaying a plurality of cover stitching templates, wherein each cover stitching template of the plurality of cover stitching templates defines a manner in which N video covers are stitched into one target video cover; receiving a selection of a user for one cover splicing template in a plurality of cover splicing templates; splicing the N video covers into a target video cover according to the selected cover splicing template; and adding the target video cover to the cover stream.

In some embodiments of the present disclosure, N video covers are associated with different picture tags. The processing method further comprises the following steps: merging the picture tags associated with the N video covers into one video tag according to the positions of the N video covers in the selected cover splicing template; in response to receiving a video recommendation request carrying a search keyword, determining whether a video tag is matched with the keyword; and in response to determining that the keyword matches the video tag, displaying recommendation information for the video, wherein the recommendation information includes the target video cover.

In some embodiments of the present disclosure, encapsulating the cover stream with the video stream and the audio stream in the video into a video file comprises: extracting a video stream and an audio stream in a video; decoding the extracted video stream and audio stream; encoding the acquired video cover into a first video frame of the video file; adding the first video frame to the newly created video stream; encoding the decoded video stream; adding the encoded video stream to the newly created video stream; encoding the decoded audio stream; adding the encoded audio stream to the newly created audio stream to synchronize the newly created audio stream with the newly created video stream; and encapsulating the envelope stream with the newly created video stream and the newly created audio stream into a video file.

In some embodiments of the present disclosure, encoding the acquired video cover into a first video frame of a video file comprises: in response to the instruction, setting the target picture as a video cover, and decoding the target picture; cutting a decoded target picture according to the aspect ratio of video frames in the video; scaling the cut target picture to a target size; and encoding the scaled target picture into a video frame as a first video frame of the video file.

In some embodiments of the present disclosure, cropping the decoded target picture according to the aspect ratio of the video frame in the video comprises: receiving a cropping starting point input by a user, wherein the cropping starting point indicates the position of the upper left corner of a cropping frame in a decoded target picture, and the aspect ratio of the cropping frame is the same as that of a video frame; receiving the setting of a user on one of the width and the height of the cutting frame, wherein the other one of the width and the height of the cutting frame is determined according to the width-height ratio of the cutting frame, the upper limit value of the width of the cutting frame is the width of the decoded target picture, and the upper limit value of the height of the cutting frame is the height of the decoded target picture; and cropping the decoded target picture according to the cropping frame from the cropping start point.

According to a second aspect of the present disclosure, a processing apparatus for video is provided. The processing device comprises at least one processor; and at least one memory storing a computer program. When executed by at least one processor, cause a processing device to receive instructions to add a video cover for a video; in response to receiving an instruction to add a video cover to a video, creating a cover stream; acquiring a video cover according to the instruction; adding the obtained video cover to the cover stream; and encapsulating the envelope stream and the video stream and the audio stream in the video into a video file.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the processing device to add the obtained video cover to the cover stream by: in response to the instruction indicating that the target picture is set as a video cover, decoding the target picture; coding the decoded target picture according to the target picture format; and adding the encoded pictures to the envelope stream.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the processing device to add the obtained video cover to the cover stream by: in response to the instruction indicating that a target video frame in the video is set as a video cover, decoding the target video frame; encoding the decoded target video frame according to a target picture format; and adding the encoded target video frame to the envelope stream.

In some embodiments of the present disclosure, the acquired video cover includes N video covers. The computer program, when executed by the at least one processor, causes the processing device to add the acquired video cover to the cover stream by: displaying a plurality of cover stitching templates, wherein each cover stitching template of the plurality of cover stitching templates defines a manner in which N video covers are stitched into a target video cover; receiving a selection of a user for one cover splicing template in a plurality of cover splicing templates; splicing the N video covers into a target video cover according to the selected cover splicing template; and adding the target video cover to the cover stream.

In some embodiments of the present disclosure, N video covers are associated with different picture tags. The computer program, when executed by the at least one processor, causes the processing apparatus to further: merging the picture tags associated with the N video covers into one video tag according to the positions of the N video covers in the selected cover splicing template; in response to receiving a video recommendation request carrying a search keyword, determining whether a video tag is matched with the keyword; and in response to determining that the keyword matches the video tag, displaying recommendation information for the video, wherein the recommendation information includes the target video cover.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the processing device to package the cover stream with the video stream and the audio stream in the video into a video file by: extracting a video stream and an audio stream in a video; decoding the extracted video stream and audio stream; encoding the acquired video cover into a first video frame of a video file; adding the first video frame to the newly created video stream; encoding the decoded video stream; adding the encoded video stream to the newly created video stream; encoding the decoded audio stream; adding the encoded audio stream to the newly created audio stream to synchronize the newly created audio stream with the newly created video stream; and encapsulating the envelope stream with the newly created video stream and the newly created audio stream into a video file.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the processing device to encode the captured video cover into a first video frame of the video file by: in response to the instruction indicating that the target picture is set as a video cover, decoding the target picture; cutting a decoded target picture according to the aspect ratio of video frames in the video; scaling the cut target picture to a target size; and encoding the scaled target picture into a video frame as a first video frame of the video file.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the processing device to crop the decoded target picture by an aspect ratio of a video frame in the video by: receiving a cropping starting point input by a user, wherein the cropping starting point indicates the position of the upper left corner of a cropping frame in a decoded target picture, and the aspect ratio of the cropping frame is the same as that of a video frame; receiving the setting of a user on one of the width and the height of the cutting frame, wherein the other one of the width and the height of the cutting frame is determined according to the width-height ratio of the cutting frame, the upper limit value of the width of the cutting frame is the width of the decoded target picture, and the upper limit value of the height of the cutting frame is the height of the decoded target picture; and cropping the decoded target picture according to the cropping frame from the cropping start point.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method according to the first aspect of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, it being understood that the drawings described below relate only to some embodiments of the present disclosure, and not to limit the present disclosure, wherein:

fig. 1 is an exemplary flow diagram of a processing method for video according to an embodiment of the present disclosure;

FIG. 2 is a schematic view of a cover splicing template according to an embodiment of the present disclosure;

FIG. 3 is an exemplary flow diagram of a process for encapsulating a cover stream with a video stream and an audio stream in a video into a video file in the embodiment shown in FIG. 1; and

fig. 4 is a schematic block diagram of a processing device for video according to an embodiment of the present disclosure.

The elements in the drawings are schematic and not drawn to scale.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are also within the scope of protection of the disclosure.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the presently disclosed subject matter belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the specification and relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. Terms such as "first" and "second" are only used to distinguish one element (or a portion of an element) from another element (or another portion of an element).

The embodiment of the disclosure provides a video processing method. Fig. 1 illustrates an exemplary flowchart of a processing method for video according to an embodiment of the present disclosure. "video" herein refers to a video file that includes a digitized representation of a set of sequential images (i.e., a video stream). The set of consecutive images comprises more than e.g. 24 images within one second. Some video files may also include a digitized representation of audio (i.e., an audio stream) corresponding to the set of sequential images. Embodiments of the present disclosure discuss video files including video streams and audio streams. However, it will be understood by those skilled in the art that video files that include only video streams may also be suitable for use with the processing methods described in this disclosure. The processing method 100 for video is described below with reference to fig. 1.

In the processing method 100, at block S102, an instruction to add a video cover to a video is received. The instructions may include information about the video cover. The video cover can be a picture or a video frame in the video. In examples where the video cover is a picture (which may also be referred to in this context as a target picture), the information may include a path to take the video cover, the path including a file name of the video cover. In examples where the video cover is a video frame in a video (which may also be referred to in this context as a target video frame), this information may include a Presentation Time Stamp (PTS) of the video frame. In one example, the instruction may come from a local user or a remote terminal.

At block S104, it is determined whether an instruction to add a video cover to the video has been received. If an instruction to add a video cover to the video is received ("yes" at block S104), a cover stream is created at block S106. The cover flows created here may be empty flows defining some necessary parameter values required for the cover flows, e.g. storage space, etc. The cover stream is a separate stream independent of the video stream and the audio stream.

At block S108, a video cover is obtained according to the instructions. As described above, the instructions may include information about the video cover. In an example where the video cover is the target picture, the video cover may be acquired through a path of the video cover and stored in a cache. In an example where the video cover is a target video frame, the video frame to be acquired may be determined according to the PTS of the target video frame.

At block S110, the captured video cover is added to the cover stream. In some embodiments of the present disclosure, the target picture is decoded if the instruction indicates that the target picture is set as a video cover. Then, the decoded target picture is encoded according to the target picture format. The encoded target picture may not be output to the outside. Then, the encoded target picture is added to the envelope stream. Specifically, the coded picture is added to a storage space corresponding to the envelope stream.

In some embodiments of the present disclosure, if the instruction indicates that the target video frame in the video is set as a video cover, the target video frame is decoded. In the process of decoding the target video frame, a key frame (also referred to as an I frame) associated with the target video frame can be determined by the attribute information of the target video frame. The attribute information of the video frame comprises information representing whether the target video frame is a key frame. The key frame stores complete information of a frame of picture, and original data of the frame of picture can be obtained only by performing intra-frame decoding during decoding. If the target video frame is a key frame, the target video frame can be directly decoded. The target video frame may also be a non-key frame. In this case, decoding may be performed chronologically starting from the key frame until the target video frame at the PTS is decoded. Then, the decoded target video frame is encoded according to the target picture format. The encoded target video frame may not be output to the outside. The encoded target video frame is then added to the envelope stream. Specifically, the encoded target video frame is added to the storage space corresponding to the cover stream.

In some embodiments of the present disclosure, the acquired video cover may include N video covers. In adding the captured video cover to the cover stream, multiple cover stitching templates may be displayed to the user. Each cover-stitching template defines the manner in which N video covers are stitched into one target video cover. Figure 2 shows a schematic view of a cover splicing template according to an embodiment of the present disclosure. In the example of fig. 2, N is equal to 4. The first cover stitching template 210 defines 4

boxes

211, 212, 213, and 214. The 4 boxes are equal in size and symmetrically distributed. After adding 4 video covers in the 4 boxes, respectively, the 4 video covers can be combined into one target video cover. The second cover stitching template 220 defines 4

boxes

221, 222, 223, and 224. Of these 4 boxes, box 221 is arranged above

boxes

222, 223, and 224 and is larger than

boxes

222, 223, and 224.

Boxes

222, 223, and 224 are equal in size and are arranged in a straight pattern. After adding 4 video covers in the 4 boxes, respectively, the 4 video covers can be combined into one target video cover. The cover splicing templates in fig. 2 are merely exemplary, and cover splicing templates according to embodiments of the present disclosure may have other forms.

The user may select one cover stitching template among the displayed plurality of cover stitching templates. And if the user selection of one cover splicing template in the cover splicing templates is received, splicing the N video covers into a target video cover according to the selected cover splicing template. In the example of fig. 2, where the user selects the first cover-stitching template 210, 4 video covers may be added to

boxes

211, 212, 213, and 214, respectively, to form a target video cover. The target video cover is then added to the cover stream.

The target video cover spliced with the plurality of video covers can embody more video related information, and is helpful for a user to know the video content more quickly.

For a media website including a huge amount of videos, it is inefficient to know the content of the videos only through a video cover. Therefore, video tags can be added to the videos, so that the users can search the video tags according to the keywords to quickly locate the videos. In some cases, video tags may be singular, scattered, and/or unordered and may not facilitate searching according to user-set logic. The embodiment of the disclosure improves the setting mode of the video label.

In some embodiments of the present disclosure, N video covers may be associated with different picture tags. The picture tag indicates the main content of the picture (or video frame), for example in the form of a keyword. In an example where the video cover is a photograph of a star, the picture tag of the video cover may be the name of the star (e.g., large a). In an example where the video cover is a photograph of an attraction, the picture tag of the video cover may be the name of the attraction (e.g., X-park). In an example where a video cover embodies a certain motion, the picture tag of the video cover may be the name of the motion (e.g., a Y-ball). In an example where a video cover embodies a weather condition, the picture tag of the video cover may be the name of the weather condition (e.g., heavy rain). In the above embodiment, the picture tags associated with the N video covers may be combined into one video tag according to the positions of the N video covers in the selected cover stitching template.

In one example, a picture of a large a is added to the frame 211 of the first cover stitching template 210, a picture of a heavy rain is added to the frame 212 of the first cover stitching template 210, a picture of an X park is added to the frame 213 of the first cover stitching template 210, and a picture of a Y ball is added to the frame 214 of the first cover stitching template 210. Then the picture tags associated with these 4 video covers are merged into a video tag-the "big a big rain X park Y-ball" in the order of block 211-.

In another example, a picture of heavy rain is added to the frame 211 of the first cover stitching template 210, a picture of X park is added to the frame 212 of the first cover stitching template 210, a picture of Y-ball is added to the frame 213 of the first cover stitching template 210, and a picture of heavy a is added to the frame 214 of the first cover stitching template 210. Then the picture tags associated with these 4 video covers are merged into a video tag-rainy X park Y ball a in the order of block 211-214.

And under the condition that a video recommendation request carrying the search keyword is received, determining whether the video tag is matched with the keyword. And if the keyword is determined to be matched with the video label, displaying recommendation information of the video. Wherein the recommendation information includes a target video cover.

Video tags incorporating multiple picture tags in a user-specified order can facilitate searching of videos in a user-specified keyword order. The method is particularly suitable for videos spliced with a plurality of video clips. According to the video processing method, the videos spliced according to the sequence specified by the user can be more accurately searched, and the videos are recommended to the user.

Returning to fig. 1, at block S112, the cover stream is packaged with the video stream and the audio stream in the video into a video file. Fig. 3 shows an exemplary flow diagram of a process of encapsulating a cover stream with a video stream and an audio stream in a video into a video file.

At block S302, a video stream and an audio stream in the video are extracted. The video stream and the audio stream are extracted in their original forms in the video and stored in designated storage spaces, respectively. Alternatively, the storage addresses of the video stream and the audio stream may be acquired separately, and extracted frame by frame according to the storage addresses.

At block S304, the extracted video stream and audio stream are decoded. In some embodiments of the present disclosure, multiple threads may be provided to decode the extracted video stream to increase the decoding speed. And a separate decoding thread is provided for the audio stream.

At block S306, the acquired video cover is encoded into the first video frame of the video file. In some embodiments of the present disclosure, the target picture is decoded if the instruction indicates that the target picture is set as a video cover. In some cases, the size (resolution) of the target picture may not coincide with the size of the video frame. If the decoded target picture is directly encoded into the first video frame of the video file, the problem of malformation of the first video frame may occur, thereby affecting the video display effect. In some implementations, the size of the target picture may be scaled directly with reference to the size of the video frame. However, when the aspect ratio of the target picture is not consistent with the aspect ratio of the video frame, the scaling operation may scale the width and height of the target picture to be not more than the width and height of the video frame, which may cause an unnecessary blank area in the scaled target picture (i.e., the first video frame), thereby causing a phenomenon of screen blooming. In this embodiment, the decoded target picture may be cropped according to the aspect ratio of the video frame in the video. Then, the cropped target picture is scaled to the target size. And coding the zoomed target picture into a video frame as the first video frame of the video file. This avoids the problem of the screen-up described above.

In some embodiments of the present disclosure, the decoded target picture may be cropped in the following manner. And receiving a cutting starting point input by a user. Wherein the cropping start point indicates a position of an upper left corner of the cropping frame (e.g., rectangular frame) in the decoded picture. The aspect ratio of the crop box is the same as the aspect ratio of the video frame. User settings for one of a width and a height of the crop box are received. Wherein the other of the width and the height of the crop box is determined according to the aspect ratio of the crop box. The upper limit value of the width of the crop box is the width of the decoded picture. The upper limit value of the height of the crop box is the height of the decoded picture. In one example, the aspect ratio of a video frame is a: b, the width of the decoded picture is 1.1a, and the height is 1.2 b. When the user sets the width of the crop box to 0.9a, the height of the crop box is determined to be 0.9 b. When the user sets the width of the crop box to a, the height of the crop box is determined to be b. When the user sets the width of the crop box to 1.2a, the width of the crop box exceeds the upper limit value, and is therefore adjusted to the upper limit value of 1.1 a. Thus, the height of the crop box is determined to be 1.1 b. After the cropping frame is determined, the decoded picture is cropped according to the cropping frame from the cropping start point.

At block S308, the first video frame is added to the newly created video stream. In some embodiments of the present disclosure, a separate storage space may be defined for the newly created video stream.

At block S310, the decoded video stream is encoded. The decoded video stream is acquired at block S304. The action performed at block S310 may be performed in parallel with the action performed at block S306. Since the first video frame has been added to the newly created video stream at block S308, in the event that the first video frame is sufficiently similar to the first frame in the decoded video stream, the first frame may be encoded with the first video frame as a key frame.

At block S312, the encoded video stream is added to the newly created video stream. The PTS of each video frame in the encoded video stream needs to be incremented by a time increment compared to the PTS in the original video stream (the video stream extracted at block S302). The time increment is the difference between the PTSs of two adjacent video frames.

At block S314, the decoded audio stream is encoded. The decoded audio stream is acquired at block S304. The actions performed at block S314 may be performed in parallel with the actions performed at blocks S306-S312.

At block S316, the encoded audio stream is added to the newly created audio stream to synchronize the newly created audio stream with the newly created video stream. Since the newly created video stream has adjusted the time stamps compared to the original video stream, the audio stream also needs to adaptively adjust the time stamps to keep the sound and picture synchronized.

At block S318, the cover stream is packaged with the newly created video stream and the newly created audio stream into a video file. In some embodiments of the present disclosure, the wrapper stream, the video stream, and the audio stream may be packaged into one video file according to a predetermined packaging format. The video file may display a video cover (video thumbnail) and the video cover may be displayed again when the video file has just been opened and paused in the first frame.

The manner in which the cover stream is set for a single video file facilitates more flexible replacement or changing of video covers to facilitate later editing of the video file.

Returning to fig. 1, if an instruction to add a video cover to the video is not received (no at block S104), the processing procedure ends at block S114. The timing of determining that no instruction to add a video cover to the video has been received may be reaching a specified time point, or receiving other operations on the video file, such as an operation of publishing the video, and the like.

Fig. 4 shows a schematic block diagram of a processing apparatus 400 for video according to an embodiment of the present disclosure. As shown in fig. 4, the processing device 400 may include a processor 410 and a memory 420 in which computer programs are stored. The computer program, when executed by the processor 410, causes the processing apparatus 400 to perform the steps of the method 100 as shown in fig. 1. In one example, the processing apparatus 400 may be a computer device or a cloud computing node. The processing device 400 may receive an instruction to add a video cover for the video. In response to receiving an instruction to add a video cover to the video, processing device 400 may create a cover stream. The processing device 400 may obtain the video cover according to the instruction. The processing device 400 may then add the acquired video cover to the cover stream. Thereafter, the processing device 400 may encapsulate the cover stream with the video stream and the audio stream in the video into a video file.

In some embodiments of the disclosure, the processing device 400 may decode the target picture in response to the instruction indicating that the target picture is set as the video cover. The processing device 400 may encode the decoded target picture in the target picture format. The processing device 400 may add the encoded picture to the envelope stream.

In some embodiments of the disclosure, in response to the instruction indicating that the target video frame in the video is set as the video cover, the processing device 400 may decode the target video frame. The processing device 400 may encode the decoded target video frame in the target picture format. The processing device 400 may add the encoded target video frame to the envelope stream.

In some embodiments of the present disclosure, the acquired video cover includes N video covers. The processing device 400 may display a plurality of cover stitching templates, wherein each cover stitching template of the plurality of cover stitching templates defines a manner in which the N video covers are stitched into one target video cover. The processing device 400 may receive a user selection of one of the plurality of cover stitching templates. The processing device 400 may stitch the N video covers into one target video cover according to the selected cover stitching template. The processing device 400 may add the target video cover to the cover stream.

In some embodiments of the present disclosure, N video covers are associated with different picture tags. The processing device 400 may merge the picture tags associated with the N video covers into one video tag according to the positions of the N video covers in the selected cover stitching template. In response to receiving a video recommendation request carrying a search keyword, the processing device 400 may determine whether the video tag matches the keyword. In response to determining that the keyword matches the video tag, the processing device 400 may display recommendation information for the video. Wherein the recommendation information includes a target video cover.

In some embodiments of the disclosure, the processing device 400 may extract the video stream and the audio stream in the video. The processing device 400 may decode the extracted video stream and audio stream. The processing device 400 may encode the acquired video cover into the first video frame of the video file. The processing device 400 may add the first video frame to the newly created video stream. The processing device 400 may encode the decoded video stream. The processing device 400 may add the encoded video stream to the newly created video stream. The processing device 400 may encode the decoded audio stream. The processing device 400 may add the encoded audio stream to the newly created audio stream to synchronize the newly created audio stream with the newly created video stream. The processing device 400 may package the envelope stream with the newly created video stream and the newly created audio stream into a video file.

In some embodiments of the disclosure, the processing device 400 may decode the target picture in response to the instruction indicating that the target picture is set as the video cover. The processing device 400 may crop the decoded target picture by the aspect ratio of the video frames in the video. The processing device 400 may scale the cropped target picture to the target size. The processing device 400 may encode the scaled target picture into a video frame as a first video frame of the video file.

In some embodiments of the present disclosure, the processing device 400 may receive a crop start point input by a user. Wherein the cropping start point indicates a position of an upper left corner of the cropping frame in the decoded target picture, and an aspect ratio of the cropping frame is the same as an aspect ratio of the video frame. The processing device 400 may receive a user's setting of one of a width and a height of a crop box. Wherein the other of the width and the height of the crop box is determined according to the aspect ratio of the crop box. The upper limit value of the width of the crop box is the width of the decoded target picture. The upper limit value of the height of the crop box is the height of the decoded target picture. The processing device 400 may crop the decoded target picture according to the crop box starting from the crop start point.

In an embodiment of the present disclosure, the processor 410 may be, for example, a Central Processing Unit (CPU), a microprocessor, a Digital Signal Processor (DSP), a processor based on a multi-core processor architecture, or the like. The memory 420 may be any type of memory implemented using data storage technology including, but not limited to, random access memory, read only memory, semiconductor-based memory, flash memory, disk memory, and the like.

Further, in an embodiment of the present disclosure, the processing apparatus 400 may also include an input device 430, such as a microphone, a keyboard, a mouse, etc., for inputting the video file and the video cover. In addition, the processing apparatus 400 may further comprise an output device 440, such as a microphone, a display, etc., for outputting the processed video file.

In other embodiments of the present disclosure, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program, when executed by a processor, is capable of implementing the steps of the method as shown in fig. 1 and 3.

In summary, the processing method for video according to the embodiment of the present disclosure replaces or changes the video cover more flexibly by setting the cover stream for a single video file, so as to facilitate the editing of the video file in the future. The video covers are spliced into the target video cover, so that more video related information can be embodied, and a user can know video content more quickly. Merging multiple picture tags corresponding to multiple video covers into video tags in an order specified by a user may facilitate searching videos in a keyword order specified by the user. The method is particularly suitable for videos spliced with a plurality of video clips, and videos spliced according to the sequence specified by the user can be searched more accurately and recommended to the user. The decoded target picture is cut according to the aspect ratio of the video frame in the video, and then the target picture is zoomed, so that the problem of screen splash possibly caused by setting the target picture as the first frame of the video is avoided.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As used herein and in the appended claims, the singular forms of words include the plural and vice versa, unless the context clearly dictates otherwise. Thus, when reference is made to the singular, it is generally intended to include the plural of the corresponding term. Similarly, the terms "comprising" and "including" are to be construed as being inclusive rather than exclusive. Likewise, the terms "include" and "or" should be construed as inclusive unless such interpretation is explicitly prohibited herein. Where the term "example" is used herein, particularly when it comes after a set of terms, it is merely exemplary and illustrative and should not be considered exclusive or extensive.

Further aspects and ranges of adaptability will become apparent from the description provided herein. It should be understood that various aspects of the present application may be implemented alone or in combination with one or more other aspects. It should also be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.

Several embodiments of the present disclosure have been described in detail above, but it is apparent that various modifications and variations can be made to the embodiments of the present disclosure by those skilled in the art without departing from the spirit and scope of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of processing for video, comprising:

receiving an instruction of adding a video cover to the video;

in response to receiving an instruction to add a video cover to the video, creating a cover stream;

acquiring a video cover according to the instruction;

adding the obtained video cover to the cover stream; and

and encapsulating the envelope stream and the video stream and the audio stream in the video into a video file.

2. The processing method of claim 1, wherein adding the captured video cover to the cover stream comprises:

in response to the instruction indicating that a target picture is set as the video cover, decoding the target picture;

coding the decoded target picture according to the target picture format; and

and adding the coded target picture into the envelope stream.

3. The processing method of claim 1, wherein adding the captured video cover to the cover stream comprises:

in response to the instruction indicating that a target video frame in the video is set as the video cover, decoding the target video frame;

encoding the decoded target video frame according to a target picture format; and

adding the encoded target video frame to the envelope stream.

4. The processing method of claim 1, wherein the captured video cover includes N video covers, and adding the captured video cover to the cover stream includes:

displaying a plurality of cover stitching templates, wherein each cover stitching template of the plurality of cover stitching templates defines a manner in which the N video covers are stitched into one target video cover;

receiving a user selection of one of the plurality of cover stitching templates;

splicing the N video covers into a target video cover according to the selected cover splicing template; and

adding the target video cover to the cover stream.

5. The process of claim 4, wherein the N video covers are associated with different picture tags, the process further comprising:

merging the picture tags associated with the N video covers into one video tag according to the positions of the N video covers in the selected cover splicing template;

in response to receiving a video recommendation request carrying a search keyword, determining whether the video tag is matched with the keyword; and

in response to determining that the keyword matches the video tag, displaying recommendation information for the video, wherein the recommendation information includes the target video cover.

6. The processing method of claim 1, wherein encapsulating the cover stream with a video stream and an audio stream in the video into a video file comprises:

extracting a video stream and an audio stream in the video;

decoding the extracted video stream and audio stream;

encoding the obtained video cover into a first video frame of the video file;

adding the first video frame to the newly created video stream;

encoding the decoded video stream;

adding the encoded video stream to the newly created video stream;

encoding the decoded audio stream;

adding the encoded audio stream to a newly created audio stream to synchronize the newly created audio stream with the newly created video stream; and

and encapsulating the envelope stream, the newly created video stream and the newly created audio stream into the video file.

7. The processing method of claim 6, wherein encoding the captured video cover into a first video frame of the video file comprises:

cutting a decoded target picture according to the aspect ratio of the video frame in the video;

scaling the cut target picture to a target size; and

and coding the zoomed target picture into a video frame as the first video frame of the video file.

8. The processing method of claim 7, wherein cropping the decoded target picture according to the aspect ratio of the video frames in the video comprises:

receiving a cropping starting point input by a user, wherein the cropping starting point indicates the position of the upper left corner of a cropping frame in the decoded target picture, and the aspect ratio of the cropping frame is the same as that of the video frame;

receiving a setting of the user for one of a width and a height of the crop box, wherein the other one of the width and the height of the crop box is determined according to the aspect ratio of the crop box, an upper limit value of the width of the crop box is a width of the decoded target picture, and an upper limit value of the height of the crop box is a height of the decoded target picture; and

and cutting the decoded target picture according to the cutting frame from the cutting starting point.

9. A processing apparatus for video, comprising:

at least one processor; and

at least one memory storing a computer program;

wherein the computer program, when executed by the at least one processor, causes the processing apparatus to perform the steps of the processing method according to any one of claims 1 to 8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the processing method according to any one of claims 1 to 8.