CN113923477A

CN113923477A - Video processing method, video processing device, electronic equipment and storage medium

Info

Publication number: CN113923477A
Application number: CN202111168023.XA
Authority: CN
Inventors: 乔钢
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-11

Abstract

The disclosure provides a video processing method, a video processing device, an electronic device, a storage medium and a program product, and belongs to the technical field of artificial intelligence, in particular to the technical field of cloud computing. The specific implementation scheme is as follows: determining identification information of the user terminal in response to a request for processing a video to be clipped; acquiring a historical clip video based on the identification information, wherein the historical clip video is a video uploaded by the user terminal after being clipped in a historical time period; determining a target clipping style based on the historical clipped video; and processing the video to be clipped based on the target clipping style to obtain the target clipping video.

Description

Video processing method, video processing device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to the field of cloud computing technology. In particular, it relates to a video processing method, apparatus, electronic device, storage medium and program product.

Background

With the rapid development of the network, the efficiency of information propagation is obviously improved, and further the application development of the nationwide shooting and sharing of video works is promoted. The post-production can be carried out on the shot video works to fully embody the effects of rich content, complete picture, layering sense and rhythm sense of story expression and the like of the video works. However, post-production requires a user to perform editing, time, effort, and the like.

Disclosure of Invention

The present disclosure provides a video processing method, apparatus, electronic device, storage medium, and program product.

According to an aspect of the present disclosure, there is provided a video processing method including: determining identification information of the user terminal in response to a request for processing a video to be clipped; acquiring a history clipping video based on the identification information, wherein the history clipping video is a video uploaded by the user terminal after clipping in a history time period; determining a target clipping style based on the historical clipped video; and processing the video to be clipped based on the target clipping style to obtain a target clipping video.

According to another aspect of the present disclosure, there is provided a video processing apparatus including: an identification determination module for determining identification information of the user terminal in response to a request for processing a video to be clipped; an obtaining module, configured to obtain a history clip video based on the identification information, where the history clip video is a video uploaded by the user terminal after being clipped within a history time period; a style determination module for determining a target clipping style based on the historical clipped video; and the processing module is used for processing the video to be clipped based on the target clipping style to obtain the target clipping video.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary system architecture to which the video processing method and apparatus may be applied, according to an embodiment of the present disclosure;

fig. 2 schematically shows a flow chart of a video processing method according to an embodiment of the present disclosure;

fig. 3 schematically shows a flow diagram of a video processing method according to another embodiment of the present disclosure;

fig. 4 schematically shows an application scenario diagram of a video processing method according to an embodiment of the present disclosure;

fig. 5 schematically shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure; and

fig. 6 schematically shows a block diagram of an electronic device adapted to implement a video processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The disclosure provides a video processing method, an apparatus, an electronic device, a storage medium, and a program product.

According to an embodiment of the present disclosure, a video processing method may include: determining identification information of the user terminal in response to a request for processing a video to be clipped; acquiring a historical clip video based on the identification information, wherein the historical clip video is a video uploaded by the user terminal after being clipped in a historical time period; determining a target clipping style based on the historical clipped video; and processing the video to be clipped based on the target clipping style to obtain the target clipping video.

By utilizing the video processing method provided by the embodiment of the disclosure, aiming at live game videos, teaching videos and the like, the user does not need to manually clip the videos, the personalized target clipping style of the user can be automatically determined according to the historical clipping videos released by the user, and the target clipping videos are obtained according to the target clipping style clipping. Thereby improving the clipping quality and the degree of intelligence of the target clip video.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 1 schematically shows an exemplary system architecture to which the video processing method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the video processing method and apparatus may be applied may include a user terminal, but the user terminal may implement the video processing method and apparatus provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

user terminals

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

user terminals

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

A user may use

user terminals

101, 102, 103 to interact with a server 105 over a network 104 to receive or transmit video information or the like. The

user terminals

101, 102, 103 may have installed thereon various messaging client applications, such as a knowledge reading type application, a web browser application, a search type application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

user terminals

101, 102, 103 may be various electronic devices having a display screen and supporting live webcasting, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for a user to utilize a video to be clipped uploaded by the

user terminals

101, 102, 103. The background management server can carry out clipping processing on the received video to be clipped, and feeds back the target clipping video to other user terminals.

According to the embodiment of the disclosure, the Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that the video processing method provided by the embodiment of the present disclosure may also be generally executed by the server 105. Accordingly, the video processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The video processing method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

user terminals

101, 102, 103 and/or the server 105. Accordingly, the video processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

user terminals

101, 102, 103 and/or the server 105.

It should be understood that the number of user terminals, networks and servers in fig. 1 is merely illustrative. There may be any number of user terminals, networks, and servers, as desired for an implementation.

Fig. 2 schematically shows a flow chart of a video processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, identification information of the user terminal is determined in response to a request for processing a video to be clipped.

In operation S220, a history clip video is acquired based on the identification information, wherein the history clip video is a video uploaded by the user terminal after being clipped within a history time period.

In operation S230, a target clipping style is determined based on the historical clip video.

In operation S240, the video to be clipped is processed based on the target clipping style, resulting in a target clip video.

According to an embodiment of the present disclosure, the user terminal may be various electronic devices having a display screen and supporting live webcasting, such as a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.

According to the embodiment of the present disclosure, the identification information may be any one or a combination of multiple items of information in english, numbers, and characters, or may be information in other forms as long as the information plays a role of unique identification. It should be noted that, this embodiment is not limited to the identification information of the user terminal, and may also be the identification information of the user, such as the identification information of an account number, an identity card number, or a mobile phone number of the user.

According to the embodiment of the present disclosure, based on the identification information, a history clipped video that is consistent with the identification information, that is, a video that is uploaded by a user using a user terminal after being clipped within a history period of time, may be acquired.

According to the embodiment of the disclosure, the video can be edited according to the history of the user, and the favorite and interesting target clipping style of the user can be determined. And automatically processing the video to be clipped based on the target clipping style to obtain the target clipping video. Therefore, the target clip video meets the personalized pursuit of the user, the processing efficiency of the user is improved, and the use experience of the user is improved.

A video processing method such as that shown in fig. 2 will be further described with reference to fig. 3 to 4 in conjunction with the specific embodiments.

According to an embodiment of the present disclosure, the target clipping style may include at least one of: a target character clipping style, a target sound clipping style and a target picture special effect clipping style.

According to an exemplary embodiment of the present disclosure, the target clipping styles may include a target text clipping style, a target sound clipping style, and a target picture special effect clipping style.

According to embodiments of the present disclosure, the target text-clipping style may include at least one of: subtitle font clipping style, subtitle dynamic clipping style and subtitle color clipping style.

According to an embodiment of the present disclosure, the target sound clipping style may include at least one of: sound effect clip style, soundtrack clip style.

According to an embodiment of the present disclosure, the target screen special effect clipping style may include at least one of: filter clip style, picture restoration clip style, transition clip style.

According to the embodiment of the disclosure, the target clipping style is considered from multiple aspects such as characters, sound, special effects of pictures and the like, so that the content of the target clipping video is rich, and the interest and the ornamental property of the target clipping video are improved.

According to an embodiment of the present disclosure, the operation S230 of determining the target clipping style based on the historical clip video may include the following operations.

For example, in response to detecting the presence of text clip information in the historical clipped video, a target text clip style is determined based on the text clip information; in response to detecting that picture special effect clipping information exists in the history clipped video, determining a target picture special effect clipping style based on the picture special effect clipping information; and in response to detecting the presence of sound clip information in the historical clip video, determining a target sound clip style based on the sound clip information.

According to an embodiment of the present disclosure, whether text clip information exists in a history clip video may be detected by the following operations. For example, video frames in a history clip video are extracted, text clip information in the video frames is identified, and in the case where the presence of the text clip information is detected, a text clip style is identified, and a target text clip style is determined.

According to embodiments of the present disclosure, various text recognition methods may be utilized to recognize text clip information in a video frame, which is not limited by the present disclosure.

According to embodiments of the present disclosure, a text-clip style model may be utilized to determine a target text-clip style from text-clip information.

According to embodiments of the present disclosure, a text-clip style model may be generated using a deep-learning neural network model. The character clipping style model can be used for taking an image with character clipping information as training data, dividing font formats, colors, display positions, dynamic effects and the like of the character clipping information into different style type labels, training the character clipping style model based on the training data and the type labels to obtain the trained character clipping style model, and determining the target character clipping style based on the character clipping information better by using the trained character clipping style model.

According to the embodiments of the present disclosure, in the case where no text clip information is detected from the history clip video, it can be made clear that the user does not need to perform processing of adding a text clip such as a subtitle to the video to be clipped, and thus the operation of determining the target text clip style based on the text clip information can be stopped.

According to an embodiment of the present disclosure, whether screen special effect clipping information exists in a history clip video may be detected by the following operations. For example, video frames in a history clip video are extracted, picture special effect clip information in the video frames is identified, and in the case where the presence of the picture special effect clip information is detected, a picture special effect clip style is identified, and further, a target picture special effect clip style is determined.

According to the embodiments of the present disclosure, various image feature extraction methods may be utilized to identify picture special effect clipping information in a video frame, which is not limited by the present disclosure.

According to an embodiment of the present disclosure, a picture effect clipping style model may be utilized to determine a target picture effect clipping style from picture effect clipping information.

For example, the color category and the texture category are obtained based on the picture special effect clipping information using the picture special effect clipping style model. The color category may include a hue category, a saturation category, a brightness category, and the like. The target picture special effect clipping style can be determined based on category information such as color category and texture category.

According to an embodiment of the present disclosure, the picture special effect clipping style model may be generated using a deep-learning neural network model. The picture special effect montage style model can be used for taking a video frame sequence with picture special effect montage information as training data, dividing texture types, color types and the like into different types of class labels, training the picture special effect montage style model based on the training data and the class labels, and obtaining the trained picture special effect montage style model, so that the trained picture special effect montage style model is used for better determining the target picture special effect montage style based on the picture special effect montage information.

According to the embodiments of the present disclosure, in the case where no screen special effect clipping information is detected from the history clipped video, it can be made clear that the user does not need to perform processing of screen special effect clipping such as filter addition, screen restoration, and the like on the video to be clipped, and thus the execution of the operation of determining the target screen special effect clipping style based on the screen special effect clipping information can be stopped.

According to an embodiment of the present disclosure, whether sound clip information exists in a history clip video may be detected by the following operations. For example, audio data in a history clip video is extracted, sound clip information in the audio data is identified, and in the case where the presence of the sound clip information is detected, a sound clip style is identified, thereby determining a target sound clip style.

According to the embodiments of the present disclosure, various voiceprint feature recognition methods may be utilized to recognize sound clip information in audio data, which the present disclosure does not limit.

According to embodiments of the present disclosure, a sound clipping style model may be utilized to identify a sound clipping style from sound clipping information.

According to embodiments of the present disclosure, the sound clip style model may be generated using a deep-learning neural network model. The sound clip style model may be divided by the audio data into category labels of different styles, such as class labels of relaxation, sadness, cheerfulness, tension, etc., further such as class labels of pure music, songs, male voice, female voice, etc., and further such as class labels of a combination of multiple category labels, such as cheerfuly female song. But is not limited thereto. The category labels of the target sound clip style may also be different audio processing parameters, such as a speech rate parameter, a frequency parameter, a volume parameter, etc. Training the sound clipping style model based on the training data and the category labels, resulting in a trained sound clipping style model, for better determination of the target sound clipping style based on the sound clipping information using the trained sound clipping style model.

According to the embodiments of the present disclosure, in the case where sound clip information is not detected from the history clip video, it can be made clear that the user does not need to perform processing of adding a sound clip such as background music, a sound effect, or the like to the video to be clipped, and thus the operation of determining the target sound clip style based on the sound clip information can be stopped from being performed.

According to an embodiment of the present disclosure, the operation S240 of processing the video to be clipped based on the target clipping style to obtain the target clipping video may include the following operations.

For example, in response to determining a target text-clipping style, identifying text information to be presented from a video to be clipped; and processing the video to be edited by using the character information to be displayed according to the target character editing style to obtain the character editing video.

According to the embodiment of the disclosure, the video to be clipped can be decoded, the audio data can be extracted from the decoded video to be clipped, and the audio data can be converted into the text information to be displayed. And adding the character information to be displayed to the video to be edited according to the target character editing style to obtain the character editing video.

For example, the target text clip style is determined to be: the display position is arranged below the picture, adopts the style format of Song dynasty and displays with white color. The text editing video can be obtained by processing the video to be edited by using the text information to be displayed, such as 'acceleration running ahead' according to the target text editing style.

According to the embodiment of the disclosure, under the condition that the style of the target character clip is not determined, it can be clear that the user does not need to perform the processing of adding the character clip such as subtitles to the video to be clipped, so that the execution of recognizing the character information to be displayed from the video to be clipped can be stopped; and processing the video to be edited by using the character information to be displayed according to the target character editing style to obtain the operation of editing the video by using the characters.

For example, in response to determining the target picture special effect clipping style, the video to be clipped is processed in accordance with the target picture special effect clipping style, resulting in a picture special effect clipped video.

According to the embodiments of the present disclosure, the picture special effect clipping processing manner may be determined from the determined target picture special effect clipping style, for example, whether there is a dynamic picture special effect clipping process, whether there is a filter process, whether there is a transition process, or the like is determined from the determined target picture special effect clipping style. Under the condition that a specific picture special effect clipping processing mode is clear, video frames in a video to be clipped are extracted, picture special effect clipping processing is carried out according to picture special effect clipping parameters in a target picture special effect clipping style, and each processed video frame is encoded to finally obtain a picture special effect clipping video.

For example, according to the picture special effect clipping parameters in the target picture special effect clipping style, a filter effect is added to each video frame, or a dynamic picture effect is added to each video frame, and a picture special effect clipped video is finally obtained.

According to the embodiment of the disclosure, under the condition that the target picture special effect clipping style is not determined, it can be clear that the user does not need to perform picture special effect clipping processing such as filter adding on the video to be clipped, and therefore the operation of processing the video to be clipped according to the target picture special effect clipping style to obtain the picture special effect clipped video can be stopped.

For example, in response to determining the target sound clipping style, target sound information matching the target sound clipping style is acquired; and processing the video to be clipped by using the target sound information to obtain the audio clip video.

According to an embodiment of the present disclosure, whether to add background music to a video to be clipped may be determined based on a target sound clipping style. Under the condition that the audio clipping processing is determined, the background music category label can be determined according to the target sound clipping style, the song which is the same as the background music category label is identified from the music database to serve as target sound information matched with the target sound clipping style, and the target sound information is combined with the video to be clipped to obtain the audio clipping video.

For example, a soothing female song is added to a video to be edited as background music, resulting in an audio-edited video, so that the background music emphasizes the atmosphere of the video to be edited.

According to the embodiment of the disclosure, whether audio clipping processing is performed for sound information in a video to be clipped can also be determined based on a target sound clipping style. Under the condition that the audio clipping processing is determined, audio data can be extracted from a video to be clipped, and the audio data is processed according to audio processing parameters in the target sound clipping style, so that target sound information matched with the target sound clipping style is obtained; and combining the target sound information with the video to be clipped to obtain the audio clip video.

For example, the audio data in the video to be clipped is audio clipped, and the voice of the speaker is adjusted to slow down the speech rate and increase the volume, so that the audience can hear more clearly.

According to the embodiment of the disclosure, in the case that the target sound clipping style is not determined, it can be clarified that the user does not need to perform processing of adding a sound clip such as background music to the video to be clipped, and thus acquisition of target sound information matching the target sound clipping style can be stopped; and processing the video to be clipped by using the target sound information to obtain the audio clip video.

Fig. 3 schematically shows a flow diagram of a video processing method according to another embodiment of the present disclosure.

As shown in fig. 3, based on the history clip video 310, an operation of S311 to detect whether there is text clip information, an operation of S312 to detect whether there is screen special effect clip information, and an operation of S313 to detect whether there is sound clip information may be performed.

In operation S321, in the case where the text clip information is detected, a target text clip style is determined. Otherwise, the operation of determining the style of the target character clip is finished.

In operation S322, in the case where the picture special effect clipping information is detected, a target picture special effect clipping style is determined. Otherwise, the operation of determining the special effect clipping style of the target picture is finished.

In operation S323, in the case where the sound clip information is detected, a target sound clip style is determined. Otherwise, the operation of determining the target sound clipping style is ended.

Based on the target text-clip style, the video to be clipped 320 is processed to obtain a text-clip video 331.

And processing the video to be clipped 320 based on the target picture special effect clipping style to obtain a picture special effect clipped video 332.

Based on the target sound clipping style, the video to be clipped 320 is processed, resulting in a sound clip video 333.

The text clip video 331, the screen special effect clip video 332, and the audio clip video 333 are combined to obtain the target clip video 340.

Fig. 4 schematically shows an application scenario diagram of a video processing method according to an embodiment of the present disclosure.

As shown in fig. 4, the video processing method may be applied to the cloud server 410. The cloud server 410 can receive live video of user a in live broadcast sent by the user terminal 420 in real time, acquire historical clip video from a database of the cloud server 410, and determine a target clip style based on the historical clip video. And processing the live video based on the target editing style to obtain the target live video. The user a is not required to manually perform the video clip.

After the target live video is obtained, the cloud server 410 may send the target live video to each live platform server in real time, and then the target live video is transmitted to the other user terminals 430 by the live platform servers, so that the audience B can watch the clipped target live video online.

By using the video processing method provided by the embodiment of the disclosure, a user does not need to manually clip a video, the cloud server can automatically clip the video according to the history released by the user, determine the personalized target clipping style of the user, and clip the video according to the target clipping style to obtain the target live video. And moreover, the live video can be automatically released to each live platform server through the cloud server, so that the editing quality and the intelligent degree of the target live video are improved.

Fig. 5 schematically shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, video processing apparatus 500 may include an identification determination module 510, an acquisition module 520, a genre determination module 530, and a processing module 540.

An identification determination module 510 for determining identification information of the user terminal in response to the request for processing the video to be clipped.

An obtaining module 520, configured to obtain a history clip video based on the identification information, where the history clip video is a video uploaded by the user terminal after being clipped in a history time period.

A style determination module 530 for determining a target clipping style based on the historical clipped video.

And the processing module 540 is configured to process the video to be clipped based on the target clipping style to obtain the target clipped video.

According to an embodiment of the present disclosure, the style determination module may include a character style determination unit, a picture style determination unit, and a sound style determination unit.

And a text style determination unit for determining a target text clip style based on the text clip information in response to detecting the presence of the text clip information in the history clip video.

And a picture style determination unit for determining a target picture special effect clipping style based on the picture special effect clipping information in response to detecting that the picture special effect clipping information exists in the history clipped video.

A sound style determination unit for determining a target sound clipping style based on the sound clipping information in response to detecting that the sound clipping information exists in the history-clipped video.

According to an embodiment of the present disclosure, the processing module may include a character style determination unit, a picture style determination unit, and a sound style determination unit.

And the character recognition unit is used for responding to the determined target character clipping style and recognizing character information to be displayed from the video to be clipped.

And the word processing unit is used for processing the video to be edited by using the word information to be displayed according to the target word editing style to obtain the word editing video.

According to an embodiment of the present disclosure, the processing module may include a picture processing unit.

And the picture processing unit is used for responding to the determined special effect clipping style of the target picture and processing the video to be clipped according to the special effect clipping style of the target picture to obtain the special effect clipped video of the picture.

According to an embodiment of the present disclosure, a processing module may include an acquisition unit, and a sound processing unit.

An acquisition unit configured to acquire target sound information matching a target sound clipping style in response to determining the target sound clipping style.

And the sound processing unit is used for processing the video to be clipped by utilizing the target sound information to obtain the audio clip video.

According to an embodiment of the present disclosure, the target text-clipping style includes at least one of: subtitle font clipping style, subtitle dynamic clipping style and subtitle color clipping style.

According to an embodiment of the present disclosure, the target sound clipping style comprises at least one of: sound effect clip style, soundtrack clip style.

According to an embodiment of the present disclosure, the target screen special effect clipping style includes at least one of: filter clip style, picture restoration clip style, transition clip style.

According to an embodiment of the present disclosure, the processing module may further include a merging unit.

And the merging unit is used for merging the character clip video, the picture special effect clip video and the audio clip video to obtain the target clip video.

According to an embodiment of the present disclosure, a processing module may include a receiving unit, and a processing unit.

According to an embodiment of the present disclosure, a video processing apparatus may be applied to a cloud server.

And the receiving unit is used for receiving live broadcast video which is live broadcast in real time.

And the processing unit is used for processing the live video based on the target clipping style to obtain the target live video.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as a video processing method. For example, in some embodiments, the video processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the video processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the video processing method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A video processing method, comprising:

determining identification information of the user terminal in response to a request for processing a video to be clipped;

acquiring a history clipping video based on the identification information, wherein the history clipping video is a video uploaded by the user terminal after clipping in a history time period;

determining a target clipping style based on the historical clipped video; and

and processing the video to be clipped based on the target clipping style to obtain a target clipping video.

2. The method of claim 1, wherein said determining a target clipping style based on said historical clip video comprises:

in response to detecting the presence of text clipping information in the historical clipped video, determining a target text clipping style based on the text clipping information;

in response to detecting that picture special effect clipping information exists in the history clipped video, determining a target picture special effect clipping style based on the picture special effect clipping information; and

in response to detecting the presence of sound clip information in the historical clip video, a target sound clip style is determined based on the sound clip information.

3. The method of claim 1 or 2, wherein the processing the video to be clipped based on the target clipping style to obtain the target clipping video comprises:

in response to determining the target text-clipping style, identifying text information to be presented from the video to be clipped; and

and processing the video to be edited by using the character information to be displayed according to the target character editing style to obtain the character editing video.

4. The method of claim 3, wherein the processing the video to be clipped based on the target clipping style to obtain the target clipping video comprises:

in response to determining the target sound clipping style, obtaining target sound information that matches the target sound clipping style; and

and processing the video to be clipped by utilizing the target sound information to obtain the audio clip video.

5. The method of claim 4, wherein the processing the video to be clipped based on the target clipping style to obtain a target clipping video further comprises:

and combining the character clip video, the picture special effect clip video and the audio clip video to obtain the target clip video.

6. The method of claim 2, wherein the target text-clipping style comprises at least one of: subtitle font clipping style, subtitle dynamic clipping style and subtitle color clipping style;

wherein the target sound clipping style comprises at least one of: sound effect clip style, dubbing music clip style, dubbing clip style;

wherein the target picture special effect clipping style comprises at least one of: filter clip style, picture restoration clip style, transition clip style.

7. The method of claim 1, wherein the processing the video to be clipped based on the target clipping style to obtain the target clipping video comprises:

receiving live broadcast video in real time; and

and processing the live video based on the target editing style to obtain a target live video.

8. The method according to any one of claims 1-7, wherein the method is applied to a cloud server.

9. A video processing apparatus comprising:

an identification determination module for determining identification information of the user terminal in response to a request for processing a video to be clipped;

an obtaining module, configured to obtain a history clip video based on the identification information, where the history clip video is a video uploaded by the user terminal after being clipped within a history time period;

a style determination module for determining a target clipping style based on the historical clipped video; and

and the processing module is used for processing the video to be clipped based on the target clipping style to obtain a target clipping video.

10. The apparatus of claim 9, wherein the style determination module comprises:

a text style determining unit, configured to determine a target text clipping style based on the text clipping information in response to detecting that text clipping information exists in the history clipped video;

a picture style determination unit configured to determine a target picture special effect clipping style based on picture special effect clipping information in response to detecting that picture special effect clipping information exists in the history clipped video; and

a sound style determination unit for determining a target sound clipping style based on the sound clipping information in response to detecting that sound clipping information exists in the history-clipped video.

11. The apparatus of claim 9 or 10, wherein the processing module comprises:

the character recognition unit is used for responding to the determined target character clipping style and recognizing character information to be displayed from the video to be clipped; and

12. The apparatus of claim 11, wherein the processing module comprises:

an acquisition unit configured to acquire target sound information matching the target sound clipping style in response to determining the target sound clipping style; and

and the sound processing unit is used for processing the video to be clipped by utilizing the target sound information to obtain an audio clip video.

13. The apparatus of claim 12, wherein the processing module further comprises:

14. The apparatus of claim 9, wherein the processing module comprises:

the receiving unit is used for receiving live broadcast video in real time; and

and the processing unit is used for processing the live video based on the target clipping style to obtain a target live video.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.