WO2023197814A1

WO2023197814A1 - Video processing method and system, and related device

Info

Publication number: WO2023197814A1
Application number: PCT/CN2023/081604
Authority: WO
Inventors: 童贝; 喻晓源
Original assignee: 华为云计算技术有限公司
Priority date: 2022-04-13
Filing date: 2023-03-15
Publication date: 2023-10-19
Also published as: CN116980644A

Abstract

The present application provides a video processing method and system and a related device; the method comprises the following steps: a video processing system obtaining configuration parameters and a video from a user via a configuration interface, the configuration parameters comprising the video type of the video; then the video processing system outputting a plurality of short videos, wherein the plurality of short videos are obtained following splitting of the video according to the configuration parameters, thus making it so that the plurality of short videos satisfy various requirements of the user; the types of configuration parameters inputted by the user are determined by the type of scenario under which segmentation is necessary, thus achieving a video splitting model which can be universally used across a plurality of scenarios and which satisfies user requirements, thus enhancing the utilization experience of the user.

Description

A video processing method, system and related equipment

This application claims priority to the Chinese patent application submitted to the China Patent Office on April 13, 2022, with application number 202210384788.5 and the application title "A video processing method, system and related equipment", the entire content of which is incorporated by reference. in this application.

Technical field

The present application relates to the field of computers, and in particular to a video processing method, system and related equipment.

Background technique

With the rapid growth of short video content, video viewers’ patience for watching long videos is gradually decreasing. In order to provide video viewers with more exciting video highlights and improve the utilization of users' fragmented time, video splitting technology came into being. Video stripping is a secondary processing of the original video content, splitting the original video content into several video clips as needed, so that users can watch the video clips they are interested in on demand. Video stripping technology can deeply mine valuable information points in long videos and help users understand videos better and faster.

However, video stripping technology is usually limited to a specific scene. For example, the video stripping technology of news videos is usually based on the title of the news, news shot changes and other features. The video stripping technology of film and television dramas is usually based on the subtitles of film and television dramas. Therefore, the current video stripping technology has a single application scenario, low flexibility, and reduces the user experience.

Contents of the invention

This application provides a video processing method, system and related equipment to solve the problems of single application scenario, low flexibility and poor user experience of video stripping technology.

In a first aspect, a video processing method is provided, which method includes the following steps: the video processing system obtains configuration parameters and videos from the user through a configuration interface, the configuration parameters include the video type of the video, and the video processing system outputs multiple short videos, wherein , multiple short videos are obtained after splitting the videos according to the configuration parameters.

In specific implementation, the video processing system can be deployed on a computing device, which can be a bare metal server (Bare Metal Server, BMS), a virtual machine or a container. Among them, BMS refers to a general physical server, such as an ARM server or an A complete computer system running in a completely isolated environment. A container refers to a group of processes that are subject to resource constraints and isolated from each other. The computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.

The above configuration interface can be an application page, a web page or an application programming interface (API) for the user to interact with the video processing system. The video processing system can display the application page or web page on the client's screen. or provide API interface parameters to the user. The user can use the API interface parameters to integrate the video processing system 200 into a third-party system for secondary development.

The above video types may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here. The user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.

Implement the method described in the first aspect, obtain the configuration parameters input by the user through the configuration interface, the configuration parameters at least include the video type of the video, select a pre-trained splitting model according to the video type to split the video input by the user, and output the splitting Multiple short videos after the article, in which the article disassembly model has a corresponding relationship with the video type input by the user, so that the multiple short videos mentioned above can meet the diverse needs of the user, and the user can input the disassembly scenario in which What kind of configuration parameters can be used to achieve a video splitting model that is universal in multiple scenarios and meets user needs, and improves the user experience.

In a possible implementation, the configuration parameters also include stripping features, and the stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.

Among them, scene refers to splitting the video into strips according to different scenes. The multiple short videos after splitting can be short videos in the same scene, such as a school documentary. If the stripping feature is scene, then the multiple short videos after splitting Among the short videos, Short Video 1 is a short video with a classroom scene in a documentary, Short Video 2 is a short video with a dormitory scene in a documentary, Short Video 3 is a short video with a playground scene in a documentary, etc. This application does not make specific limitations. .

Characters refer to splitting videos according to different characters. The multiple short videos after splitting can be short videos of the same character. For example, in a talent show and variety show, if the split feature is a character, then among the multiple short videos after splitting , short video 1 is a performance clip of player A, short video 2 is a performance clip of player B, etc. This application does not make specific limitations.

Subtitles refer to splitting videos according to subtitles. Text recognition technology needs to be combined to determine the content of the split based on the semantic explanation in the subtitles. For example, in news videos, if the split feature is subtitles, then in the multiple short videos after splitting , short video 1 is a fragment of news 1, short video 2 is a fragment of news 2, etc. This application does not make specific limitations.

Action refers to splitting the video into strips according to different actions. The multiple short videos after stripping can be short videos of the same action. For example, in a cultural evening show, if the stripping feature is action, then among the multiple short videos after stripping , short video 1 is all dance program clips, short video 2 is a singing program clip, short video 3 is a sketch program clip, etc., this application does not make specific limitations.

OCR refers to the need to combine image and text recognition technology to split the video. For example, it is necessary to identify the pictures in some scenes in the video to determine the meaning of the scene, such as billboards, traffic signs, etc. This application does not make specific limitations. .

Appearance refers to splitting the video into strips according to different appearances. The multiple short videos after stripping can be short videos with the same appearance. The appearance here can refer to the same appearance of clothes, the same appearance of hats, etc. This application No specific limitation is made.

It should be understood that the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here. Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.

It should be understood that each split feature can be further subdivided. For example, the split feature "action" can be further subdivided into "dancing", "running", "conflict", etc., and "audio" can be further subdivided. Divided into "singing", "quarrel", etc., still taking the above example as an example, the user needs the "dancing" clip in the video, then he can select "action" in the split feature, and then select the "action" category. For the "dancing" feature, it should be understood that the above examples are for illustration and are not specifically limited in this application.

Implementing the above implementation method and splitting the video by obtaining the user's splitting characteristics can meet the diverse needs of different users in the same video type scenario and improve the user experience. It should be understood that for the same video type, different Users also have different concerns when splitting videos. For example, for variety show videos, some users only want to watch performance clips of their favorite actors and stars, some only want to watch dancing clips, and some only want to watch For singing clips, this application obtains the required splitting characteristics from the user through the above configuration interface, and splits the video into strips based on the splitting characteristics, which can meet the diverse needs of users.

In a possible implementation, the configuration parameters also include strip stripping speed.

In specific implementation, the strip-splitting speed can be a speed value or a speed range. The video processing system can determine the strip-splitting speed according to the speed value or speed range input by the user. The strip-splitting speed can be a previously set range, such as 0. ~1s, 1~5s, 5~10s, 10~15s, 15s~20s, 20~30s, etc. For example, if the speed value input by the user is 3s, then the splitting speed can be 1~5s. If the user The input speed value is 4~8s, then the strip removal speed can be 5~10s.

Implementing the above implementation method, by obtaining the stripping speed input by the user, the stripping speed can be used to strip the video, thereby further meeting the user's usage needs and improving the user experience. It should be understood that the faster the video stripping speed, the faster the stripping speed of the video. The lower the stripping accuracy, but some users' needs focus on stripping speed, and some users' needs focus on stripping accuracy. Users can choose the stripping speed according to their own needs, which can improve the user experience.

In a possible implementation, the video type includes an unknown type. When the video type is an unknown type, the method may also include the following steps: the video processing system performs type detection on the video, and obtains the detection type of the video. The video processing system Split the video into strips based on the detection type and configuration parameters, and output multiple short videos.

Optionally, when the configuration interface does not obtain the stripping characteristics input by the user, the video processing system performs feature detection on the video, obtains the stripping characteristics of the video, and strips the video according to the stripping characteristics.

Optionally, the configuration interface can display the pre-set bar splitting speed to the user for selection. If the user does not select the bar splitting speed, the video processing system uses the default bar splitting speed or the user's historical bar splitting speed as user input. The splitting speed is used to split the video into strips and output multiple short videos.

Implementing the above implementation method, if the user does not input or cannot confirm the required video type, stripping characteristics or stripping speed, the video processing system can detect the video, obtain its video type, stripping characteristics or stripping speed, and predict the video type, stripping characteristics or stripping speed. Users may need to split the strips to improve the user experience.

In a possible implementation, the video processing system includes multiple splitting models, where one splitting model corresponds to a video type. The video processing system obtains the splitting model corresponding to the video type, and inputs the video into the splitting model corresponding to the video type. The strip model outputs multiple short videos obtained after splitting the strips.

In the specific implementation, the video and stripping features are input into the stripping model corresponding to the video type, and multiple short videos obtained after stripping are output. Among them, the stripping model is obtained after training the machine learning model using the sample set. The sample The set includes sample input data and sample output data, where the sample input data includes known videos and known features, and the sample output data includes multiple known short videos obtained by splitting known videos using known splitting features. .

Optionally, when the video type is an unknown type, if the video processing system successfully performs type detection on the video and obtains the detection type of the video, the video processing system can use the stripping model corresponding to the detection type to strip the video. Output multiple short videos; if the video processing system fails to detect the type of video and does not obtain the detection type of the video, or the type detection is successful but the confidence of the detection type is very low, in this case, the general splitting model can be used to split the video. , outputting multiple short videos, the above universal bar splitting model can be a common bar splitting model for multiple video types.

In the above implementation, one split model corresponds to one video type, which can meet the needs of users in different application scenarios. Moreover, the model structures used by machine learning models of different video types can be the same or different, depending on the corresponding video. Type determined. For example, the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.

In a possible implementation, the above-mentioned stripping model can strip videos according to different stripping characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding stripping model is variety show stripping. Model, if the user selects the splitting feature as "character" and the uploaded video is an episode of a talent show variety show, then the splitting feature can be based on the character The feature pair is split into strips, and the short video obtained can be all the performance clips of contestant A in the variety show.

Optionally, the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features. For multiple known short videos obtained after stripping, the trained splitting model can split the videos according to the splitting features input by the user.

In specific implementation, the above machine learning models may include but are not limited to CNN, LSTM, Yolo model, SSD model, RCNN model or Fast-RCNN model, etc., which are not specifically limited in this application.

In the above implementation, the splitting model can split videos according to different splitting characteristics, so that the usage needs of different users of the same video type can be met. It should be understood that for the same video type, different users can split videos. The focus of Tieshi is also different. For example, for variety show videos, some users only want to watch performance clips of their favorite actors and stars, some users only want to watch dancing clips, and some users only want to watch singing clips. This application Through the above configuration interface, the required splitting characteristics are obtained from the user, and the video is split into strips based on the splitting characteristics, which can meet the diverse needs of the user.

In a possible implementation, the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 can obtain the video according to the configuration interface. The type determines the corresponding splitting model, and then determines the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then uses the speed splitting model to split the video to obtain multiple short videos. In specific implementation, the structures of the multi-speed bar splitting models under each video type can be the same or different. The details can be determined according to the actual processing situation, and are not specifically limited in this application.

In the above implementation, the stripping model under each video type includes multiple speed stripping models, and different stripping speeds correspond to different speed stripping models, thereby further meeting the user needs and improving the user experience.

In a second aspect, a video processing system is provided. The system includes: an acquisition unit for acquiring configuration parameters and videos from the user through a configuration interface. The configuration parameters include the video type of the video; and a splitting unit for outputting multiple short videos. Video, among which multiple short videos are obtained by splitting the video into strips according to the configuration parameters.

Implement the method described in the second aspect, obtain the configuration parameters input by the user through the configuration interface, the configuration parameters at least include the video type of the video, select a pre-trained splitting model according to the video type to split the video input by the user, and output the splitting Multiple short videos after the article, in which the article disassembly model has a corresponding relationship with the video type input by the user, so that the multiple short videos mentioned above can meet the diverse needs of the user, and the user can input the disassembly scenario in which What kind of configuration parameters can be used to achieve a video splitting model that is universal in multiple scenarios and meets user needs, and improves the user experience.

In a possible implementation, the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences.

In a possible implementation, the system further includes a detection unit, and the video type includes an unknown type. When the video type shown is an unknown type, the detection unit is used to perform type detection on the video and obtain the detection type of the video;

The splitting unit is used to split the video into strips according to the detection type and configuration parameters, and output multiple short videos.

In a possible implementation, when the configuration interface does not obtain the stripping feature input by the user, the detection unit is used to detect features of the video, obtain the stripping feature of the video, and split the video according to the stripping feature. strip.

In a possible implementation, the video processing system includes multiple splitting models, where one splitting model corresponds to one video type, and the splitting unit is used to obtain the splitting model corresponding to the video type, and input the video into the video type The corresponding bar splitting model outputs multiple short videos obtained after splitting the bar.

In one possible implementation, the video and stripping features are input into a stripping model corresponding to the video type, and multiple short videos obtained after stripping are output, where the stripping model uses a sample set to perform a machine learning model Obtained after training, the sample set includes sample input data and sample output data, where the sample input data includes known videos and known features, and the sample output data includes the results obtained by splitting known videos using known splitting features. Multiple known short videos.

In a third aspect, a computing device is provided, including a processor and a memory. The memory is used to store codes. The processor includes functions for executing each module implemented by the chip in the first aspect or any possible implementation of the first aspect. .

In a fourth aspect, a computer storage medium is provided. The computer storage medium stores instructions, which when run on a computing device, cause the computing device to execute the methods described in the above aspects.

A fifth aspect provides a program product containing instructions, including a program or instructions that, when run on a computing device, cause the computing device to perform the methods described in the above aspects.

Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.

Description of the drawings

Figure 1 is a schematic architectural diagram of a video processing system provided by this application;

Figure 2 is a schematic diagram of a bar removal model in a video processing system provided by this application;

Figure 3 is a schematic flow chart of the steps of a video processing method provided by this application;

Figure 4 is an example diagram of a configuration interface in a video processing system provided by this application;

Figure 5 is a schematic flow chart of the steps of the video processing system in an application scenario provided by this application;

Figure 6 is a schematic structural diagram of a computing device provided by this application.

Detailed ways

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

First, the video stripping technology involved in this application will be described.

With the rapid growth of short video content, video viewers’ patience for watching long videos is gradually decreasing. In order to provide video viewers with more exciting video highlights and improve the utilization of users' fragmented time, video splitting technology came into being. Video splitting is a secondary processing of the original video content, splitting the original video content into several video clips as needed, so that users can watch the video clips they are interested in on demand. Video stripping technology can deeply mine valuable information points in long videos and help users understand videos better and faster.

However, video stripping technology is usually limited to a specific scene. For example, the video stripping technology of news videos is usually based on the title of the news, news shot changes and other features. The video stripping technology of film and television dramas is usually based on the subtitles of film and television dramas. Therefore, most video stripping technologies can only play a role in the corresponding application scenarios, resulting in a single application scenario for video stripping technology. One video stripping model cannot strip multiple types of videos, resulting in the platform’s When video stripping, the video stripping model needs to be customized for each application scenario, which is costly and inefficient.

Moreover, in specific scenarios, users have diverse needs. For example, in news video scenarios, some users need fast processing speed, some users need a large number of video splits, and some users need video splits. The analysis is precise. Some users need news videos with specific content, etc., and the video splitting technology in specific scenarios is usually common in that scenario. The model cannot meet the diverse needs of users and has low flexibility.

In summary, it can be seen that since the video stripping technology can only play a role in the corresponding single application scenario, and it is a video stripping technology in a single scenario, it often cannot meet the diverse needs of users. Therefore, how to implement a universal video stripping technology in multiple scenarios? A video stripping model that is accurate and meets user needs is an urgent problem that needs to be solved.

In order to solve the above problems, this application provides a video processing system. Figure 1 is a schematic diagram of the architecture of a video processing system provided by this application. As shown in Figure 1, the architecture includes a client 100, a video processing system 200 and a storage server. 300, wherein a communication connection can be established between the client 100, the video processing system 200 and the storage server 300, which may be a wired connection or a wireless connection, which is not specifically limited in this application. Furthermore, the number of clients 100 and storage servers 300 may be one or more. FIG. 1 takes one client 100 and one storage server 300 as an example. This application does not specifically limit this.

The client 100 can be run on a terminal device held by the user, which can be a computer, a smartphone, a handheld processing device, a tablet computer, a mobile notebook, an augmented reality (AR) device, or a virtual reality (virtual reality) , VR) equipment, integrated handheld devices, wearable devices, vehicle-mounted equipment, smart conference equipment, smart advertising equipment, smart home appliances, etc., there are no specific limitations here.

In specific implementation, the client 100 can be an application client, a web-based client in a browser, an application (APP, APP) client, or an application editing interface (application programming). interface, API), this application does not make specific limitations.

The video processing system 200 can be deployed on a computing device, which can be a bare metal server (Bare Metal Server, BMS), a virtual machine or a container. Among them, BMS refers to a general physical server, such as an ARM server or an A complete computer system running in a completely isolated environment. A container refers to a group of processes that are subject to resource constraints and isolated from each other. The computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.

The storage server 300 may be a server with a storage function. The server may be a physical server such as an ARM server or an X86 server, or a virtual machine, which is not specifically limited in this application. The storage server 300 may be a storage server in a video platform (such as a TV station, a video website, a live broadcast platform, etc.) or a public cloud platform, and is used to store videos to be split and short videos after splitting.

Optionally, the video processing system 200 can also be deployed on the storage server 300. In other words, the storage server 300 has the function of video stripping. The video processing system 200 and the client 100 can also be deployed on the storage server 300. The application is not specifically limited. The client 100 can also be deployed on the storage server 300, and the video processing system can be deployed on other servers. Alternatively, as shown in Figure 1, the client 100 and the video processing system 200 can be deployed on other than the storage server 300. On other servers, this application does not make specific limitations.

In this embodiment of the present application, the client 100 can upload the video to the video processing system 200 to split the video into strips. The video processing system 200 splits the video into strips to obtain multiple short videos, and then returns them to the client 100, or, Store it in the storage server 300. Of course, the storage server 300 can also send the video to the video processing system 200 for video splitting. The video processing system splits the video to obtain multiple short videos, and then returns them to the storage server. 300, or return it to the client 100 for use. The details can be determined according to the actual application scenario, and are not specifically limited in this application.

Optionally, the video processing system 200 can also be deployed in a public cloud to provide users with cloud services for video stripping. For example, users can check the box when purchasing a content delivery network (CDN) service. Choose video As a service, the public cloud platform can use the video processing system 200 to split some videos spread in the CDN network according to user needs. It should be understood that the above examples are for illustration and are not specifically limited in this application.

Further, the video processing system 200 can be divided into multiple unit modules, and each unit module can be a software module or a hardware module, or can be part software module and part hardware module, which is not specifically limited in this application. FIG. 1 is an exemplary division method. As shown in FIG. 1 , the video processing system 200 may include an acquisition unit 210 , a strip splitting unit 220 and a strip splitting model 230 .

The acquisition unit 210 is used to obtain configuration parameters and videos from the user through the configuration interface, where the video may be a long video that the user needs to split, such as an episode of a variety show, a documentary, an interview program, etc.

In specific implementation, the configuration interface may be an application page, web page or API for the user to interact with the video processing system 200. The video processing system 200 may display the application page or web page on the screen of the client 100, or The API interface parameters are provided to the user, and the user can use the API interface parameters to integrate the video processing system 200 into the third-party system for secondary development.

It should be noted that the above-mentioned users can be users who use the stripping service. For example, video website users can input configuration parameters and videos through the application page or web page to use the video stripping service of the video website to split different types of videos. To perform video stripping, the above example is used for illustration and is not specifically limited in this application.

The above-mentioned users can also be development users who integrate the stripping service into a third-party system for secondary development. For example, if the video processing system 200 is deployed in a public cloud, the configuration interface can be the console of the public cloud platform. ) or API. Among them, the console can be a web-based service management system. Users can purchase cloud services through the console and connect to cloud service instances with the function of the video processing system 200. The API can be integrated by users into third-party systems for secondary development, such as The short video platform can establish a connection between the API of this configuration interface and the internal server used to store long videos, so that long videos uploaded by users can automatically be split into strips through this API interface. It should be understood that the above example is for illustration. Applications are not subject to specific restrictions.

If the video processing system 200 is deployed in a video website, the configuration interface can be the console of the video website. Users can upload videos through the console of the video website and enter the above configuration parameters for the video processing system 200 in the video website to use according to the configuration parameters. This configuration parameter splits the video into strips to obtain multiple short videos. It should be understood that the above examples are for illustration and are not specifically limited in this application.

Optionally, the configuration parameter may include a video type of the video, and the video type may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here. The user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.

Optionally, the video type includes an unknown type, where the unknown type may refer to a video type that the user cannot determine, or the unknown type may refer to the user not inputting the video type, that is, the configuration interface does not obtain the video type. When the video type is an unknown type, the detection unit 240 may perform type detection on the video to obtain the detection type of the video. The stripping unit 220 of the video processing system 200 can strip the video according to the configuration parameters and detection type, and output multiple short videos.

Optionally, the configuration parameters may also include stripping features, which may include one or more of scenes, characters, audio, subtitles, actions, optical character recognition (optical character recognition, OCR), and appearance.

Among them, scene refers to splitting the video into strips according to different scenes. The multiple short videos after splitting can be short videos in the same scene, such as a school documentary. If the stripping feature is scene, then the multiple short videos after splitting Short video, short video 1 It is a short video with a classroom scene in the documentary, short video 2 is a short video with a dormitory scene in the documentary, short video 3 is a short video with the playground scene in the documentary, etc. This application does not make specific limitations.

In the specific implementation, the configuration interface can display a variety of stripping characteristics to the user for selection. If the user cannot determine the stripping characteristics or the user does not select the stripping characteristics, that is, if the configuration interface does not obtain the stripping characteristics, the detection Unit 240 can detect the video and obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or determine the feature type input by the user in the video type. , this application does not make specific limitations.

Optionally, the configuration parameters may also include a strip-splitting speed, which may be a speed value or a speed range. The video processing system 200 may determine the strip-splitting speed according to the speed value or speed range input by the user. The strip-splitting speed may be a speed value or a speed range input by the user. It can be a previously set range, such as 0~1s, 1~5s, 5~10s, 10~15s, 15s~20s, 20~30s, etc. For example, if the speed value input by the user is 3s, then The speed of stripping can be 1~5s. If the speed value input by the user is 4~8s, then the speed of stripping can be 5~10s.

In specific implementation, the configuration interface can display the preset stripping speed to the user for the user to choose. If the user does not select the stripping speed, the detection unit 240 can use the default stripping speed or the user's historical stripping speed to evaluate the video. There are no specific limitations in this application for strip stripping.

Optionally, the configuration interface may include a parameter interface and a video interface. The parameter interface obtains the configuration parameters, and the video interface obtains the user's video. The user can first configure the parameters through the parameter interface, and then upload multiple videos through the video interface. Line splitting, for example, the user sets the video type to TV series, the splitting feature to actor A, the splitting speed to 1 to 5 seconds, and then uploads 24 episodes of a certain TV series in sequence. The video processing system 200 can process 24 episodes according to the configuration parameters. The video is split into strips in turn, and a short video of each episode is output. The content in the short video is a performance clip of actor A. It should be understood that the above examples are for illustration and are not specifically limited in this application.

The stripping unit 220 is used to output multiple short videos, wherein the multiple short videos are obtained by stripping the videos according to the above configuration parameters. In specific implementation, the strip splitting unit 220 can output multiple short videos to the user client 100, and can also output multiple short videos to the storage server 300, which is not specifically limited in this application.

In specific implementation, the video processing system 200 may include multiple splitting models 230, and one splitting model corresponds to one video type. The stripping unit 220 can determine the corresponding stripping model according to the video type obtained by the configuration interface, use the stripping model to strip the video, and output multiple short videos. For example, video types may include film and television dramas, variety shows, news, and documentaries. The split model 1 corresponds to the film and television drama type, the split model 2 corresponds to the variety show type, the split model 3 corresponds to the news type, and the split model 4 corresponds to the documentary type.

It should be understood that the above-mentioned bar splitting model can be obtained by training the machine learning model using sample sets of different video types. For example, a bar splitting model of the film and television drama video type is obtained after training the machine learning model with a sample set of film and television drama types. After training the machine learning model on the news type sample set, the news video type stripping model is obtained, and by analogy, the stripping models of multiple video types are obtained.

It should be noted that the model structures used by machine learning models of different video types can be the same or different, and the details can be determined according to the corresponding video types. For example, the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.

For example, in talk shows and conferences, the only characters in the video are usually the host and guests (or participants), and the movements of the host and guests (or participants) do not change much. Video stripping usually does not focus on character movements and scene changes, but focuses more on subtitles or audio. Then the machine learning model structure used in the video matching can focus on the extraction and recognition of speech and text features, rather than on the subtitles or audio. For image recognition, the model structures used by these two video types of machine learning models can be the same or similar.

For another example, variety shows and film and television dramas usually have many characters, the scenes are constantly changing, and the characters' actions are also changeable. Then the machine learning model structure used in this video type can focus on the extraction of scene, face, and action features. and recognition, rather than focusing on speech text recognition. It should be understood that the above examples are for illustration and are not specifically limited in this application.

Optionally, the above-mentioned splitting model can split the video according to different splitting characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding splitting model is a variety show splitting model. If the user If the selected stripping feature is "character" and the uploaded video is an episode of a talent show variety show, then the stripping feature can be split according to the character characteristics, and the short video obtained can be all the performances of contestant A in the variety show. fragment. If the stripping feature selected by the user is "action", then the stripping feature can split the video according to the action feature. The short video obtained after splitting can be a video clip in which the action appears in the variety show. For example, the action user sets is "dance", then the short video obtained after splitting the strip can be a collection of performances by dancers in the variety show. It should be understood that the above examples are for illustration and are not specifically limited in this application.

It should be understood that for the same video type, different users have different concerns when detaching videos. For example, for variety show videos, some users only want to watch the performance clips of their favorite actors and stars, and some users only want to watch the performance clips of their favorite actors and stars. want to see dancing Clips. Some users only want to watch singing clips. This application obtains the required splitting characteristics from the user through the above configuration interface, and splits the video into strips based on the splitting characteristics, which can meet the diverse needs of users.

In specific implementation, the above machine learning models may include but are not limited to convolutional neural networks (CNN) models, long short-term memory networks (LSTM) models, one-stage unified real-time target detection (you only look once: unified, real-time object detection (Yolo) model, single shot multi box detector (SSD) model, region convolutional neural network (RCNN) model or fast region Convolutional neural network (fast region convolutional neural network, Fast-RCNN) model, etc. are not specifically limited in this application.

Optionally, the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 may determine the corresponding stripping model according to the video type obtained by the configuration interface. Split the model, and then determine the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then use the speed splitting model to split the video to obtain multiple short videos. In specific implementation, the structures of the multi-speed bar splitting models under each video type can be the same or different. The details can be determined according to the actual processing situation, and are not specifically limited in this application.

Exemplarily, FIG. 2 is an example diagram of a stripping model stored in the video processing system provided by this application. As shown in FIG. 2 , the multiple stripping models 230 in the video processing system 200 shown in FIG. 1 can be as shown in FIG. 2 Among them, the strip model 11, the strip model 12, the strip model 21 and the strip model 22, among which the video types of the strip model 11 and the strip model 12 are type 1, and the videos of the strip model 21 and the strip model 22 are The type is type 2, the strip stripping speeds of the strip stripping models 11 and 21 are speed 1, and the strip stripping speeds of the strip stripping models 12 and 22 are speed 2.

Among them, each splitting model can correspond to different video types and video speeds, and the corresponding splitting model can be selected according to the configuration parameters input by the user for video splitting. The input data of each splitting model includes the video to be split and splitting features. The output data is multiple short videos obtained after splitting the video using the splitting features, such as splitting feature 1 and video input splitting. After model 11, multiple short videos of split feature 1 are obtained. After split feature 2 and video are input into split model 11, multiple short videos of split feature 2 are obtained, and so on. I will not go into details here. For example, if the video type selected by the user through the configuration interface is video type 1, the stripping feature is stripping feature 2, and the stripping speed is speed 2, then the video processing system can select the configuration parameters in Figure 2 based on the configuration parameters input by the user. By inputting the stripping feature 2 and the video into the stripping model 12, multiple short videos of the stripping feature 2 can be obtained. In this way, the multiple short videos finally output are obtained by splitting the videos based on the video type and the splitting characteristics and splitting speed required by the user. This meets the diverse needs of the users to the greatest extent and improves the user experience.

It should be understood that FIG. 2 is used for illustration. The stripping model 230 in the video processing system 200 may also include more or less video types, video speeds, and stripping characteristics, which are not specifically limited in this application.

In summary, it can be seen that the video processing system provided by this application obtains the configuration parameters input by the user through the configuration interface. The configuration parameters at least include the video type of the video. According to the video type, a pre-trained splitting model is selected to split the video input by the user. strips, and outputs multiple short videos after splitting. The splitting model has a corresponding relationship with the video type input by the user, so that the above multiple short videos can meet the diverse needs of users, and what scenarios the users need to perform in For disassembly, you enter the configuration parameters to achieve a video strip dismantling model that is universal in multiple scenarios and meets user needs, and improves the user experience.

Figure 3 is a schematic flow chart of the steps of a video processing method provided by this application. The method can be applied to the video processing system 200 shown in Figure 1. As shown in Figure 3, the method can include the following steps:

Step S310: The video processing system 200 obtains configuration parameters and videos from the user through the configuration interface. Among them, video can Therefore, users need to split long videos, such as a TV series, a variety show, an interview video, a documentary, etc.

Among them, the video processing system 200 can be deployed on a server or a public cloud. The server can be one of a physical server, a virtual machine, a container, and an edge computing device. For specific deployment methods, refer to the video processing system in the embodiment of Figure 1 The description of 200 will not be repeated here.

In specific implementation, the above configuration interface may be an application page, web page or API for the user to interact with the video processing system 200. The video processing system 200 may display the application page or web page on the screen of the client 100, Or the API interface parameters are provided to the user, and the user can use the API interface parameters to integrate the video processing system 200 into the third-party system for secondary development. For details, reference may be made to the description of the configuration interface in the embodiment of Figure 1, which will not be repeated here.

Optionally, the configuration parameter may include a video type, and the video type may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here. The user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.

Optionally, the video type includes an unknown type, where the unknown type may refer to a video type that the user cannot determine, or the unknown type may refer to the user not inputting the video type, that is, the configuration interface does not obtain the video type. When the video type is an unknown type, the video processing system 200 may perform type detection on the video to obtain the detected type of the video. The video processing system 200 can split the video into strips according to the configuration parameters and detection type, and output multiple short videos.

Optionally, the configuration parameters may also include stripping features, which may include one or more of scenes, characters, audio, subtitles, actions, OCR, and appearance. It should be understood that the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here. Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.

It should be understood that each split feature can be further subdivided. For example, the split feature "action" can be further subdivided into "dancing", "running", "conflict", etc., and "audio" can be further subdivided. Divided into "singing", "quarrel", etc., still taking the above example as an example, if the user needs a "dancing" clip in the video, then he can select "action" in the split feature, and then select "action" under the category For the "dancing" feature, it should be understood that the above examples are for illustration and are not specifically limited in this application.

In the specific implementation, the configuration interface can display a variety of stripping characteristics to the user for selection. If the user cannot determine the stripping characteristics or the user does not select the stripping characteristics, that is, the configuration interface does not obtain the stripping characteristics, the video The processing system 200 can detect the video and obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or determine the feature input by the user in the video type. type, and then split the video according to the splitting characteristics, which is not specifically limited in this application.

In specific implementation, the configuration interface can display the preset bar splitting speed to the user for the user to choose. If the user does not select the bar splitting speed, the video processing system 200 can use the default bar splitting speed or the user's historical bar splitting speed. The video is divided into strips, and this application does not impose specific restrictions.

Step S320: The video processing system 200 outputs multiple short videos, wherein the multiple short videos are obtained by splitting the videos according to the configuration parameters. The video processing system 200 can output the multiple short videos to the client 100, or Multiple short videos can be output to the storage server 300, which is not specifically limited in this application. For specific descriptions of the client 100 and the storage server 300, reference can be made to the embodiment of FIG. 1, and the details will not be repeated here.

In specific implementation, the video processing system 200 may include multiple splitting models, and one splitting model corresponds to one video type. The stripping unit 220 can determine the corresponding stripping model according to the video type obtained by the configuration interface, use the stripping model to strip the video, and output multiple short videos.

In specific implementation, when the video type is an unknown type, if the video processing system 200 successfully performs type detection on the video and obtains the detection type of the video, the video processing system 200 can use the splitting model corresponding to the detection type to split the video. strips to output multiple short videos; if the video processing system 200 fails to perform type detection on the video and does not obtain the detection type of the video, or the type detection is successful but the confidence of the detection type is very low, at this time, the general stripping model can be used to detect the video To perform splitting and output multiple short videos, the above universal splitting model can be a splitting model common to multiple video types.

Optionally, the above-mentioned splitting model can split the video according to different splitting characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding splitting model is a variety show splitting model. If the user If the selected stripping feature is "character" and the uploaded video is an episode of a talent show variety show, then the stripping feature can be split according to the character characteristics, and the short video obtained can be all the performances of contestant A in the variety show. fragment.

It should be understood that for the same video type, different users have different concerns when detaching videos. For example, for variety show videos, some users only want to watch the performance clips of their favorite actors and stars, and some users only want to watch the performance clips of their favorite actors and stars. Some users only want to watch dancing clips, and some users only want to watch singing clips. This application obtains the required splitting characteristics from the user through the above configuration interface, and splits the video according to the splitting characteristics, which can meet the diverse needs of users.

Optionally, the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 may determine the corresponding stripping model according to the video type obtained by the configuration interface. Split the model, and then determine the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then use the speed splitting model to split the video to obtain multiple short videos. In the specific implementation, multiple speed split modes under each video type The structures of the types can be the same or different, and the details can be determined according to actual processing conditions, and are not specifically limited in this application.

Exemplarily, FIG. 2 is an example diagram of a stripping model stored in the video processing system provided by this application. As shown in FIG. 2, the video processing system 200 shown in FIG. 1 includes stripping models 230 for multiple videos, such as in FIG. 2. Splitting model 11, splitting model 12, splitting model 21 and splitting model 22. Among them, the video type of splitting model 11 and splitting model 12 is type 1, and the video type of splitting model 21 and splitting model 22 is type 1. The video type is type 2, the strip-splitting speeds of strip-splitting models 11 and 21 are speed 1, and the strip-splitting speeds of strip-splitting models 12 and 22 are speed 2.

Among them, each splitting model can correspond to different video types and video speeds, and the corresponding splitting model can be selected according to the configuration parameters input by the user for video splitting. The input data of each splitting model includes the video to be split and the splitting features. The output data is multiple short videos obtained after splitting the video using the splitting features, such as splitting feature 1 and video input splitting. After model 11, multiple short videos of split feature 1 are obtained. After split feature 2 and video are input to split model 11, multiple short videos of split feature 2 are obtained, and so on. I will not go into details here. For example, if the video type selected by the user through the configuration interface is video type 1, the stripping feature is stripping feature 2, and the stripping speed is speed 2, then the video processing system can select the configuration parameters in Figure 2 based on the configuration parameters input by the user. By inputting the stripping feature 2 and the video into the stripping model 12, multiple short videos of the stripping feature 2 can be obtained. In this way, the multiple short videos finally output are obtained by splitting the videos based on the video type and the splitting characteristics and splitting speed required by the user. This meets the diverse needs of the users to the greatest extent and improves the user experience.

In order to enable the present application to be better understood, the step process described in the above steps S310 to S320 is illustrated below with reference to the specific application scenarios shown in FIGS. 4 to 5 .

Figure 4 illustrates an example diagram of a configuration interface. The configuration interface is a console in the form of a web page or an application program. The console can be a console of a public cloud platform. It should be understood that Figure 4 is used for illustration. This application In the provided solution, the console can also be the console of a non-public cloud platform, and the configuration interface can also be in the form of an API, which is not specifically limited in this application.

As shown in FIG. 4 , the web page or application program interface of the configuration interface at least includes a video type selection area 410 , a splitting feature selection area 420 , a splitting speed selection area 430 , an upload video area 440 and a control area 450 .

Among them, the video type selection area 410 is used for the user to select the video type of the video. For example, the video type selection area 410 in Figure 4 shows the user "movies and TV series", "news", "variety shows", "unknown" "Type" and other video types, it should be understood that the configuration interface can also display more video types to the user. For example, dragging down the progress bar in the video type selection area 410 in Figure 4 can display more types of video types.

Optionally, if the user does not select a video type in the video type selection area 410, the video type selection area 410 can set the video type to the "unknown type" option by default. Alternatively, if the user cannot determine the video type, he or she can also select the video type in the video type selection area 410. 410, the "unknown type" option is selected, and the video processing system 200 can perform processing on the video uploaded by the user. Video type detection, splitting the video according to the detection type and configuration parameters input by the user (such as splitting characteristics and splitting speed).

The stripping feature selection area 420 is used for the user to select stripping features of the video. For example, the stripping feature selection area 420 in Figure 4 shows the user "Characters", "Scenes", "Subtitles", and "Others" Waiting for the stripping feature, it should be understood that the configuration interface can also display more stripping features to the user. For example, dragging down the progress bar in the stripping feature selection area 420 in Figure 4 can display more types of stripping features. feature.

It should be understood that each strip feature can be further subdivided. For example, in Figure 4, the strip feature "character" can be further divided into "actors", "actresses", "upload photos of actors", etc. If "male actor" is selected as the stripping feature, the multiple short videos obtained after splitting the video can be video clips that include male actors in the video. If "actress" is selected as the stripping feature, the video will be splitted. The multiple short videos obtained in the end can be video clips including actresses in the video. If "upload photos of actors" is selected as the splitting feature, the user can upload a screenshot of an actor in the video, and multiple videos can be obtained after splitting the video. A short video can be a video clip including the actor. It should be understood that the above examples are for illustration and are not specifically limited in this application.

Optionally, if the user does not select the stripping feature in the stripping feature selection area 420, the stripping feature selection area 420 can set the stripping feature to the "other" option by default, and the video processing system 200 can detect the video uploaded by the user. , to obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or the feature type historically input by the user, which is not specifically limited in this application.

The stripping speed selection area 430 is used for the user to select the stripping speed of the video. For example, the stripping speed selection area 430 in Figure 4 shows the user "0 to 1 second", "2 to 5 seconds", " "6-10 seconds", "11-15 seconds", "Others" and other bar-splitting speeds. It should be understood that the configuration interface can also show the user more bar-splitting speeds, such as dragging down to select the bar-splitting speed in Figure 4. The progress bar in area 430 can display more types of bar splitting speeds.

Optionally, if the user does not select the stripping speed in the stripping speed selection area 430, the stripping speed selection area 430 can set the stripping speed to the "other" option by default, and the video processing system 200 can detect the video uploaded by the user. , to obtain the stripping speed of the video, where the stripping speed can be the most commonly used stripping speed under the video type and stripping characteristics, or the stripping speed historically input by the user, and is not specifically limited.

The upload video area 440 is used for users to upload videos, which are long videos to be split. Optionally, if the above-mentioned video processing system 200 is deployed in the public cloud, the video is a video uploaded by the user to the object storage service (OBS), and the video processing system 200 can download the video from the OBS bucket bound by the user. Video splitting does not specifically limit this.

The control area 450 includes a "Save Configuration" control and a "Start Splitting" control, where the "Save Configuration" is used to save the user input in the video type selection area 410, the splitting feature selection area 420, and the splitting speed selection area 430. Parameter configuration, the "Start splitting" control is used to respond to the user's operation and start splitting the video using the above parameter configuration.

For example, in the configuration interface shown in Figure 4, in the video parameters input by the user, the video type is "movie and TV drama", the teardown feature is "upload actor photos" (assuming that the uploaded actor photo is actor A), and the teardown feature The speed is "2 to 5 seconds". Then the processing flow of the video processing system 200 in this application scenario can be shown in Figure 5.

FIG. 5 is a schematic flowchart of steps of a video processing method in an application scenario provided by this application. The application scenario may be the application scenario shown in FIG. 4 . As shown in Figure 5, the method may include the following steps:

Step 1. Enter the video, where the video is a long video input by the user to be split.

Step 2. Determine if the user selects the video type. In the case of yes, determine the splitting model corresponding to the video type selected by the user. In the application scenario shown in Figure 4, the video type selected by the user is "movie and television drama", so the step in the flow chart shown in Figure 5 After determining the video type in step 2, perform step 3.

In other application scenarios, if the user does not select a video type, that is, if No, steps 7, 8, and 4 can be performed.

Step 3. Obtain the film and television drama type stripping model. It should be understood that with reference to the foregoing content, the video processing system 200 includes multiple stripping models, wherein one stripping model corresponds to one video type. In the application scenario shown in Figure 4, the user The selected video type is "film and television drama", so step 3 obtains the strip model of the film and television drama type.

Step 4. Determine whether the user selects the strip feature. If yes, perform step 5. If no, perform step 9 and step 5. It should be understood that in the application scenario shown in Figure 4, the stripping feature selected by the user is "person", so the stripping feature determined in step 4 of the flowchart shown in Figure 5 is "person".

Step 5. Determine whether the user selects the bar splitting speed. If yes, perform step 6. If no, perform step 10 and step 6. It should be understood that the strip stripping speed selected by the user in the application scenario shown in Figure 4 is "2 to 5 seconds", so the strip stripping speed determined in step 5 of the flow chart shown in Figure 5 is "2 to 5 seconds".

Step 6. Select the corresponding stripping model to strip the video. Among them, the bar splitting model selected in step 6 corresponds to the configuration parameters of the user data. The configuration parameters include "movie and TV drama" (video type), "character" (bar splitting characteristics) and "2~5 seconds" (bar splitting speed) .

It can be seen from the embodiment of FIG. 2 that the video processing system 200 includes a plurality of splitting videos, each splitting video corresponds to a video type and a splitting speed. The splitting model under the film and television drama type may include "0 to 1 second" splitting. The film and television drama stripping model corresponding to the stripping speed, the film and television drama stripping model corresponding to the "2-5" second stripping speed, etc. I will not give examples one by one here. According to the splitting speed, you can choose the movie and TV drama splitting model corresponding to the "2~5" second splitting speed, and use this splitting model to split the video. Among them, during the training process of the film and television drama strip splitting model corresponding to the 2-5" second strip stripping speed, the sample set used includes sample input data and sample output data, where the sample input data includes known videos and known features. The sample output data includes multiple known short videos obtained by splitting the known videos using the known splitting features. The trained splitting model can split the videos according to the splitting features input by the user. .

Therefore, after inputting the "character" selected by the user and the video uploaded by the user into the film and television drama splitting model corresponding to the "2-5" second splitting speed, multiple short videos containing the character can be output, such as If the user selects an image of actor A that he uploaded himself, then the multiple short videos output can include multiple short videos of actor A.

Step 7. Detect the video type. It should be understood that if the user does not select the video type in step 2 or the user selects an unknown type, step 7 can be performed to detect the video type. If the detection is successful, perform steps 3 and 4. Determine the splitting model corresponding to the detection type. If the detection fails, perform steps 8 and 4.

Step 8. Obtain a general model. It is understandable that the categories of some videos are not very clear and may not be able to detect the video type of the video. In this case, a general splitting model can be used to detect the video.

Step 9. The system selects the stripping feature. It should be understood that if the user does not select the stripping feature in step 4, step 9 can be performed, and the system detects the stripping features commonly used in this video type, or the stripping features selected by the user in history. Features, etc., then proceed to step 5.

Step 10. The system selects the strip splitting speed. It should be understood that if the user does not select the stripping speed in step 5, step 10 can be performed, and the system detects the stripping speed commonly used under the video type and stripping characteristics, or the stripping speed selected by the user in history, etc., and then Go to step 6.

It should be understood that Figure 5 is used for illustration and is not specifically limited in this application.

In summary, it can be seen that the video processing method provided by this application obtains the configuration parameters input by the user through the configuration interface. The configuration parameters at least include the video type of the video. According to the video type, the pre-trained splitting model is selected to view the user input. Frequently perform splitting and output multiple short videos after splitting. Among them, the splitting model has a corresponding relationship with the video type input by the user, thereby realizing a video splitting model that is universal in multiple scenarios and meets user needs, improving User experience.

FIG. 6 is a schematic structural diagram of a computing device provided by this application. The video processing system 200 described in the embodiments of FIGS. 1 to 5 can be deployed on the computing device 600 shown in FIG. 6 .

Further, the computing device 600 includes a processor 601, a storage unit 602, a storage medium 603 and a communication interface 604, wherein the processor 601, the storage unit 602, the storage medium 603 and the communication interface 604 communicate through the bus 605 and also through wireless transmission. and other means to achieve communication.

The computing device can be a BMS, virtual machine or container. Among them, BMS refers to a general physical server, such as an ARM server or an A system and a container refer to a group of processes that are subject to resource constraints and isolated from each other. The computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.

The processor 601 is composed of at least one general-purpose processor, such as a CPU, an NPU, or a combination of a CPU and a hardware chip. The above-mentioned hardware chip is an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination thereof. The above-mentioned PLD is a complex programmable logic device (Complex Programmable Logic Device, CPLD), a field-programmable gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL) or any combination thereof. The processor 601 executes various types of digital storage instructions, such as software or firmware programs stored in the storage unit 602, which enables the computing device 600 to provide a wide variety of services.

In specific implementation, as an embodiment, the processor 601 includes one or more CPUs, such as CPU0 and CPU1 shown in FIG. 6 .

In a specific implementation, as an embodiment, the computing device 600 also includes multiple processors, such as the processor 601 and the processor 606 shown in FIG. 6 . Each of these processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor here refers to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).

The storage unit 602 is used to store program codes, and is controlled and executed by the processor 601 to perform the processing steps of the video processing system 200 in any of the above embodiments in FIGS. 1 to 6 . The program code includes one or more software units. The one or more software units mentioned above are the acquisition unit and strip-splitting unit in the embodiment of Figure 1. The user of the acquisition unit provides a configuration interface to the user, and the strip-splitting unit is used to configure according to the configuration. Parameters to split the video into strips. For specific implementation methods, refer to the embodiments in Figures 1 to 5, which will not be described again here.

Storage unit 602 includes read-only memory and random access memory, and provides instructions and data to processor 601. Storage unit 602 also includes non-volatile random access memory. Storage unit 602 is volatile memory or non-volatile memory, or includes both volatile and non-volatile memory. Among them, non-volatile memory is read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically erasable programmable read-only memory. memory (electrically EPROM, EEPROM) or flash memory. Volatile memory is random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are used, such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM). Or hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc. The hard disk is a hard disk drive (HDD) , solid state disk (SSD), mechanical hard disk (HDD), etc., this application does not make specific limitations.

Storage medium 603 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc. The hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application.

The communication interface 604 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.

Bus 605 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express). link, CXL), cache coherent interconnect for accelerators, CCIX, etc. The bus 605 is divided into an address bus, a data bus, a control bus, etc.

In addition to the data bus, the bus 605 also includes a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, the various buses are labeled bus 605 in the figure.

It should be noted that FIG. 6 is only a possible implementation manner of the embodiment of the present application. In actual applications, the computing device 600 may also include more or fewer components, which is not limited here. For contents not shown or described in the embodiments of the present application, please refer to the relevant explanations in the embodiments of FIGS. 1 to 5 , and will not be described again here.

Embodiments of the present application provide a computer storage medium in which instructions are stored; when the instructions are run on a computing device, the computing device is caused to execute the video processing method described in the embodiments of FIGS. 1 to 5 .

Embodiments of the present application provide a program product containing instructions, including programs or instructions. When the program or instructions are run on a computing device, the computing device executes the video processing method described in the embodiments of FIGS. 1 to 5 .

The above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination. When implemented using software, the above-described embodiments are implemented in whole or in part in the form of a computer program product. A computer program product includes at least one computer instruction. When computer program instructions are loaded or executed on a computer, processes or functions according to embodiments of the present invention are generated in whole or in part. The computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device. Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center. Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection. The media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media. The semiconductor medium is SSD.

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalent repairs or replacements within the technical scope disclosed by the present invention. , these repairs or replacements should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

A video processing method, characterized in that the method includes:

The video processing system obtains configuration parameters and videos from the user through a configuration interface, where the configuration parameters include the video type of the video;

The video processing system outputs multiple short videos, wherein the multiple short videos are obtained by splitting the video into strips according to the configuration parameters.
The method according to claim 1, wherein the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations and conferences.
The method according to claim 1 or 2, characterized in that the configuration parameters also include stripping features, and the stripping features include one of scenes, characters, audio, subtitles, actions, optical character recognition OCR and appearance. Kind or variety.
The method according to any one of claims 1 to 3, characterized in that the configuration parameters further include strip stripping speed.
The method according to any one of claims 1 to 4, characterized in that the video type also includes an unknown type. When the video type is an unknown type, the method further includes:

The video processing system performs type detection on the video to obtain the detection type of the video;

The video processing system outputs multiple short videos including:

The video processing system splits the video into strips according to the detection type and the configuration parameters, and outputs the multiple short videos.
The method according to any one of claims 1 to 3, characterized in that, in the case where the configuration interface does not obtain the bar splitting feature input by the user, the method further includes:

The video processing system performs feature detection on the video, obtains stripping features of the video, and strips the video according to the stripping features.
The method according to any one of claims 1 to 6, characterized in that the video processing system includes multiple splitting models, wherein one splitting model corresponds to one video type;

The video is split into strips according to the configuration parameters, and the multiple split videos are output including:

The video processing system obtains the splitting model corresponding to the video type, inputs the video into the splitting model corresponding to the video type, and outputs multiple short videos obtained after splitting.
The method according to claim 7, wherein said inputting the video into the splitting model corresponding to the video type to obtain the plurality of split videos includes:

The video and the stripping feature are input into the stripping model corresponding to the video type, and multiple short videos obtained after stripping are output. The stripping model is obtained by training a machine learning model using a sample set. Obtained, the sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known stripping features, and the sample output data includes the known stripping features using the known stripping features. Multiple known short videos obtained after splitting the above known videos.
A video processing system, characterized in that the system includes:

An acquisition unit, configured to acquire configuration parameters and videos from the user through the configuration interface, where the configuration parameters include the video type of the video;

A splitting unit is configured to output multiple short videos, wherein the multiple short videos are obtained by splitting the video according to the configuration parameters.
The system according to claim 9, wherein the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations and conferences.
The system according to claim 9 or 10, characterized in that the configuration parameters further include stripping features, and the stripping features include one of scenes, characters, audio, subtitles, actions, optical character recognition (OCR) and appearance. Kind or variety.
The system according to any one of claims 9 to 11, wherein the configuration parameters further include strip stripping speed.
The system according to any one of claims 9 to 12, characterized in that the system further includes a detection unit, the video type includes an unknown type, and when the video type is an unknown type, the detection unit A unit configured to perform type detection on the video and obtain the detection type of the video;

The stripping unit is used to strip the video into strips according to the detection type and the configuration parameter, and output the multiple short videos.
The system according to any one of claims 9 to 13, characterized in that, when the configuration interface does not obtain the stripping feature input by the user, the detection unit is used to detect the video Perform feature detection to obtain stripping features of the video, and strip the video according to the stripping features.
The system according to any one of claims 9 to 14, characterized in that the video processing system includes a plurality of splitting models, wherein one splitting model corresponds to one video type;

The stripping unit is used to obtain a stripping model corresponding to the video type, input the video into the stripping model corresponding to the video type, and output a plurality of short videos obtained after stripping.
The system according to claim 15, characterized in that the stripping unit is used to input the video and the stripping characteristics into the stripping model corresponding to the video type, and output a plurality of strips obtained after stripping. Short video, wherein the splitting model is obtained by training a machine learning model using a sample set, the sample set includes sample input data and sample output data, wherein the sample input data includes known videos and past videos. The sample output data includes a plurality of known short videos obtained by splitting the known video using the known splitting feature.
A computing device, characterized in that the computing device includes a processor and a memory, the memory is used to store code, and the processor is used to execute the code to implement the method described in any one of claims 1 to 8. method.