WO2023197814A1 - Video processing method and system, and related device - Google Patents

Video processing method and system, and related device Download PDF

Info

Publication number
WO2023197814A1
WO2023197814A1 PCT/CN2023/081604 CN2023081604W WO2023197814A1 WO 2023197814 A1 WO2023197814 A1 WO 2023197814A1 CN 2023081604 W CN2023081604 W CN 2023081604W WO 2023197814 A1 WO2023197814 A1 WO 2023197814A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
stripping
splitting
type
user
Prior art date
Application number
PCT/CN2023/081604
Other languages
French (fr)
Chinese (zh)
Inventor
童贝
喻晓源
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023197814A1 publication Critical patent/WO2023197814A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments

Definitions

  • the present application relates to the field of computers, and in particular to a video processing method, system and related equipment.
  • Video stripping is a secondary processing of the original video content, splitting the original video content into several video clips as needed, so that users can watch the video clips they are interested in on demand. Video stripping technology can deeply mine valuable information points in long videos and help users understand videos better and faster.
  • video stripping technology is usually limited to a specific scene.
  • the video stripping technology of news videos is usually based on the title of the news, news shot changes and other features.
  • the video stripping technology of film and television dramas is usually based on the subtitles of film and television dramas. Therefore, the current video stripping technology has a single application scenario, low flexibility, and reduces the user experience.
  • This application provides a video processing method, system and related equipment to solve the problems of single application scenario, low flexibility and poor user experience of video stripping technology.
  • a video processing method includes the following steps: the video processing system obtains configuration parameters and videos from the user through a configuration interface, the configuration parameters include the video type of the video, and the video processing system outputs multiple short videos, wherein , multiple short videos are obtained after splitting the videos according to the configuration parameters.
  • the video processing system can be deployed on a computing device, which can be a bare metal server (Bare Metal Server, BMS), a virtual machine or a container.
  • BMS refers to a general physical server, such as an ARM server or an A complete computer system running in a completely isolated environment.
  • a container refers to a group of processes that are subject to resource constraints and isolated from each other.
  • the computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.
  • the above configuration interface can be an application page, a web page or an application programming interface (API) for the user to interact with the video processing system.
  • the video processing system can display the application page or web page on the client's screen. or provide API interface parameters to the user.
  • the user can use the API interface parameters to integrate the video processing system 200 into a third-party system for secondary development.
  • the above video types may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here.
  • the user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.
  • the configuration parameters at least include the video type of the video
  • select a pre-trained splitting model according to the video type to split the video input by the user and output the splitting Multiple short videos after the article, in which the article disassembly model has a corresponding relationship with the video type input by the user, so that the multiple short videos mentioned above can meet the diverse needs of the user, and the user can input the disassembly scenario in which What kind of configuration parameters can be used to achieve a video splitting model that is universal in multiple scenarios and meets user needs, and improves the user experience.
  • the configuration parameters also include stripping features, and the stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.
  • stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.
  • scene refers to splitting the video into strips according to different scenes.
  • the multiple short videos after splitting can be short videos in the same scene, such as a school documentary. If the stripping feature is scene, then the multiple short videos after splitting Among the short videos, Short Video 1 is a short video with a classroom scene in a documentary, Short Video 2 is a short video with a dormitory scene in a documentary, Short Video 3 is a short video with a playground scene in a documentary, etc. This application does not make specific limitations. .
  • Characters refer to splitting videos according to different characters.
  • the multiple short videos after splitting can be short videos of the same character. For example, in a talent show and variety show, if the split feature is a character, then among the multiple short videos after splitting , short video 1 is a performance clip of player A, short video 2 is a performance clip of player B, etc. This application does not make specific limitations.
  • Subtitles refer to splitting videos according to subtitles. Text recognition technology needs to be combined to determine the content of the split based on the semantic explanation in the subtitles. For example, in news videos, if the split feature is subtitles, then in the multiple short videos after splitting , short video 1 is a fragment of news 1, short video 2 is a fragment of news 2, etc. This application does not make specific limitations.
  • Action refers to splitting the video into strips according to different actions.
  • the multiple short videos after stripping can be short videos of the same action. For example, in a cultural evening show, if the stripping feature is action, then among the multiple short videos after stripping , short video 1 is all dance program clips, short video 2 is a singing program clip, short video 3 is a sketch program clip, etc., this application does not make specific limitations.
  • OCR refers to the need to combine image and text recognition technology to split the video. For example, it is necessary to identify the pictures in some scenes in the video to determine the meaning of the scene, such as billboards, traffic signs, etc. This application does not make specific limitations. .
  • Appearance refers to splitting the video into strips according to different appearances.
  • the multiple short videos after stripping can be short videos with the same appearance.
  • the appearance here can refer to the same appearance of clothes, the same appearance of hats, etc. This application No specific limitation is made.
  • the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here.
  • Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.
  • each split feature can be further subdivided.
  • the split feature "action” can be further subdivided into “dancing”, “running”, “conflict”, etc., and “audio” can be further subdivided. Divided into “singing", “quarrel”, etc., still taking the above example as an example, the user needs the “dancing” clip in the video, then he can select “action” in the split feature, and then select the “action” category.
  • the “dancing” feature it should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the configuration parameters also include strip stripping speed.
  • the strip-splitting speed can be a speed value or a speed range.
  • the video processing system can determine the strip-splitting speed according to the speed value or speed range input by the user.
  • the strip-splitting speed can be a previously set range, such as 0. ⁇ 1s, 1 ⁇ 5s, 5 ⁇ 10s, 10 ⁇ 15s, 15s ⁇ 20s, 20 ⁇ 30s, etc. For example, if the speed value input by the user is 3s, then the splitting speed can be 1 ⁇ 5s. If the user The input speed value is 4 ⁇ 8s, then the strip removal speed can be 5 ⁇ 10s.
  • the stripping speed can be used to strip the video, thereby further meeting the user's usage needs and improving the user experience. It should be understood that the faster the video stripping speed, the faster the stripping speed of the video. The lower the stripping accuracy, but some users' needs focus on stripping speed, and some users' needs focus on stripping accuracy. Users can choose the stripping speed according to their own needs, which can improve the user experience.
  • the video type includes an unknown type.
  • the method may also include the following steps: the video processing system performs type detection on the video, and obtains the detection type of the video. The video processing system Split the video into strips based on the detection type and configuration parameters, and output multiple short videos.
  • the video processing system performs feature detection on the video, obtains the stripping characteristics of the video, and strips the video according to the stripping characteristics.
  • the configuration interface can display the pre-set bar splitting speed to the user for selection. If the user does not select the bar splitting speed, the video processing system uses the default bar splitting speed or the user's historical bar splitting speed as user input. The splitting speed is used to split the video into strips and output multiple short videos.
  • the video processing system can detect the video, obtain its video type, stripping characteristics or stripping speed, and predict the video type, stripping characteristics or stripping speed. Users may need to split the strips to improve the user experience.
  • the video processing system includes multiple splitting models, where one splitting model corresponds to a video type.
  • the video processing system obtains the splitting model corresponding to the video type, and inputs the video into the splitting model corresponding to the video type.
  • the strip model outputs multiple short videos obtained after splitting the strips.
  • the video and stripping features are input into the stripping model corresponding to the video type, and multiple short videos obtained after stripping are output.
  • the stripping model is obtained after training the machine learning model using the sample set.
  • the sample The set includes sample input data and sample output data, where the sample input data includes known videos and known features, and the sample output data includes multiple known short videos obtained by splitting known videos using known splitting features. .
  • the video processing system can use the stripping model corresponding to the detection type to strip the video.
  • the general splitting model can be used to split the video.
  • the above universal bar splitting model can be a common bar splitting model for multiple video types.
  • one split model corresponds to one video type, which can meet the needs of users in different application scenarios.
  • the model structures used by machine learning models of different video types can be the same or different, depending on the corresponding video. Type determined.
  • the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.
  • the above-mentioned stripping model can strip videos according to different stripping characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding stripping model is variety show stripping. Model, if the user selects the splitting feature as "character" and the uploaded video is an episode of a talent show variety show, then the splitting feature can be based on the character The feature pair is split into strips, and the short video obtained can be all the performance clips of contestant A in the variety show.
  • the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features.
  • the trained splitting model can split the videos according to the splitting features input by the user.
  • the above machine learning models may include but are not limited to CNN, LSTM, Yolo model, SSD model, RCNN model or Fast-RCNN model, etc., which are not specifically limited in this application.
  • the splitting model can split videos according to different splitting characteristics, so that the usage needs of different users of the same video type can be met. It should be understood that for the same video type, different users can split videos. The focus of Tieshi is also different. For example, for variety show videos, some users only want to watch performance clips of their favorite actors and stars, some users only want to watch dancing clips, and some users only want to watch singing clips. This application Through the above configuration interface, the required splitting characteristics are obtained from the user, and the video is split into strips based on the splitting characteristics, which can meet the diverse needs of the user.
  • the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 can obtain the video according to the configuration interface.
  • the type determines the corresponding splitting model, and then determines the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then uses the speed splitting model to split the video to obtain multiple short videos.
  • the structures of the multi-speed bar splitting models under each video type can be the same or different. The details can be determined according to the actual processing situation, and are not specifically limited in this application.
  • the stripping model under each video type includes multiple speed stripping models, and different stripping speeds correspond to different speed stripping models, thereby further meeting the user needs and improving the user experience.
  • a video processing system in a second aspect, includes: an acquisition unit for acquiring configuration parameters and videos from the user through a configuration interface.
  • the configuration parameters include the video type of the video; and a splitting unit for outputting multiple short videos.
  • the configuration parameters at least include the video type of the video, select a pre-trained splitting model according to the video type to split the video input by the user, and output the splitting Multiple short videos after the article, in which the article disassembly model has a corresponding relationship with the video type input by the user, so that the multiple short videos mentioned above can meet the diverse needs of the user, and the user can input the disassembly scenario in which What kind of configuration parameters can be used to achieve a video splitting model that is universal in multiple scenarios and meets user needs, and improves the user experience.
  • the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences.
  • the configuration parameters also include stripping features, and the stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.
  • stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.
  • the configuration parameters also include strip stripping speed.
  • the system further includes a detection unit, and the video type includes an unknown type.
  • the detection unit is used to perform type detection on the video and obtain the detection type of the video;
  • the splitting unit is used to split the video into strips according to the detection type and configuration parameters, and output multiple short videos.
  • the detection unit when the configuration interface does not obtain the stripping feature input by the user, the detection unit is used to detect features of the video, obtain the stripping feature of the video, and split the video according to the stripping feature. strip.
  • the video processing system includes multiple splitting models, where one splitting model corresponds to one video type, and the splitting unit is used to obtain the splitting model corresponding to the video type, and input the video into the video type
  • the corresponding bar splitting model outputs multiple short videos obtained after splitting the bar.
  • the video and stripping features are input into a stripping model corresponding to the video type, and multiple short videos obtained after stripping are output, where the stripping model uses a sample set to perform a machine learning model Obtained after training, the sample set includes sample input data and sample output data, where the sample input data includes known videos and known features, and the sample output data includes the results obtained by splitting known videos using known splitting features. Multiple known short videos.
  • a computing device including a processor and a memory.
  • the memory is used to store codes.
  • the processor includes functions for executing each module implemented by the chip in the first aspect or any possible implementation of the first aspect. .
  • a computer storage medium stores instructions, which when run on a computing device, cause the computing device to execute the methods described in the above aspects.
  • a fifth aspect provides a program product containing instructions, including a program or instructions that, when run on a computing device, cause the computing device to perform the methods described in the above aspects.
  • Figure 1 is a schematic architectural diagram of a video processing system provided by this application.
  • Figure 2 is a schematic diagram of a bar removal model in a video processing system provided by this application.
  • Figure 3 is a schematic flow chart of the steps of a video processing method provided by this application.
  • Figure 4 is an example diagram of a configuration interface in a video processing system provided by this application.
  • Figure 5 is a schematic flow chart of the steps of the video processing system in an application scenario provided by this application;
  • Figure 6 is a schematic structural diagram of a computing device provided by this application.
  • Video splitting is a secondary processing of the original video content, splitting the original video content into several video clips as needed, so that users can watch the video clips they are interested in on demand.
  • Video stripping technology can deeply mine valuable information points in long videos and help users understand videos better and faster.
  • video stripping technology is usually limited to a specific scene.
  • the video stripping technology of news videos is usually based on the title of the news, news shot changes and other features.
  • the video stripping technology of film and television dramas is usually based on the subtitles of film and television dramas. Therefore, most video stripping technologies can only play a role in the corresponding application scenarios, resulting in a single application scenario for video stripping technology.
  • One video stripping model cannot strip multiple types of videos, resulting in the platform’s When video stripping, the video stripping model needs to be customized for each application scenario, which is costly and inefficient.
  • users have diverse needs. For example, in news video scenarios, some users need fast processing speed, some users need a large number of video splits, and some users need video splits. The analysis is precise. Some users need news videos with specific content, etc., and the video splitting technology in specific scenarios is usually common in that scenario. The model cannot meet the diverse needs of users and has low flexibility.
  • FIG. 1 is a schematic diagram of the architecture of a video processing system provided by this application.
  • the architecture includes a client 100, a video processing system 200 and a storage server. 300, wherein a communication connection can be established between the client 100, the video processing system 200 and the storage server 300, which may be a wired connection or a wireless connection, which is not specifically limited in this application.
  • the number of clients 100 and storage servers 300 may be one or more.
  • FIG. 1 takes one client 100 and one storage server 300 as an example. This application does not specifically limit this.
  • the client 100 can be run on a terminal device held by the user, which can be a computer, a smartphone, a handheld processing device, a tablet computer, a mobile notebook, an augmented reality (AR) device, or a virtual reality (virtual reality) , VR) equipment, integrated handheld devices, wearable devices, vehicle-mounted equipment, smart conference equipment, smart advertising equipment, smart home appliances, etc., there are no specific limitations here.
  • a terminal device held by the user can be a computer, a smartphone, a handheld processing device, a tablet computer, a mobile notebook, an augmented reality (AR) device, or a virtual reality (virtual reality) , VR) equipment, integrated handheld devices, wearable devices, vehicle-mounted equipment, smart conference equipment, smart advertising equipment, smart home appliances, etc., there are no specific limitations here.
  • AR augmented reality
  • VR virtual reality
  • the client 100 can be an application client, a web-based client in a browser, an application (APP, APP) client, or an application editing interface (application programming). interface, API), this application does not make specific limitations.
  • APP application programming
  • API application programming interface
  • the video processing system 200 can be deployed on a computing device, which can be a bare metal server (Bare Metal Server, BMS), a virtual machine or a container.
  • BMS refers to a general physical server, such as an ARM server or an A complete computer system running in a completely isolated environment.
  • a container refers to a group of processes that are subject to resource constraints and isolated from each other.
  • the computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.
  • the storage server 300 may be a server with a storage function.
  • the server may be a physical server such as an ARM server or an X86 server, or a virtual machine, which is not specifically limited in this application.
  • the storage server 300 may be a storage server in a video platform (such as a TV station, a video website, a live broadcast platform, etc.) or a public cloud platform, and is used to store videos to be split and short videos after splitting.
  • the video processing system 200 can also be deployed on the storage server 300.
  • the storage server 300 has the function of video stripping.
  • the video processing system 200 and the client 100 can also be deployed on the storage server 300.
  • the application is not specifically limited.
  • the client 100 can also be deployed on the storage server 300, and the video processing system can be deployed on other servers.
  • the client 100 and the video processing system 200 can be deployed on other than the storage server 300.
  • this application does not make specific limitations.
  • the client 100 can upload the video to the video processing system 200 to split the video into strips.
  • the video processing system 200 splits the video into strips to obtain multiple short videos, and then returns them to the client 100, or, Store it in the storage server 300.
  • the storage server 300 can also send the video to the video processing system 200 for video splitting.
  • the video processing system splits the video to obtain multiple short videos, and then returns them to the storage server. 300, or return it to the client 100 for use.
  • the details can be determined according to the actual application scenario, and are not specifically limited in this application.
  • the video processing system 200 can also be deployed in a public cloud to provide users with cloud services for video stripping.
  • users can check the box when purchasing a content delivery network (CDN) service.
  • CDN content delivery network
  • the public cloud platform can use the video processing system 200 to split some videos spread in the CDN network according to user needs. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the video processing system 200 can be divided into multiple unit modules, and each unit module can be a software module or a hardware module, or can be part software module and part hardware module, which is not specifically limited in this application.
  • FIG. 1 is an exemplary division method. As shown in FIG. 1 , the video processing system 200 may include an acquisition unit 210 , a strip splitting unit 220 and a strip splitting model 230 .
  • the acquisition unit 210 is used to obtain configuration parameters and videos from the user through the configuration interface, where the video may be a long video that the user needs to split, such as an episode of a variety show, a documentary, an interview program, etc.
  • the configuration interface may be an application page, web page or API for the user to interact with the video processing system 200.
  • the video processing system 200 may display the application page or web page on the screen of the client 100, or
  • the API interface parameters are provided to the user, and the user can use the API interface parameters to integrate the video processing system 200 into the third-party system for secondary development.
  • the above-mentioned users can be users who use the stripping service.
  • video website users can input configuration parameters and videos through the application page or web page to use the video stripping service of the video website to split different types of videos.
  • video stripping the above example is used for illustration and is not specifically limited in this application.
  • the above-mentioned users can also be development users who integrate the stripping service into a third-party system for secondary development.
  • the configuration interface can be the console of the public cloud platform. ) or API.
  • the console can be a web-based service management system. Users can purchase cloud services through the console and connect to cloud service instances with the function of the video processing system 200.
  • the API can be integrated by users into third-party systems for secondary development, such as The short video platform can establish a connection between the API of this configuration interface and the internal server used to store long videos, so that long videos uploaded by users can automatically be split into strips through this API interface. It should be understood that the above example is for illustration. Applications are not subject to specific restrictions.
  • the configuration interface can be the console of the video website. Users can upload videos through the console of the video website and enter the above configuration parameters for the video processing system 200 in the video website to use according to the configuration parameters. This configuration parameter splits the video into strips to obtain multiple short videos. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the configuration parameter may include a video type of the video
  • the video type may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here.
  • the user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.
  • the video type includes an unknown type, where the unknown type may refer to a video type that the user cannot determine, or the unknown type may refer to the user not inputting the video type, that is, the configuration interface does not obtain the video type.
  • the detection unit 240 may perform type detection on the video to obtain the detection type of the video.
  • the stripping unit 220 of the video processing system 200 can strip the video according to the configuration parameters and detection type, and output multiple short videos.
  • the configuration parameters may also include stripping features, which may include one or more of scenes, characters, audio, subtitles, actions, optical character recognition (optical character recognition, OCR), and appearance.
  • stripping features may include one or more of scenes, characters, audio, subtitles, actions, optical character recognition (optical character recognition, OCR), and appearance.
  • scene refers to splitting the video into strips according to different scenes.
  • the multiple short videos after splitting can be short videos in the same scene, such as a school documentary. If the stripping feature is scene, then the multiple short videos after splitting Short video, short video 1 It is a short video with a classroom scene in the documentary, short video 2 is a short video with a dormitory scene in the documentary, short video 3 is a short video with the playground scene in the documentary, etc.
  • This application does not make specific limitations.
  • Characters refer to splitting videos according to different characters.
  • the multiple short videos after splitting can be short videos of the same character. For example, in a talent show and variety show, if the split feature is a character, then among the multiple short videos after splitting , short video 1 is a performance clip of player A, short video 2 is a performance clip of player B, etc. This application does not make specific limitations.
  • Subtitles refer to splitting videos according to subtitles. Text recognition technology needs to be combined to determine the content of the split based on the semantic explanation in the subtitles. For example, in news videos, if the split feature is subtitles, then in the multiple short videos after splitting , short video 1 is a fragment of news 1, short video 2 is a fragment of news 2, etc. This application does not make specific limitations.
  • Action refers to splitting the video into strips according to different actions.
  • the multiple short videos after stripping can be short videos of the same action. For example, in a cultural evening show, if the stripping feature is action, then among the multiple short videos after stripping , short video 1 is all dance program clips, short video 2 is a singing program clip, short video 3 is a sketch program clip, etc., this application does not make specific limitations.
  • OCR refers to the need to combine image and text recognition technology to split the video. For example, it is necessary to identify the pictures in some scenes in the video to determine the meaning of the scene, such as billboards, traffic signs, etc. This application does not make specific limitations. .
  • Appearance refers to splitting the video into strips according to different appearances.
  • the multiple short videos after stripping can be short videos with the same appearance.
  • the appearance here can refer to the same appearance of clothes, the same appearance of hats, etc. This application No specific limitation is made.
  • the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here.
  • Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.
  • each split feature can be further subdivided.
  • the split feature "action” can be further subdivided into “dancing”, “running”, “conflict”, etc., and “audio” can be further subdivided. Divided into “singing", “quarrel”, etc., still taking the above example as an example, the user needs the “dancing” clip in the video, then he can select “action” in the split feature, and then select the “action” category.
  • the “dancing” feature it should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the configuration interface can display a variety of stripping characteristics to the user for selection. If the user cannot determine the stripping characteristics or the user does not select the stripping characteristics, that is, if the configuration interface does not obtain the stripping characteristics, the detection Unit 240 can detect the video and obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or determine the feature type input by the user in the video type. , this application does not make specific limitations.
  • the configuration parameters may also include a strip-splitting speed, which may be a speed value or a speed range.
  • the video processing system 200 may determine the strip-splitting speed according to the speed value or speed range input by the user.
  • the strip-splitting speed may be a speed value or a speed range input by the user. It can be a previously set range, such as 0 ⁇ 1s, 1 ⁇ 5s, 5 ⁇ 10s, 10 ⁇ 15s, 15s ⁇ 20s, 20 ⁇ 30s, etc. For example, if the speed value input by the user is 3s, then The speed of stripping can be 1 ⁇ 5s. If the speed value input by the user is 4 ⁇ 8s, then the speed of stripping can be 5 ⁇ 10s.
  • the configuration interface can display the preset stripping speed to the user for the user to choose. If the user does not select the stripping speed, the detection unit 240 can use the default stripping speed or the user's historical stripping speed to evaluate the video. There are no specific limitations in this application for strip stripping.
  • the configuration interface may include a parameter interface and a video interface.
  • the parameter interface obtains the configuration parameters
  • the video interface obtains the user's video.
  • the user can first configure the parameters through the parameter interface, and then upload multiple videos through the video interface.
  • Line splitting for example, the user sets the video type to TV series, the splitting feature to actor A, the splitting speed to 1 to 5 seconds, and then uploads 24 episodes of a certain TV series in sequence.
  • the video processing system 200 can process 24 episodes according to the configuration parameters.
  • the video is split into strips in turn, and a short video of each episode is output.
  • the content in the short video is a performance clip of actor A.
  • the stripping unit 220 is used to output multiple short videos, wherein the multiple short videos are obtained by stripping the videos according to the above configuration parameters.
  • the strip splitting unit 220 can output multiple short videos to the user client 100, and can also output multiple short videos to the storage server 300, which is not specifically limited in this application.
  • the video processing system 200 may include multiple splitting models 230, and one splitting model corresponds to one video type.
  • the stripping unit 220 can determine the corresponding stripping model according to the video type obtained by the configuration interface, use the stripping model to strip the video, and output multiple short videos.
  • video types may include film and television dramas, variety shows, news, and documentaries.
  • the split model 1 corresponds to the film and television drama type
  • the split model 2 corresponds to the variety show type
  • the split model 3 corresponds to the news type
  • the split model 4 corresponds to the documentary type.
  • the above-mentioned bar splitting model can be obtained by training the machine learning model using sample sets of different video types.
  • a bar splitting model of the film and television drama video type is obtained after training the machine learning model with a sample set of film and television drama types.
  • the news video type stripping model is obtained, and by analogy, the stripping models of multiple video types are obtained.
  • model structures used by machine learning models of different video types can be the same or different, and the details can be determined according to the corresponding video types.
  • the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.
  • Video stripping usually does not focus on character movements and scene changes, but focuses more on subtitles or audio.
  • the machine learning model structure used in the video matching can focus on the extraction and recognition of speech and text features, rather than on the subtitles or audio.
  • the model structures used by these two video types of machine learning models can be the same or similar.
  • the above-mentioned splitting model can split the video according to different splitting characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding splitting model is a variety show splitting model. If the user If the selected stripping feature is "character" and the uploaded video is an episode of a talent show variety show, then the stripping feature can be split according to the character characteristics, and the short video obtained can be all the performances of contestant A in the variety show. fragment. If the stripping feature selected by the user is "action”, then the stripping feature can split the video according to the action feature.
  • the short video obtained after splitting can be a video clip in which the action appears in the variety show. For example, the action user sets is "dance”, then the short video obtained after splitting the strip can be a collection of performances by dancers in the variety show. It should be understood that the above examples are for illustration and are not specifically limited in this application.
  • the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features.
  • the trained splitting model can split the videos according to the splitting features input by the user.
  • the above machine learning models may include but are not limited to convolutional neural networks (CNN) models, long short-term memory networks (LSTM) models, one-stage unified real-time target detection (you only look once: unified, real-time object detection (Yolo) model, single shot multi box detector (SSD) model, region convolutional neural network (RCNN) model or fast region Convolutional neural network (fast region convolutional neural network, Fast-RCNN) model, etc. are not specifically limited in this application.
  • CNN convolutional neural networks
  • LSTM long short-term memory networks
  • RCNN region convolutional neural network
  • Fast-RCNN fast region Convolutional neural network
  • the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 may determine the corresponding stripping model according to the video type obtained by the configuration interface. Split the model, and then determine the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then use the speed splitting model to split the video to obtain multiple short videos.
  • the structures of the multi-speed bar splitting models under each video type can be the same or different. The details can be determined according to the actual processing situation, and are not specifically limited in this application.
  • FIG. 2 is an example diagram of a stripping model stored in the video processing system provided by this application.
  • the multiple stripping models 230 in the video processing system 200 shown in FIG. 1 can be as shown in FIG. 2 Among them, the strip model 11, the strip model 12, the strip model 21 and the strip model 22, among which the video types of the strip model 11 and the strip model 12 are type 1, and the videos of the strip model 21 and the strip model 22 are The type is type 2, the strip stripping speeds of the strip stripping models 11 and 21 are speed 1, and the strip stripping speeds of the strip stripping models 12 and 22 are speed 2.
  • each splitting model can correspond to different video types and video speeds, and the corresponding splitting model can be selected according to the configuration parameters input by the user for video splitting.
  • the input data of each splitting model includes the video to be split and splitting features.
  • the output data is multiple short videos obtained after splitting the video using the splitting features, such as splitting feature 1 and video input splitting. After model 11, multiple short videos of split feature 1 are obtained. After split feature 2 and video are input into split model 11, multiple short videos of split feature 2 are obtained, and so on. I will not go into details here. For example, if the video type selected by the user through the configuration interface is video type 1, the stripping feature is stripping feature 2, and the stripping speed is speed 2, then the video processing system can select the configuration parameters in Figure 2 based on the configuration parameters input by the user.
  • the stripping feature 2 and the video into the stripping model 12 By inputting the stripping feature 2 and the video into the stripping model 12, multiple short videos of the stripping feature 2 can be obtained. In this way, the multiple short videos finally output are obtained by splitting the videos based on the video type and the splitting characteristics and splitting speed required by the user. This meets the diverse needs of the users to the greatest extent and improves the user experience.
  • FIG. 2 is used for illustration.
  • the stripping model 230 in the video processing system 200 may also include more or less video types, video speeds, and stripping characteristics, which are not specifically limited in this application.
  • the video processing system obtains the configuration parameters input by the user through the configuration interface.
  • the configuration parameters at least include the video type of the video.
  • a pre-trained splitting model is selected to split the video input by the user. strips, and outputs multiple short videos after splitting.
  • the splitting model has a corresponding relationship with the video type input by the user, so that the above multiple short videos can meet the diverse needs of users, and what scenarios the users need to perform in For disassembly, you enter the configuration parameters to achieve a video strip dismantling model that is universal in multiple scenarios and meets user needs, and improves the user experience.
  • Figure 3 is a schematic flow chart of the steps of a video processing method provided by this application. The method can be applied to the video processing system 200 shown in Figure 1. As shown in Figure 3, the method can include the following steps:
  • Step S310 The video processing system 200 obtains configuration parameters and videos from the user through the configuration interface. Among them, video can Therefore, users need to split long videos, such as a TV series, a variety show, an interview video, a documentary, etc.
  • the video processing system 200 can be deployed on a server or a public cloud.
  • the server can be one of a physical server, a virtual machine, a container, and an edge computing device.
  • edge computing device For specific deployment methods, refer to the video processing system in the embodiment of Figure 1 The description of 200 will not be repeated here.
  • the above configuration interface may be an application page, web page or API for the user to interact with the video processing system 200.
  • the video processing system 200 may display the application page or web page on the screen of the client 100, Or the API interface parameters are provided to the user, and the user can use the API interface parameters to integrate the video processing system 200 into the third-party system for secondary development.
  • the configuration interface in the embodiment of Figure 1, which will not be repeated here.
  • the configuration parameter may include a video type
  • the video type may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here.
  • the user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.
  • the video type includes an unknown type, where the unknown type may refer to a video type that the user cannot determine, or the unknown type may refer to the user not inputting the video type, that is, the configuration interface does not obtain the video type.
  • the video processing system 200 may perform type detection on the video to obtain the detected type of the video.
  • the video processing system 200 can split the video into strips according to the configuration parameters and detection type, and output multiple short videos.
  • the configuration parameters may also include stripping features, which may include one or more of scenes, characters, audio, subtitles, actions, OCR, and appearance.
  • stripping features may include one or more of scenes, characters, audio, subtitles, actions, OCR, and appearance.
  • the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here. Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.
  • each split feature can be further subdivided.
  • the split feature "action” can be further subdivided into “dancing”, “running”, “conflict”, etc., and “audio” can be further subdivided. Divided into “singing", “quarrel”, etc., still taking the above example as an example, if the user needs a “dancing” clip in the video, then he can select “action” in the split feature, and then select “action” under the category
  • the split feature can be further subdivided.
  • the configuration interface can display a variety of stripping characteristics to the user for selection. If the user cannot determine the stripping characteristics or the user does not select the stripping characteristics, that is, the configuration interface does not obtain the stripping characteristics, the video The processing system 200 can detect the video and obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or determine the feature input by the user in the video type. type, and then split the video according to the splitting characteristics, which is not specifically limited in this application.
  • the configuration parameters may also include a strip-splitting speed, which may be a speed value or a speed range.
  • the video processing system 200 may determine the strip-splitting speed according to the speed value or speed range input by the user.
  • the strip-splitting speed may be a speed value or a speed range input by the user. It can be a previously set range, such as 0 ⁇ 1s, 1 ⁇ 5s, 5 ⁇ 10s, 10 ⁇ 15s, 15s ⁇ 20s, 20 ⁇ 30s, etc. For example, if the speed value input by the user is 3s, then The speed of stripping can be 1 ⁇ 5s. If the speed value input by the user is 4 ⁇ 8s, then the speed of stripping can be 5 ⁇ 10s.
  • the configuration interface can display the preset bar splitting speed to the user for the user to choose. If the user does not select the bar splitting speed, the video processing system 200 can use the default bar splitting speed or the user's historical bar splitting speed. The video is divided into strips, and this application does not impose specific restrictions.
  • Step S320 The video processing system 200 outputs multiple short videos, wherein the multiple short videos are obtained by splitting the videos according to the configuration parameters.
  • the video processing system 200 can output the multiple short videos to the client 100, or Multiple short videos can be output to the storage server 300, which is not specifically limited in this application.
  • the storage server 300 For specific descriptions of the client 100 and the storage server 300, reference can be made to the embodiment of FIG. 1, and the details will not be repeated here.
  • the video processing system 200 may include multiple splitting models, and one splitting model corresponds to one video type.
  • the stripping unit 220 can determine the corresponding stripping model according to the video type obtained by the configuration interface, use the stripping model to strip the video, and output multiple short videos.
  • the above-mentioned bar splitting model can be obtained by training the machine learning model using sample sets of different video types.
  • a bar splitting model of the film and television drama video type is obtained after training the machine learning model with a sample set of film and television drama types.
  • the news video type stripping model is obtained, and by analogy, the stripping models of multiple video types are obtained.
  • model structures used by machine learning models of different video types can be the same or different, and the details can be determined according to the corresponding video types.
  • the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.
  • the video processing system 200 can use the splitting model corresponding to the detection type to split the video. strips to output multiple short videos; if the video processing system 200 fails to perform type detection on the video and does not obtain the detection type of the video, or the type detection is successful but the confidence of the detection type is very low, at this time, the general stripping model can be used to detect the video
  • the above universal splitting model can be a splitting model common to multiple video types.
  • the above-mentioned splitting model can split the video according to different splitting characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding splitting model is a variety show splitting model. If the user If the selected stripping feature is "character" and the uploaded video is an episode of a talent show variety show, then the stripping feature can be split according to the character characteristics, and the short video obtained can be all the performances of contestant A in the variety show. fragment.
  • the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features.
  • the trained splitting model can split the videos according to the splitting features input by the user.
  • the above machine learning models may include but are not limited to CNN, LSTM, Yolo model, SSD model, RCNN model or Fast-RCNN model, etc., which are not specifically limited in this application.
  • the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 may determine the corresponding stripping model according to the video type obtained by the configuration interface. Split the model, and then determine the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then use the speed splitting model to split the video to obtain multiple short videos.
  • multiple speed split modes under each video type The structures of the types can be the same or different, and the details can be determined according to actual processing conditions, and are not specifically limited in this application.
  • FIG. 2 is an example diagram of a stripping model stored in the video processing system provided by this application.
  • the video processing system 200 shown in FIG. 1 includes stripping models 230 for multiple videos, such as in FIG. 2.
  • Splitting model 11, splitting model 12, splitting model 21 and splitting model 22 is type 1
  • the video type of splitting model 11 and splitting model 12 is type 1
  • the video type of splitting model 21 and splitting model 22 is type 1.
  • the video type is type 2, the strip-splitting speeds of strip-splitting models 11 and 21 are speed 1, and the strip-splitting speeds of strip-splitting models 12 and 22 are speed 2.
  • each splitting model can correspond to different video types and video speeds, and the corresponding splitting model can be selected according to the configuration parameters input by the user for video splitting.
  • the input data of each splitting model includes the video to be split and the splitting features.
  • the output data is multiple short videos obtained after splitting the video using the splitting features, such as splitting feature 1 and video input splitting. After model 11, multiple short videos of split feature 1 are obtained. After split feature 2 and video are input to split model 11, multiple short videos of split feature 2 are obtained, and so on. I will not go into details here. For example, if the video type selected by the user through the configuration interface is video type 1, the stripping feature is stripping feature 2, and the stripping speed is speed 2, then the video processing system can select the configuration parameters in Figure 2 based on the configuration parameters input by the user.
  • the stripping feature 2 and the video into the stripping model 12 By inputting the stripping feature 2 and the video into the stripping model 12, multiple short videos of the stripping feature 2 can be obtained. In this way, the multiple short videos finally output are obtained by splitting the videos based on the video type and the splitting characteristics and splitting speed required by the user. This meets the diverse needs of the users to the greatest extent and improves the user experience.
  • FIG. 2 is used for illustration.
  • the stripping model 230 in the video processing system 200 may also include more or less video types, video speeds, and stripping characteristics, which are not specifically limited in this application.
  • the video processing system obtains the configuration parameters input by the user through the configuration interface.
  • the configuration parameters at least include the video type of the video.
  • a pre-trained splitting model is selected to split the video input by the user. strips, and outputs multiple short videos after splitting.
  • the splitting model has a corresponding relationship with the video type input by the user, so that the above multiple short videos can meet the diverse needs of users, and what scenarios the users need to perform in For disassembly, you enter the configuration parameters to achieve a video strip dismantling model that is universal in multiple scenarios and meets user needs, and improves the user experience.
  • step process described in the above steps S310 to S320 is illustrated below with reference to the specific application scenarios shown in FIGS. 4 to 5 .
  • FIG. 4 illustrates an example diagram of a configuration interface.
  • the configuration interface is a console in the form of a web page or an application program.
  • the console can be a console of a public cloud platform. It should be understood that Figure 4 is used for illustration.
  • the console can also be the console of a non-public cloud platform, and the configuration interface can also be in the form of an API, which is not specifically limited in this application.
  • the web page or application program interface of the configuration interface at least includes a video type selection area 410 , a splitting feature selection area 420 , a splitting speed selection area 430 , an upload video area 440 and a control area 450 .
  • the video type selection area 410 is used for the user to select the video type of the video.
  • the video type selection area 410 in Figure 4 shows the user "movies and TV series", “news”, “variety shows", "unknown” "Type” and other video types, it should be understood that the configuration interface can also display more video types to the user. For example, dragging down the progress bar in the video type selection area 410 in Figure 4 can display more types of video types.
  • the video type selection area 410 can set the video type to the "unknown type” option by default. Alternatively, if the user cannot determine the video type, he or she can also select the video type in the video type selection area 410. 410, the "unknown type” option is selected, and the video processing system 200 can perform processing on the video uploaded by the user. Video type detection, splitting the video according to the detection type and configuration parameters input by the user (such as splitting characteristics and splitting speed).
  • the stripping feature selection area 420 is used for the user to select stripping features of the video.
  • the stripping feature selection area 420 in Figure 4 shows the user "Characters”, “Scenes”, “Subtitles”, and “Others” Waiting for the stripping feature
  • the configuration interface can also display more stripping features to the user. For example, dragging down the progress bar in the stripping feature selection area 420 in Figure 4 can display more types of stripping features. feature.
  • each strip feature can be further subdivided.
  • the strip feature "character” can be further divided into “actors”, “actresses”, “upload photos of actors”, etc.
  • the multiple short videos obtained after splitting the video can be video clips that include male actors in the video.
  • the video will be splitted.
  • the multiple short videos obtained in the end can be video clips including actresses in the video.
  • upload photos of actors is selected as the splitting feature, the user can upload a screenshot of an actor in the video, and multiple videos can be obtained after splitting the video.
  • a short video can be a video clip including the actor.
  • the stripping feature selection area 420 can set the stripping feature to the "other" option by default, and the video processing system 200 can detect the video uploaded by the user. , to obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or the feature type historically input by the user, which is not specifically limited in this application.
  • the stripping speed selection area 430 is used for the user to select the stripping speed of the video.
  • the stripping speed selection area 430 in Figure 4 shows the user "0 to 1 second”, “2 to 5 seconds”, “ “6-10 seconds”, “11-15 seconds”, “Others” and other bar-splitting speeds.
  • the configuration interface can also show the user more bar-splitting speeds, such as dragging down to select the bar-splitting speed in Figure 4.
  • the progress bar in area 430 can display more types of bar splitting speeds.
  • the stripping speed selection area 430 can set the stripping speed to the "other" option by default, and the video processing system 200 can detect the video uploaded by the user. , to obtain the stripping speed of the video, where the stripping speed can be the most commonly used stripping speed under the video type and stripping characteristics, or the stripping speed historically input by the user, and is not specifically limited.
  • the upload video area 440 is used for users to upload videos, which are long videos to be split.
  • the video is a video uploaded by the user to the object storage service (OBS), and the video processing system 200 can download the video from the OBS bucket bound by the user.
  • OBS object storage service
  • Video splitting does not specifically limit this.
  • the control area 450 includes a "Save Configuration” control and a “Start Splitting” control, where the "Save Configuration” is used to save the user input in the video type selection area 410, the splitting feature selection area 420, and the splitting speed selection area 430.
  • Parameter configuration the "Start splitting” control is used to respond to the user's operation and start splitting the video using the above parameter configuration.
  • the video type is "movie and TV drama”
  • the teardown feature is "upload actor photos” (assuming that the uploaded actor photo is actor A)
  • the teardown feature The speed is "2 to 5 seconds”.
  • FIG. 5 is a schematic flowchart of steps of a video processing method in an application scenario provided by this application.
  • the application scenario may be the application scenario shown in FIG. 4 .
  • the method may include the following steps:
  • Step 1 Enter the video, where the video is a long video input by the user to be split.
  • Step 2 Determine if the user selects the video type. In the case of yes, determine the splitting model corresponding to the video type selected by the user. In the application scenario shown in Figure 4, the video type selected by the user is "movie and television drama", so the step in the flow chart shown in Figure 5 After determining the video type in step 2, perform step 3.
  • Step 3 Obtain the film and television drama type stripping model.
  • the video processing system 200 includes multiple stripping models, wherein one stripping model corresponds to one video type.
  • the user The selected video type is "film and television drama", so step 3 obtains the strip model of the film and television drama type.
  • Step 4. Determine whether the user selects the strip feature. If yes, perform step 5. If no, perform step 9 and step 5. It should be understood that in the application scenario shown in Figure 4, the stripping feature selected by the user is "person", so the stripping feature determined in step 4 of the flowchart shown in Figure 5 is "person".
  • Step 5 Determine whether the user selects the bar splitting speed. If yes, perform step 6. If no, perform step 10 and step 6. It should be understood that the strip stripping speed selected by the user in the application scenario shown in Figure 4 is "2 to 5 seconds", so the strip stripping speed determined in step 5 of the flow chart shown in Figure 5 is "2 to 5 seconds".
  • Step 6 Select the corresponding stripping model to strip the video.
  • the bar splitting model selected in step 6 corresponds to the configuration parameters of the user data.
  • the configuration parameters include "movie and TV drama” (video type), “character” (bar splitting characteristics) and “2 ⁇ 5 seconds” (bar splitting speed) .
  • the video processing system 200 includes a plurality of splitting videos, each splitting video corresponds to a video type and a splitting speed.
  • the splitting model under the film and television drama type may include "0 to 1 second" splitting.
  • the splitting speed you can choose the movie and TV drama splitting model corresponding to the "2 ⁇ 5" second splitting speed, and use this splitting model to split the video.
  • the sample set used includes sample input data and sample output data, where the sample input data includes known videos and known features.
  • the sample output data includes multiple known short videos obtained by splitting the known videos using the known splitting features.
  • the trained splitting model can split the videos according to the splitting features input by the user. .
  • multiple short videos containing the character can be output, such as If the user selects an image of actor A that he uploaded himself, then the multiple short videos output can include multiple short videos of actor A.
  • Step 7 Detect the video type. It should be understood that if the user does not select the video type in step 2 or the user selects an unknown type, step 7 can be performed to detect the video type. If the detection is successful, perform steps 3 and 4. Determine the splitting model corresponding to the detection type. If the detection fails, perform steps 8 and 4.
  • Step 8 Obtain a general model. It is understandable that the categories of some videos are not very clear and may not be able to detect the video type of the video. In this case, a general splitting model can be used to detect the video.
  • Step 9 The system selects the stripping feature. It should be understood that if the user does not select the stripping feature in step 4, step 9 can be performed, and the system detects the stripping features commonly used in this video type, or the stripping features selected by the user in history. Features, etc., then proceed to step 5.
  • Step 10 The system selects the strip splitting speed. It should be understood that if the user does not select the stripping speed in step 5, step 10 can be performed, and the system detects the stripping speed commonly used under the video type and stripping characteristics, or the stripping speed selected by the user in history, etc., and then Go to step 6.
  • the video processing method provided by this application obtains the configuration parameters input by the user through the configuration interface.
  • the configuration parameters at least include the video type of the video.
  • the pre-trained splitting model is selected to view the user input. Frequently perform splitting and output multiple short videos after splitting.
  • the splitting model has a corresponding relationship with the video type input by the user, thereby realizing a video splitting model that is universal in multiple scenarios and meets user needs, improving User experience.
  • FIG. 6 is a schematic structural diagram of a computing device provided by this application.
  • the video processing system 200 described in the embodiments of FIGS. 1 to 5 can be deployed on the computing device 600 shown in FIG. 6 .
  • the computing device 600 includes a processor 601, a storage unit 602, a storage medium 603 and a communication interface 604, wherein the processor 601, the storage unit 602, the storage medium 603 and the communication interface 604 communicate through the bus 605 and also through wireless transmission. and other means to achieve communication.
  • the computing device can be a BMS, virtual machine or container.
  • BMS refers to a general physical server, such as an ARM server or an A system
  • a container refer to a group of processes that are subject to resource constraints and isolated from each other.
  • the computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.
  • the processor 601 is composed of at least one general-purpose processor, such as a CPU, an NPU, or a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip is an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination thereof.
  • the above-mentioned PLD is a complex programmable logic device (Complex Programmable Logic Device, CPLD), a field-programmable gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL) or any combination thereof.
  • the processor 601 executes various types of digital storage instructions, such as software or firmware programs stored in the storage unit 602, which enables the computing device 600 to provide a wide variety of services.
  • the processor 601 includes one or more CPUs, such as CPU0 and CPU1 shown in FIG. 6 .
  • the computing device 600 also includes multiple processors, such as the processor 601 and the processor 606 shown in FIG. 6 .
  • processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU).
  • a processor here refers to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the storage unit 602 is used to store program codes, and is controlled and executed by the processor 601 to perform the processing steps of the video processing system 200 in any of the above embodiments in FIGS. 1 to 6 .
  • the program code includes one or more software units.
  • the one or more software units mentioned above are the acquisition unit and strip-splitting unit in the embodiment of Figure 1.
  • the user of the acquisition unit provides a configuration interface to the user, and the strip-splitting unit is used to configure according to the configuration. Parameters to split the video into strips. For specific implementation methods, refer to the embodiments in Figures 1 to 5, which will not be described again here.
  • Storage unit 602 includes read-only memory and random access memory, and provides instructions and data to processor 601. Storage unit 602 also includes non-volatile random access memory. Storage unit 602 is volatile memory or non-volatile memory, or includes both volatile and non-volatile memory. Among them, non-volatile memory is read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically erasable programmable read-only memory. memory (electrically EPROM, EEPROM) or flash memory. Volatile memory is random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • SRAM dynamic random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory direct rambus RAM, DR RAM
  • hard disk hard disk
  • U disk universal serial bus, USB
  • flash flash
  • SD card secure digital memory Card, SD card
  • memory stick etc.
  • the hard disk is a hard disk drive (HDD) , solid state disk (SSD), mechanical hard disk (HDD), etc., this application does not make specific limitations.
  • Storage medium 603 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc.
  • the hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application.
  • the communication interface 604 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.
  • a wired interface such as an Ethernet interface
  • an internal interface such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface
  • PCIe Peripheral Component Interconnect express
  • Ethernet interface such as an Ethernet interface
  • a wireless interface such as a cellular interface
  • Bus 605 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express). link, CXL), cache coherent interconnect for accelerators, CCIX, etc.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • unified bus unified bus
  • Ubus or UB unified bus
  • CXL computer express link
  • cache coherent interconnect for accelerators CCIX, etc.
  • the bus 605 is divided into an address bus, a data bus, a control bus, etc.
  • bus 605 also includes a power bus, a control bus, a status signal bus, etc.
  • bus 605 also includes a power bus, a control bus, a status signal bus, etc.
  • the various buses are labeled bus 605 in the figure.
  • FIG. 6 is only a possible implementation manner of the embodiment of the present application.
  • the computing device 600 may also include more or fewer components, which is not limited here.
  • contents not shown or described in the embodiments of the present application please refer to the relevant explanations in the embodiments of FIGS. 1 to 5 , and will not be described again here.
  • Embodiments of the present application provide a computer storage medium in which instructions are stored; when the instructions are run on a computing device, the computing device is caused to execute the video processing method described in the embodiments of FIGS. 1 to 5 .
  • Embodiments of the present application provide a program product containing instructions, including programs or instructions.
  • the program or instructions When the program or instructions are run on a computing device, the computing device executes the video processing method described in the embodiments of FIGS. 1 to 5 .
  • the above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments are implemented in whole or in part in the form of a computer program product.
  • a computer program product includes at least one computer instruction.
  • the computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device.
  • Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center.
  • Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection.
  • the media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media.
  • the semiconductor medium is SSD.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The present application provides a video processing method and system and a related device; the method comprises the following steps: a video processing system obtaining configuration parameters and a video from a user via a configuration interface, the configuration parameters comprising the video type of the video; then the video processing system outputting a plurality of short videos, wherein the plurality of short videos are obtained following splitting of the video according to the configuration parameters, thus making it so that the plurality of short videos satisfy various requirements of the user; the types of configuration parameters inputted by the user are determined by the type of scenario under which segmentation is necessary, thus achieving a video splitting model which can be universally used across a plurality of scenarios and which satisfies user requirements, thus enhancing the utilization experience of the user.

Description

一种视频处理方法、系统及相关设备A video processing method, system and related equipment
本申请要求于2022年04月13日提交中国专利局、申请号为202210384788.5、申请名称为“一种视频处理方法、系统及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the China Patent Office on April 13, 2022, with application number 202210384788.5 and the application title "A video processing method, system and related equipment", the entire content of which is incorporated by reference. in this application.
技术领域Technical field
本申请涉及计算机领域,尤其涉及一种视频处理方法、系统及相关设备。The present application relates to the field of computers, and in particular to a video processing method, system and related equipment.
背景技术Background technique
随着短视频内容的急剧增长,视频观看者对于长视频的观看耐心程度逐渐降低。为了给与视频观看者更加精彩的视频锦集,以及提高用户碎片化时间的利用率,视频拆条技术应运而生。视频拆条是对原始视频内容的二次加工,将原始视频内容按需拆分成若干视频片段,如此,用户可按需观看感兴趣的视频片段。视频拆条技术可深度挖掘长视频中有价值的信息点,帮助用户更好更快地理解视频。With the rapid growth of short video content, video viewers’ patience for watching long videos is gradually decreasing. In order to provide video viewers with more exciting video highlights and improve the utilization of users' fragmented time, video splitting technology came into being. Video stripping is a secondary processing of the original video content, splitting the original video content into several video clips as needed, so that users can watch the video clips they are interested in on demand. Video stripping technology can deeply mine valuable information points in long videos and help users understand videos better and faster.
但是,视频拆条技术通常局限于某一特定场景,比如新闻视频的视频拆条技术通常基于新闻的标题、新闻镜头变换等特征来实现,影视剧的视频拆条技术通常基于影视剧的字幕来实现,因此,当前视频拆条技术的应用场景单一,灵活性低,降低用户的使用体验。However, video stripping technology is usually limited to a specific scene. For example, the video stripping technology of news videos is usually based on the title of the news, news shot changes and other features. The video stripping technology of film and television dramas is usually based on the subtitles of film and television dramas. Therefore, the current video stripping technology has a single application scenario, low flexibility, and reduces the user experience.
发明内容Contents of the invention
本申请提供了一种视频处理方法、系统及相关设备,用于解决视频拆条技术应用场景单一、灵活性低、用户使用体验差的问题。This application provides a video processing method, system and related equipment to solve the problems of single application scenario, low flexibility and poor user experience of video stripping technology.
第一方面,提供了一种视频处理方法,该方法包括以下步骤:视频处理系统通过配置接口向用户获取配置参数和视频,配置参数包括视频的视频类型,视频处理系统输出多个短视频,其中,多个短视频是根据配置参数对视频进行拆条后获得的。In a first aspect, a video processing method is provided, which method includes the following steps: the video processing system obtains configuration parameters and videos from the user through a configuration interface, the configuration parameters include the video type of the video, and the video processing system outputs multiple short videos, wherein , multiple short videos are obtained after splitting the videos according to the configuration parameters.
具体实现中,视频处理系统可以部署于计算设备上,该计算设备可以是裸金属服务器(Bare Metal Server,BMS)、虚拟机或容器。其中,BMS指的是通用的物理服务器,例如,ARM服务器或者X86服务器;虚拟机指的是网络功能虚拟化(Network Functions Virtualization,NFV)技术实现的、通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统,容器指的是一组受到资源限制,彼此间相互隔离的进程,计算设备还可以是边缘计算设备、存储服务器或者存储阵列,本申请不作具体限定。In specific implementation, the video processing system can be deployed on a computing device, which can be a bare metal server (Bare Metal Server, BMS), a virtual machine or a container. Among them, BMS refers to a general physical server, such as an ARM server or an A complete computer system running in a completely isolated environment. A container refers to a group of processes that are subject to resource constraints and isolated from each other. The computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.
上述配置接口可以是用户与视频处理系统进行交互的一个应用程序页面、网页页面或者应用程序编辑接口(application programming interface,API),视频处理系统可以将该应用程序页面或者网页页面显示在客户端的屏幕上,或者将API接口参数提供给用户,用户可使用该API接口参数将视频处理系统200集成到第三方系统进行二次开发。The above configuration interface can be an application page, a web page or an application programming interface (API) for the user to interact with the video processing system. The video processing system can display the application page or web page on the client's screen. or provide API interface parameters to the user. The user can use the API interface parameters to integrate the video processing system 200 into a third-party system for secondary development.
上述视频类型可包括影视剧、综艺、新闻、纪录片、访谈、体育、动漫以及会议中的一种或者多种。应理解,视频类型的划分可根据用户的业务场景划分出更多的分类,这里不一一举例说明。用户需要对何种类型的视频进行视频拆条,就可以输入何种类型的视频类型,比如用户需要对影视剧视频进行视频拆条,那么就可以将视频类型选择为影视剧。 The above video types may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here. The user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.
实施第一方面描述的方法,通过配置接口获取用户输入的配置参数,该配置参数至少包括视频的视频类型,根据视频类型选择预先训练好的拆条模型对用户输入的视频进行拆条,输出拆条后的多个短视频,其中,该拆条模型与用户输入的视频类型呈对应关系,使得上述多个短视频可以满足用户的多样化需求,用户需要进行何种场景下的拆解就输入何种配置参数,从而实现一个多场景下通用的且符合用户需求的视频拆条模型,提高用户的使用体验。Implement the method described in the first aspect, obtain the configuration parameters input by the user through the configuration interface, the configuration parameters at least include the video type of the video, select a pre-trained splitting model according to the video type to split the video input by the user, and output the splitting Multiple short videos after the article, in which the article disassembly model has a corresponding relationship with the video type input by the user, so that the multiple short videos mentioned above can meet the diverse needs of the user, and the user can input the disassembly scenario in which What kind of configuration parameters can be used to achieve a video splitting model that is universal in multiple scenarios and meets user needs, and improves the user experience.
在一可能的实现方式中,配置参数还包括拆条特征,拆条特征包括场景、人物、音频、字幕、动作、光学字符识别OCR以及外观中的一种或者多种。In a possible implementation, the configuration parameters also include stripping features, and the stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.
其中,场景指的是按照不同场景对视频进行拆条,拆条后的多个短视频可以是相同场景下的短视频,比如学校的纪录片,拆条特征为场景,那么拆条后的多个短视频中,短视频1是纪录片中场景为教室的短视频,短视频2是纪录片中场景为宿舍的短视频,短视频3是纪录片中场景为操场的短视频等等,本申请不作具体限定。Among them, scene refers to splitting the video into strips according to different scenes. The multiple short videos after splitting can be short videos in the same scene, such as a school documentary. If the stripping feature is scene, then the multiple short videos after splitting Among the short videos, Short Video 1 is a short video with a classroom scene in a documentary, Short Video 2 is a short video with a dormitory scene in a documentary, Short Video 3 is a short video with a playground scene in a documentary, etc. This application does not make specific limitations. .
人物指的是按照不同人物对视频进行拆条,拆条后的多个短视频可以是相同人物的短视频,比如选秀综艺节目,拆条特征为人物,那么拆条后的多个短视频中,短视频1是选手A的演出片段,短视频2是选手B的演出片段等等,本申请不作具体限定。Characters refer to splitting videos according to different characters. The multiple short videos after splitting can be short videos of the same character. For example, in a talent show and variety show, if the split feature is a character, then among the multiple short videos after splitting , short video 1 is a performance clip of player A, short video 2 is a performance clip of player B, etc. This application does not make specific limitations.
字幕指的是按照字幕对视频进行拆条,需要结合文字识别技术,根据字幕中的语义解释来决定拆条内容,比如新闻视频,拆条特征为字幕,那么拆条后的多个短视频中,短视频1是新闻1的片段,短视频2是新闻2的片段等等,本申请不作具体限定。Subtitles refer to splitting videos according to subtitles. Text recognition technology needs to be combined to determine the content of the split based on the semantic explanation in the subtitles. For example, in news videos, if the split feature is subtitles, then in the multiple short videos after splitting , short video 1 is a fragment of news 1, short video 2 is a fragment of news 2, etc. This application does not make specific limitations.
动作指的是按照不同动作对视频进行拆条,拆条后的多个短视频可以是相同动作的短视频,比如文艺晚会节目,拆条特征为动作,那么拆条后的多个短视频中,短视频1是全部舞蹈节目片段,短视频2是歌唱节目片段,短视频3是小品节目片段等等,本申请不作具体限定。Action refers to splitting the video into strips according to different actions. The multiple short videos after stripping can be short videos of the same action. For example, in a cultural evening show, if the stripping feature is action, then among the multiple short videos after stripping , short video 1 is all dance program clips, short video 2 is a singing program clip, short video 3 is a sketch program clip, etc., this application does not make specific limitations.
OCR指的是需要结合图像文字识别技术对视频进行拆条,比如需要对视频中的一些场景中的图片进行识别来确定场景的含义,例如广告牌、交通指示牌等等,本申请不作具体限定。OCR refers to the need to combine image and text recognition technology to split the video. For example, it is necessary to identify the pictures in some scenes in the video to determine the meaning of the scene, such as billboards, traffic signs, etc. This application does not make specific limitations. .
外观指的是按照不同外观对视频进行拆条,拆条后的多个短视频可以是相同外观的短视频,这里的外观可以指的是相同的衣服外观、相同的帽子外观等等,本申请不作具体限定。Appearance refers to splitting the video into strips according to different appearances. The multiple short videos after stripping can be short videos with the same appearance. The appearance here can refer to the same appearance of clothes, the same appearance of hats, etc. This application No specific limitation is made.
应理解,拆条特征可根据用户的业务场景划分出更多的特征类型,这里不一一举例说明。用户需要使用何种拆条特征对视频进行拆条,就可以输入何种类型的拆条特征,比如用户想要影视剧集中各个演员的全部片段,那么可以将人物作为拆条特征输入配置接口,或者用户想要对综艺节目视频中的跳舞片段,那么用户可以将动作作为拆条特征输入配置接口。It should be understood that the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here. Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.
应理解,每个拆条特征还可以进行进步一步的细分,比如拆条特征“动作”可以进一步的细分为“跳舞”、“跑步”、“冲突”等等,“音频”可以进一步细分为“唱歌”、“争吵”等等,仍以上述例子为例,用户需要视频中“跳舞”片段,那么可以选择拆条特征中的“动作”,然后再选择“动作”大类下的“跳舞”特征,应理解,上述举例用于说明,本申请不作具体限定。It should be understood that each split feature can be further subdivided. For example, the split feature "action" can be further subdivided into "dancing", "running", "conflict", etc., and "audio" can be further subdivided. Divided into "singing", "quarrel", etc., still taking the above example as an example, the user needs the "dancing" clip in the video, then he can select "action" in the split feature, and then select the "action" category. For the "dancing" feature, it should be understood that the above examples are for illustration and are not specifically limited in this application.
实施上述实现方式,通过获取用户的拆条特征对视频进行拆条,可以满足不同用户在同一视频类型场景下的多样化需求,提高用户的使用体验,应理解,对于同一个视频类型下,不同的用户进行视频拆条时的关注点也是不同的,比如综艺类型的视频,有的用户只想看自己喜爱的演员明星的演出片段,有的用户只想看跳舞片段,有的用户只想看演唱片段,本申请通过上述配置接口向用户获取需求的拆条特征,根据该拆条特征对视频进行拆条,可以满足用户多样化需求。 Implementing the above implementation method and splitting the video by obtaining the user's splitting characteristics can meet the diverse needs of different users in the same video type scenario and improve the user experience. It should be understood that for the same video type, different Users also have different concerns when splitting videos. For example, for variety show videos, some users only want to watch performance clips of their favorite actors and stars, some only want to watch dancing clips, and some only want to watch For singing clips, this application obtains the required splitting characteristics from the user through the above configuration interface, and splits the video into strips based on the splitting characteristics, which can meet the diverse needs of users.
在一可能的实现方式中,配置参数还包括拆条速度。In a possible implementation, the configuration parameters also include strip stripping speed.
具体实现中,该拆条速度可以是速度数值或者速度范围,视频处理系统可以根据用户输入的速度数值或者速度范围,确定拆条速度,该拆条速度可以是前设定好的范围,比如0~1s,1~5s,5~10s,10~15s,15s~20s,20~30s等等,举例来说,若用户输入的速度数值为3s,那么拆条速度可以是1~5s,如果用户输入的速度数值为4~8s,那么拆条速度可以是5~10s。In specific implementation, the strip-splitting speed can be a speed value or a speed range. The video processing system can determine the strip-splitting speed according to the speed value or speed range input by the user. The strip-splitting speed can be a previously set range, such as 0. ~1s, 1~5s, 5~10s, 10~15s, 15s~20s, 20~30s, etc. For example, if the speed value input by the user is 3s, then the splitting speed can be 1~5s. If the user The input speed value is 4~8s, then the strip removal speed can be 5~10s.
实施上述实现方式,通过获取用户输入的拆条速度,可以使用该拆条速度对视频进行拆条,从而进一步满足用户的使用需求,提高用户的使用体验,应理解,视频拆条速度越快,拆条精度越低,但是有的用户需求侧重于拆条速度,有的用户需求侧重于拆条精度,用户按照自己的需求选择拆条速度,可以提高用户的使用体验。Implementing the above implementation method, by obtaining the stripping speed input by the user, the stripping speed can be used to strip the video, thereby further meeting the user's usage needs and improving the user experience. It should be understood that the faster the video stripping speed, the faster the stripping speed of the video. The lower the stripping accuracy, but some users' needs focus on stripping speed, and some users' needs focus on stripping accuracy. Users can choose the stripping speed according to their own needs, which can improve the user experience.
在一可能的实现方式中,视频类型包括未知类型,在视频类型为未知类型的情况下,该方法还可包括以下步骤:视频处理系统对视频进行类型检测,获得视频的检测类型,视频处理系统根据检测类型和配置参数对视频进行拆条,输出多个短视频。In a possible implementation, the video type includes an unknown type. When the video type is an unknown type, the method may also include the following steps: the video processing system performs type detection on the video, and obtains the detection type of the video. The video processing system Split the video into strips based on the detection type and configuration parameters, and output multiple short videos.
可选地,在配置接口未获取到用户输入的拆条特征的情况下,视频处理系统对视频进行特征检测,获得视频的拆条特征,根据拆条特征对视频进行拆条。Optionally, when the configuration interface does not obtain the stripping characteristics input by the user, the video processing system performs feature detection on the video, obtains the stripping characteristics of the video, and strips the video according to the stripping characteristics.
可选地,配置接口可以向用户展示提前设定好的拆条速度以供用户选择,若用户没有进行拆条速度的选择,视频处理系统使用默认拆条速度或者用户历史拆条速度作为用户输入的拆条速度,并使用该拆条速度对视频进行拆条,输出多个短视频。Optionally, the configuration interface can display the pre-set bar splitting speed to the user for selection. If the user does not select the bar splitting speed, the video processing system uses the default bar splitting speed or the user's historical bar splitting speed as user input. The splitting speed is used to split the video into strips and output multiple short videos.
实施上述实现方式,若用户没有输入或者无法确认所需的视频类型、拆条特征或者拆条速度,视频处理系统可以对视频进行检测,获得其视频类型、拆条特征或者拆条速度,预测出用户可能的拆条需求,提高用户的使用体验。Implementing the above implementation method, if the user does not input or cannot confirm the required video type, stripping characteristics or stripping speed, the video processing system can detect the video, obtain its video type, stripping characteristics or stripping speed, and predict the video type, stripping characteristics or stripping speed. Users may need to split the strips to improve the user experience.
在一可能的实现方式中,视频处理系统包括多个拆条模型,其中,一个拆条模型对应一种视频类型,视频处理系统获取视频类型对应的拆条模型,将视频输入视频类型对应的拆条模型,输出拆条后获得的多个短视频。In a possible implementation, the video processing system includes multiple splitting models, where one splitting model corresponds to a video type. The video processing system obtains the splitting model corresponding to the video type, and inputs the video into the splitting model corresponding to the video type. The strip model outputs multiple short videos obtained after splitting the strips.
具体实现中,将视频和拆条特征输入视频类型对应的拆条模型,输出拆条后获得的多个短视频,其中,拆条模型是使用样本集对机器学习模型进行训练后获得的,样本集包括样本输入数据和样本输出数据,其中,样本输入数据包括已知视频和已知特征,样本输出数据包括使用已知拆条特征对已知视频进行拆条后获得的多个已知短视频。In the specific implementation, the video and stripping features are input into the stripping model corresponding to the video type, and multiple short videos obtained after stripping are output. Among them, the stripping model is obtained after training the machine learning model using the sample set. The sample The set includes sample input data and sample output data, where the sample input data includes known videos and known features, and the sample output data includes multiple known short videos obtained by splitting known videos using known splitting features. .
可选地,在视频类型为未知类型的情况下,若视频处理系统对视频进行类型检测成功获得了视频的检测类型,视频处理系统可以使用该检测类型对应的拆条模型对视频进行拆条,输出多个短视频;若视频处理系统对视频进行类型检测失败,未获得视频的检测类型,或者类型检测成功但是检测类型的置信度很低,此时可使用通用拆条模型对视频进行拆条,输出多个短视频,上述通用拆条模型可以是多种视频类型通用的拆条模型。Optionally, when the video type is an unknown type, if the video processing system successfully performs type detection on the video and obtains the detection type of the video, the video processing system can use the stripping model corresponding to the detection type to strip the video. Output multiple short videos; if the video processing system fails to detect the type of video and does not obtain the detection type of the video, or the type detection is successful but the confidence of the detection type is very low, in this case, the general splitting model can be used to split the video. , outputting multiple short videos, the above universal bar splitting model can be a common bar splitting model for multiple video types.
上述实现方式,一个拆条模型对应一种视频类型,可以满足不同应用场景下用户的使用需求,并且,不同视频类型的机器学习模型采用的模型结构可以相同或者不同,具体可根据各自对应的视频类型确定。比如应用场景类似的视频类型所采用的机器学习模型结构可以是类似的或者相同的,训练时使用的样本集对应各自的视频类型即可,从而减少模型搭建的工作量,提高拆条模型的准备效率。In the above implementation, one split model corresponds to one video type, which can meet the needs of users in different application scenarios. Moreover, the model structures used by machine learning models of different video types can be the same or different, depending on the corresponding video. Type determined. For example, the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.
在一可能的实现方式中,上述拆条模型可针对不同的拆条特征对视频进行拆条,举例来说,假设用户选择的视频类型为“综艺”,那么对应的拆条模型为综艺拆条模型,若用户选择的拆条特征为“人物”,上传的视频为选秀类综艺节目的一期视频,那么拆条特征可按照人物 特征对进行拆条,获得的短视频可以是该综艺中选手A全部的演出片段。In a possible implementation, the above-mentioned stripping model can strip videos according to different stripping characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding stripping model is variety show stripping. Model, if the user selects the splitting feature as "character" and the uploaded video is an episode of a talent show variety show, then the splitting feature can be based on the character The feature pair is split into strips, and the short video obtained can be all the performance clips of contestant A in the variety show.
可选地,上述样本集包括样本输入数据和样本输出数据,其中,样本输入数据包括已知视频和已知特征,样本输出数据包括使用所述已知拆条特征对所述已知视频进行拆条后获得的多个已知短视频,训练好的拆条模型可根据用户输入的拆条特征对视频进行拆条。Optionally, the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features. For multiple known short videos obtained after stripping, the trained splitting model can split the videos according to the splitting features input by the user.
具体实现中,上述机器学习模型可包括但不限于比如CNN、LSTM、Yolo模型、SSD模型、RCNN模型或Fast-RCNN模型等,本申请不作具体限定。In specific implementation, the above machine learning models may include but are not limited to CNN, LSTM, Yolo model, SSD model, RCNN model or Fast-RCNN model, etc., which are not specifically limited in this application.
上述实现方式,拆条模型可针对不同的拆条特征对视频进行拆条,使得相同视频类型的不同用户使用需求都可以得到满足,应理解,对于同一个视频类型下,不同的用户进行视频拆条时的关注点也是不同的,比如综艺类型的视频,有的用户只想看自己喜爱的演员明星的演出片段,有的用户只想看跳舞片段,有的用户只想看演唱片段,本申请通过上述配置接口向用户获取需求的拆条特征,根据该拆条特征对视频进行拆条,可以满足用户多样化需求。In the above implementation, the splitting model can split videos according to different splitting characteristics, so that the usage needs of different users of the same video type can be met. It should be understood that for the same video type, different users can split videos. The focus of Tieshi is also different. For example, for variety show videos, some users only want to watch performance clips of their favorite actors and stars, some users only want to watch dancing clips, and some users only want to watch singing clips. This application Through the above configuration interface, the required splitting characteristics are obtained from the user, and the video is split into strips based on the splitting characteristics, which can meet the diverse needs of the user.
在一可能的实现方式中,每个视频类型下的拆条模型可包括多种速度拆条模型,其中,一个速度拆条模型对应一个拆条速度,拆条单元220可以根据配置接口获取的视频类型确定对应的拆条模型,然后根据配置接口获取的拆条速度确定该拆条模型对应的速度拆条模型,然后使用该速度拆条模型对视频进行拆条,获得多个短视频。具体实现中,每种视频类型下的多种速度拆条模型的结构可以相同也可以不同,具体可根据实际处理情况决定,本申请不作具体限定。In a possible implementation, the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 can obtain the video according to the configuration interface. The type determines the corresponding splitting model, and then determines the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then uses the speed splitting model to split the video to obtain multiple short videos. In specific implementation, the structures of the multi-speed bar splitting models under each video type can be the same or different. The details can be determined according to the actual processing situation, and are not specifically limited in this application.
上述实现方式,每个视频类型下的拆条模型包括多种速度拆条模型,不同的拆条速度对应不同的速度拆条模型,从而进一步满足用户的使用需求,提高用户的使用体验。In the above implementation, the stripping model under each video type includes multiple speed stripping models, and different stripping speeds correspond to different speed stripping models, thereby further meeting the user needs and improving the user experience.
第二方面,提供了一种视频处理系统,该系统包括:获取单元,用于通过配置接口向用户获取配置参数和视频,配置参数包括视频的视频类型,拆条单元,用于输出多个短视频,其中,多个短视频是根据配置参数对视频进行拆条后获得的。In a second aspect, a video processing system is provided. The system includes: an acquisition unit for acquiring configuration parameters and videos from the user through a configuration interface. The configuration parameters include the video type of the video; and a splitting unit for outputting multiple short videos. Video, among which multiple short videos are obtained by splitting the video into strips according to the configuration parameters.
实施第二方面描述的方法,通过配置接口获取用户输入的配置参数,该配置参数至少包括视频的视频类型,根据视频类型选择预先训练好的拆条模型对用户输入的视频进行拆条,输出拆条后的多个短视频,其中,该拆条模型与用户输入的视频类型呈对应关系,使得上述多个短视频可以满足用户的多样化需求,用户需要进行何种场景下的拆解就输入何种配置参数,从而实现一个多场景下通用的且符合用户需求的视频拆条模型,提高用户的使用体验。Implement the method described in the second aspect, obtain the configuration parameters input by the user through the configuration interface, the configuration parameters at least include the video type of the video, select a pre-trained splitting model according to the video type to split the video input by the user, and output the splitting Multiple short videos after the article, in which the article disassembly model has a corresponding relationship with the video type input by the user, so that the multiple short videos mentioned above can meet the diverse needs of the user, and the user can input the disassembly scenario in which What kind of configuration parameters can be used to achieve a video splitting model that is universal in multiple scenarios and meets user needs, and improves the user experience.
在一可能的实现方式中,视频类型包括影视剧、综艺、新闻、纪录片、访谈、体育、动漫以及会议中的一种或者多种。In a possible implementation, the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences.
在一可能的实现方式中,配置参数还包括拆条特征,拆条特征包括场景、人物、音频、字幕、动作、光学字符识别OCR以及外观中的一种或者多种。In a possible implementation, the configuration parameters also include stripping features, and the stripping features include one or more of scene, character, audio, subtitle, action, optical character recognition (OCR), and appearance.
在一可能的实现方式中,配置参数还包括拆条速度。In a possible implementation, the configuration parameters also include strip stripping speed.
在一可能的实现方式中,系统还包括检测单元,视频类型包括未知类型,在所示视频类型为未知类型的情况下,检测单元,用于对视频进行类型检测,获得视频的检测类型;In a possible implementation, the system further includes a detection unit, and the video type includes an unknown type. When the video type shown is an unknown type, the detection unit is used to perform type detection on the video and obtain the detection type of the video;
拆条单元,用于根据检测类型和配置参数对视频进行拆条,输出多个短视频。The splitting unit is used to split the video into strips according to the detection type and configuration parameters, and output multiple short videos.
在一可能的实现方式中,在配置接口未获取到用户输入的拆条特征的情况下,检测单元,用于对视频进行特征检测,获得视频的拆条特征,根据拆条特征对视频进行拆条。In a possible implementation, when the configuration interface does not obtain the stripping feature input by the user, the detection unit is used to detect features of the video, obtain the stripping feature of the video, and split the video according to the stripping feature. strip.
在一可能的实现方式中,视频处理系统包括多个拆条模型,其中,一个拆条模型对应一种视频类型,拆条单元,用于获取视频类型对应的拆条模型,将视频输入视频类型对应的拆条模型,输出拆条后获得的多个短视频。 In a possible implementation, the video processing system includes multiple splitting models, where one splitting model corresponds to one video type, and the splitting unit is used to obtain the splitting model corresponding to the video type, and input the video into the video type The corresponding bar splitting model outputs multiple short videos obtained after splitting the bar.
在一可能的实现方式中,用于将视频和拆条特征输入视频类型对应的拆条模型,输出拆条后获得的多个短视频,其中,拆条模型是使用样本集对机器学习模型进行训练后获得的,样本集包括样本输入数据和样本输出数据,其中,样本输入数据包括已知视频和已知特征,样本输出数据包括使用已知拆条特征对已知视频进行拆条后获得的多个已知短视频。In one possible implementation, the video and stripping features are input into a stripping model corresponding to the video type, and multiple short videos obtained after stripping are output, where the stripping model uses a sample set to perform a machine learning model Obtained after training, the sample set includes sample input data and sample output data, where the sample input data includes known videos and known features, and the sample output data includes the results obtained by splitting known videos using known splitting features. Multiple known short videos.
第三方面,提供了一种计算设备,包括处理器和存储器,存储器用于存储代码,处理器包括用于执行第一方面或第一方面任一种可能实现方式中芯片实现的各个模块的功能。In a third aspect, a computing device is provided, including a processor and a memory. The memory is used to store codes. The processor includes functions for executing each module implemented by the chip in the first aspect or any possible implementation of the first aspect. .
第四方面,提供了一种计算机存储介质,该计算机存储介质中存储有指令,当其在计算设备运行时,使得计算设备执行上述各方面所述的方法。In a fourth aspect, a computer storage medium is provided. The computer storage medium stores instructions, which when run on a computing device, cause the computing device to execute the methods described in the above aspects.
第五方面,提供了一种包含指令的程序产品,包括程序或指令,当该程序或指令在计算设备上运行时,使得计算设备执行上述各方面所述的方法。A fifth aspect provides a program product containing instructions, including a program or instructions that, when run on a computing device, cause the computing device to perform the methods described in the above aspects.
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.
附图说明Description of the drawings
图1是本申请提供的一种视频处理系统的架构示意图;Figure 1 is a schematic architectural diagram of a video processing system provided by this application;
图2是本申请提供的一种视频处理系统中拆条模型的示意图;Figure 2 is a schematic diagram of a bar removal model in a video processing system provided by this application;
图3是本申请提供的一种视频处理方法的步骤流程示意图;Figure 3 is a schematic flow chart of the steps of a video processing method provided by this application;
图4是本申请提供的一种视频处理系统中配置接口的示例图;Figure 4 is an example diagram of a configuration interface in a video processing system provided by this application;
图5是本申请提供的一种应用场景下视频处理系统的步骤流程示意图;Figure 5 is a schematic flow chart of the steps of the video processing system in an application scenario provided by this application;
图6是本申请提供的一种计算设备的结构示意图。Figure 6 is a schematic structural diagram of a computing device provided by this application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
首先,对本申请涉及的视频拆条技术进行说明。First, the video stripping technology involved in this application will be described.
随着短视频内容的急剧增长,视频观看者对于长视频的观看耐心程度逐渐降低。为了给与视频观看者更加精彩的视频锦集,以及提高用户碎片化时间的利用率,视频拆条技术应运而生。视频拆条是对原视频内容的二次加工,将原始视频内容按需拆分成若干视频片段,如此,用户可按需观看感兴趣的视频片段。视频拆条技术可深度挖掘长视频中有价值的信息点,帮助用户更好更快地理解视频。With the rapid growth of short video content, video viewers’ patience for watching long videos is gradually decreasing. In order to provide video viewers with more exciting video highlights and improve the utilization of users' fragmented time, video splitting technology came into being. Video splitting is a secondary processing of the original video content, splitting the original video content into several video clips as needed, so that users can watch the video clips they are interested in on demand. Video stripping technology can deeply mine valuable information points in long videos and help users understand videos better and faster.
但是,视频拆条技术通常局限于某一特定场景,比如新闻视频的视频拆条技术通常基于新闻的标题、新闻镜头变换等特征来实现,影视剧的视频拆条技术通常基于影视剧的字幕来实现,因此大多数视频拆条技术只能在对应的应用场景发挥作用,导致视频拆条技术的应用场景单一,一种视频拆条模型无法对多种类型的视频进行视频拆条,导致平台进行视频拆条时需要为每种应用场景定制视频拆条模型,实现成本高,效率低下。However, video stripping technology is usually limited to a specific scene. For example, the video stripping technology of news videos is usually based on the title of the news, news shot changes and other features. The video stripping technology of film and television dramas is usually based on the subtitles of film and television dramas. Therefore, most video stripping technologies can only play a role in the corresponding application scenarios, resulting in a single application scenario for video stripping technology. One video stripping model cannot strip multiple types of videos, resulting in the platform’s When video stripping, the video stripping model needs to be customized for each application scenario, which is costly and inefficient.
并且,在特定场景下,用户的需求是多样的,比如新闻视频场景下,有的用户需要的是处理速度快,有的用户需要的是视频拆分数量多,有的用户需要的是视频拆分精准,有的用户需要的是特定内容的新闻视频等等,而特定场景下的视频拆条技术通常为该场景下的通用 模型,并不能满足用户的多样化需求,灵活性很低。Moreover, in specific scenarios, users have diverse needs. For example, in news video scenarios, some users need fast processing speed, some users need a large number of video splits, and some users need video splits. The analysis is precise. Some users need news videos with specific content, etc., and the video splitting technology in specific scenarios is usually common in that scenario. The model cannot meet the diverse needs of users and has low flexibility.
综上可知,由于视频拆条技术只能在对应的单一应用场景发挥作用,并且是单一场景下的视频拆条技术,也往往无法满足用户的多样的需求,因此,如何实现一个多场景下通用的且符合用户需求的视频拆条模型,是一个亟待解决的问题。In summary, it can be seen that since the video stripping technology can only play a role in the corresponding single application scenario, and it is a video stripping technology in a single scenario, it often cannot meet the diverse needs of users. Therefore, how to implement a universal video stripping technology in multiple scenarios? A video stripping model that is accurate and meets user needs is an urgent problem that needs to be solved.
为了解决上述问题,本申请提供了一种视频处理系统,图1是本申请提供的一种视频处理系统的架构示意图,如图1所示,架构包括客户端100、视频处理系统200以及存储服务器300,其中,客户端100、视频处理系统200以及存储服务器300之间可建立通信连接,具体可以是有线连接也可以是无线连接,本申请不作具体限定。并且,客户端100和存储服务器300的数量可以是一个或者多个,图1以1个客户端100以及1个存储服务器300为例进行了举例说明,本申请不对此进行具体限定。In order to solve the above problems, this application provides a video processing system. Figure 1 is a schematic diagram of the architecture of a video processing system provided by this application. As shown in Figure 1, the architecture includes a client 100, a video processing system 200 and a storage server. 300, wherein a communication connection can be established between the client 100, the video processing system 200 and the storage server 300, which may be a wired connection or a wireless connection, which is not specifically limited in this application. Furthermore, the number of clients 100 and storage servers 300 may be one or more. FIG. 1 takes one client 100 and one storage server 300 as an example. This application does not specifically limit this.
客户端100可以在用户所持有的终端设备上运行,该终端设备可以是计算机、智能手机、掌上处理设备、平板电脑、移动笔记本、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、一体化掌机、穿戴设备、车载设备、智能会议设备、智能广告设备、智能家电等等,此处不作具体限定。The client 100 can be run on a terminal device held by the user, which can be a computer, a smartphone, a handheld processing device, a tablet computer, a mobile notebook, an augmented reality (AR) device, or a virtual reality (virtual reality) , VR) equipment, integrated handheld devices, wearable devices, vehicle-mounted equipment, smart conference equipment, smart advertising equipment, smart home appliances, etc., there are no specific limitations here.
具体实现中,客户端100可以是应用程序客户端,也可以是浏览器中的基于web的客户端,还可以是应用程序(application,APP)客户端,还可以是应用程序编辑接口(application programming interface,API),本申请不作具体限定。In specific implementation, the client 100 can be an application client, a web-based client in a browser, an application (APP, APP) client, or an application editing interface (application programming). interface, API), this application does not make specific limitations.
视频处理系统200可以部署于计算设备上,该计算设备可以是裸金属服务器(Bare Metal Server,BMS)、虚拟机或容器。其中,BMS指的是通用的物理服务器,例如,ARM服务器或者X86服务器;虚拟机指的是网络功能虚拟化(Network Functions Virtualization,NFV)技术实现的、通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统,容器指的是一组受到资源限制,彼此间相互隔离的进程,计算设备还可以是边缘计算设备、存储服务器或者存储阵列,本申请不作具体限定。The video processing system 200 can be deployed on a computing device, which can be a bare metal server (Bare Metal Server, BMS), a virtual machine or a container. Among them, BMS refers to a general physical server, such as an ARM server or an A complete computer system running in a completely isolated environment. A container refers to a group of processes that are subject to resource constraints and isolated from each other. The computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.
存储服务器300可以是具有存储功能的服务器,该服务器可以是物理服务器比如ARM服务器或者X86服务器,还可以是虚拟机,本申请不作具体限定。存储服务器300可以是视频平台(比如电视台、视频网站、直播平台等等)或者公有云平台中的存储服务器,用于存储待拆条的视频和拆条后的短视频。The storage server 300 may be a server with a storage function. The server may be a physical server such as an ARM server or an X86 server, or a virtual machine, which is not specifically limited in this application. The storage server 300 may be a storage server in a video platform (such as a TV station, a video website, a live broadcast platform, etc.) or a public cloud platform, and is used to store videos to be split and short videos after splitting.
可选地,视频处理系统200也可部署于存储服务器300上,换句话说,存储服务器300具有视频拆条的功能,视频处理系统200和客户端100也可以都部署于存储服务器300上,本申请不作具体限定,还可以客户端100部署于存储服务器300上,视频处理系统部署于其他服务器桑,还可以是如图1所示的,客户端100、视频处理系统200部署于除存储服务器300以外的其他服务器上,本申请不作具体限定。Optionally, the video processing system 200 can also be deployed on the storage server 300. In other words, the storage server 300 has the function of video stripping. The video processing system 200 and the client 100 can also be deployed on the storage server 300. The application is not specifically limited. The client 100 can also be deployed on the storage server 300, and the video processing system can be deployed on other servers. Alternatively, as shown in Figure 1, the client 100 and the video processing system 200 can be deployed on other than the storage server 300. On other servers, this application does not make specific limitations.
在本申请实施例中,客户端100可以将视频上传至视频处理系统200进行视频拆条,视频处理系统200对视频拆条后获得多个短视频,然后将其返回至客户端100,或者,将其存储至存储服务器300中,当然,存储服务器300也可以将视频发送给视频处理系统200进行视频拆条,视频处理系统对视频拆条后获得多个短视频,然后将其返回给存储服务器300,或者,将其返回给客户端100使用,具体可根据实际的应用场景决定,本申请不作具体限定。In this embodiment of the present application, the client 100 can upload the video to the video processing system 200 to split the video into strips. The video processing system 200 splits the video into strips to obtain multiple short videos, and then returns them to the client 100, or, Store it in the storage server 300. Of course, the storage server 300 can also send the video to the video processing system 200 for video splitting. The video processing system splits the video to obtain multiple short videos, and then returns them to the storage server. 300, or return it to the client 100 for use. The details can be determined according to the actual application scenario, and are not specifically limited in this application.
可选地,视频处理系统200也可以部署于公有云中,用于向用户提供视频拆条的云服务,举例来说,用户在购买内容分发网络(content delivery network,CDN)服务时,可勾选视频拆 条服务,公有云平台可通过该视频处理系统200根据用户需要对CDN网络中传播的部分视频进行视频拆条。应理解,上述举例用于说明,本申请不作具体限定。Optionally, the video processing system 200 can also be deployed in a public cloud to provide users with cloud services for video stripping. For example, users can check the box when purchasing a content delivery network (CDN) service. Choose video As a service, the public cloud platform can use the video processing system 200 to split some videos spread in the CDN network according to user needs. It should be understood that the above examples are for illustration and are not specifically limited in this application.
进一步地,视频处理系统200可划分为多个单元模块,各个单元模块可以是软件模块也可以是硬件模块,也可以是部分软件模块部分硬件模块,本申请不作具体限定。图1是一种示例性划分方式,如图1所示,该视频处理系统200可包括获取单元210、拆条单元220以及拆条模型230。Further, the video processing system 200 can be divided into multiple unit modules, and each unit module can be a software module or a hardware module, or can be part software module and part hardware module, which is not specifically limited in this application. FIG. 1 is an exemplary division method. As shown in FIG. 1 , the video processing system 200 may include an acquisition unit 210 , a strip splitting unit 220 and a strip splitting model 230 .
获取单元210用于通过配置接口向用户获取配置参数和视频,其中,该视频可以是用户需要进行视频拆条的长视频,比如一集综艺节目,一部纪录片,一期访谈节目等。The acquisition unit 210 is used to obtain configuration parameters and videos from the user through the configuration interface, where the video may be a long video that the user needs to split, such as an episode of a variety show, a documentary, an interview program, etc.
具体实现中,配置接口可以是用户与视频处理系统200进行交互的一个应用程序页面、网页页面或者API,视频处理系统200可以将该应用程序页面或者网页页面显示在客户端100的屏幕上,或者将API接口参数提供给用户,用户可使用该API接口参数将视频处理系统200集成到第三方系统进行二次开发。In specific implementation, the configuration interface may be an application page, web page or API for the user to interact with the video processing system 200. The video processing system 200 may display the application page or web page on the screen of the client 100, or The API interface parameters are provided to the user, and the user can use the API interface parameters to integrate the video processing system 200 into the third-party system for secondary development.
需要说明的,上述用户可以是使用拆条服务的用户,举例来说,视频网站用户可通过应用程序页面或者网页页面输入配置参数和视频,以使用视频网站的视频拆条服务对不同类型的视频进行视频拆条,上述举例用于说明,本申请不作具体限定。It should be noted that the above-mentioned users can be users who use the stripping service. For example, video website users can input configuration parameters and videos through the application page or web page to use the video stripping service of the video website to split different types of videos. To perform video stripping, the above example is used for illustration and is not specifically limited in this application.
上述用户还可以是将拆条服务集成到第三方系统进行二次开发的开发用户,举例来说,若视频处理系统200部署于公有云中,那么配置接口可以是公有云平台的控制台(console)或者API。其中,控制台可以是web化的服务管理系统,用户可通过控制台购买云服务、连接具有该视频处理系统200功能的云服务实例,API可以供用户集成到第三方系统进行二次开发,比如短视频平台可以将该配置接口的API与内部用于存储长视频的服务器建立连接,使得用户上传的长视频可以自动通过该API接口实现视频的拆条,应理解,上述举例用于说明,本申请不作具体限定。The above-mentioned users can also be development users who integrate the stripping service into a third-party system for secondary development. For example, if the video processing system 200 is deployed in a public cloud, the configuration interface can be the console of the public cloud platform. ) or API. Among them, the console can be a web-based service management system. Users can purchase cloud services through the console and connect to cloud service instances with the function of the video processing system 200. The API can be integrated by users into third-party systems for secondary development, such as The short video platform can establish a connection between the API of this configuration interface and the internal server used to store long videos, so that long videos uploaded by users can automatically be split into strips through this API interface. It should be understood that the above example is for illustration. Applications are not subject to specific restrictions.
若视频处理系统200部署于视频网站中,那么配置接口可以是该视频网站的控制台,用户可通过视频网站的控制台上传视频,输入上述配置参数,以供视频网站中的视频处理系统200根据该配置参数对视频进行拆条,获得多个短视频,应理解,上述举例用于说明,本申请不作具体限定。If the video processing system 200 is deployed in a video website, the configuration interface can be the console of the video website. Users can upload videos through the console of the video website and enter the above configuration parameters for the video processing system 200 in the video website to use according to the configuration parameters. This configuration parameter splits the video into strips to obtain multiple short videos. It should be understood that the above examples are for illustration and are not specifically limited in this application.
可选地,该配置参数可包括该视频的视频类型,该视频类型可包括影视剧、综艺、新闻、纪录片、访谈、体育、动漫以及会议中的一种或者多种。应理解,视频类型的划分可根据用户的业务场景划分出更多的分类,这里不一一举例说明。用户需要对何种类型的视频进行视频拆条,就可以输入何种类型的视频类型,比如用户需要对影视剧视频进行视频拆条,那么就可以将视频类型选择为影视剧。Optionally, the configuration parameter may include a video type of the video, and the video type may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here. The user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.
可选地,视频类型包括未知类型,其中,未知类型可以是指用户无法确定的视频类型,或者,未知类型可以是指用户没有输入视频类型,即配置接口未获取到视频类型。在视频类型为未知类型的情况下,检测单元240可以对视频进行类型检测,获得视频的检测类型。视频处理系统200的拆条单元220可以根据配置参数和检测类型对视频进行拆条,输出多个短视频。Optionally, the video type includes an unknown type, where the unknown type may refer to a video type that the user cannot determine, or the unknown type may refer to the user not inputting the video type, that is, the configuration interface does not obtain the video type. When the video type is an unknown type, the detection unit 240 may perform type detection on the video to obtain the detection type of the video. The stripping unit 220 of the video processing system 200 can strip the video according to the configuration parameters and detection type, and output multiple short videos.
可选地,该配置参数还可包括拆条特征,该拆条特征可包括场景、人物、音频、字幕、动作、光学字符识别(optical character recognition,OCR)以及外观中的一种或者多种。Optionally, the configuration parameters may also include stripping features, which may include one or more of scenes, characters, audio, subtitles, actions, optical character recognition (optical character recognition, OCR), and appearance.
其中,场景指的是按照不同场景对视频进行拆条,拆条后的多个短视频可以是相同场景下的短视频,比如学校的纪录片,拆条特征为场景,那么拆条后的多个短视频中,短视频1 是纪录片中场景为教室的短视频,短视频2是纪录片中场景为宿舍的短视频,短视频3是纪录片中场景为操场的短视频等等,本申请不作具体限定。Among them, scene refers to splitting the video into strips according to different scenes. The multiple short videos after splitting can be short videos in the same scene, such as a school documentary. If the stripping feature is scene, then the multiple short videos after splitting Short video, short video 1 It is a short video with a classroom scene in the documentary, short video 2 is a short video with a dormitory scene in the documentary, short video 3 is a short video with the playground scene in the documentary, etc. This application does not make specific limitations.
人物指的是按照不同人物对视频进行拆条,拆条后的多个短视频可以是相同人物的短视频,比如选秀综艺节目,拆条特征为人物,那么拆条后的多个短视频中,短视频1是选手A的演出片段,短视频2是选手B的演出片段等等,本申请不作具体限定。Characters refer to splitting videos according to different characters. The multiple short videos after splitting can be short videos of the same character. For example, in a talent show and variety show, if the split feature is a character, then among the multiple short videos after splitting , short video 1 is a performance clip of player A, short video 2 is a performance clip of player B, etc. This application does not make specific limitations.
字幕指的是按照字幕对视频进行拆条,需要结合文字识别技术,根据字幕中的语义解释来决定拆条内容,比如新闻视频,拆条特征为字幕,那么拆条后的多个短视频中,短视频1是新闻1的片段,短视频2是新闻2的片段等等,本申请不作具体限定。Subtitles refer to splitting videos according to subtitles. Text recognition technology needs to be combined to determine the content of the split based on the semantic explanation in the subtitles. For example, in news videos, if the split feature is subtitles, then in the multiple short videos after splitting , short video 1 is a fragment of news 1, short video 2 is a fragment of news 2, etc. This application does not make specific limitations.
动作指的是按照不同动作对视频进行拆条,拆条后的多个短视频可以是相同动作的短视频,比如文艺晚会节目,拆条特征为动作,那么拆条后的多个短视频中,短视频1是全部舞蹈节目片段,短视频2是歌唱节目片段,短视频3是小品节目片段等等,本申请不作具体限定。Action refers to splitting the video into strips according to different actions. The multiple short videos after stripping can be short videos of the same action. For example, in a cultural evening show, if the stripping feature is action, then among the multiple short videos after stripping , short video 1 is all dance program clips, short video 2 is a singing program clip, short video 3 is a sketch program clip, etc., this application does not make specific limitations.
OCR指的是需要结合图像文字识别技术对视频进行拆条,比如需要对视频中的一些场景中的图片进行识别来确定场景的含义,例如广告牌、交通指示牌等等,本申请不作具体限定。OCR refers to the need to combine image and text recognition technology to split the video. For example, it is necessary to identify the pictures in some scenes in the video to determine the meaning of the scene, such as billboards, traffic signs, etc. This application does not make specific limitations. .
外观指的是按照不同外观对视频进行拆条,拆条后的多个短视频可以是相同外观的短视频,这里的外观可以指的是相同的衣服外观、相同的帽子外观等等,本申请不作具体限定。Appearance refers to splitting the video into strips according to different appearances. The multiple short videos after stripping can be short videos with the same appearance. The appearance here can refer to the same appearance of clothes, the same appearance of hats, etc. This application No specific limitation is made.
应理解,拆条特征可根据用户的业务场景划分出更多的特征类型,这里不一一举例说明。用户需要使用何种拆条特征对视频进行拆条,就可以输入何种类型的拆条特征,比如用户想要影视剧集中各个演员的全部片段,那么可以将人物作为拆条特征输入配置接口,或者用户想要对综艺节目视频中的跳舞片段,那么用户可以将动作作为拆条特征输入配置接口。It should be understood that the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here. Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.
应理解,每个拆条特征还可以进行进步一步的细分,比如拆条特征“动作”可以进一步的细分为“跳舞”、“跑步”、“冲突”等等,“音频”可以进一步细分为“唱歌”、“争吵”等等,仍以上述例子为例,用户需要视频中“跳舞”片段,那么可以选择拆条特征中的“动作”,然后再选择“动作”大类下的“跳舞”特征,应理解,上述举例用于说明,本申请不作具体限定。It should be understood that each split feature can be further subdivided. For example, the split feature "action" can be further subdivided into "dancing", "running", "conflict", etc., and "audio" can be further subdivided. Divided into "singing", "quarrel", etc., still taking the above example as an example, the user needs the "dancing" clip in the video, then he can select "action" in the split feature, and then select the "action" category. For the "dancing" feature, it should be understood that the above examples are for illustration and are not specifically limited in this application.
具体实现中,配置接口可以向用户展示多种拆条特征以供选择,若用户无法确定拆条特征或者用户没有进行拆条特征的选择,即配置接口未获取到拆条特征的情况下,检测单元240可以对视频进行检测,获得该视频的拆条特征,其中,该拆条特征可以是该种视频类型中最常用的特征类型,或者确定该中视频类型中该也用户历史输入的特征类型,本申请不作具体限定。In the specific implementation, the configuration interface can display a variety of stripping characteristics to the user for selection. If the user cannot determine the stripping characteristics or the user does not select the stripping characteristics, that is, if the configuration interface does not obtain the stripping characteristics, the detection Unit 240 can detect the video and obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or determine the feature type input by the user in the video type. , this application does not make specific limitations.
可选地,该配置参数还可包括拆条速度,该拆条速度可以是速度数值或者速度范围,视频处理系统200可以根据用户输入的速度数值或者速度范围,确定拆条速度,该拆条速度可以是前设定好的范围,比如0~1s,1~5s,5~10s,10~15s,15s~20s,20~30s等等,举例来说,若用户输入的速度数值为3s,那么拆条速度可以是1~5s,如果用户输入的速度数值为4~8s,那么拆条速度可以是5~10s。Optionally, the configuration parameters may also include a strip-splitting speed, which may be a speed value or a speed range. The video processing system 200 may determine the strip-splitting speed according to the speed value or speed range input by the user. The strip-splitting speed may be a speed value or a speed range input by the user. It can be a previously set range, such as 0~1s, 1~5s, 5~10s, 10~15s, 15s~20s, 20~30s, etc. For example, if the speed value input by the user is 3s, then The speed of stripping can be 1~5s. If the speed value input by the user is 4~8s, then the speed of stripping can be 5~10s.
具体实现中,配置接口可以向用户展示提前设定好的拆条速度以供用户选择,若用户没有进行拆条速度的选择,检测单元240可以使用默认拆条速度或者用户历史拆条速度对视频进行拆条,本申请不作具体限定。In specific implementation, the configuration interface can display the preset stripping speed to the user for the user to choose. If the user does not select the stripping speed, the detection unit 240 can use the default stripping speed or the user's historical stripping speed to evaluate the video. There are no specific limitations in this application for strip stripping.
可选地,配置接口可包括参数接口和视频接口,参数接口获取配置参数,视频接口获取用户上的视频,用户可以先通过参数接口进行参数配置后,再通过视频接口上传多个视频进 行拆条,比如用户设置视频类型为电视剧,拆条特征为演员A,拆条速度为1~5s,然后将某电视剧的24集视频依次上传,视频处理系统200可根据该配置参数对24个视频依次进行视频拆条,输出每集视频拆条后的短视频,该短视频中的内容为演员A的演出片段。应理解,上述举例用于说明,本申请不作具体限定。Optionally, the configuration interface may include a parameter interface and a video interface. The parameter interface obtains the configuration parameters, and the video interface obtains the user's video. The user can first configure the parameters through the parameter interface, and then upload multiple videos through the video interface. Line splitting, for example, the user sets the video type to TV series, the splitting feature to actor A, the splitting speed to 1 to 5 seconds, and then uploads 24 episodes of a certain TV series in sequence. The video processing system 200 can process 24 episodes according to the configuration parameters. The video is split into strips in turn, and a short video of each episode is output. The content in the short video is a performance clip of actor A. It should be understood that the above examples are for illustration and are not specifically limited in this application.
拆条单元220用于输出多个短视频,其中,上述多个短视频是根据上述配置参数对视频进行拆条后获得的。具体实现中,拆条单元220可以将多个短视频输出至用户客户端100,也可以将多个短视频输出至存储服务器300,本申请不作具体限定。The stripping unit 220 is used to output multiple short videos, wherein the multiple short videos are obtained by stripping the videos according to the above configuration parameters. In specific implementation, the strip splitting unit 220 can output multiple short videos to the user client 100, and can also output multiple short videos to the storage server 300, which is not specifically limited in this application.
具体实现中,视频处理系统200中可包括多个拆条模型230,一个拆条模型对应一种视频类型。拆条单元220可以根据配置接口获取的视频类型确定对应的拆条模型,使用该拆条模型对视频进行拆条,输出多个短视频。举例来说,视频类型可包括影视剧、综艺、新闻以及纪录片。拆条模型1对应影视剧类型,拆条模型2对应综艺类型,拆条模型3对应新闻类型,拆条模型4对应纪录片类型。In specific implementation, the video processing system 200 may include multiple splitting models 230, and one splitting model corresponds to one video type. The stripping unit 220 can determine the corresponding stripping model according to the video type obtained by the configuration interface, use the stripping model to strip the video, and output multiple short videos. For example, video types may include film and television dramas, variety shows, news, and documentaries. The split model 1 corresponds to the film and television drama type, the split model 2 corresponds to the variety show type, the split model 3 corresponds to the news type, and the split model 4 corresponds to the documentary type.
应理解,上述拆条模型可以是使用不同视频类型的样本集对机器学习模型进行训练后获得的,比如影视剧类型的样本集对机器学习模型进行训练后获得影视剧视频类型的拆条模型,新闻类型的样本集对机器学习模型进行训练后获得新闻视频类型的拆条模型,以此类推,获得多种视频类型的拆条模型。It should be understood that the above-mentioned bar splitting model can be obtained by training the machine learning model using sample sets of different video types. For example, a bar splitting model of the film and television drama video type is obtained after training the machine learning model with a sample set of film and television drama types. After training the machine learning model on the news type sample set, the news video type stripping model is obtained, and by analogy, the stripping models of multiple video types are obtained.
需要说明的,不同视频类型的机器学习模型采用的模型结构可以相同或者不同,具体可根据各自对应的视频类型确定。比如应用场景类似的视频类型所采用的机器学习模型结构可以是类似的或者相同的,训练时使用的样本集对应各自的视频类型即可,从而减少模型搭建的工作量,提高拆条模型的准备效率。It should be noted that the model structures used by machine learning models of different video types can be the same or different, and the details can be determined according to the corresponding video types. For example, the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.
举例来说,访谈节目和会议通常视频中的人物只有主持人和嘉宾(或者参会人员),并且主持人和嘉宾(或者参会人员)的动作不会有太大变化,访谈节目和会议的视频拆条通常不会注重于人物动作和场景变化,而是更注重于字幕或音频,那么该视频配型所采用的机器学习模型结构可侧重于语音文字特征的提取和识别,而不会侧重于图像识别,这两个视频类型的机器学习模型采用的模型结构可以是相同或者相似的。For example, in talk shows and conferences, the only characters in the video are usually the host and guests (or participants), and the movements of the host and guests (or participants) do not change much. Video stripping usually does not focus on character movements and scene changes, but focuses more on subtitles or audio. Then the machine learning model structure used in the video matching can focus on the extraction and recognition of speech and text features, rather than on the subtitles or audio. For image recognition, the model structures used by these two video types of machine learning models can be the same or similar.
再比如,综艺节目和影视剧通常会有很多人物,场景也是不断变化的,人物动作也是多变的,那么该视频类型所采用的机器学习模型结构可侧重于场景、人脸、动作特征的提取和识别,而不会侧重于语音文字识别。应理解,上述举例用于说明,本申请不作具体限定。For another example, variety shows and film and television dramas usually have many characters, the scenes are constantly changing, and the characters' actions are also changeable. Then the machine learning model structure used in this video type can focus on the extraction of scene, face, and action features. and recognition, rather than focusing on speech text recognition. It should be understood that the above examples are for illustration and are not specifically limited in this application.
可选地,上述拆条模型可针对不同的拆条特征对视频进行拆条,举例来说,假设用户选择的视频类型为“综艺”,那么对应的拆条模型为综艺拆条模型,若用户选择的拆条特征为“人物”,上传的视频为选秀类综艺节目的一期视频,那么拆条特征可按照人物特征对进行拆条,获得的短视频可以是该综艺中选手A全部的演出片段。若用户选择的拆条特征为“动作”,那么拆条特征可按照动作特征对视频进行拆条,拆条后获得的短视频可以是该综艺中该动作出现的视频片段,比如该动作用户设置为“舞蹈”,那么拆条后获得的短视频可以是该综艺中舞蹈选手的演出合集,应理解,上述举例用于说明,本申请不作具体限定。Optionally, the above-mentioned splitting model can split the video according to different splitting characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding splitting model is a variety show splitting model. If the user If the selected stripping feature is "character" and the uploaded video is an episode of a talent show variety show, then the stripping feature can be split according to the character characteristics, and the short video obtained can be all the performances of contestant A in the variety show. fragment. If the stripping feature selected by the user is "action", then the stripping feature can split the video according to the action feature. The short video obtained after splitting can be a video clip in which the action appears in the variety show. For example, the action user sets is "dance", then the short video obtained after splitting the strip can be a collection of performances by dancers in the variety show. It should be understood that the above examples are for illustration and are not specifically limited in this application.
可选地,上述样本集包括样本输入数据和样本输出数据,其中,样本输入数据包括已知视频和已知特征,样本输出数据包括使用所述已知拆条特征对所述已知视频进行拆条后获得的多个已知短视频,训练好的拆条模型可根据用户输入的拆条特征对视频进行拆条。Optionally, the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features. For multiple known short videos obtained after stripping, the trained splitting model can split the videos according to the splitting features input by the user.
应理解,对于同一个视频类型下,不同的用户进行视频拆条时的关注点也是不同的,比如综艺类型的视频,有的用户只想看自己喜爱的演员明星的演出片段,有的用户只想看跳舞 片段,有的用户只想看演唱片段,本申请通过上述配置接口向用户获取需求的拆条特征,根据该拆条特征对视频进行拆条,可以满足用户多样化需求。It should be understood that for the same video type, different users have different concerns when detaching videos. For example, for variety show videos, some users only want to watch the performance clips of their favorite actors and stars, and some users only want to watch the performance clips of their favorite actors and stars. want to see dancing Clips. Some users only want to watch singing clips. This application obtains the required splitting characteristics from the user through the above configuration interface, and splits the video into strips based on the splitting characteristics, which can meet the diverse needs of users.
具体实现中,上述机器学习模型可包括但不限于比如卷积神经网络(convolutional neural networks,CNN)模型、长短期记忆网络(long short-term memory,LSTM)模型、一阶段统一实时目标检测(you only look once:unified,real-time object detection,Yolo)模型、单镜头多盒检测器(single shot multi box detector,SSD)模型、区域卷积神经网络(region convolutional neural network,RCNN)模型或快速区域卷积神经网络(fast region convolutional neural network,Fast-RCNN)模型等,本申请不作具体限定。In specific implementation, the above machine learning models may include but are not limited to convolutional neural networks (CNN) models, long short-term memory networks (LSTM) models, one-stage unified real-time target detection (you only look once: unified, real-time object detection (Yolo) model, single shot multi box detector (SSD) model, region convolutional neural network (RCNN) model or fast region Convolutional neural network (fast region convolutional neural network, Fast-RCNN) model, etc. are not specifically limited in this application.
可选地,每个视频类型下的拆条模型可包括多种速度拆条模型,其中,一个速度拆条模型对应一个拆条速度,拆条单元220可以根据配置接口获取的视频类型确定对应的拆条模型,然后根据配置接口获取的拆条速度确定该拆条模型对应的速度拆条模型,然后使用该速度拆条模型对视频进行拆条,获得多个短视频。具体实现中,每种视频类型下的多种速度拆条模型的结构可以相同也可以不同,具体可根据实际处理情况决定,本申请不作具体限定。Optionally, the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 may determine the corresponding stripping model according to the video type obtained by the configuration interface. Split the model, and then determine the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then use the speed splitting model to split the video to obtain multiple short videos. In specific implementation, the structures of the multi-speed bar splitting models under each video type can be the same or different. The details can be determined according to the actual processing situation, and are not specifically limited in this application.
示例性地,图2是本申请提供的视频处理系统中存储的拆条模型示例图,如图2所示,图1所示的视频处理系统200中的多个拆条模型230可以是图2中拆条模型11、拆条模型12、拆条模型21和拆条模型22,其中,拆条模型11和拆条模型12的视频类型为类型1,拆条模型21和拆条模型22的视频类型为类型2,拆条模型11和拆条模型21的拆条速度为速度1,拆条模型12和拆条模型22的拆条速度为速度2。Exemplarily, FIG. 2 is an example diagram of a stripping model stored in the video processing system provided by this application. As shown in FIG. 2 , the multiple stripping models 230 in the video processing system 200 shown in FIG. 1 can be as shown in FIG. 2 Among them, the strip model 11, the strip model 12, the strip model 21 and the strip model 22, among which the video types of the strip model 11 and the strip model 12 are type 1, and the videos of the strip model 21 and the strip model 22 are The type is type 2, the strip stripping speeds of the strip stripping models 11 and 21 are speed 1, and the strip stripping speeds of the strip stripping models 12 and 22 are speed 2.
其中,每个拆条模型可对应不同的视频类型和视频速度,可根据用户输入的配置参数选择对应的拆条模型进行视频拆条。每个拆条模型的输入数据包括待拆条的视频以及拆条特征,输出数据是使用该拆条特征对视频进行拆条后获得的多个短视频,比如拆条特征1和视频输入拆条模型11后,获得拆条特征1的多个短视频,拆条特征2和视频输入拆条模型11后,获得拆条特征2的多个短视频,以此类推,这里不一一展开赘述。举例来说,若用户通过配置接口选择的视频类型为视频类型1,拆条特征为拆条特征2,拆条速度为速度2,那么视频处理系统可根据用户输入的配置参数,选择图2中的拆条模型12,将拆条特征2和视频输入上述拆条模型12,即可获得拆条特征2的多个短视频。这样,最终输出的多个短视频是结合视频类型和用户所需要的拆条特征以及拆条速度对视频进行拆条后获得的,最大程度满足了用户的多样化需求,提高用户的使用体验。Among them, each splitting model can correspond to different video types and video speeds, and the corresponding splitting model can be selected according to the configuration parameters input by the user for video splitting. The input data of each splitting model includes the video to be split and splitting features. The output data is multiple short videos obtained after splitting the video using the splitting features, such as splitting feature 1 and video input splitting. After model 11, multiple short videos of split feature 1 are obtained. After split feature 2 and video are input into split model 11, multiple short videos of split feature 2 are obtained, and so on. I will not go into details here. For example, if the video type selected by the user through the configuration interface is video type 1, the stripping feature is stripping feature 2, and the stripping speed is speed 2, then the video processing system can select the configuration parameters in Figure 2 based on the configuration parameters input by the user. By inputting the stripping feature 2 and the video into the stripping model 12, multiple short videos of the stripping feature 2 can be obtained. In this way, the multiple short videos finally output are obtained by splitting the videos based on the video type and the splitting characteristics and splitting speed required by the user. This meets the diverse needs of the users to the greatest extent and improves the user experience.
应理解图2用于举例说明,视频处理系统200中的拆条模型230还可包括更多或者更少的视频类型、视频速度以及拆条特征,本申请不作具体限定。It should be understood that FIG. 2 is used for illustration. The stripping model 230 in the video processing system 200 may also include more or less video types, video speeds, and stripping characteristics, which are not specifically limited in this application.
综上可知,本申请提供的视频处理系统,通过配置接口获取用户输入的配置参数,该配置参数至少包括视频的视频类型,根据视频类型选择预先训练好的拆条模型对用户输入的视频进行拆条,输出拆条后的多个短视频,其中,该拆条模型与用户输入的视频类型呈对应关系,使得上述多个短视频可以满足用户的多样化需求,用户需要进行何种场景下的拆解就输入何种配置参数,从而实现一个多场景下通用的且符合用户需求的视频拆条模型,提高用户的使用体验。In summary, it can be seen that the video processing system provided by this application obtains the configuration parameters input by the user through the configuration interface. The configuration parameters at least include the video type of the video. According to the video type, a pre-trained splitting model is selected to split the video input by the user. strips, and outputs multiple short videos after splitting. The splitting model has a corresponding relationship with the video type input by the user, so that the above multiple short videos can meet the diverse needs of users, and what scenarios the users need to perform in For disassembly, you enter the configuration parameters to achieve a video strip dismantling model that is universal in multiple scenarios and meets user needs, and improves the user experience.
图3是本申请提供的一种视频处理方法的步骤流程示意图,该方法可应用于如图1所示的视频处理系统200中,如图3所示,该方法可包括以下步骤:Figure 3 is a schematic flow chart of the steps of a video processing method provided by this application. The method can be applied to the video processing system 200 shown in Figure 1. As shown in Figure 3, the method can include the following steps:
步骤S310:视频处理系统200通过配置接口向用户获取配置参数和视频。其中,视频可 以是用户需要进行拆条的长视频,比如一集电视剧、一期综艺节目、一次访谈录像、一部纪录片等等。Step S310: The video processing system 200 obtains configuration parameters and videos from the user through the configuration interface. Among them, video can Therefore, users need to split long videos, such as a TV series, a variety show, an interview video, a documentary, etc.
其中,视频处理系统200可部署于服务器或公有云中,该服务器可以是物理服务器、虚拟机、容器、边缘计算设备中的一种,具体的部署方式可参考图1实施例中关于视频处理系统200的描述,这里不重复赘述。Among them, the video processing system 200 can be deployed on a server or a public cloud. The server can be one of a physical server, a virtual machine, a container, and an edge computing device. For specific deployment methods, refer to the video processing system in the embodiment of Figure 1 The description of 200 will not be repeated here.
具体实现中,上述配置接口可以是用户与视频处理系统200进行交互的一个应用程序页面、网页页面或者API,视频处理系统200可以将该应用程序页面或者网页页面显示在客户端100的屏幕上,或者将API接口参数提供给用户,用户可使用该API接口参数将视频处理系统200集成到第三方系统进行二次开发。具体可参考图1实施例中关于配置接口的描述,这里不重复赘述。In specific implementation, the above configuration interface may be an application page, web page or API for the user to interact with the video processing system 200. The video processing system 200 may display the application page or web page on the screen of the client 100, Or the API interface parameters are provided to the user, and the user can use the API interface parameters to integrate the video processing system 200 into the third-party system for secondary development. For details, reference may be made to the description of the configuration interface in the embodiment of Figure 1, which will not be repeated here.
可选地,该配置参数可包括视频类型,该视频类型可包括影视剧、综艺、新闻、纪录片、访谈、体育、动漫以及会议中的一种或者多种。应理解,视频类型的划分可根据用户的业务场景划分出更多的分类,这里不一一举例说明。用户需要对何种类型的视频进行视频拆条,就可以输入何种类型的视频类型,比如用户需要对影视剧视频进行视频拆条,那么就可以将视频类型选择为影视剧。Optionally, the configuration parameter may include a video type, and the video type may include one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations, and conferences. It should be understood that the classification of video types can be divided into more categories according to the user's business scenarios, and examples are not given here. The user can input the type of video that the user needs to split into strips. For example, if the user needs to split into strips for a movie or TV series, then the video type can be selected as a movie or TV series.
可选地,视频类型包括未知类型,其中,未知类型可以是指用户无法确定的视频类型,或者,未知类型可以是指用户没有输入视频类型,即配置接口未获取到视频类型。在视频类型为未知类型的情况下,视频处理系统200可以对视频进行类型检测,获得视频的检测类型。视频处理系统200可以根据配置参数和检测类型对视频进行拆条,输出多个短视频。Optionally, the video type includes an unknown type, where the unknown type may refer to a video type that the user cannot determine, or the unknown type may refer to the user not inputting the video type, that is, the configuration interface does not obtain the video type. When the video type is an unknown type, the video processing system 200 may perform type detection on the video to obtain the detected type of the video. The video processing system 200 can split the video into strips according to the configuration parameters and detection type, and output multiple short videos.
可选地,该配置参数还可包括拆条特征,该拆条特征可包括场景、人物、音频、字幕、动作、OCR以及外观中的一种或者多种。应理解,拆条特征可根据用户的业务场景划分出更多的特征类型,这里不一一举例说明。用户需要使用何种拆条特征对视频进行拆条,就可以输入何种类型的拆条特征,比如用户想要影视剧集中各个演员的全部片段,那么可以将人物作为拆条特征输入配置接口,或者用户想要对综艺节目视频中的跳舞片段,那么用户可以将动作作为拆条特征输入配置接口。Optionally, the configuration parameters may also include stripping features, which may include one or more of scenes, characters, audio, subtitles, actions, OCR, and appearance. It should be understood that the strip feature can be divided into more feature types according to the user's business scenario, and examples are not given here. Users can input what type of splitting features they need to use to split the video. For example, if the user wants all the clips of each actor in a film and television series, then the character can be input into the configuration interface as a splitting feature. , or the user wants to edit the dancing clips in the variety show video, then the user can input the action as a split feature into the configuration interface.
应理解,每个拆条特征还可以进行进步一步的细分,比如拆条特征“动作”可以进一步的细分为“跳舞”、“跑步”、“冲突”等等,“音频”可以进一步细分为“唱歌”、“争吵”等等,仍以上述例子为例,用户需要视频中“跳舞”片段,那么可以选择拆条特征中的“动作”,然后再选择“动作”大类下的“跳舞”特征,应理解,上述举例用于说明,本申请不作具体限定。It should be understood that each split feature can be further subdivided. For example, the split feature "action" can be further subdivided into "dancing", "running", "conflict", etc., and "audio" can be further subdivided. Divided into "singing", "quarrel", etc., still taking the above example as an example, if the user needs a "dancing" clip in the video, then he can select "action" in the split feature, and then select "action" under the category For the "dancing" feature, it should be understood that the above examples are for illustration and are not specifically limited in this application.
具体实现中,配置接口可以向用户展示多种拆条特征以供选择,若用户无法确定拆条特征或者用户没有进行拆条特征的选择,即配置接口未获取到拆条特征的情况下,视频处理系统200可以对视频进行检测,获得该视频的拆条特征,其中,该拆条特征可以是该种视频类型中最常用的特征类型,或者确定该中视频类型中该也用户历史输入的特征类型,然后根据该拆条特征对视频进行拆条,本申请不作具体限定。In the specific implementation, the configuration interface can display a variety of stripping characteristics to the user for selection. If the user cannot determine the stripping characteristics or the user does not select the stripping characteristics, that is, the configuration interface does not obtain the stripping characteristics, the video The processing system 200 can detect the video and obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or determine the feature input by the user in the video type. type, and then split the video according to the splitting characteristics, which is not specifically limited in this application.
可选地,该配置参数还可包括拆条速度,该拆条速度可以是速度数值或者速度范围,视频处理系统200可以根据用户输入的速度数值或者速度范围,确定拆条速度,该拆条速度可以是前设定好的范围,比如0~1s,1~5s,5~10s,10~15s,15s~20s,20~30s等等,举例来说,若用户输入的速度数值为3s,那么拆条速度可以是1~5s,如果用户输入的速度数值为4~8s,那么拆条速度可以是5~10s。 Optionally, the configuration parameters may also include a strip-splitting speed, which may be a speed value or a speed range. The video processing system 200 may determine the strip-splitting speed according to the speed value or speed range input by the user. The strip-splitting speed may be a speed value or a speed range input by the user. It can be a previously set range, such as 0~1s, 1~5s, 5~10s, 10~15s, 15s~20s, 20~30s, etc. For example, if the speed value input by the user is 3s, then The speed of stripping can be 1~5s. If the speed value input by the user is 4~8s, then the speed of stripping can be 5~10s.
具体实现中,配置接口可以向用户展示提前设定好的拆条速度以供用户选择,若用户没有进行拆条速度的选择,视频处理系统200可以使用默认拆条速度或者用户历史拆条速度对视频进行拆条,本申请不作具体限定。In specific implementation, the configuration interface can display the preset bar splitting speed to the user for the user to choose. If the user does not select the bar splitting speed, the video processing system 200 can use the default bar splitting speed or the user's historical bar splitting speed. The video is divided into strips, and this application does not impose specific restrictions.
步骤S320:视频处理系统200输出多个短视频,其中,上述多个短视频是根据配置参数对视频进行拆条后获得的,视频处理系统200可以将多个短视频输出至客户端100,也可以将多个短视频输出至存储服务器300,本申请不作具体限定,其中,客户端100和存储服务器300的具体描述可参考图1实施例,这里不重复赘述。Step S320: The video processing system 200 outputs multiple short videos, wherein the multiple short videos are obtained by splitting the videos according to the configuration parameters. The video processing system 200 can output the multiple short videos to the client 100, or Multiple short videos can be output to the storage server 300, which is not specifically limited in this application. For specific descriptions of the client 100 and the storage server 300, reference can be made to the embodiment of FIG. 1, and the details will not be repeated here.
具体实现中,视频处理系统200中可包括多个拆条模型,一个拆条模型对应一种视频类型。拆条单元220可以根据配置接口获取的视频类型确定对应的拆条模型,使用该拆条模型对视频进行拆条,输出多个短视频。In specific implementation, the video processing system 200 may include multiple splitting models, and one splitting model corresponds to one video type. The stripping unit 220 can determine the corresponding stripping model according to the video type obtained by the configuration interface, use the stripping model to strip the video, and output multiple short videos.
应理解,上述拆条模型可以是使用不同视频类型的样本集对机器学习模型进行训练后获得的,比如影视剧类型的样本集对机器学习模型进行训练后获得影视剧视频类型的拆条模型,新闻类型的样本集对机器学习模型进行训练后获得新闻视频类型的拆条模型,以此类推,获得多种视频类型的拆条模型。It should be understood that the above-mentioned bar splitting model can be obtained by training the machine learning model using sample sets of different video types. For example, a bar splitting model of the film and television drama video type is obtained after training the machine learning model with a sample set of film and television drama types. After training the machine learning model on the news type sample set, the news video type stripping model is obtained, and by analogy, the stripping models of multiple video types are obtained.
需要说明的,不同视频类型的机器学习模型采用的模型结构可以相同或者不同,具体可根据各自对应的视频类型确定。比如应用场景类似的视频类型所采用的机器学习模型结构可以是类似的或者相同的,训练时使用的样本集对应各自的视频类型即可,从而减少模型搭建的工作量,提高拆条模型的准备效率。It should be noted that the model structures used by machine learning models of different video types can be the same or different, and the details can be determined according to the corresponding video types. For example, the machine learning model structures used for video types with similar application scenarios can be similar or identical, and the sample sets used during training can correspond to the respective video types, thereby reducing the workload of model construction and improving preparation for splitting the model. efficiency.
具体实现中,在视频类型为未知类型的情况下,若视频处理系统200对视频进行类型检测成功获得了视频的检测类型,视频处理系统200可以使用该检测类型对应的拆条模型对视频进行拆条,输出多个短视频;若视频处理系统200对视频进行类型检测失败,未获得视频的检测类型,或者类型检测成功但是检测类型的置信度很低,此时可使用通用拆条模型对视频进行拆条,输出多个短视频,上述通用拆条模型可以是多种视频类型通用的拆条模型。In specific implementation, when the video type is an unknown type, if the video processing system 200 successfully performs type detection on the video and obtains the detection type of the video, the video processing system 200 can use the splitting model corresponding to the detection type to split the video. strips to output multiple short videos; if the video processing system 200 fails to perform type detection on the video and does not obtain the detection type of the video, or the type detection is successful but the confidence of the detection type is very low, at this time, the general stripping model can be used to detect the video To perform splitting and output multiple short videos, the above universal splitting model can be a splitting model common to multiple video types.
可选地,上述拆条模型可针对不同的拆条特征对视频进行拆条,举例来说,假设用户选择的视频类型为“综艺”,那么对应的拆条模型为综艺拆条模型,若用户选择的拆条特征为“人物”,上传的视频为选秀类综艺节目的一期视频,那么拆条特征可按照人物特征对进行拆条,获得的短视频可以是该综艺中选手A全部的演出片段。Optionally, the above-mentioned splitting model can split the video according to different splitting characteristics. For example, assuming that the video type selected by the user is "variety show", then the corresponding splitting model is a variety show splitting model. If the user If the selected stripping feature is "character" and the uploaded video is an episode of a talent show variety show, then the stripping feature can be split according to the character characteristics, and the short video obtained can be all the performances of contestant A in the variety show. fragment.
可选地,上述样本集包括样本输入数据和样本输出数据,其中,样本输入数据包括已知视频和已知特征,样本输出数据包括使用所述已知拆条特征对所述已知视频进行拆条后获得的多个已知短视频,训练好的拆条模型可根据用户输入的拆条特征对视频进行拆条。Optionally, the above sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known features, and the sample output data includes splitting the known video using the known splitting features. For multiple known short videos obtained after stripping, the trained splitting model can split the videos according to the splitting features input by the user.
应理解,对于同一个视频类型下,不同的用户进行视频拆条时的关注点也是不同的,比如综艺类型的视频,有的用户只想看自己喜爱的演员明星的演出片段,有的用户只想看跳舞片段,有的用户只想看演唱片段,本申请通过上述配置接口向用户获取需求的拆条特征,根据该拆条特征对视频进行拆条,可以满足用户多样化需求。It should be understood that for the same video type, different users have different concerns when detaching videos. For example, for variety show videos, some users only want to watch the performance clips of their favorite actors and stars, and some users only want to watch the performance clips of their favorite actors and stars. Some users only want to watch dancing clips, and some users only want to watch singing clips. This application obtains the required splitting characteristics from the user through the above configuration interface, and splits the video according to the splitting characteristics, which can meet the diverse needs of users.
具体实现中,上述机器学习模型可包括但不限于比如CNN、LSTM、Yolo模型、SSD模型、RCNN模型或Fast-RCNN模型等,本申请不作具体限定。In specific implementation, the above machine learning models may include but are not limited to CNN, LSTM, Yolo model, SSD model, RCNN model or Fast-RCNN model, etc., which are not specifically limited in this application.
可选地,每个视频类型下的拆条模型可包括多种速度拆条模型,其中,一个速度拆条模型对应一个拆条速度,拆条单元220可以根据配置接口获取的视频类型确定对应的拆条模型,然后根据配置接口获取的拆条速度确定该拆条模型对应的速度拆条模型,然后使用该速度拆条模型对视频进行拆条,获得多个短视频。具体实现中,每种视频类型下的多种速度拆条模 型的结构可以相同也可以不同,具体可根据实际处理情况决定,本申请不作具体限定。Optionally, the stripping model under each video type may include multiple speed stripping models, where one speed stripping model corresponds to one stripping speed, and the stripping unit 220 may determine the corresponding stripping model according to the video type obtained by the configuration interface. Split the model, and then determine the speed splitting model corresponding to the splitting model based on the splitting speed obtained by the configuration interface, and then use the speed splitting model to split the video to obtain multiple short videos. In the specific implementation, multiple speed split modes under each video type The structures of the types can be the same or different, and the details can be determined according to actual processing conditions, and are not specifically limited in this application.
示例性地,图2是本申请提供的视频处理系统中存储的拆条模型示例图,如图2所示,图1所示的视频处理系统200包括多种视频的拆条模型230,例如图2中拆条模型11、拆条模型12、拆条模型21和拆条模型22,其中,拆条模型11和拆条模型12的视频类型为类型1,拆条模型21和拆条模型22的视频类型为类型2,拆条模型11和拆条模型21的拆条速度为速度1,拆条模型12和拆条模型22的拆条速度为速度2。Exemplarily, FIG. 2 is an example diagram of a stripping model stored in the video processing system provided by this application. As shown in FIG. 2, the video processing system 200 shown in FIG. 1 includes stripping models 230 for multiple videos, such as in FIG. 2. Splitting model 11, splitting model 12, splitting model 21 and splitting model 22. Among them, the video type of splitting model 11 and splitting model 12 is type 1, and the video type of splitting model 21 and splitting model 22 is type 1. The video type is type 2, the strip-splitting speeds of strip-splitting models 11 and 21 are speed 1, and the strip-splitting speeds of strip-splitting models 12 and 22 are speed 2.
其中,每个拆条模型可对应不同的视频类型和视频速度,可根据用户输入的配置参数选择对应的拆条模型进行视频拆条。每个拆条模型的输入数据包括待拆条的视频以及拆条特征,输出数据是使用该拆条特征对视频进行拆条后获得的多个短视频,比如拆条特征1和视频输入拆条模型11后,获得拆条特征1的多个短视频,拆条特征2和视频输入拆条模型11后,获得拆条特征2的多个短视频,以此类推,这里不一一展开赘述。举例来说,若用户通过配置接口选择的视频类型为视频类型1,拆条特征为拆条特征2,拆条速度为速度2,那么视频处理系统可根据用户输入的配置参数,选择图2中的拆条模型12,将拆条特征2和视频输入上述拆条模型12,即可获得拆条特征2的多个短视频。这样,最终输出的多个短视频是结合视频类型和用户所需要的拆条特征以及拆条速度对视频进行拆条后获得的,最大程度满足了用户的多样化需求,提高用户的使用体验。Among them, each splitting model can correspond to different video types and video speeds, and the corresponding splitting model can be selected according to the configuration parameters input by the user for video splitting. The input data of each splitting model includes the video to be split and the splitting features. The output data is multiple short videos obtained after splitting the video using the splitting features, such as splitting feature 1 and video input splitting. After model 11, multiple short videos of split feature 1 are obtained. After split feature 2 and video are input to split model 11, multiple short videos of split feature 2 are obtained, and so on. I will not go into details here. For example, if the video type selected by the user through the configuration interface is video type 1, the stripping feature is stripping feature 2, and the stripping speed is speed 2, then the video processing system can select the configuration parameters in Figure 2 based on the configuration parameters input by the user. By inputting the stripping feature 2 and the video into the stripping model 12, multiple short videos of the stripping feature 2 can be obtained. In this way, the multiple short videos finally output are obtained by splitting the videos based on the video type and the splitting characteristics and splitting speed required by the user. This meets the diverse needs of the users to the greatest extent and improves the user experience.
应理解图2用于举例说明,视频处理系统200中的拆条模型230还可包括更多或者更少的视频类型、视频速度以及拆条特征,本申请不作具体限定。It should be understood that FIG. 2 is used for illustration. The stripping model 230 in the video processing system 200 may also include more or less video types, video speeds, and stripping characteristics, which are not specifically limited in this application.
综上可知,本申请提供的视频处理系统,通过配置接口获取用户输入的配置参数,该配置参数至少包括视频的视频类型,根据视频类型选择预先训练好的拆条模型对用户输入的视频进行拆条,输出拆条后的多个短视频,其中,该拆条模型与用户输入的视频类型呈对应关系,使得上述多个短视频可以满足用户的多样化需求,用户需要进行何种场景下的拆解就输入何种配置参数,从而实现一个多场景下通用的且符合用户需求的视频拆条模型,提高用户的使用体验。In summary, it can be seen that the video processing system provided by this application obtains the configuration parameters input by the user through the configuration interface. The configuration parameters at least include the video type of the video. According to the video type, a pre-trained splitting model is selected to split the video input by the user. strips, and outputs multiple short videos after splitting. The splitting model has a corresponding relationship with the video type input by the user, so that the above multiple short videos can meet the diverse needs of users, and what scenarios the users need to perform in For disassembly, you enter the configuration parameters to achieve a video strip dismantling model that is universal in multiple scenarios and meets user needs, and improves the user experience.
为了使本申请能够被更好地理解,下面结合图4~图5所示的具体的应用场景,对上述步骤S310~步骤S320描述的步骤流程进行举例说明。In order to enable the present application to be better understood, the step process described in the above steps S310 to S320 is illustrated below with reference to the specific application scenarios shown in FIGS. 4 to 5 .
图4示例性的给出了一种配置接口的示例图,该配置接口为网页或者应用程序形态的console,该console可以是公有云平台的console,应理解,图4用于举例说明,本申请提供的方案中,该console也可以是非公有云平台的console,配置接口也可以是API形态,本申请不作具体限定。Figure 4 illustrates an example diagram of a configuration interface. The configuration interface is a console in the form of a web page or an application program. The console can be a console of a public cloud platform. It should be understood that Figure 4 is used for illustration. This application In the provided solution, the console can also be the console of a non-public cloud platform, and the configuration interface can also be in the form of an API, which is not specifically limited in this application.
如图4所示,该配置接口的网页或者应用程序界面至少包括视频类型选择区域410、拆条特征选择区域420、拆条速度选择区域430、上传视频区域440以及控件区域450。As shown in FIG. 4 , the web page or application program interface of the configuration interface at least includes a video type selection area 410 , a splitting feature selection area 420 , a splitting speed selection area 430 , an upload video area 440 and a control area 450 .
其中,视频类型选择区域410用于供用户选择视频的视频类型,示例性的,图4中的视频类型选择区域410向用户展示了“影视剧”、“新闻”、“综艺节目”、“未知类型”等视频类型,应理解,配置接口还可以向用户展示更多的视频类型,比如向下拖拽图4中视频类型选择区域410中的进度拉条,可以展示更多种类的视频类型。Among them, the video type selection area 410 is used for the user to select the video type of the video. For example, the video type selection area 410 in Figure 4 shows the user "movies and TV series", "news", "variety shows", "unknown" "Type" and other video types, it should be understood that the configuration interface can also display more video types to the user. For example, dragging down the progress bar in the video type selection area 410 in Figure 4 can display more types of video types.
可选地,若用户不在视频类型选择区域410中选择视频类型,视频类型选择区域410可以将视频类型默认设置为“未知类型”选项,或者,用户无法确定视频类型,也可以在视频类型选择区域410中选择“未知类型”选项,视频处理系统200可以对用户上传的视频进行 视频类型检测,根据检测类型和用户输入的配置参数(比如拆条特征和拆条速度)对视频进行拆条。Optionally, if the user does not select a video type in the video type selection area 410, the video type selection area 410 can set the video type to the "unknown type" option by default. Alternatively, if the user cannot determine the video type, he or she can also select the video type in the video type selection area 410. 410, the "unknown type" option is selected, and the video processing system 200 can perform processing on the video uploaded by the user. Video type detection, splitting the video according to the detection type and configuration parameters input by the user (such as splitting characteristics and splitting speed).
拆条特征选择区域420用于供用户选择视频的拆条特征,示例性地,图4中的拆条特征选择区域420向用户展示了“人物”、“场景”、“字幕”、“其他”等拆条特征,应理解,配置接口还可以向用户展示更多的拆条特征,比如向下拖拽图4中拆条特征选择区域420中的进度拉条,可以展示更多种类的拆条特征。The stripping feature selection area 420 is used for the user to select stripping features of the video. For example, the stripping feature selection area 420 in Figure 4 shows the user "Characters", "Scenes", "Subtitles", and "Others" Waiting for the stripping feature, it should be understood that the configuration interface can also display more stripping features to the user. For example, dragging down the progress bar in the stripping feature selection area 420 in Figure 4 can display more types of stripping features. feature.
应理解,每个拆条特征还可以进行进步一步的细分,例如图4中,拆条特征“人物”还可进一步划分为“男演员”、“女演员”“上传演员照片”等等,若选择“男演员”作为拆条特征,对视频进行拆条后获得的多个短视频可以是视频中包括男演员的视频片段,若选择“女演员”作为拆条特征,对视频进行拆条后获得的多个短视频可以是视频中包括女演员的视频片段,若选择“上传演员照片”作为拆条特征,用户可以上传视频中某个演员的截图,对视频进行拆条后获得的多个短视频可以是包括该演员的视频片段。应理解,上述举例用于说明,本申请不作具体限定。It should be understood that each strip feature can be further subdivided. For example, in Figure 4, the strip feature "character" can be further divided into "actors", "actresses", "upload photos of actors", etc. If "male actor" is selected as the stripping feature, the multiple short videos obtained after splitting the video can be video clips that include male actors in the video. If "actress" is selected as the stripping feature, the video will be splitted. The multiple short videos obtained in the end can be video clips including actresses in the video. If "upload photos of actors" is selected as the splitting feature, the user can upload a screenshot of an actor in the video, and multiple videos can be obtained after splitting the video. A short video can be a video clip including the actor. It should be understood that the above examples are for illustration and are not specifically limited in this application.
可选地,若用户不在拆条特征选择区域420中选择拆条特征,拆条特征选择区域420可以将拆条特征默认设置为“其他”选项,视频处理系统200可以对用户上传的视频进行检测,获得该视频的拆条特征,其中,该拆条特征可以是该种视频类型中最常用的特征类型,或者是该用户历史输入的特征类型,本申请不作具体限定。Optionally, if the user does not select the stripping feature in the stripping feature selection area 420, the stripping feature selection area 420 can set the stripping feature to the "other" option by default, and the video processing system 200 can detect the video uploaded by the user. , to obtain the stripping feature of the video, where the stripping feature can be the most commonly used feature type in the video type, or the feature type historically input by the user, which is not specifically limited in this application.
拆条速度选择区域430用于供用户选择视频的拆条速度,示例性地,图4中的拆条速度选择区域430向用户展示了“0~1秒”、“2~5秒”、“6~10秒”、“11~15秒”、“其他”等拆条速度,应理解,配置接口还可以向用户展示更多的拆条速度,比如向下拖拽图4中拆条速度选择区域430中的进度拉条,可以展示更多种类的拆条速度。The stripping speed selection area 430 is used for the user to select the stripping speed of the video. For example, the stripping speed selection area 430 in Figure 4 shows the user "0 to 1 second", "2 to 5 seconds", " "6-10 seconds", "11-15 seconds", "Others" and other bar-splitting speeds. It should be understood that the configuration interface can also show the user more bar-splitting speeds, such as dragging down to select the bar-splitting speed in Figure 4. The progress bar in area 430 can display more types of bar splitting speeds.
可选地,若用户不在拆条速度选择区域430中选择拆条速度,拆条速度选择区域430可以将拆条速度默认设置为“其他”选项,视频处理系统200可以对用户上传的视频进行检测,获得该视频的拆条速度,其中,该拆条速度可以是该种视频类型和拆条特征下最常用的拆条速度,或者是该用户历史输入的拆条速度,本身不作具体限定。Optionally, if the user does not select the stripping speed in the stripping speed selection area 430, the stripping speed selection area 430 can set the stripping speed to the "other" option by default, and the video processing system 200 can detect the video uploaded by the user. , to obtain the stripping speed of the video, where the stripping speed can be the most commonly used stripping speed under the video type and stripping characteristics, or the stripping speed historically input by the user, and is not specifically limited.
上传视频区域440用于供用户上传视频,该视频是待拆条的长视频。可选地,若上述视频处理系统200部署于公有云,该视频是用户上传至对象存储服务(object storage service,OBS)的视频,视频处理系统200可以从用户绑定的OBS桶中下载视频进行视频拆条,本身不对此进行具体限定。The upload video area 440 is used for users to upload videos, which are long videos to be split. Optionally, if the above-mentioned video processing system 200 is deployed in the public cloud, the video is a video uploaded by the user to the object storage service (OBS), and the video processing system 200 can download the video from the OBS bucket bound by the user. Video splitting does not specifically limit this.
控件区域450包括“保存配置”控件,以及“开始拆条”控件,其中,“保存配置”用于保存视频类型选择区域410、拆条特征选择区域420、拆条速度选择区域430中用户输入的参数配置,“开始拆条”控件用于响应用户的操作,开始使用上述参数配置对视频进行视频拆条。The control area 450 includes a "Save Configuration" control and a "Start Splitting" control, where the "Save Configuration" is used to save the user input in the video type selection area 410, the splitting feature selection area 420, and the splitting speed selection area 430. Parameter configuration, the "Start splitting" control is used to respond to the user's operation and start splitting the video using the above parameter configuration.
示例性地,图4所示的配置接口中,用户输入的视频参数中,视频类型为“影视剧”,拆条特征为“上传演员照片”(假设上传的演员照片为演员A),拆条速度为“2~5秒”。那么视频处理系统200在该应用场景下的处理流程可以如图5所示。For example, in the configuration interface shown in Figure 4, in the video parameters input by the user, the video type is "movie and TV drama", the teardown feature is "upload actor photos" (assuming that the uploaded actor photo is actor A), and the teardown feature The speed is "2 to 5 seconds". Then the processing flow of the video processing system 200 in this application scenario can be shown in Figure 5.
图5是本申请提供的一种应用场景下的视频处理方法的步骤流程示意图,该应用场景可以是图4所示的应用场景。如图5所示,该方法可包括以下步骤:FIG. 5 is a schematic flowchart of steps of a video processing method in an application scenario provided by this application. The application scenario may be the application scenario shown in FIG. 4 . As shown in Figure 5, the method may include the following steps:
步骤1.输入视频,其中,该视频是用户输入的待拆条的长视频。Step 1. Enter the video, where the video is a long video input by the user to be split.
步骤2.确定用户是否选择视频类型。在是的情况下确定用户选择视频类型对应的拆条模型,如图4所示的应用场景中用户选择的视频类型为“影视剧”,因此图5所示的流程图中步 骤2确定的视频类型后执行步骤3。Step 2. Determine if the user selects the video type. In the case of yes, determine the splitting model corresponding to the video type selected by the user. In the application scenario shown in Figure 4, the video type selected by the user is "movie and television drama", so the step in the flow chart shown in Figure 5 After determining the video type in step 2, perform step 3.
在其他应用场景中,若用户未选择视频类型,即否的情况下,可执行步骤7、步骤8和步骤4。In other application scenarios, if the user does not select a video type, that is, if No, steps 7, 8, and 4 can be performed.
步骤3.获取影视剧类型拆条模型,应理解,参考前述内容可知,视频处理系统200包括多个拆条模型,其中,一个拆条模型对应一个视频类型,图4所示的应用场景中用户选择的视频类型为“影视剧”,因此步骤3获取影视剧类型的拆条模型。Step 3. Obtain the film and television drama type stripping model. It should be understood that with reference to the foregoing content, the video processing system 200 includes multiple stripping models, wherein one stripping model corresponds to one video type. In the application scenario shown in Figure 4, the user The selected video type is "film and television drama", so step 3 obtains the strip model of the film and television drama type.
步骤4.确定用户是否选择拆条特征,在是的情况下执行步骤5,在否的情况下执行步骤9和步骤5。应理解,图4所示的应用场景中用户选择的拆条特征为“人物”,因此图5所示的流程图中步骤4确定的拆条特征为“人物”。Step 4. Determine whether the user selects the strip feature. If yes, perform step 5. If no, perform step 9 and step 5. It should be understood that in the application scenario shown in Figure 4, the stripping feature selected by the user is "person", so the stripping feature determined in step 4 of the flowchart shown in Figure 5 is "person".
步骤5.确定用户是否选择拆条速度,在是的情况下,执行步骤6,在否的情况下执行步骤10和步骤6。应理解,图4所示的应用场景中用户选择的拆条速度为“2~5秒”,因此图5所示的流程图中步骤5确定的拆条速度为“2~5秒”。Step 5. Determine whether the user selects the bar splitting speed. If yes, perform step 6. If no, perform step 10 and step 6. It should be understood that the strip stripping speed selected by the user in the application scenario shown in Figure 4 is "2 to 5 seconds", so the strip stripping speed determined in step 5 of the flow chart shown in Figure 5 is "2 to 5 seconds".
步骤6.选择对应的拆条模型,对视频进行拆条。其中,步骤6选择的拆条模型与用户数据的配置参数对应,该配置参数包括“影视剧”(视频类型)、“人物”(拆条特征)和“2~5秒”(拆条速度)。Step 6. Select the corresponding stripping model to strip the video. Among them, the bar splitting model selected in step 6 corresponds to the configuration parameters of the user data. The configuration parameters include "movie and TV drama" (video type), "character" (bar splitting characteristics) and "2~5 seconds" (bar splitting speed) .
结合图2实施例可知,视频处理系统200包括多个拆条视频,每个拆条视频对应一种视频类型和拆条速度,影视剧类型下的拆条模型可包括“0~1秒”拆条速度对应的影视剧拆条模型,“2~5”秒拆条速度对应的影视剧拆条模型等,这里不一一举例说明。根据拆条速度可选择“2~5”秒拆条速度对应的影视剧拆条模型,使用该拆条模型对视频进行拆条。其中,该2~5”秒拆条速度对应的影视剧拆条模型在训练过程中,使用的样本集包括样本输入数据和样本输出数据,其中,样本输入数据包括已知视频和已知特征,样本输出数据包括使用所述已知拆条特征对所述已知视频进行拆条后获得的多个已知短视频,训练好的拆条模型可根据用户输入的拆条特征对视频进行拆条。It can be seen from the embodiment of FIG. 2 that the video processing system 200 includes a plurality of splitting videos, each splitting video corresponds to a video type and a splitting speed. The splitting model under the film and television drama type may include "0 to 1 second" splitting. The film and television drama stripping model corresponding to the stripping speed, the film and television drama stripping model corresponding to the "2-5" second stripping speed, etc. I will not give examples one by one here. According to the splitting speed, you can choose the movie and TV drama splitting model corresponding to the "2~5" second splitting speed, and use this splitting model to split the video. Among them, during the training process of the film and television drama strip splitting model corresponding to the 2-5" second strip stripping speed, the sample set used includes sample input data and sample output data, where the sample input data includes known videos and known features. The sample output data includes multiple known short videos obtained by splitting the known videos using the known splitting features. The trained splitting model can split the videos according to the splitting features input by the user. .
因此,将用户选择的拆条特征“人物”和用户上传的视频输入至上述“2~5”秒拆条速度对应的影视剧拆条模型之后,可以输出包含该人物的多个短视频,比如用户选择的是自行上传的演员A的图像,那么输出的多个短视频可以是包含演员A的多个短视频。Therefore, after inputting the "character" selected by the user and the video uploaded by the user into the film and television drama splitting model corresponding to the "2-5" second splitting speed, multiple short videos containing the character can be output, such as If the user selects an image of actor A that he uploaded himself, then the multiple short videos output can include multiple short videos of actor A.
步骤7.检测视频类型,应理解,若用户在步骤2没有选择视频类型或者用户选择了未知类型,可执行步骤7对视频类型进行检测,在检测成功的情况下,执行步骤3和步骤4,确定检测类型对应的拆条模型,在检测失败的情况下,执行步骤8和步骤4。Step 7. Detect the video type. It should be understood that if the user does not select the video type in step 2 or the user selects an unknown type, step 7 can be performed to detect the video type. If the detection is successful, perform steps 3 and 4. Determine the splitting model corresponding to the detection type. If the detection fails, perform steps 8 and 4.
步骤8.获取通用模型,可以理解的,一些视频的类别并不是非常清晰可能无法检测出视频的视频类型,此时可使用通用的拆条模型对视频进行检测。Step 8. Obtain a general model. It is understandable that the categories of some videos are not very clear and may not be able to detect the video type of the video. In this case, a general splitting model can be used to detect the video.
步骤9.系统选择拆条特征,应理解,若用户在步骤4没有选择拆条特征,可执行步骤9,由系统检测出该视频类型下常用的拆条特征,或者该用户历史选择的拆条特征等等,然后执行步骤5。Step 9. The system selects the stripping feature. It should be understood that if the user does not select the stripping feature in step 4, step 9 can be performed, and the system detects the stripping features commonly used in this video type, or the stripping features selected by the user in history. Features, etc., then proceed to step 5.
步骤10.系统选择拆条速度。应理解,若用户在步骤5没有选择拆条速度,可执行步骤10,由系统检测出该视频类型和拆条特征下常用的拆条速度,或者该用户历史选择的拆条速度等等,然后执行步骤6。Step 10. The system selects the strip splitting speed. It should be understood that if the user does not select the stripping speed in step 5, step 10 can be performed, and the system detects the stripping speed commonly used under the video type and stripping characteristics, or the stripping speed selected by the user in history, etc., and then Go to step 6.
应理解,图5用于举例说明,本申请不作具体限定。It should be understood that Figure 5 is used for illustration and is not specifically limited in this application.
综上可知,本申请提供的视频处理方法,通过配置接口获取用户输入的配置参数,该配置参数至少包括视频的视频类型,根据视频类型选择预先训练好的拆条模型对用户输入的视 频进行拆条,输出拆条后的多个短视频,其中,该拆条模型与用户输入的视频类型呈对应关系,从而实现一个多场景下通用的且符合用户需求的视频拆条模型,提高用户的使用体验。In summary, it can be seen that the video processing method provided by this application obtains the configuration parameters input by the user through the configuration interface. The configuration parameters at least include the video type of the video. According to the video type, the pre-trained splitting model is selected to view the user input. Frequently perform splitting and output multiple short videos after splitting. Among them, the splitting model has a corresponding relationship with the video type input by the user, thereby realizing a video splitting model that is universal in multiple scenarios and meets user needs, improving User experience.
图6是本申请提供的一种计算设备的结构示意图,图1~图5实施例中描述的视频处理系统200可部署于图6所示的计算设备600上。FIG. 6 is a schematic structural diagram of a computing device provided by this application. The video processing system 200 described in the embodiments of FIGS. 1 to 5 can be deployed on the computing device 600 shown in FIG. 6 .
进一步地,计算设备600包括处理器601、存储单元602、存储介质603和通信接口604,其中,处理器601、存储单元602、存储介质603和通信接口604通过总线605进行通信,也通过无线传输等其他手段实现通信。Further, the computing device 600 includes a processor 601, a storage unit 602, a storage medium 603 and a communication interface 604, wherein the processor 601, the storage unit 602, the storage medium 603 and the communication interface 604 communicate through the bus 605 and also through wireless transmission. and other means to achieve communication.
该计算设备可以是BMS、虚拟机或容器。其中,BMS指的是通用的物理服务器,例如,ARM服务器或者X86服务器;虚拟机指的是NFV技术实现的、通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统,容器指的是一组受到资源限制,彼此间相互隔离的进程,计算设备还可以是边缘计算设备、存储服务器或者存储阵列,本申请不作具体限定。The computing device can be a BMS, virtual machine or container. Among them, BMS refers to a general physical server, such as an ARM server or an A system and a container refer to a group of processes that are subject to resource constraints and isolated from each other. The computing device can also be an edge computing device, a storage server or a storage array, which is not specifically limited in this application.
处理器601由至少一个通用处理器构成,例如CPU、NPU或者CPU和硬件芯片的组合。上述硬件芯片是专用集成电路(Application-Specific Integrated Circuit,ASIC)、编程逻辑器件(Programmable Logic Device,PLD)或其组合。上述PLD是复杂编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场编程逻辑门阵列(Field-Programmable Gate Array,FPGA)、通用阵列逻辑(Generic Array Logic,GAL)或其任意组合。处理器601执行各种类型的数字存储指令,例如存储在存储单元602中的软件或者固件程序,它能使计算设备600提供较宽的多种服务。The processor 601 is composed of at least one general-purpose processor, such as a CPU, an NPU, or a combination of a CPU and a hardware chip. The above-mentioned hardware chip is an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination thereof. The above-mentioned PLD is a complex programmable logic device (Complex Programmable Logic Device, CPLD), a field-programmable gate array (Field-Programmable Gate Array, FPGA), a general array logic (Generic Array Logic, GAL) or any combination thereof. The processor 601 executes various types of digital storage instructions, such as software or firmware programs stored in the storage unit 602, which enables the computing device 600 to provide a wide variety of services.
具体实现中,作为一种实施例,处理器601包括一个或多个CPU,例如图6中所示的CPU0和CPU1。In specific implementation, as an embodiment, the processor 601 includes one or more CPUs, such as CPU0 and CPU1 shown in FIG. 6 .
在具体实现中,作为一种实施例,计算设备600也包括多个处理器,例如图6中所示的处理器601和处理器606。这些处理器中的每一个可以是一个单核处理器(single-CPU),也可以是一个多核处理器(multi-CPU)。这里的处理器指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the computing device 600 also includes multiple processors, such as the processor 601 and the processor 606 shown in FIG. 6 . Each of these processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor here refers to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
存储单元602用于存储程序代码,并由处理器601来控制执行,以执行上述图1-图6中任一实施例中视频处理系统200的处理步骤。程序代码中包括一个或多个软件单元,上述一个或多个软件单元是图1实施例中的获取单元和拆条单元,其中,获取单元用户向用户提供配置接口,拆条单元用于根据配置参数对视频进行拆条。具体实现方式参考图1~图5实施例,此处不再赘述。The storage unit 602 is used to store program codes, and is controlled and executed by the processor 601 to perform the processing steps of the video processing system 200 in any of the above embodiments in FIGS. 1 to 6 . The program code includes one or more software units. The one or more software units mentioned above are the acquisition unit and strip-splitting unit in the embodiment of Figure 1. The user of the acquisition unit provides a configuration interface to the user, and the strip-splitting unit is used to configure according to the configuration. Parameters to split the video into strips. For specific implementation methods, refer to the embodiments in Figures 1 to 5, which will not be described again here.
存储单元602包括只读存储器和随机存取存储器,并向处理器601提供指令和数据。存储单元602还包括非易失性随机存取存储器。存储单元602是易失性存储器或非易失性存储器,或包括易失性和非易失性存储器两者。其中,非易失性存储器是只读存储器(read-only memory,ROM)、编程只读存储器(programmable ROM,PROM)、擦除编程只读存储器(erasable PROM,EPROM)、电擦除编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器 (synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。还是硬盘(hard disk)、U盘(universal serial bus,USB)、闪存(flash)、SD卡(secure digital memory Card,SD card)、记忆棒等等,硬盘是硬盘驱动器(hard disk drive,HDD)、固态硬盘(solid state disk,SSD)、机械硬盘(mechanical hard disk,HDD)等,本申请不作具体限定。Storage unit 602 includes read-only memory and random access memory, and provides instructions and data to processor 601. Storage unit 602 also includes non-volatile random access memory. Storage unit 602 is volatile memory or non-volatile memory, or includes both volatile and non-volatile memory. Among them, non-volatile memory is read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically erasable programmable read-only memory. memory (electrically EPROM, EEPROM) or flash memory. Volatile memory is random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are used, such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM). Or hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc. The hard disk is a hard disk drive (HDD) , solid state disk (SSD), mechanical hard disk (HDD), etc., this application does not make specific limitations.
存储介质603是存储数据的载体,比如硬盘(hard disk)、U盘(universal serial bus,USB)、闪存(flash)、SD卡(secure digital memory Card,SD card)、记忆棒等等,硬盘可以是硬盘驱动器(hard disk drive,HDD)、固态硬盘(solid state disk,SSD)、机械硬盘(mechanical hard disk,HDD)等,本申请不作具体限定。Storage medium 603 is a carrier for storing data, such as hard disk (hard disk), U disk (universal serial bus, USB), flash memory (flash), SD card (secure digital memory Card, SD card), memory stick, etc. The hard disk can It is a hard disk drive (HDD), a solid state disk (SSD), a mechanical hard disk (HDD), etc., and is not specifically limited in this application.
通信接口604为有线接口(例如以太网接口),为内部接口(例如高速串行计算机扩展总线(Peripheral Component Interconnect express,PCIe)总线接口)、有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与其他服务器或单元进行通信。The communication interface 604 is a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface (such as a cellular interface). network interface or using the wireless LAN interface) for communicating with other servers or units.
总线605是快捷外围部件互联标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线605分为地址总线、数据总线、控制总线等。Bus 605 is a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer express link (compute express). link, CXL), cache coherent interconnect for accelerators, CCIX, etc. The bus 605 is divided into an address bus, a data bus, a control bus, etc.
总线605除包括数据总线之外,还包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线605。In addition to the data bus, the bus 605 also includes a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, the various buses are labeled bus 605 in the figure.
需要说明的,图6仅仅是本申请实施例的一种可能的实现方式,实际应用中,计算设备600还包括更多或更少的部件,这里不作限制。关于本申请实施例中未示出或未描述的内容,参见前述图1-图5实施例中的相关阐述,这里不再赘述。It should be noted that FIG. 6 is only a possible implementation manner of the embodiment of the present application. In actual applications, the computing device 600 may also include more or fewer components, which is not limited here. For contents not shown or described in the embodiments of the present application, please refer to the relevant explanations in the embodiments of FIGS. 1 to 5 , and will not be described again here.
本申请实施例提供一种计算机存储介质,该计算机存储介质中存储有指令;当该指令在计算设备上运行时,使得该计算设备执行上述图1~图5实施例描述的视频处理方法。Embodiments of the present application provide a computer storage medium in which instructions are stored; when the instructions are run on a computing device, the computing device is caused to execute the video processing method described in the embodiments of FIGS. 1 to 5 .
本申请实施例提供了一种包含指令的程序产品,包括程序或指令,当该程序或指令在计算设备上运行时,使得该计算设备执行上述图1~图5实施例描述的视频处理方法。Embodiments of the present application provide a program product containing instructions, including programs or instructions. When the program or instructions are run on a computing device, the computing device executes the video processing method described in the embodiments of FIGS. 1 to 5 .
上述实施例,全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例全部或部分地以计算机程序产品的形式实现。计算机程序产品包括至少一个计算机指令。在计算机上加载或执行计算机程序指令时,全部或部分地产生按照本发明实施例的流程或功能。计算机为通用计算机、专用计算机、计算机网络、或者其他编程装置。计算机指令存储在计算机读存储介质中,或者从一个计算机读存储介质向另一个计算机读存储介质传输,例如,计算机指令从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机读存储介质是计算机能够存取的任何用介质或者是包含至少一个用介质集合的服务器、数据中心等数据存储节点。用介质是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD)、或者半导体介质。半导体介质是SSD。The above embodiments are implemented in whole or in part by software, hardware, firmware or any other combination. When implemented using software, the above-described embodiments are implemented in whole or in part in the form of a computer program product. A computer program product includes at least one computer instruction. When computer program instructions are loaded or executed on a computer, processes or functions according to embodiments of the present invention are generated in whole or in part. The computer is a general-purpose computer, a special-purpose computer, a computer network, or other programming device. Computer instructions are stored in or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable) , digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center. Computer-readable storage media are any media that can be accessed by a computer or data storage nodes such as servers and data centers that contain at least one media collection. The media used is magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, high-density digital video disc (DVD)), or semiconductor media. The semiconductor medium is SSD.
以上,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,轻易想到各种等效的修复或替换,这些修复或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。 The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalent repairs or replacements within the technical scope disclosed by the present invention. , these repairs or replacements should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (17)

  1. 一种视频处理方法,其特征在于,所述方法包括:A video processing method, characterized in that the method includes:
    视频处理系统通过配置接口向用户获取配置参数和视频,所述配置参数包括所述视频的视频类型;The video processing system obtains configuration parameters and videos from the user through a configuration interface, where the configuration parameters include the video type of the video;
    所述视频处理系统输出多个短视频,其中,所述多个短视频是根据所述配置参数对所述视频进行拆条后获得的。The video processing system outputs multiple short videos, wherein the multiple short videos are obtained by splitting the video into strips according to the configuration parameters.
  2. 根据权利要求1所述的方法,其特征在于,所述视频类型包括影视剧、综艺、新闻、纪录片、访谈、体育、动漫以及会议中的一种或者多种。The method according to claim 1, wherein the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations and conferences.
  3. 根据权利要求1或2所述的方法,其特征在于,所述配置参数还包括拆条特征,所述拆条特征包括场景、人物、音频、字幕、动作、光学字符识别OCR以及外观中的一种或者多种。The method according to claim 1 or 2, characterized in that the configuration parameters also include stripping features, and the stripping features include one of scenes, characters, audio, subtitles, actions, optical character recognition OCR and appearance. Kind or variety.
  4. 根据权利要求1至3任一权利要求所述的方法,其特征在于,所述配置参数还包括拆条速度。The method according to any one of claims 1 to 3, characterized in that the configuration parameters further include strip stripping speed.
  5. 根据权利要求1至4任一权利要求所述的方法,其特征在于,所述视频类型还包括未知类型,在所述视频类型为未知类型的情况下,所述方法还包括:The method according to any one of claims 1 to 4, characterized in that the video type also includes an unknown type. When the video type is an unknown type, the method further includes:
    所述视频处理系统对所述视频进行类型检测,获得所述视频的检测类型;The video processing system performs type detection on the video to obtain the detection type of the video;
    所述视频处理系统输出多个短视频包括:The video processing system outputs multiple short videos including:
    所述视频处理系统根据所述检测类型和所述配置参数对所述视频进行拆条,输出所述多个短视频。The video processing system splits the video into strips according to the detection type and the configuration parameters, and outputs the multiple short videos.
  6. 根据权利要求1至3任一权利要求所述的方法,其特征在于,在所述配置接口未获取到所述用户输入的拆条特征的情况下,所述方法还包括:The method according to any one of claims 1 to 3, characterized in that, in the case where the configuration interface does not obtain the bar splitting feature input by the user, the method further includes:
    所述视频处理系统对所述视频进行特征检测,获得所述视频的拆条特征,根据所述拆条特征对所述视频进行拆条。The video processing system performs feature detection on the video, obtains stripping features of the video, and strips the video according to the stripping features.
  7. 根据权利要求1至6任一权利要求所述的方法,其特征在于,所述视频处理系统包括多个拆条模型,其中,一个拆条模型对应一种视频类型;The method according to any one of claims 1 to 6, characterized in that the video processing system includes multiple splitting models, wherein one splitting model corresponds to one video type;
    所述根据所述配置参数对所述视频进行拆条,输出拆条后的多个视频包括:The video is split into strips according to the configuration parameters, and the multiple split videos are output including:
    所述视频处理系统获取所述视频类型对应的拆条模型,将所述视频输入所述视频类型对应的拆条模型,输出拆条后获得的多个短视频。The video processing system obtains the splitting model corresponding to the video type, inputs the video into the splitting model corresponding to the video type, and outputs multiple short videos obtained after splitting.
  8. 根据权利要求7所述的方法,其特征在于,所述将所述视频输入所述视频类型对应的拆条模型获得所述拆条后的多个视频包括:The method according to claim 7, wherein said inputting the video into the splitting model corresponding to the video type to obtain the plurality of split videos includes:
    将所述视频和所述拆条特征输入所述视频类型对应的拆条模型,输出拆条后获得的多个短视频,其中,所述拆条模型是使用样本集对机器学习模型进行训练后获得的,所述样本集包括样本输入数据和样本输出数据,其中,所述样本输入数据包括已知视频和已知拆条特征,所述样本输出数据包括使用所述已知拆条特征对所述已知视频进行拆条后获得的多个已知短视频。The video and the stripping feature are input into the stripping model corresponding to the video type, and multiple short videos obtained after stripping are output. The stripping model is obtained by training a machine learning model using a sample set. Obtained, the sample set includes sample input data and sample output data, wherein the sample input data includes known videos and known stripping features, and the sample output data includes the known stripping features using the known stripping features. Multiple known short videos obtained after splitting the above known videos.
  9. 一种视频处理系统,其特在于,所述系统包括:A video processing system, characterized in that the system includes:
    获取单元,用于通过配置接口向用户获取配置参数和视频,所述配置参数包括所述视频的视频类型;An acquisition unit, configured to acquire configuration parameters and videos from the user through the configuration interface, where the configuration parameters include the video type of the video;
    拆条单元,用于输出多个短视频,其中,所述多个短视频是根据所述配置参数对所述视频进行拆条后获得的。 A splitting unit is configured to output multiple short videos, wherein the multiple short videos are obtained by splitting the video according to the configuration parameters.
  10. 根据权利要求9所述的系统,其特征在于,所述视频类型包括影视剧、综艺、新闻、纪录片、访谈、体育、动漫以及会议中的一种或者多种。The system according to claim 9, wherein the video type includes one or more of film and television dramas, variety shows, news, documentaries, interviews, sports, animations and conferences.
  11. 根据权利要求9或10所述的系统,其特征在于,所述配置参数还包括拆条特征,所述拆条特征包括场景、人物、音频、字幕、动作、光学字符识别OCR以及外观中的一种或者多种。The system according to claim 9 or 10, characterized in that the configuration parameters further include stripping features, and the stripping features include one of scenes, characters, audio, subtitles, actions, optical character recognition (OCR) and appearance. Kind or variety.
  12. 根据权利要求9至11任一权利要求所述的系统,其特征在于,所述配置参数还包括拆条速度。The system according to any one of claims 9 to 11, wherein the configuration parameters further include strip stripping speed.
  13. 根据权利要求9至12任一权利要求所述的系统,其特征在于,所述系统还包括检测单元,所述视频类型包括未知类型,在所示视频类型为未知类型的情况下,所述检测单元,用于对所述视频进行类型检测,获得所述视频的检测类型;The system according to any one of claims 9 to 12, characterized in that the system further includes a detection unit, the video type includes an unknown type, and when the video type is an unknown type, the detection unit A unit configured to perform type detection on the video and obtain the detection type of the video;
    所述拆条单元,用于根据所述检测类型和所述配置参数对所述视频进行拆条,输出所述多个短视频。The stripping unit is used to strip the video into strips according to the detection type and the configuration parameter, and output the multiple short videos.
  14. 根据权利要求9至13任一权利要求所述的系统,其特征在于,在所述配置接口未获取到所述用户输入的拆条特征的情况下,所述检测单元,用于对所述视频进行特征检测,获得所述视频的拆条特征,根据所述拆条特征对所述视频进行拆条。The system according to any one of claims 9 to 13, characterized in that, when the configuration interface does not obtain the stripping feature input by the user, the detection unit is used to detect the video Perform feature detection to obtain stripping features of the video, and strip the video according to the stripping features.
  15. 根据权利要求9至14任一权利要求所述的系统,其特征在于,所述视频处理系统包括多个拆条模型,其中,一个拆条模型对应一种视频类型;The system according to any one of claims 9 to 14, characterized in that the video processing system includes a plurality of splitting models, wherein one splitting model corresponds to one video type;
    所述拆条单元,用于获取所述视频类型对应的拆条模型,将所述视频输入所述视频类型对应的拆条模型,输出拆条后获得的多个短视频。The stripping unit is used to obtain a stripping model corresponding to the video type, input the video into the stripping model corresponding to the video type, and output a plurality of short videos obtained after stripping.
  16. 根据权利要求15所述的系统,其特征在于,所述拆条单元,用于将所述视频和所述拆条特征输入所述视频类型对应的拆条模型,输出拆条后获得的多个短视频,其中,所述拆条模型是使用样本集对机器学习模型进行训练后获得的,所述样本集包括样本输入数据和样本输出数据,其中,所述样本输入数据包括已知视频和已知特征,所述样本输出数据包括使用所述已知拆条特征对所述已知视频进行拆条后获得的多个已知短视频。The system according to claim 15, characterized in that the stripping unit is used to input the video and the stripping characteristics into the stripping model corresponding to the video type, and output a plurality of strips obtained after stripping. Short video, wherein the splitting model is obtained by training a machine learning model using a sample set, the sample set includes sample input data and sample output data, wherein the sample input data includes known videos and past videos. The sample output data includes a plurality of known short videos obtained by splitting the known video using the known splitting feature.
  17. 一种计算设备,其特征在于,所述计算设备包括处理器和存储器,所述存储器用于存储代码,所述处理器用于执行所述代码实现如权利要求1至8任一权利要求所述的方法。 A computing device, characterized in that the computing device includes a processor and a memory, the memory is used to store code, and the processor is used to execute the code to implement the method described in any one of claims 1 to 8. method.
PCT/CN2023/081604 2022-04-13 2023-03-15 Video processing method and system, and related device WO2023197814A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210384788.5 2022-04-13
CN202210384788.5A CN116980644A (en) 2022-04-13 2022-04-13 Video processing method, system and related equipment

Publications (1)

Publication Number Publication Date
WO2023197814A1 true WO2023197814A1 (en) 2023-10-19

Family

ID=88328835

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081604 WO2023197814A1 (en) 2022-04-13 2023-03-15 Video processing method and system, and related device

Country Status (2)

Country Link
CN (1) CN116980644A (en)
WO (1) WO2023197814A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108965920A (en) * 2018-08-08 2018-12-07 北京未来媒体科技股份有限公司 A kind of video content demolition method and device
CN110166828A (en) * 2019-02-19 2019-08-23 腾讯科技(深圳)有限公司 A kind of method for processing video frequency and device
CN111726682A (en) * 2020-06-30 2020-09-29 北京百度网讯科技有限公司 Video clip generation method, device, equipment and computer storage medium
CN112423151A (en) * 2020-11-17 2021-02-26 北京金山云网络技术有限公司 Video strip splitting method, system, device, equipment and storage medium
CN113539304A (en) * 2020-04-21 2021-10-22 华为技术有限公司 Video strip splitting method and device
US20220027663A1 (en) * 2019-11-21 2022-01-27 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108965920A (en) * 2018-08-08 2018-12-07 北京未来媒体科技股份有限公司 A kind of video content demolition method and device
CN110166828A (en) * 2019-02-19 2019-08-23 腾讯科技(深圳)有限公司 A kind of method for processing video frequency and device
US20220027663A1 (en) * 2019-11-21 2022-01-27 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN113539304A (en) * 2020-04-21 2021-10-22 华为技术有限公司 Video strip splitting method and device
CN111726682A (en) * 2020-06-30 2020-09-29 北京百度网讯科技有限公司 Video clip generation method, device, equipment and computer storage medium
CN112423151A (en) * 2020-11-17 2021-02-26 北京金山云网络技术有限公司 Video strip splitting method, system, device, equipment and storage medium

Also Published As

Publication number Publication date
CN116980644A (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN107832434B (en) Method and device for generating multimedia play list based on voice interaction
US11061962B2 (en) Recommending and presenting comments relative to video frames
US11417341B2 (en) Method and system for processing comment information
US9460752B2 (en) Multi-source journal content integration systems and methods
CN109408639B (en) Bullet screen classification method, bullet screen classification device, bullet screen classification equipment and storage medium
CN111432235A (en) Live video generation method and device, computer readable medium and electronic equipment
Hu et al. Toward multiscreen social TV with geolocation-aware social sense
US20170235828A1 (en) Text Digest Generation For Searching Multiple Video Streams
US20150100582A1 (en) Association of topic labels with digital content
CN109862100B (en) Method and device for pushing information
CN109255035B (en) Method and device for constructing knowledge graph
CN111314732A (en) Method for determining video label, server and storage medium
WO2023142917A1 (en) Video generation method and apparatus, and device, medium and product
CN111279709A (en) Providing video recommendations
US20200007940A1 (en) Echo bullet screen
CN108470057B (en) Generating and pushing method, device, terminal, server and medium of integrated information
CN109600625A (en) A kind of program searching method, device, equipment and medium
WO2018145572A1 (en) Method and device for implementing vr live streaming, ott service system, and storage medium
US20240121485A1 (en) Method, apparatus, device, medium and program product for obtaining text material
WO2023197814A1 (en) Video processing method and system, and related device
CN113282770A (en) Multimedia recommendation system and method
CN113343069A (en) User information processing method, device, medium and electronic equipment
CN111901629A (en) Method and device for generating and playing video stream
US12010405B2 (en) Generating video summary
KR102615377B1 (en) Method of providing a service to experience broadcasting

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23787462

Country of ref document: EP

Kind code of ref document: A1