CN116419039A - Video stripping method, server and system - Google Patents

Video stripping method, server and system Download PDF

Info

Publication number
CN116419039A
CN116419039A CN202210114425.XA CN202210114425A CN116419039A CN 116419039 A CN116419039 A CN 116419039A CN 202210114425 A CN202210114425 A CN 202210114425A CN 116419039 A CN116419039 A CN 116419039A
Authority
CN
China
Prior art keywords
video
server
structural
short
splitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210114425.XA
Other languages
Chinese (zh)
Inventor
王昱璇
耿学文
钟伟才
田新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Petal Cloud Technology Co Ltd
Original Assignee
Petal Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Petal Cloud Technology Co Ltd filed Critical Petal Cloud Technology Co Ltd
Publication of CN116419039A publication Critical patent/CN116419039A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/23439Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/44029Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Studio Devices (AREA)

Abstract

A video stripping method, a server and a system relate to the field of video processing, can split a long video into a plurality of short videos, can improve the splitting quality of the videos, and avoid the problems of overlong/too short videos after splitting, rhythmic dragging and the like, and the method comprises the following steps: the method comprises the steps that an electronic device sends a splitting request for a first video to a server, the server identifies each structural unit contained in the first video according to a first structural marking mode corresponding to the first video, each structural unit comprises one or more lenses, and the server splits the first video into a plurality of first short sheets according to each structural unit contained in the first video and a splitting rule, wherein each first short sheet comprises one or more structural units; the server sends a plurality of first tabs to the electronic device. When the types of the first videos are different, the first structure marking modes corresponding to the first videos are different.

Description

Video stripping method, server and system
The present application claims priority from the chinese patent application filed at 2021, 12 and 30, with application number 202111657170.3, entitled "a video tear-down method, electronic device, server, and system", the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to the field of video processing, and in particular, to a video stripping method, server, and system.
Background
The short video meets the requirement of the mobile user on utilizing the fragmentation time, and greatly improves the social activity and the use viscosity of the user, so that the mobile user has blowout flow and outstanding marketing effect. Whereas conventional video platforms have a large number of long video assets (e.g., movie shows, etc.), but lack traffic. How to split long video into multiple independent short video becomes a problem to be solved.
Disclosure of Invention
According to the video stripping method, the server and the system, long videos can be split into a plurality of short videos, the splitting quality of the videos can be improved, and the problems that the short videos after splitting are overlong/too short, rhythmic dragging and the like are avoided. In order to achieve the above purpose, the embodiment of the present application provides the following technical solutions:
in a first aspect, a video stripping method is provided, applied to an electronic device, and the method includes: the electronic equipment sends a splitting request for the first video to a server; the electronic equipment receives a plurality of first short sheets returned by the server, wherein each first short sheet comprises one or more structural units, and each structural unit comprises one or more lenses; the electronic equipment receives adjustment operation of a user for one or more first short sheets; the electronic device splits the first video into a plurality of second short pieces according to the adjustment operation.
It can be understood that when the types of the first videos are different, different segments in the first videos have different characteristics, so that structural units contained in the first videos are identified based on a first structural marking mode corresponding to the first videos, and the first videos are split into a plurality of first short pieces based on the structural units corresponding to the first structural marking mode, so that the splitting quality of the first videos is improved. Furthermore, the electronic device can also accept that the user adjusts the split first short sheets to obtain second short sheets, and further meets the video splitting requirement of the user. In addition, the technical scheme provided by the embodiment of the application can be suitable for split scenes of various types of first videos, and is wide in application scene.
In one possible implementation, the electronic device splits the first video into a plurality of second short pieces according to an adjustment operation, including: the electronic equipment sends the adjustment operation to a server, and requests the server to split the first video into a plurality of second short pieces according to the adjustment operation; and the electronic equipment receives the plurality of second short sheets returned by the server.
That is, the electronic device may send the adjustment operation of the user to the server, and the server adjusts the first short piece split by the first video according to the adjustment operation to obtain the second short piece, and returns the second short piece to the electronic device.
In one possible implementation, the adjusting operation for the one or more first tabs includes: an operation of deleting one or more structural units from the one or more first tabs; and/or, combining two or more first tabs; and/or modifying the operation of one or more of the structural units in one or more of the first tabs.
The video stripping method provided by the embodiment of the invention can support several adjustment operations of the user on the split first short film.
In one possible implementation, the splitting request includes a first structure marking mode, where the first structure marking mode is used for marking each lens in the first video as a structural unit corresponding to the first structure marking mode by the server.
In some examples, a user may input a first structural tagging manner corresponding to a first video. In other examples, the server may also automatically determine the first structural marking mode according to the type of the first video.
In one possible implementation, the first structural unit corresponds to a type of the first video.
In one possible implementation, the first structural tagging mode is any one of movie type, category interview type, news type, documentary type, sports event type, concert type.
In one possible implementation manner, when the first structural marking mode is a film and television play type, the structural unit corresponding to the first structural marking mode includes one or more of a field, an event, a background/introduction and details; when the first structural tagging approach is of the category of category interview, the structural element corresponding to the first structural tagging approach includes one or more of a game/link, a show, a performance/content; when the first structure marking mode is a news type, the structural unit corresponding to the first structure marking mode comprises one or more of a chapter, a guide language, a main program and a later interview/comment.
In one possible implementation, when the first structural marking mode is a movie type, each first short film includes one or more fields; when the first structural marking means is of the category of a category interview, each first tab includes one or more games/links; when the first structure is marked as a news type, each first short piece includes one or more chapters.
In one possible implementation, before the electronic device receives an adjustment operation by a user for one or more first tabs, the method includes: when the first structure marking mode is a film and television play type, the electronic equipment displays a first interface, wherein the first interface comprises one or more fields corresponding to each first short film, and each field comprises at least one of an event, a background/introduction and details; when the first structural marking mode is of a variety interview type, the electronic device displays a first interface, wherein the first interface comprises one or more games/links corresponding to each first short piece, and each game/link is at least one of a play, performance and content; when the first structure marking mode is a news type, the electronic device displays a first interface, wherein the first interface comprises one or more chapters corresponding to each first short piece, and each chapter comprises at least one of a guide language and a main program and a later interview/comment.
A second aspect provides a video stripping method, applied to a server, the method comprising: the method comprises the steps that a server receives a splitting request for a first video sent by electronic equipment; the server identifies each structural unit contained in the first video according to a first structural marking mode corresponding to the first video, wherein each structural unit comprises one or more lenses; the server splits the first video into a plurality of first short sheets according to each structural unit and splitting rules contained in the first video, wherein each first short sheet comprises one or more structural units; the server sends a plurality of first tabs to the electronic device.
It will be appreciated that when the types of the first videos are different, different segments in the first videos have different features, and then the first structure marking modes corresponding to the first videos are different. That is, in the embodiment of the present application, different first structure marking manners are configured for different types of first videos, which is favorable to improving the splitting quality of the first videos, and is favorable to avoiding the problems of overlong/too short first short pieces after splitting, rhythmic tugging, and the like. In addition, the technical scheme provided by the embodiment of the application can be suitable for split scenes of various types of first videos, and is wide in application scene.
In one possible implementation manner, the server identifies each structural unit included in the first video according to a first structural marking manner corresponding to the first video, including: the server extracts the corresponding characteristics of each lens included in the first video; and marking each lens in the first video as a structural unit corresponding to the first structural mark based on the corresponding characteristic of each lens in the first video.
In one possible implementation manner, the server splits the first video into a plurality of first short pieces according to each structural unit and a splitting rule included in the first video, including: and the server splits the first video into a plurality of first short sheets according to the structural units in the first video, the association degree among the largest structural units and the splitting rule.
It should be noted that the server may also configure different splitting rules for different marking modes, that is, different splitting rules are configured for different types of long videos (i.e., the first videos), which is beneficial to further improving the splitting quality of the server.
In one possible implementation, the split request includes a first structure marker pattern.
In one possible implementation, the first structural marking means corresponds to a type of the first video.
In one possible implementation, the first structural tagging mode is any one of movie type, category interview type, news type, documentary type, sports event type, concert type.
In one possible implementation manner, when the first structural marking mode is a film and television play type, the structural unit corresponding to the first structural marking mode includes one or more of a field, an event, a background/introduction and details; when the first structural tagging approach is of the category of category interview, the structural element corresponding to the first structural tagging approach includes one or more of a game/link, a show, a performance/content; when the first structure marking mode is a news type, the structural unit corresponding to the first structure marking mode comprises one or more of a chapter, a guide language, a main program and a later interview/comment.
In one possible implementation, the field is the largest structural unit among the structural units corresponding to the movie & TV drama; among the building blocks corresponding to the variety interviews, the game/link is the largest building block; among the structural units corresponding to news, the chapter is the largest structural unit.
In one possible implementation, when the first structural marking mode is a movie type, each first short film includes one or more fields; when the first structural marking means is of the category of a category interview, each first tab includes one or more games/links; when the first structure is marked as a news type, each first short piece includes one or more chapters.
In one possible implementation, after the server sends the plurality of first clips to the electronic device, the method further includes: the method comprises the steps that a server receives adjustment operation of a user aiming at one or more first short sheets, wherein the adjustment operation is sent by electronic equipment; and the server updates the splitting rule according to the adjustment operation.
In one possible implementation, after the server updates the splitting rule according to the adjustment operation, the method further includes: the server splits the first video into a plurality of second short pieces according to each structural unit contained in the first video and the updated splitting rule, wherein each second short piece comprises one or more structural units; the server sends a plurality of second tabs to the electronic device.
In one possible implementation, the adjusting operation for the one or more first tabs includes: an operation of deleting one or more structural units from the one or more first tabs; and/or, combining two or more first tabs; and/or modifying the operation of one or more of the structural units in one or more of the first tabs.
In a third aspect, there is provided an electronic device comprising: a processor, a memory, and a display screen, the memory, the display screen being coupled to the processor, the memory being for storing computer program code, the computer program code comprising computer instructions which, when read from the memory by the processor, cause the electronic device to perform the video striping method as described in the first aspect and any one of the possible implementations of the first aspect.
A fourth aspect provides a server, comprising: a processor, a memory, and a communication interface, the memory, the communication interface being coupled to the processor, the memory being for storing computer program code, the computer program code comprising computer instructions which, when read from the memory by the processor, cause the server to perform the video striping method as described in the second aspect and any possible implementation of the second aspect.
In a fifth aspect, there is provided a computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform a video striping method as described in the first aspect and any one of the possible implementations of the first aspect.
A sixth aspect provides a computer readable storage medium comprising computer instructions which, when run on a server, cause the server to perform a video striping method as described in the second aspect and any one of the possible implementations of the second aspect.
A seventh aspect provides a computer program product which, when run on a computer, causes the computer to perform the video striping method as described in the first aspect and any one of the possible implementations of the first aspect; alternatively, a video striping method as described in the second aspect and any one of the possible implementations of the second aspect is performed.
An eighth aspect provides an apparatus, the apparatus being comprised in an electronic device, the apparatus having functionality to implement the behaviour of the electronic device in any one of the above aspects and possible implementations. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the functions described above. Such as a communication module or unit, a display module or unit, a processing module or unit, etc.
A ninth aspect provides an apparatus embodied in a server, the apparatus having functionality to implement server behaviour in any of the above aspects and possible implementations. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the functions described above. Such as a communication module or unit, a processing module or unit, etc.
A tenth aspect provides a chip system comprising a processor which, when executing instructions, performs a video striping method as described in the first aspect and any one of the possible implementations of the first aspect; alternatively, a video striping method as described in the second aspect and any one of the possible implementations of the second aspect is performed.
The electronic device provided in the third aspect, the server provided in the fourth aspect, the computer readable storage medium provided in the fifth aspect, the computer readable storage medium provided in the sixth aspect, the computer program product provided in the seventh aspect, the apparatus provided in the eighth aspect, the apparatus provided in the ninth aspect, the chip system provided in the tenth aspect, etc. refer to the description of the technical effects in any one of the foregoing first aspect or the second aspect and any one of the possible implementation manners thereof, which are not repeated here.
Drawings
Fig. 1 is a schematic structural diagram of a video stripping system according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 4 is a flow chart of a video stripping method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of some graphical user interfaces related to a video striping method according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of another graphical user interface involved in a video striping method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another server according to an embodiment of the present application;
Fig. 8 is a process schematic diagram of a video stripping method according to an embodiment of the present application;
fig. 9 is a flowchart of another video stripping method according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a further graphical user interface involved in a video striping method according to an embodiment of the present application;
FIG. 11A is a schematic diagram of yet another graphical user interface related to a video striping method according to an embodiment of the present application;
FIG. 11B is a schematic diagram of yet another graphical user interface related to a video striping method according to an embodiment of the present application;
FIG. 11C is a schematic diagram of yet another graphical user interface involved in the video stripping method according to the embodiments of the present application;
FIG. 12 is a schematic diagram of yet another graphical user interface involved in a video striping method according to an embodiment of the present application;
FIG. 13 is a schematic diagram of a further user graphical interface involved in a video striping method provided in an embodiment of the present application;
FIG. 14 is a schematic diagram of yet another graphical user interface involved in a video striping method according to an embodiment of the present application;
fig. 15 is a schematic diagram of still another graphical user interface related to a video splitting method according to an embodiment of the present application.
Detailed Description
In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more. In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The video splitting scheme provided in the embodiment of the present application may be applied to a video processing system as shown in fig. 1, where the video processing system includes one or more electronic devices 100, and one or more first servers 200.
Wherein, the electronic device 100 may have a client or an application program installed thereon for providing a man-machine interaction interface. For example, a user may input a first video (i.e., a long video) to be processed, such as a movie show, a variety interview, news, documentaries, etc., through the electronic device 100. The electronic device 100 sends the first video input by the user to the first server 200, and the first server 200 strips the first video to obtain a plurality of split second videos (i.e. short videos). In general, the first server 200 adopts the technical solution provided in the embodiments of the present application, and first converts the video data of the unstructured first video into structured video data, i.e. marks different segments in the first video as structural units of different levels. Then, the first video is split into a plurality of second videos (i.e., short videos) according to a preset splitting rule in combination with meanings of the structural units of different levels. Specific technical solutions will be described in detail below, and will not be described here.
In some embodiments, the user may also select the structural tagging mode to be employed by the electronic device 100. In one particular implementation, the user may select a structural tagging approach corresponding to a type of the first video (e.g., movie genre, category interview genre, news genre, documentary genre, sports event genre, concert genre, etc.). Then, the first server 200 will use the structure marking mode selected by the user to mark different segments in the first video as different levels of structure units. In addition, different structural units are arranged in different structural marking modes. For example, the structural units configured in the structural marking mode corresponding to the movie type may include fields, events, background/introduction, details, and the like. That is, if the first video is a movie type, different segments in the first video may be marked as structural units of fields, events, backgrounds/introductions, details, etc., respectively. For another example, the structural elements configured in the category of the shows corresponds to the structural tagging may include games/links, plays, shows/content, and the like. That is, if the first video is of the category of a category interview, different segments in the first video are marked as building blocks for games/links, shows/content, etc., respectively. For another example, the structural marking mode corresponding to the news type may include chapters, guide words, main programs, post interviews (post-harvest)/comments, and the like. That is, if the first video is of the news type, different segments in the first video are respectively marked as structural units of chapters, guide words, main programs, post-production/comments, and the like.
Of course, in other embodiments, the user may not enter the structural tagging approach. Then, the first server 200 may perform intelligent analysis on the first video, determine the type of the first video, and automatically determine the corresponding structure marking mode based on the type of the first video.
It will be appreciated that when the types of the first video are different, different segments in the first video have different characteristics. In the embodiment of the application, different structural units are configured for different types of first videos, different splitting rules are preset, the splitting quality of the first videos can be improved, and the problems of overlong/too short second videos, rhythmic dragging and the like after splitting are avoided. In addition, the technical scheme provided by the embodiment of the application can be suitable for split scenes of various types of first videos, and is wide in application scene.
And, for different types of first videos, the user needs to select at most different structure marking modes, and the first server 200 can use different structure marking modes to mark the first videos as different structure units and split the first videos based on different splitting rules and marked structure units. Therefore, in the technical scheme provided by the embodiment of the application, a user does not need to configure more splitting rules and the like for different types of first videos, the user is simple, convenient and quick to operate, and the user and the electronic equipment are good in interaction experience.
In still other embodiments, the first server 200 may perform multiple splitting processes on the first video until a user-satisfactory splitting result is obtained. For example, if the user is not satisfied with the first split result of the first server 200, one or more second videos in the first split result may be adjusted by the electronic device 100 (e.g., deleting a segment corresponding to one or some of the structural units, or merging segments corresponding to two or more of the structural units, etc.). Then, the first server 200 may update the splitting rule according to the adjustment operation of the user, and install the updated splitting rule to split the first video again to obtain the second splitting result. If the user is satisfied with the second splitting result, the second server 200 may output the split second video to the electronic device 100. If the user is still not satisfied with the second splitting result, the second splitting result may still be adjusted by the electronic device 100 again, and the first server 200 splits again until the splitting result satisfied by the user is obtained.
In still other embodiments, the video processing system may further include a second server 300, such as a video server, having a large amount of video stored thereon, such as the first video. When a user inputs a first video to the first server 200 through the electronic device 100, the first server 200 may acquire the content of the first video from the second server 300 according to the network address of the first video by inputting the network address of the first video to the first server 200 through the electronic device 100.
Referring to fig. 2, fig. 2 shows a schematic diagram of an electronic device 100, the electronic device 100 including one or more processors 110, one or more memories 120, one or more communication interfaces 130, one or more input devices 140, and one or more output devices 150. In a specific implementation, the electronic device 100 may be, for example, a mobile phone, a tablet computer, a personal computer (personal computer, PC), a personal digital assistant (personal digital assistant, PDA), an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, an in-vehicle device, a smart screen, etc., and the specific form of the electronic device 100 is not limited in this application.
Wherein the processor 110, the memory 120, the communication interface 130, and the input-output devices are connected by buses. The processor 110 may include a general purpose central processing unit (Central Processing Unit, CPU) (e.g., CPU0 and CPU 1), a microprocessor, an Application-specific integrated circuit (ASIC), a graphics processor (graphics processing unit, GPU), a neural-Network Processor (NPU), or an integrated circuit for controlling program execution in the present Application, etc.
In general, memory 120 may be used to store computer-executable program code that includes instructions. The memory 120 may include a stored program area and a stored data area. The storage program area may store an operating system, application program codes, and the like. In addition, the memory 120 may include a high-speed random access memory, and may also include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
The processor 110 performs various functional applications and data processing of the electronic device 100 by executing instructions stored in the memory 120. In one example, the processor 110 may also include multiple CPUs, and the processor 110 may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, or processing cores for processing data (e.g., computer program instructions).
Communication interface 130 may be used to communicate with other devices or communication networks, such as ethernet, wireless local area network (wireless local area networks, WLAN), etc.
The input device 140 is in communication with the processor 110 and may receive user input in a variety of ways. For example, the input device 140 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.
The output device 150 communicates with the processor 110 and may display information in a variety of ways. For example, the output device 150 may be an active-matrix organic light emitting diode (AMOLED) or active-matrix organic light emitting diode (AMOLED), a flexible light emitting diode (flex light-emitting diode, FLED), mini, micro LED, micro-oeled, quantum dot light emitting diode (quantum dot light emitting diodes, QLED), liquid crystal display (Liquid Crystal Display, LCD), light emitting diode (Light Emitting Diode, LED) display device, cathode Ray Tube (CRT) display device, projector (projector), or the like.
Referring to fig. 3, fig. 3 shows a schematic structure of a first server 200, where the first server 200 includes one or more processors 210, one or more memories 220, and one or more communication interfaces 230. In particular implementations, the first server 200 herein may be at least one of a stand-alone physical server, a plurality of stand-alone physical servers, a cloud server providing cloud computing, a cloud computing platform, and a virtualization center.
Wherein the processor 210, the memory 220 and the communication interface 230 are connected by a bus. The processor 210 may include a general purpose central processing unit (Central Processing Unit, CPU) (e.g., CPU0 and CPU 1), a microprocessor, an Application-specific integrated circuit (ASIC), a graphics processor (graphics processing unit, GPU), a neural-Network Processor (NPU), or an integrated circuit for controlling program execution in the present Application, etc.
In general, memory 220 may be used to store computer-executable program code that includes instructions. The memory 220 may include a stored program area and a stored data area. The storage program area may store an operating system, application program codes, and the like. In addition, the memory 220 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
The processor 210 performs various functional applications of the first server 200 and data processing by executing instructions stored in the memory 220. In one example, processor 210 may also include multiple CPUs, and processor 210 may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, or processing cores for processing data (e.g., computer program instructions).
Communication interface 230 may be used to communicate with other devices or communication networks such as ethernet, wireless local area network (wireless local area networks, WLAN), etc.
It should be noted that, the structure of the second server 300 may refer to the structure of the first server 100, and it is understood that the second server 300 may include more or less components than the first server 200, or combine some components, split some components, or arrange different components. The embodiments of the present application are not limited in this regard.
The following describes in detail the technical solutions provided in the embodiments of the present application with reference to the accompanying drawings.
Fig. 4 is a schematic flow chart of a video stripping method according to an embodiment of the present application, where the method includes:
s400, the electronic device 100 receives a first video input by a user.
Optionally, the electronic device 100 also receives a first structural marking mode input by a user.
S401, the electronic device 100 transmits the first video to the first server 200.
Optionally, the electronic device 200 also sends the first structure marking means to the first server 200.
In step S400-step S401, a user may input a first video through the electronic device 100. Optionally, the first structural marking mode is input through the electronic device 100.
For example: the electronic device 100 may be provided with a client or an application program, where the client or the application program provides a man-machine interaction interface, and a user may input a first video, optionally, a first structure marking manner, to the first server 200 through the electronic device 100 through the man-machine interaction interface provided by the client or the application program. In one example, the client or application may be a client or application dedicated to video processing or video splitting, or may be a general client or application, for example, the client or application may be a browser, and the user may enter a web page of the video splitting to perform related processing by inputting a specific network address into the browser.
The electronic device 100 is a computer, and a client dedicated to video stripping is installed on the computer. As shown in fig. 5 (1), the computer displays a client back homepage 501 dedicated to starting video splitting. The homepage 501 includes a control 502 for selecting a long video to be processed (i.e., a first video), and a control 503 for selecting a structural marking manner. In response to detecting the user operating control 502, the computer may display an interface 504 as shown in fig. 5 (2), the interface 504 displaying a menu 505 that may be used by the user to select a local video in the computer as the first video, for example, to select video 506 as the first video. Of course, in other embodiments, the user may also select a non-native video (e.g., a video on the internet) as the first video via the computer. At this point, the user may enter a network address of the non-native video, etc., in control 502. It should be noted that, the operation of selecting the first video by the user is only described here by way of example, and the operation of selecting the first video in the embodiment of the present application may be any other manner, which is not limited in particular.
Further, in response to detecting user operation of control 503, the computer displays an interface 507 as shown in FIG. 6, with a drop down menu 508 of control 503 expanded in interface 507, and the user selects a corresponding structural marking mode, such as "movie play". The user may then operate the begin tear-out control 510. When the user operation is detected, the start bar splitting control 510 is executed, and the computer transmits the first video (video data or network address of the first video) and the selected structure marking mode input by the user to the first server 200, and the first server 200 performs the bar splitting process.
In a specific embodiment, the data sent by the computer to the first server 200 may use the following data structure: { "video_structure_type": "movie", "content": "xxx.mp4" }, wherein the content of field video_structure_type is the first structure marking mode, and can be any of movie shows, variety interviews, news, documentaries or other structure marking modes. The content of the field content is video data or a network address of the first video.
Of course, the user may cancel the current splitting task through the cancel control 509. In some examples, the computer may empty the information such as the long video to be processed and the structure marking mode that have been input this time.
S402, the first server 200 performs shot detection on the first video, and divides the first video into shots.
The first server 200 acquires the first video from the second server 300 after receiving the first video transmitted by the electronic device 100 or according to the network address of the first video transmitted by the electronic device 100. The first server 200 may perform video segmentation on the first video by using a method such as shot boundary detection (shot boundary detection, SBD), a machine learning method (e.g., a shot detection method of K-Means clustering), a color histogram method (detecting shots by comparing color change values between adjacent image frames), and the like, and divide the first video into a plurality of shots (shots), wherein the shots are basic data units in the video stream.
In one particular implementation, as shown in fig. 7, the first server 200 may include a lens detection module, a video understanding module, a structural unit marking module, a short-film recognition module, and a video rendering module. Optionally, the first server 200 may further include a manual correction module. The step S402 may be specifically executed by the lens detection module in the figure. The input of the lens detection module is a first video, and optionally, the input of the lens detection module may further include a first structural marking mode selected by a user. The output of the lens detection module is a video clip corresponding to each lens contained in the first video. In one embodiment, the output of the shot detection module may be a time stamp sequence of shots, for example, the output data is [0,5,33, 48, … … ] in milliseconds. Wherein each number represents the starting time of a shot in the first video and the previous preset duration (e.g., 1 millisecond) of the next bit is the ending time of the shot. That is, the timestamp corresponding to the first shot in the first video is 0ms to 4ms; the corresponding time stamp of the second shot is 5ms to 32ms; the third shot has a corresponding time stamp of 33ms to 47ms … …
S403, the first server 200 extracts the features of each shot in the first video.
For example, the first server 200 may extract the features of each lens using an image feature extractor. Features of the shots include, but are not limited to, each shot's hue, character, location, action, audio, dialogue, shot categories (e.g., main shot, distant shot, near shot, etc.), etc. The image feature extractor may be a convolutional neural network or other neural network, and the implementation of the image feature extractor is not limited in this embodiment of the present application.
In a specific implementation, this step S403 may be specifically performed by the video understanding module in fig. 7. The inputs of the video understanding module are the timestamp sequence of the shot and the first video output by the shot detection module, and the output is the characteristic of each shot, wherein the characteristic of the shot can be represented by a 512-dimensional characteristic vector.
Of course, in some other embodiments, steps S402 and S403 are optional. The first video and the first structural marking mode input by the electronic device are directly input to the structural unit marking module for processing.
S404, the first server 200 marks each shot in the first video as a structural unit with different meanings according to the first structure marking mode and the feature of each shot.
In some embodiments, the first server 200 may perform semantic recognition on each lens in the first video, and mark each lens in the first video as a structural unit with a different meaning, so that the following combination of each lens based on the structural units with different meanings of the mark is facilitated, and a split short film is formed. For example, the first server 200 may employ a classification model to label each shot in the first video as a structural unit having a different meaning based on the characteristics of each shot. The classification model may be, for example, a Random Forest (RF) algorithm, a support vector machine (support vector machines, SVM) algorithm, or the like. Further, the first server 200 may also calculate the degree of association between the adjacent maximum structural units, so that the lens combination is performed with reference to the degree of association between the adjacent maximum structural units.
It has been described above that when the types of the first video are different, different structural marking means can be selected. When the selected structural marking mode is different, the marked structural unit of each lens is different. In one specific implementation, when the selected structure marking manner is different, the first server 200 may use different classification models to mark each shot in the first video as different structure units. For example, when the first structure marking mode is a movie, the first server 200 classifies/marks each shot in the first video as a structural unit of a field, an event, a background/introduction, a detail, or the like using the classification model 1. Wherein the field is the largest building block. The structural elements labeled as fields may include one or more smaller structural elements, including, for example, one or more of the structural elements labeled as events, the structural elements labeled as background/introduction, and the structural elements labeled as details. For another example, when the first structural tagging approach is a variety interview, the first server 200 employs the classification model 2 to classify/tag individual shots in the first video as structural units of games/links, plays, shows/content, etc. Wherein the game/link is the largest building block. For another example, when the first structure marking mode is news, the first server 200 uses the classification model 3 to classify/mark structural units of chapters, guide words, main programs, post-harvest/comments, and the like of each shot in the first video. Wherein, the chapter is the largest structural unit.
In a specific implementation, this step S404 may be specifically performed by the structural unit marking module in fig. 7. The structural unit marking module selects to employ a corresponding classification model (e.g., classification model 1, classification model 2, etc.) according to the structural marking mode selected by the user or the structural marking mode automatically determined by the first server 200. And then inputting the features of each lens extracted in the step S403 into a corresponding classification model, and outputting the corresponding structural units of each lens by the classification model, namely marking the structural units for each lens. Optionally, the association degree of the largest structural unit is further calculated.
In order to better illustrate the process of marking each shot in the first video as a structural unit having a different meaning, the first video is exemplified as a movie, and will be described with reference to fig. 8.
As shown in fig. 8 (1), a complete video of the first video. As shown in fig. 8 (2), after the lens detection step of step S402 described above, the first video is detected with a plurality of lenses. Further, each lens feature is extracted, that is, the above step S403 is performed. Then, as shown in (3) of fig. 8, each lens is marked as a structural unit of a different hierarchy according to the characteristics of each lens. Wherein the largest building block is the field. The structural elements labeled as fields may include one or more smaller structural elements, including, for example, one or more of the structural elements labeled as events, the structural elements labeled as background/introduction, and the structural elements labeled as details. Further, the association degree between two adjacent fields, namely the similarity of characters, scenes, audio, tone and the like between the two adjacent fields is calculated.
As shown in table one, the meaning of each structural unit in the corresponding structural labeling mode of the movie is given by way of example. It should be noted that, the meaning of the structural unit in table one is merely an example, and the structural unit configured in the structural labeling mode corresponding to the movie and television play is not specifically limited in the embodiment of the present application, and the specific meaning of each structural unit is not limited.
List one
Figure RE-GDA0003670075730000091
Figure RE-GDA0003670075730000101
S405, the first server 200 obtains a first splitting result according to the structural units marked by each lens, the association degree between the adjacent maximum structural units and a preset splitting rule.
S406, the first server 200 splits the first video into a plurality of second videos according to the first splitting result. Wherein the second video comprises one or more segments.
In one embodiment, the first server 200 has a splitting rule preset therein. The preset splitting rule includes, for example, a duration (e.g., 5 minutes) of the short video after splitting, a number (e.g., 1) of maximum structural units included in the short video after splitting, and the like. The first server 200 may generate a mask (mask) of the region of interest according to a preset splitting rule. And inputting each lens contained in the first video, the corresponding structural unit of each lens and the characteristics of each lens into a mask to obtain the interested lens in the first video. The mask may be a deep neural network or other networks, and the embodiment of the present application does not limit the specific implementation of the mask. It can be understood that before splitting the first video, the mask is used to extract the interested shots in the first video (or exclude the shots not interested in the first video), which is beneficial to improving the processing efficiency and accuracy of the subsequent video splitting process.
It should be noted that in other embodiments, when the first video type is different, the shots of interest to the user may also be different. Then, the first server 200 may set different splitting rules for different structure marking modes, and different splitting rules may generate different masks. Therefore, the first server 200 may determine, according to the selected first structure marking manner, a splitting rule corresponding to the first structure marking manner, generate a corresponding mask, and extract, using the mask, a shot of interest to the user in the first video.
Next, the first server 200 inputs the shot of interest and the relevant information of the part of the shot of interest (for example, the feature of the part of the shot, the marked structural unit, the association degree between the maximum structural units, etc.) into the splitting model to perform an operation, so as to obtain a splitting result, and the splitting result is used for splitting the first video into a plurality of second videos. The splitting model may also adopt other neural networks of the deep learning classification network, and the embodiment of the application does not limit the specific implementation of the splitting model.
In a specific implementation, this step S405 may be specifically performed by the short-film recognition module and the video rendering module in fig. 7. The short-film identification module can comprise a splitting module and a splitting rule module. The splitting rule module selects corresponding splitting rules according to the selected first structure marking mode, regenerates a corresponding mask, and finally inputs the shots marked with the structural units in the first video and the characteristics of each shot into the generated mask to obtain the interesting shots in the first video. And then, inputting the interested shots in the obtained first video, the characteristics of the shots, the structural units marked by the shots, the association degree between the maximum structural units and the like into a splitting module for reasoning, and outputting a timestamp sequence (namely a splitting result) of the short film by the splitting module. For example, the timestamp sequence of the output short piece is:
[{id:1,tube[{id:1,start_time:00:09:010040,end_time:00:11:31.320}]}, {id:2,tube[{id:1,start_time:00:13:49.360,end_time:00:14:14.720},{id:2,start_time: 00:14:49.360,end_time:00:16:14.720}]}, {id:3,tube[{id:1,start_time:00:22:05.320,end_time:00:26:34.720}]}, {id:4,tube[{id:1,start_time:00:31:19.200,end_time:00:33:52.000}]},……]
Wherein { id:1, tube [ ] } marks the first short piece, wherein { id:1, start_time:00:09:010040, end_time:00:11:31.320} in tube [ ] indicates that the first short piece comprises a temporally continuous piece, which can comprise a plurality of shots marked as different structural units.
{ id:2, tube [ ] } marks the second short piece, where { id:2, tube [ { id:1, start_time:00:13:49.360, end_time:00:14:14.720}, { id:2, start_time:00:14:49.360, end_time:00:16:14.720} ] in tube [ ] indicates that three temporally consecutive pieces are included in the second short piece, each piece may include a plurality of shots marked as different structural units.
And so on.
And then, inputting the timestamp sequence of the short film into a video rendering module, intercepting a corresponding lens from the first video by the video rendering module according to the timestamp sequence of the short film, performing processing such as splicing rendering on the intercepted lens and audio, and finally outputting the short film. Wherein each shot includes one or more shots. The output short film is the split second video. In general, a first video may be split into a plurality of second videos.
Next, as shown in fig. 8, when the lens marked with the structural unit is subjected to the mask processing of the splitting rule, the lens of interest as shown in fig. 8 (4) is screened out, as shown in fig. 8 (3). Then, the shot of interest is subjected to shot recognition, resulting in a plurality of shots, such as shot 1 and shot 2, as shown in (5) of fig. 8. Wherein the short 1 comprises a two-field lens and the short 2 comprises a one-field lens. Of course, in other examples, the shot of the marked structural unit may be directly identified without performing masking processing, which is not limited in the embodiments of the present application.
S407, the first server 200 transmits the second video to the electronic device 100.
In other embodiments, the first server 200 may also send the second video obtained after splitting to the second server 300, which is not limited in this embodiment of the present application. The second video includes a plurality of shortcuts.
In summary, the embodiment of the application configures the fine-grained structural unit for the first video, which is favorable for more finely splitting the first video, and is favorable for avoiding the problems of overlong/too short second video after splitting, rhythmic dragging and the like. In addition, the embodiment of the application can automatically configure the structural units with different meanings for the first videos with different types, and configure different splitting rules for the first videos with different types, so that the structural units with the same theme/content in the first videos can be split into one short video, and the splitting quality of the first videos can be improved. In other words, the technical scheme provided by the embodiment of the application can be suitable for split scenes of various types of first videos, and is high in expandability and wide in application scene.
In addition, for different types of first videos, the user needs to select at most different structure marking modes, and the first server 200 can use different structure marking modes to mark the first videos as different structure units and split the first videos based on different splitting rules and marked structure units. Therefore, in the technical scheme provided by the embodiment of the application, a user does not need to configure more splitting rules and the like for different types of first videos, the user is simple, convenient and quick to operate, and the interaction experience is good.
In other embodiments, the user may manually adjust the splitting result after the first server 200 is automatically split, and then the first server 200 may automatically learn the manually adjusted rule based on the manually adjusted rule, and then re-split the first video based on the learned manually adjusted rule.
As shown in fig. 9, a flowchart of another video splitting method according to an embodiment of the present application is shown, where the flowchart includes steps S400 to S405, steps S901 to S906, and step S407.
The contents of steps S400 to S405 and step S407 are referred to the above description, and are not repeated here.
S901, the first server 200 transmits the first split result to the electronic device 100.
In one example, the first splitting result includes a short time stamp sequence after the first server 200 splits the first video, which may be used to split the first video into a plurality of short videos.
Here, the first splitting result returned by the first server 200 to the electronic device 100 further includes a time stamp sequence for each shot in the first video, and information such as a structural unit marked for each shot. In this way, after receiving the first splitting result, the electronic device 100 may present the shots included in the first video, the structural units marked for each shot, and the split short pieces of the first server 200 to the user, so that the user can conveniently combine the structural units corresponding to each shot and adjust the split short pieces of the first server 200.
In a specific implementation, this step may be specifically performed by the manual correction module and the tab identification module. The manual correction module receives the marking result of the structural unit sent by the structural unit marking module and sends the marking result to the electronic equipment. And the short-film identification module sends the splitting result (namely the splitting result 1) to the electronic equipment. The electronic device may present an adjustment interface according to the marking result and the splitting result of the structural unit, such as interface 901 shown in fig. 10 below.
S902, the electronic device 100 receives an adjustment operation input by the user.
S903, the electronic device 100 transmits the adjustment operation input by the user to the first server 200.
Next, the example of fig. 6 is described with reference to the electronic device 100 being a computer. After the computer receives the splitting result returned by the first server 200, the computer displays the interface 901 shown in fig. 10, and displays the short piece corresponding to the splitting result of the first server 200 for the first video. The interface 901 includes a clip option 902, a manual clip option (for a user to manually split the first video, etc.) and a star clip (a clip containing a specific star) option. After selecting the short-film option 902, the user can check the short film obtained by splitting the first video by the first server 200 by adopting the method. Of course, the user may also choose to view all of the shortcuts via option 903, and may also view some of the shortcuts, such as highlight shortcuts. Wherein, the highlight can be a specific character, a specific action, a specific dialogue, a specific music, etc., or a higher frequency of shot switching in the highlight, etc. In this way, the user can more specifically manually adjust the highlight instead of manually adjusting the highlight.
The explanation will be given taking the example that the user views all the tabs through the option 903. The splitting result on the right side of the interface 901 lists all the information of the short pieces obtained after the first server 200 is automatically split, for example, a preview drawing of the short piece, a playing time period, and functional controls for processing each short piece, such as a cut/adjust short piece control 906, a delete short piece control, and the like. In one example, when the user selects the second clip, the progress bar of the video on the left side of the interface 901 may jump directly to the playing time period corresponding to the clip, and the user may play the clip directly. In addition, the user can also make simple edits to the short in the play progress bar 905 below the left side of the interface 901. For example, the start time or end time of the short piece is modified, etc. In other examples, the user may also make more adjustments to the tab via the clip/adjust tab control 906 on the right side of the interface 901.
In response to detecting that the user operates the clip adjustment tab control 906, the computer displays an interface 907 as shown in fig. 11A, in which each shot included in the tab selected by the user (e.g., tab 2) and the structural units marked for each shot, the degree of association between adjacent maximum structural units, and the like are displayed. For example, in interface 907, the start time and end time of a user-selected short piece are marked with two black inverted triangles. The short-sheet 2 comprises a structural element denoted field (denoted field 2), the field 2 comprising a structural element denoted background/introduction, a structural element denoted detail, and a structural element denoted event. Optionally, the interface 907 also displays related information of other short pieces adjacent to the short piece selected by the user, such as the included lens, the structural unit marked for each lens, and so on. For example, adjacent to the tab 2 are tab 1, tab 3, and each of the shots contained in tab 1 and tab 3, and the structural units marked for each of the shots, are also marked in interface 907. Interface 907 also includes a degree of association (e.g., 0.68) of tab 1 with tab 2, and a degree of association (e.g., 0.35) of tab 2 with tab 3.
Further, the user may select one or more shots in the short shot in interface 907 to adjust the shots. For example, the time stamps (e.g., start time or end time) of the shots are adjusted, the shot is deleted from the short shot in which the shot is located, and so on. In one example, the user may select the shot 908 (the structural element labeled as detail) in the short-cut 1 by performing a left mouse click, and then open the shortcut menu 909 by performing a right mouse click, with the adjust timestamp option control and the delete option control displayed in the shortcut menu 909.
In response to detecting a user operation to adjust the timestamp option control, the computer displays an interface 919 as shown in FIG. 11B. Displayed in interface 919 is a timeline 920 of the shots 908 for marking the start time and end time of the shots 908. The user may adjust the start time and/or end time of the shot 908 by manipulating the timeline 920. For example, after the user adjusts the end time of the shot 908 forward, the computer displays an interface 921 as in fig. 11C, and the timestamp of the shot 908 in the interface 921 is adjusted. Of course, the computer may also take other forms to implement the adjustment of the timestamp of the lens 908, which is not limited herein.
As shown in fig. 11C for interface 921, the user can select the shot 908 again and open the shortcut menu 909. In response to detecting a user operation of the delete option control, the computer displays an interface 910 as shown in FIG. 12. In interface 910, a delete marker is displayed on shot 908 to indicate that shot 908 will no longer be included in shot 1. That is, the user can also manually confirm whether the unimportant shot exists in the short film, if so, the user can manually delete the unimportant shot from the short film, thereby adjusting the rhythm of the short film and improving the problems of short video scenario tugging after splitting, and the like.
It should be noted that, the selection of a certain shot and the selection of the delete option control are only examples, and the embodiments of the present application are not limited to a specific operation manner. For example: as in the interface 910 shown in fig. 12, the user can also box out a rectangular region 911 by pressing the left button of the mouse, wherein the entire contents of the shot 913 (labeled as background/introduction structural unit) and the shot 914 (labeled as detail structural unit) are located in the rectangular region 911, and then it is determined that the user selects the shot 913 and the shot 914. Then, when it is detected that the user clicks the right button of the mouse, the computer displays a shortcut menu 912. In response to the user operating the delete option control in the shortcut menu 912, the computer displays an interface 915 as shown in fig. 13, with delete markers displayed on the shots 913 and 914 in the interface 915 to indicate that the shots 913 and 914 will no longer be included in the short 2.
In one example, the association between adjacent largest structural elements (e.g., fields) is also shown in interface 916 as shown in FIG. 14. Then the user can also determine whether to merge partial shortcuts with reference to the degree of association between adjacent maximum building blocks and the content in each shortcut. For example, the user may select two or more shots (e.g., structural units labeled as field 1 and field 2) by a box selection operation or performing a mouse left click operation multiple times, etc. Then, the shortcut menu 917 is expanded by a right mouse button. Further, in response to detecting that the user selects the merge option control in the shortcut menu 917, the computer displays an interface 918 as shown in fig. 15, where the shots corresponding to field 1 and field 2 in the interface 918 are merged into one shot, for example, the merged shot 1. Optionally, the computer may reorder the serial numbers of other short pieces or not. In other embodiments, a user may split one tab into two or more tabs. Alternatively, the user may split one or more tabs into separate structural units, and then recombine the separate structural units into multiple tabs, etc. Or, the user reserves a segment corresponding to one or some of the structural units, and the like.
In summary, the adjustment operations performed by the user include, but are not limited to, deleting a segment corresponding to one or more structural units from the short-cut, merging two or more short-cut, splitting one or more short-cut, recombining segments corresponding to structural units in one or more short-cut, and retaining segments corresponding to one or more structural units. The computer transmits the adjustment operation performed by the user to the first server 200.
That is, when the length of the short piece after the first server 200 is automatically split is too short, or the length of the short piece is too short after the user manually deletes a part of the shot, the user may also adjust the duration of a single short piece by combining two or more short pieces. Or, when the length of the short film after the first server 200 is automatically split is too long, or the length of the short film is too long after the user manually reserves a part of the lens, the user can also adjust the duration of a single short film by splitting a part of the short film or recombining a part of the short film. In a word, the user can adjust the length of the split short video through a manual adjustment mode, so that the problems that the split short video is overlong or too short are avoided.
After the user has performed the full adjustment operation, the user may click on the next control on interface 918 shown in FIG. 15. In response to detecting the user clicking on the next control, the mobile phone transmits an adjustment operation performed by the user to the first server 200.
Of course, in other embodiments, if the user performs all the adjustment operations, that is, the last splitting result desired by the user, the first server 200 does not need to split again according to the adjustment operations performed by the user, and video rendering may also be directly performed by the electronic device 100 locally according to the adjusted splitting result, so as to obtain a plurality of final second videos. Alternatively, the electronic device 100 may send the adjustment operation performed by the user to the first server 200, and the first server 200 trains according to the adjustment operation, learns and updates the splitting rule, and uses the splitting rule to split other videos.
S904, the first server 200 updates the splitting rule according to the adjustment operation of the user.
The first server 200 learns the split preference of the user according to the adjustment operation of the user. In a specific implementation, the first server 200 may extract, for example, the features of the shots that are manually deleted, retained, combined, and so on, using the classification model, and update the splitting rule accordingly, so as to perform similar processing on other shots in the first video that have similar features. For example: the user deletes the detail segments with the perspective view features (i.e., shots of the building blocks labeled as details), but the user retains the detail segments with the vocal audio features. Also for example: the user combines fields 1 and 2 into one short piece whose duration is controlled within a preset duration (e.g., 5 minutes), etc. Also for example: the user splits two shots of a shot marked as event structure units into two shots, wherein each shot includes one shot marked as event structure unit. The first server 200 automatically updates the splitting rule according to the learned splitting preference of the user, so that the splitting preference of the user can be satisfied when the first server 200 re-splits or splits and adjusts the first video according to the updated splitting rule.
In a specific implementation, this step S904 may be specifically performed by the manual correction module and the splitting rule module in fig. 7. Specifically, after receiving the adjustment operation of the user, the manual correction module automatically learns the splitting preference of the user, and then sends the learned splitting preference of the user to the splitting rule module to request the splitting rule module to update the splitting rule.
S905, the first server 200 performs splitting processing on the first video again according to the structural units marked by the lenses, the association degree among the largest structural units and the updated splitting rule, and a second splitting result is obtained.
It should be noted that the updated splitting rule will act on the entire first video. That is, when the first server 200 splits the first video again according to the updated splitting rule, the second splitting result obtained after splitting the first video again not only adjusts the short pieces of the portion manually adjusted by the user, but also may similarly adjust the short pieces of the other portions in the first video (i.e., other short pieces not manually adjusted by the user). For example, the user manually deletes a detail clip having a long-view feature in the short clip 1, and does not manually adjust the short clip 3. According to the manual adjustment operation of the user, among the resolution rules updated by the first server 200, a rule of deleting a detail segment having a long-range scenery feature is added. If the short film 3 also includes a detail segment with a long-range view feature, when the first server 200 splits the first video again according to the updated splitting rule, the detail segment with the long-range view feature in the short film 3 is automatically deleted.
In a specific implementation, this step S905 may be specifically performed by the splitting module in fig. 7. Specifically, the splitting module acquires updated splitting rules from the splitting rule module, and performs splitting processing on the first video to obtain a second splitting result.
It should be noted that, in other embodiments, the first server 200 may also return the obtained second splitting result to the electronic device 100, and the electronic device 100 presents the second splitting result to the user. If the split requirement of the user is satisfied, the first server 100 may perform step S906 to output the split plurality of second videos. If the split requirement of the user is not met, the user can manually adjust again until the split result output by the first server 200 meets the requirement of the user.
S906, the first server 200 splits the first video into a plurality of second videos according to the second splitting result.
Therefore, the user can manually adjust the splitting result of the first server 200 to improve the splitting quality and meet the splitting requirement of the user. In addition, the user can adjust for part of the short sheets in the splitting result, the first server 200 can automatically learn the rule of user adjustment, and correspondingly adjust the splitting result of the whole first video according to the learned rule of user adjustment, so that the splitting efficiency is improved.
The embodiment of the application also provides a device which is contained in the server and has the function of realizing the server behavior in any one of the methods of the embodiment. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the functions described above. Such as a communication module or unit, a storage module or unit, a processing module or unit, etc.
The present embodiments also provide a computer storage medium comprising computer instructions which, when run on a server, cause the server to perform a method as in any of the embodiments above.
Embodiments of the present application also provide a computer program product for causing a computer to perform any of the methods of the embodiments described above when the computer program product is run on the computer.
It will be appreciated that the above-described terminal, etc. may comprise hardware structures and/or software modules that perform the respective functions in order to achieve the above-described functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The embodiment of the present application may divide the functional modules of the terminal and the like according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (25)

1. A video stripping method, characterized by being applied to an electronic device, the method comprising:
the electronic equipment sends a splitting request for a first video to a server;
the electronic equipment receives a plurality of first short sheets returned by the server, wherein each first short sheet comprises one or more structural units, and each structural unit comprises one or more lenses;
the electronic equipment receives adjustment operation of a user for one or more first short sheets;
and the electronic equipment splits the first video into a plurality of second short pieces according to the adjustment operation.
2. The method of claim 1, wherein the electronic device splitting the first video into a plurality of second clips according to the adjustment operation comprises:
the electronic equipment sends the adjustment operation to the server, and requests the server to split the first video into a plurality of second short pieces according to the adjustment operation;
And the electronic equipment receives the plurality of second short sheets returned by the server.
3. The method according to claim 1 or 2, characterized in that the adjustment operation for one or more of the first tabs comprises:
an operation of deleting one or more structural units from one or more of the first tabs;
and/or, combining two or more operations of the first short sheets;
and/or modifying the operation of one or more of the structural units in one or more of the first tabs.
4. A method according to any one of claim 1 to 3, wherein,
the splitting request comprises a first structure marking mode, and the first structure marking mode is used for marking each lens in the first video as a structural unit corresponding to the first structure marking mode by the server.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
the first structural unit corresponds to a type of the first video.
6. The method according to claim 4 or 5, wherein,
the first structure marking mode is any one of film and television drama type, variety interview type, news type, documentary type, sports event type and concert type.
7. The method according to any one of claim 4 to 6, wherein,
when the first structure marking mode is a film and television play type, the structure unit corresponding to the first structure marking mode comprises one or more of a field, an event, a background/introduction and details;
when the first structural tagging approach is of the category of a category interview, the structural units corresponding to the first structural tagging approach include one or more of games/links, plays, shows/content;
when the first structure marking mode is a news type, the structural unit corresponding to the first structure marking mode comprises one or more of a chapter, a guide language and a main program, and is used for later interviewing/commenting.
8. The method of claim 7, wherein the step of determining the position of the probe is performed,
when the first structure marking mode is film and television drama type, each first short film comprises one or more fields;
when the first structural tagging approach is of the category of category interview, each first tab includes one or more games/links;
when the first structure marking mode is a news type, each first short piece comprises one or more chapters.
9. The method of claim 8, wherein prior to the electronic device receiving a user adjustment operation for one or more of the first tabs, the method comprises:
When the first structure marking mode is a film and television play type, the electronic equipment displays a first interface, wherein the first interface comprises one or more fields corresponding to each first short film, and each field comprises at least one of an event, a background/introduction and details;
when the first structural marking mode is of a category of category interview, the electronic device displays a first interface including one or more games/links corresponding to each first tab, each game/link including at least one of a show, performance/content;
when the first structure marking mode is a news type, the electronic device displays a first interface, wherein the first interface comprises one or more chapters corresponding to each first short piece, and each chapter comprises at least one of a guide language, a main program and a later interview/comment.
10. A video stripping method, applied to a server, comprising:
the server receives a splitting request for a first video sent by electronic equipment;
the server identifies each structural unit contained in the first video according to a first structural marking mode corresponding to the first video, wherein each structural unit comprises one or more lenses;
The server splits the first video into a plurality of first short sheets according to each structural unit and a splitting rule contained in the first video, wherein each first short sheet comprises one or more structural units;
the server sends the plurality of first tabs to the electronic device.
11. The method according to claim 10, wherein the server identifies each structural unit included in the first video according to a first structural marking mode corresponding to the first video, including:
the server extracts the corresponding characteristics of each lens included in the first video;
and marking each lens in the first video as a structural unit corresponding to the first structural mark based on the corresponding characteristic of each lens in the first video.
12. The method according to claim 10 or 11, wherein the server splits the first video into a plurality of first short pieces according to each structural unit and splitting rule contained in the first video, comprising:
and the server splits the first video into a plurality of first short sheets according to the structural units in the first video, the association degree between the largest structural units and the splitting rule.
13. The method according to any of claims 10-12, wherein the split request includes the first structure marking means.
14. The method according to any one of claims 10 to 13, wherein,
the first structure marking mode corresponds to the type of the first video.
15. The method according to any one of claims 10 to 14, wherein,
the first structure marking mode is any one of film and television drama type, variety interview type, news type, documentary type, sports event type and concert type.
16. The method according to any one of claims 10 to 15, wherein,
when the first structure marking mode is a film and television play type, the structure unit corresponding to the first structure marking mode comprises one or more of a field, an event, a background/introduction and details;
when the first structural tagging approach is of the category of a category interview, the structural units corresponding to the first structural tagging approach include one or more of games/links, plays, shows/content;
when the first structure marking mode is a news type, the structural unit corresponding to the first structure marking mode comprises one or more of a chapter, a guide language and a main program, and is used for later interviewing/commenting.
17. The method according to any one of claims 10 to 16, wherein,
in the corresponding structural units of the movie and television drama, the field is the largest structural unit; among the building blocks corresponding to the variety interviews, the game/link is the largest building block; among the structural units corresponding to news, the chapter is the largest structural unit.
18. The method according to claim 16 or 17, wherein,
when the first structure marking mode is film and television drama type, each first short film comprises one or more fields;
when the first structural tagging approach is of the category of category interview, each first tab includes one or more games/links;
when the first structure marking mode is a news type, each first short piece comprises one or more chapters.
19. The method of any of claims 10-18, wherein after the server sends the plurality of first tabs to the electronic device, the method further comprises:
the server receives adjustment operation of a user aiming at one or more first short sheets, which is sent by the electronic equipment;
and the server updates the splitting rule according to the adjustment operation.
20. The method of claim 19, wherein after the server updates the splitting rule according to the adjustment operation, the method further comprises:
the server splits the first video into a plurality of second short pieces according to each structural unit contained in the first video and the updated splitting rule, wherein each second short piece comprises one or more structural units;
the server sends the plurality of second tabs to the electronic device.
21. The method of claim 20, wherein the adjusting operation for one or more of the first tabs comprises:
an operation of deleting one or more structural units from one or more of the first tabs;
and/or, combining two or more operations of the first short sheets;
and/or modifying the operation of one or more of the structural units in one or more of the first tabs.
22. An electronic device, comprising: a processor, a memory, and a display screen, the memory, the display screen being coupled to the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when read from the memory by the processor, cause the electronic device to perform the video striping method of any of claims 1-9.
23. A server, comprising: a processor, a memory, and a communication interface, the memory coupled with the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when read from the memory by the processor, cause the server to perform the video striping method of any of claims 10-21.
24. A computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the video striping method of any one of claims 1-9.
25. A computer readable storage medium comprising computer instructions which, when run on a server, cause the server to perform the video striping method of any one of claims 10-21.
CN202210114425.XA 2021-12-30 2022-01-30 Video stripping method, server and system Pending CN116419039A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111657170 2021-12-30
CN2021116571703 2021-12-30

Publications (1)

Publication Number Publication Date
CN116419039A true CN116419039A (en) 2023-07-11

Family

ID=87055283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210114425.XA Pending CN116419039A (en) 2021-12-30 2022-01-30 Video stripping method, server and system

Country Status (1)

Country Link
CN (1) CN116419039A (en)

Similar Documents

Publication Publication Date Title
US11438637B2 (en) Computerized system and method for automatic highlight detection from live streaming media and rendering within a specialized media player
US20210168442A1 (en) Computerized system and method for automatically detecting and rendering highlights from streaming videos
EP3855753B1 (en) Method and apparatus for locating video playing node, device and storage medium
US9918040B2 (en) Video preview during trick play
CN111031400B (en) Barrage presenting method and system
CN101523346A (en) Image layout constraint generation
US20210314668A1 (en) Display Device And Content Recommendation Method
CN106445997B (en) Information processing method and server
CN111586466B (en) Video data processing method and device and storage medium
KR102111720B1 (en) Method for design recommending using cloud literary work analysis
CN114117128A (en) Method, system and equipment for video annotation
US11991420B2 (en) Live commenting processing method and system
CN113709544A (en) Video playing method, device, equipment and computer readable storage medium
CN114615511A (en) Bullet screen key content skipping method and bullet screen skipping method
CN116419039A (en) Video stripping method, server and system
CN114697741B (en) Multimedia information playing control method and related equipment
CN114640876B (en) Multimedia service video display method, device, computer equipment and storage medium
CN113965798A (en) Video information generating and displaying method, device, equipment and storage medium
CN113655895A (en) Information recommendation method and device applied to input method and electronic equipment
CN107872730A (en) The acquisition methods and device of a kind of insertion content in video
WO2022188563A1 (en) Dynamic cover setting method and system
WO2023000950A1 (en) Display device and media content recommendation method
KR102424061B1 (en) Control method and system according to characteristics of video content playback environment
CN117224942A (en) Game interaction method and device and electronic equipment
CN117768667A (en) Picture configuration method, device, equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination