CN111601160A

CN111601160A - Method and device for editing video

Info

Publication number: CN111601160A
Application number: CN202010476849.1A
Authority: CN
Inventors: 孙昊; 薛学通; 李甫; 孟骧龙; 迟至真; 文石磊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-08-28

Abstract

The embodiment of the disclosure discloses a method and a device for editing a video, and relates to the field of artificial intelligence computer vision. One embodiment of the method comprises: identifying a video frame of a video to be processed to obtain a scene segment of the video to be processed; determining content information of a video to be processed according to the scene clip, and extracting a cover image according to the content information; extracting a characteristic segment with a set time length of the scene segment; and combining the cover image and the feature fragment to form a feature video. The embodiment is beneficial to the accuracy and the effectiveness of acquiring the characteristic video.

Description

Method and device for editing video

Technical Field

Embodiments of the present disclosure relate to the field of artificial intelligence computer vision, and in particular, to a method and apparatus for editing a video.

Background

Along with the development of science and technology, various intelligent devices are applied to work, study and daily life of people, and the work and study efficiency and the daily life convenience of people are improved. The intelligent device usually has a video acquisition function, and people can acquire videos related to work and life through the intelligent device and upload the videos to a video server to be provided for other users. Moreover, the video has the characteristics of rich information and good spreading performance, and the popularization of the video is further promoted.

After the video is uploaded to the video server, technicians need to check the video content and perform processing such as classification and editing on the video.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for clipping a video.

In a first aspect, an embodiment of the present disclosure provides a method for clipping a video, the method including: identifying a video frame of a video to be processed to obtain a scene segment of the video to be processed; determining content information of a video to be processed according to the scene clip, and extracting a cover image according to the content information; extracting a characteristic segment with a set time length of the scene segment; and combining the cover image and the feature fragment to form a feature video.

In some embodiments, the identifying a video frame of a video to be processed to obtain a scene segment of the video to be processed includes: performing image recognition on a video frame of the video to be processed to obtain scene information corresponding to the video to be processed, wherein the scene information is used for representing a scene where video content is located; and dividing the video to be processed into scene segments according to the scene information.

In some embodiments, the determining content information of a video to be processed according to the scene segment includes: identifying an initial object image in the scene segment; counting the occurrence times and the occurrence time of the initial object image in the video to be processed, and determining target object information according to the occurrence times and the occurrence time; inquiring attribute information of the target object information, and determining content information of the video to be processed according to the attribute information, wherein the attribute information comprises at least one of the following items: name of the target object, purpose of the target object.

In some embodiments, said extracting a cover image from said content information comprises: acquiring a characteristic image in the scene segment, wherein the characteristic image is used for representing the characteristics of the scene segment; determining a cover image from matching the feature image with the content information.

In some embodiments, the combining the cover image and the feature clip into a feature video comprises: deleting the repeated feature segments in response to the feature segments being multiple.

In some embodiments, the combining the cover image and the feature clip into a feature video comprises: performing transition processing on adjacent feature segments in the feature video, wherein the transition processing comprises at least one of the following steps: color transition, scene transition, light transition.

In some embodiments, the method further comprises: and adding a title to the characteristic video according to the content information.

In a second aspect, an embodiment of the present disclosure provides an apparatus for clipping a video, the apparatus including: the scene segment acquiring unit is configured to identify a video frame of a video to be processed and acquire a scene segment of the video to be processed; a cover image acquisition unit configured to determine content information of a video to be processed according to the scene segment, and extract a cover image according to the content information; a feature segment extraction unit configured to extract a feature segment of a set time length of the scene segment; a feature video acquisition unit configured to combine the cover image and the feature clip to constitute a feature video.

In some embodiments, the scene-segment obtaining unit includes: the scene information acquisition subunit is configured to perform image recognition on a video frame of the video to be processed to obtain scene information corresponding to the video to be processed, wherein the scene information is used for representing a scene where video content is located; and the scene segment dividing unit is configured to divide the video to be processed into scene segments according to the scene information.

In some embodiments, the cover image acquisition unit includes: an initial object image identification subunit configured to identify an initial object image in the scene segment; the target object information determining subunit is configured to count the occurrence times and the occurrence time of the initial object image in the video to be processed, and determine target object information according to the occurrence times and the occurrence time; a content information determining subunit configured to query attribute information of the target object information, and determine content information of the video to be processed according to the attribute information, wherein the attribute information includes at least one of: name of the target object, purpose of the target object.

In some embodiments, the cover image acquisition unit includes: a characteristic image obtaining subunit, configured to obtain a characteristic image in the scene segment, where the characteristic image is used to characterize the characteristics of the scene segment; a cover image determination subunit configured to determine a cover image from matching the feature image with the content information.

In some embodiments, the feature video acquisition unit comprises: a video deletion subunit, responsive to the feature segment being multiple, configured to delete a duplicate feature segment.

In some embodiments, the feature video acquisition unit comprises: a transition processing subunit configured to perform transition processing on adjacent feature segments in the feature video, the transition processing including at least one of: color transition, scene transition, light transition.

In some embodiments, the apparatus further comprises: a title adding unit configured to add a title to the feature video according to the content information.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; memory having one or more programs stored thereon which, when executed by the one or more processors, cause the one or more processors to perform the method for clipping video of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method for clipping video of the first aspect described above.

The method and the device for editing the video provided by the embodiment of the disclosure firstly identify the video frame of the video to be processed and acquire the scene segment of the video to be processed; then, determining the content information of the video to be processed according to the scene clip, and extracting a cover image according to the content information, so that the accuracy of selecting the video by a user is improved; then extracting the feature segments of the scene segments with set time length, which is beneficial to improving the ornamental value of the feature video; and finally, combining the cover image and the feature fragments to form a feature video. The method and the device are beneficial to obtaining accuracy and effectiveness of the characteristic video.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic illustration according to a third embodiment of the present application;

FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 5 is a block diagram of an electronic device for implementing a method for clipping video of an embodiment of the present application;

FIG. 6 is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture 100 of a method for clipping video or an apparatus for clipping video to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various video client applications, such as a video browser application, a video playing plugin, a video search application, a video downloading tool, a video playing client, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (for example, for providing distributed services), or as a single software or software module, which is not specifically limited herein.

The server 105 may be a server that provides various services, such as a server that processes a video to be processed transmitted from the

terminal apparatuses

101, 102, 103. The server can analyze and process the received data such as the video to be processed and the like to obtain the characteristic video.

It should be noted that the method for clipping video provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for clipping video is generally disposed in the server 105.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, and is not limited specifically herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for clipping video in accordance with the present disclosure is shown. The method for clipping the video comprises the following steps:

step 201, identifying a video frame of a video to be processed, and acquiring a scene segment of the video to be processed.

In the present embodiment, the execution subject of the method for clipping a video (e.g., the server 105 shown in fig. 1) may receive a video to be processed from the

terminal apparatuses

101, 102, 103 by a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

The existing video classification or clipping is usually finished manually, and a large amount of videos cannot be dealt with. Differences exist between video auditors in convenience such as video auditing standards and video editing capabilities, and the video editing classification and the like are not standard. The manual review of the clips usually requires watching all the contents of the video, which takes a lot of time, has low video processing efficiency, is not easy to obtain the key contents of the video, and has insufficient video contents.

For this purpose, the execution subject may identify the video frame of the video to be processed, and obtain the video frame content of the video frame. The video frame content may include a human image, an animal image, a background image, and the like. The execution body may then determine a scene segment from the video frame content. For example: the content of the video frame comprises a basketball player, the background image is a basketball court, and the execution main body can determine that the video frame corresponds to a basketball playing scene. And then the execution main body divides the video to be processed according to the content of the video frame to obtain the basketball shooting scene segments. Therefore, the scene segment of the video to be processed can be obtained, and the accuracy and the effectiveness of obtaining the characteristic video are facilitated.

Step 202, determining content information of a video to be processed according to the scene segments, and extracting a cover image according to the content information.

After the scene segment is acquired, the execution main body can determine content information of the video to be processed according to the scene segment, and then selects a video frame capable of representing the content of the video to be processed from the video to be processed as a cover image according to the content information. For example, when the scene segment is playing basketball, the corresponding content information may be: basketball video. The executive may then select a video frame of a basketball or basketball player from the scene clip as the cover image. The video content of the video to be processed can be visually indicated through the cover image, and the accuracy of selecting the video by the user is improved.

Step 203, extracting the feature segment with the set time length of the scene segment.

To obtain the feature video, the execution subject may extract a feature segment of a set time length of the scene segment. The feature segments can represent representative contents in the corresponding scene segments. For example, if the scene segment is a basketball shot, the corresponding feature segment may be the goal of a famous basketball player. The specific content of the feature segments may be different for different scene segments. Therefore, the method is beneficial to improving the ornamental value of the characteristic video.

And step 204, combining the cover image and the characteristic segments to form a characteristic video.

After obtaining the cover image and the feature clip, the execution body may compose the feature video with the cover image as the cover of the feature clip. As can be seen from the above description, the feature video includes feature segments of each scene segment, and can highlight the content of the video to be processed. Therefore, the feature video which is obtained from the video to be processed and contains the main content of the video to be processed is achieved, the content of the video to be processed is highlighted, the efficiency of obtaining the feature video is improved, and the effectiveness of the feature video is improved.

With continued reference to FIG. 3, a flow 300 of one embodiment of a method for clipping video in accordance with the present disclosure is shown. The method for clipping the video comprises the following steps:

step 301, performing image recognition on a video frame of the video to be processed to obtain scene information corresponding to the video to be processed.

After the execution main body obtains the video to be processed, image recognition can be carried out on the video frame of the video to be processed, and the video frame content of the video frame is obtained. The video frame content may include images of persons, animals, background images, and the like. And then, analyzing the content of the video frame to determine scene information. The scene information may be used to characterize a scene in which the video content is located. For example, if the content of the video frame includes a basketball player and the background image is a basketball court, the execution subject may determine that the video frame corresponds to a basketball playing scene. The corresponding scene information may be basketball sports.

Step 302, dividing the video to be processed into scene segments according to the scene information.

After the scene information is determined, the main body can be executed to divide the content of the video to be processed according to the scene information, and scene fragments are obtained. In practice, the video to be processed may include one scene segment or may include a plurality of scene segments. Therefore, the video to be processed is divided according to the scene segments, and the accuracy of identifying the content of the video to be processed is improved.

Step 303, determining content information of the video to be processed according to the scene segment, and extracting a cover image according to the content information.

In some optional implementation manners of this embodiment, the determining content information of the video to be processed according to the scene segment may include the following steps:

in a first step, an initial object image in the scene segment is identified.

The execution subject may perform image recognition on the video frames contained in the scene segment to determine at least one chef object image contained in each video frame. Also taking the basketball-shooting scene segment as an example, performing subject recognition on the initial object image in the video frame of the scene segment may include: basketball player images, basketball court images, basketball rim images, team identification images, basketball court images, audience images, and the like. The initial object images may be different for different scene segments.

And secondly, counting the occurrence times and the occurrence time of the initial object image in the video to be processed, and determining the information of the target object according to the occurrence times and the occurrence time.

Generally, the main content of a video appears in the video the most times and for the longest time. For this, the execution subject may count the number of occurrences and the occurrence time of the initial object image in the video to be processed. And then, determining the target object information according to the occurrence times and the occurrence time. For example, when the scene segment is basketball, the number of occurrences of a certain player in the initial object image is large, and the occurrence time is also longest, the execution subject may determine that the target object information is the name of the player. In practice, there may be a case where the number of times a certain initial object image appears is the largest, but the appearance time is not the longest. At this point, the executing subject may still mark the initial object image as the target object. And the initial object image with the longest occurrence time is also marked as the target object. That is, there may be a plurality of target objects, and there may be a plurality of target object information associated therewith.

And thirdly, inquiring attribute information of the target object information, and determining content information of the video to be processed according to the attribute information.

The target object information is typically a description of the target object itself directly. In many cases, the video is an explanation of attribute information of a target object. Wherein the attribute information may include at least one of: name of the target object, use of the target object, occupation of the target object. The attribute information may be different for different target objects. For example, when the target object is a person, the attribute information may include a name of the target object, an occupation of the target object, but not a purpose of the target object. When the target object is a device (for example, a mobile phone), the attribute information may include a name of the target object, a purpose of the target object, and not include a occupation of the target object. Accordingly, the execution body can determine the content information of the video to be processed according to the attribute information. For example, if the target object information is the name of a basketball player, then querying the attribute information corresponding to the name may include the occupation of the target object: a basketball. At this time, the execution main body may set the content information of the video to be processed as the basketball video.

In some optional implementations of this embodiment, the extracting of the cover image according to the content information may include the following:

firstly, acquiring a characteristic image in the scene segment.

For example, when the scene segment is playing basketball, the feature image that the execution subject can acquire may be an image shot by a player, an image of a basketball court, an image of a player, and the like. When the scene image is football, the characteristic image that the executive main body can acquire can be an image of a shot of a player, an image of a ball carried by the player, an image of a football field and the like. That is, the feature image may be used to characterize the features of the scene segment.

And secondly, determining a cover image by matching the characteristic image with the content information.

After obtaining the feature image, the execution subject may match the feature image with the content information to determine a cover image. For example, the content information is a basketball video, and the feature images include an image of a basketball shot by a player, an image of a basketball court, an image of a player, and the like. Since the image shot by the player contains the work information of the basketball player, the basketball and the basketball, the correlation between the image shot by the player and the basketball video is considered to be higher, and the image shot by the player can be determined as the cover image.

Step 304, extracting the feature segment with the set time length of the scene segment.

The content of step 304 is the same as that of step 203, and is not described in detail here.

And 305, in response to the characteristic segment being multiple, deleting the repeated characteristic segment.

The video frame of the video to be processed can have a plurality of different scene segments, and the scene segments can be repeatedly appeared in the video to be processed. And the same scene segment can be represented by a feature segment. Therefore, in order to avoid the occurrence of repeated feature segments and improve the effectiveness of the feature video, the execution subject may delete the repeated feature segments, and only one feature segment is reserved for each feature segment. Meanwhile, in order to prevent the situations of overlarge file data and excessive resource occupation, the execution main body can also adjust the time of each feature segment, so that the video playing time of the finally obtained feature video is in a set time range, and the size of the video file is in a set file size.

And step 306, performing transition processing on adjacent feature segments in the feature video.

Different feature segments correspond to different scenes, and correspondingly, different feature segments also have differences in color, brightness and the like. In order to improve the playing effect of the video, the execution subject may perform transition processing on adjacent feature segments so that the feature videos have the same or similar visual characteristics when being played. Wherein the transition process may include at least one of: color transition, scene transition, light transition.

With continued reference to FIG. 4, a flow 400 of one embodiment of a method for clipping video in accordance with the present disclosure is shown. The method for clipping the video comprises the following steps:

step 401, identifying a video frame of a video to be processed, and acquiring a scene segment of the video to be processed.

The content of step 401 is the same as that of step 201, and is not described in detail here.

Step 402, determining content information of a video to be processed according to the scene segment, and extracting a cover image according to the content information.

The content of step 402 is the same as that of step 202, and is not described in detail here.

Step 403, extracting the feature segment with the set time length of the scene segment.

The content of step 403 is the same as that of step 203, and is not described in detail here.

And step 404, combining the cover image and the feature fragment to form a feature video.

The content of step 404 is the same as that of step 204, and is not described in detail here.

Step 405, adding a title to the feature video according to the content information.

After the characteristic video is obtained, in order to further improve the readability of the characteristic video, the execution main body can also add a title to the characteristic video according to the content information, so that a user can know the content of the characteristic video through characters. For example, if the content information is a basketball video and the target object information corresponding to the content information is a player a, the title may be: "basketball collection for athlete A", etc.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for clipping a video, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for clipping a video of the present embodiment may include: a scene section acquisition unit 501, a cover image acquisition unit 502, a feature section extraction unit 503, and a feature video acquisition unit 504. The scene segment acquiring unit 501 is configured to identify a video frame of a video to be processed, and acquire a scene segment of the video to be processed; a cover image acquisition unit 502 configured to determine content information of a video to be processed from the scene segment, and extract a cover image from the content information; a feature segment extraction unit 503 configured to extract a feature segment of a set time length of the scene segment; a feature video obtaining unit 504 configured to combine the cover image and the feature clip to form a feature video.

In some optional implementations of this embodiment, the scene segment obtaining unit 501 may include: a scene information acquisition sub-unit (not shown in the figure) and a scene segment dividing sub-unit (not shown in the figure). The scene information acquisition subunit is configured to perform image recognition on a video frame of the video to be processed to obtain scene information corresponding to the video to be processed, wherein the scene information is used for representing a scene where video content is located; and the scene segment dividing unit is configured to divide the video to be processed into scene segments according to the scene information.

In some optional implementations of this embodiment, the cover image acquiring unit 502 may include: an initial object image recognition subunit (not shown in the figure), a target object information determination subunit (not shown in the figure), and a content information determination subunit (not shown in the figure). Wherein the initial object image identification subunit is configured to identify an initial object image in the scene segment; the target object information determining subunit is configured to count the occurrence times and the occurrence time of the initial object image in the video to be processed, and determine target object information according to the occurrence times and the occurrence time; a content information determining subunit configured to query attribute information of the target object information, and determine content information of the video to be processed according to the attribute information, wherein the attribute information includes at least one of: name of the target object, purpose of the target object.

In some optional implementations of this embodiment, the cover image acquiring unit 502 may include: a feature image acquisition subunit (not shown in the figure) and a cover image determination subunit (not shown in the figure). The characteristic image acquisition subunit is configured to acquire a characteristic image in the scene segment, wherein the characteristic image is used for characterizing the characteristics of the scene segment; a cover image determination subunit configured to determine a cover image from matching the feature image with the content information.

In some optional implementations of this embodiment, the feature video obtaining unit 504 may include: a video deletion subunit (not shown in the figure) configured to delete a repeated feature segment in response to the feature segment being plural.

In some optional implementations of this embodiment, the feature video obtaining unit 504 may include: a transition processing subunit (not shown in the figure) configured to perform transition processing on adjacent feature segments in the feature video, the transition processing including at least one of: color transition, scene transition, light transition.

In some optional implementations of this embodiment, the apparatus 500 for clipping a video may further include: a title adding unit (not shown in the figure) configured to add a title to the feature video according to the content information.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, is a block diagram of an electronic device for a method of clipping a video according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for clipping video provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the method for clipping video provided by the present application.

The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for clipping video in the embodiment of the present application (for example, the scene-section acquiring unit 501, the cover-image acquiring unit 502, the feature-section extracting unit 503, and the feature-video acquiring unit 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implements the method for clipping video in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for clipping a video, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include memory located remotely from the processor 601, which may be connected to an electronic device for clipping video over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of clipping a video may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus for clipping video, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, firstly, a video frame of a video to be processed is identified, and a scene segment of the video to be processed is obtained; then, determining the content information of the video to be processed according to the scene clip, and extracting a cover image according to the content information, so that the accuracy of selecting the video by a user is improved; then extracting the feature segments of the scene segments with set time length, which is beneficial to improving the ornamental value of the feature video; and finally, combining the cover image and the feature fragments to form a feature video. The method and the device are beneficial to obtaining accuracy and effectiveness of the characteristic video.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for clipping a video, comprising:

identifying a video frame of a video to be processed to obtain a scene segment of the video to be processed;

determining content information of a video to be processed according to the scene clip, and extracting a cover image according to the content information;

extracting a characteristic segment with a set time length of the scene segment;

and combining the cover image and the feature fragment to form a feature video.

2. The method of claim 1, wherein the identifying the video frame of the video to be processed and obtaining the scene segment of the video to be processed comprises:

performing image recognition on a video frame of the video to be processed to obtain scene information corresponding to the video to be processed, wherein the scene information is used for representing a scene where video content is located;

and dividing the video to be processed into scene segments according to the scene information.

3. The method of claim 2, wherein the determining content information of the video to be processed according to the scene segment comprises:

identifying an initial object image in the scene segment;

counting the occurrence times and the occurrence time of the initial object image in the video to be processed, and determining target object information according to the occurrence times and the occurrence time;

inquiring attribute information of the target object information, and determining content information of the video to be processed according to the attribute information, wherein the attribute information comprises at least one of the following items: name of the target object, purpose of the target object.

4. The method of claim 1, wherein said extracting a cover image from the content information comprises:

acquiring a characteristic image in the scene segment, wherein the characteristic image is used for representing the characteristics of the scene segment;

determining a cover image from matching the feature image with the content information.

5. The method of claim 1, wherein the combining the cover image and feature snippet into a feature video comprises:

deleting the repeated feature segments in response to the feature segments being multiple.

6. The method of claim 1, wherein the combining the cover image and feature snippet into a feature video comprises:

performing transition processing on adjacent feature segments in the feature video, wherein the transition processing comprises at least one of the following steps: color transition, scene transition, light transition.

7. The method of any of claims 1 to 6, wherein the method further comprises:

and adding a title to the characteristic video according to the content information.

8. An apparatus for clipping video, comprising:

the scene segment acquiring unit is configured to identify a video frame of a video to be processed and acquire a scene segment of the video to be processed;

a cover image acquisition unit configured to determine content information of a video to be processed according to the scene segment, and extract a cover image according to the content information;

a feature segment extraction unit configured to extract a feature segment of a set time length of the scene segment;

a feature video acquisition unit configured to combine the cover image and the feature clip to constitute a feature video.

9. The apparatus of claim 8, wherein the scene-slice acquiring unit comprises:

the scene information acquisition subunit is configured to perform image recognition on a video frame of the video to be processed to obtain scene information corresponding to the video to be processed, wherein the scene information is used for representing a scene where video content is located;

and the scene segment dividing unit is configured to divide the video to be processed into scene segments according to the scene information.

10. The apparatus of claim 9, wherein the cover image acquisition unit comprises:

an initial object image identification subunit configured to identify an initial object image in the scene segment;

the target object information determining subunit is configured to count the occurrence times and the occurrence time of the initial object image in the video to be processed, and determine target object information according to the occurrence times and the occurrence time;

a content information determining subunit configured to query attribute information of the target object information, and determine content information of the video to be processed according to the attribute information, wherein the attribute information includes at least one of: name of the target object, purpose of the target object.

11. The apparatus of claim 8, wherein the cover image acquisition unit comprises:

a characteristic image obtaining subunit, configured to obtain a characteristic image in the scene segment, where the characteristic image is used to characterize the characteristics of the scene segment;

a cover image determination subunit configured to determine a cover image from matching the feature image with the content information.

12. The apparatus of claim 8, wherein the feature video acquisition unit comprises:

a video deletion subunit, responsive to the feature segment being multiple, configured to delete a duplicate feature segment.

13. The apparatus of claim 8, wherein the feature video acquisition unit comprises:

a transition processing subunit configured to perform transition processing on adjacent feature segments in the feature video, the transition processing including at least one of: color transition, scene transition, light transition.

14. The apparatus of any one of claims 8 to 13, wherein the apparatus further comprises:

a title adding unit configured to add a title to the feature video according to the content information.

15. An electronic device, comprising:

one or more processors;

a memory having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.