CN113286173B

CN113286173B - Video editing method and device

Info

Publication number: CN113286173B
Application number: CN202110547114.8A
Authority: CN
Inventors: 陈祥雨; 刘伟科; 李同猛; 韩卫召; 沈俊杰
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2023-08-04
Anticipated expiration: 2041-05-19
Also published as: CN113286173A

Abstract

The invention discloses a video editing method and device, and relates to the technical field of computers. The specific implementation mode of the method comprises the following steps: acquiring a video to be processed; determining a target video segment from the video to be processed according to one or more preset object identifiers, wherein the target video segment comprises demonstration contents of objects corresponding to the object identifiers; and editing the video to be processed according to the position of the target video segment in the video to be processed so as to obtain the target video segment. The implementation mode can improve video processing efficiency and greatly reduce time cost and labor cost.

Description

Video editing method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a video editing method and apparatus.

Background

The video processing refers to operations such as video playing, recording, editing, converting, compressing, detecting and the like through hardware equipment or software facilities so as to meet various using purposes.

The existing video editing technology is generally applied in a manual editing mode, the starting time of a required video segment is determined by manually watching a video, and editing is performed by software.

The existing video editing mode has low processing efficiency, especially when facing to large-batch video data volume, the labor cost is extremely high, so that the basic video processing requirement cannot be met.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a video editing method and apparatus, which can improve video processing efficiency, and even for massive video data, can rapidly process the video data, thereby greatly reducing time cost and labor cost.

To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a video editing method including:

acquiring a video to be processed;

determining a target video segment from the video to be processed according to one or more preset object identifiers, wherein the target video segment comprises demonstration contents of objects corresponding to the object identifiers;

and editing the video to be processed according to the position of the target video segment in the video to be processed so as to obtain the target video segment.

Optionally, determining the target video segment from the video to be processed according to one or more preset object identifiers includes:

extracting audio to be processed from the video to be processed;

Converting the audio to be processed into a text to be processed;

determining a target vocabulary corresponding to the object identifier in the text to be processed;

and determining the target video segment according to the position of the target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed.

Optionally, when the target vocabulary is a plurality of target vocabulary,

and determining the time interval between every two target vocabularies according to the corresponding relation, and determining that the target vocabularies correspond to the same target video segment when the time interval is smaller than a preset time interval threshold value.

Optionally, when the same target video segment corresponds to a plurality of target vocabularies, the same target video segment is targeted to:

according to the arrangement sequence of the target words in the text to be processed, determining a first target word appearing first and a second target word appearing last from the target words;

and determining the target video segment according to the positions of the first target vocabulary and the second target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed.

Optionally, determining a first moment when the first target vocabulary appears in the to-be-processed video and a second moment when the second target vocabulary appears in the to-be-processed video according to the positions of the first target vocabulary and the second target vocabulary in the to-be-processed text and the time corresponding relation between the to-be-processed text and the to-be-processed video;

and taking the first moment as the starting moment of the target video segment and the second moment as the ending moment of the target video segment.

determining the starting time of the target video clip according to the first time and a preset first time interval;

and determining the ending time of the target video segment according to the second time and a preset second time interval.

Optionally, the editing the video to be processed according to the position of the target video segment in the video to be processed includes:

and editing the video to be processed according to the starting time and the ending time of the target video segment.

Optionally, the editing the video to be processed according to the position of the target video segment in the video to be processed includes: determining a clipping starting point of the video to be processed according to the starting time of the target video segment and a preset third time interval;

determining a clipping ending point of the video to be processed according to the ending time of the target video segment and a preset fourth time interval;

and editing the video to be processed according to the editing starting point and the editing ending point of the video to be processed.

Optionally, the video to be processed is a live video, and further includes:

and generating a detail page of the object according to the target video segment.

Optionally, when the same item identification corresponds to a plurality of the target video clips,

sequencing a plurality of target video clips corresponding to the same article identifier according to feedback data corresponding to the target video clips;

And selecting the target video segment with the strongest positive feedback from the target video segments according to the sorting result, and generating the detail page according to the selected target video segment.

According to still another aspect of an embodiment of the present invention, there is provided a video clip apparatus including:

the acquisition module is used for acquiring the video to be processed;

the video processing module is used for determining a target video segment from the video to be processed according to one or more preset article identifiers, wherein the target video segment comprises demonstration contents of articles corresponding to the article identifiers;

and the video clipping module is used for clipping the video to be processed according to the position of the target video fragment in the video to be processed so as to obtain the target video fragment.

According to another aspect of an embodiment of the present invention, there is provided a video clip electronic device including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the video editing method provided by the present invention.

According to still another aspect of an embodiment of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements the video editing method provided by the present invention.

One embodiment of the above invention has the following advantages or benefits: because the technical means of extracting the audio from the video, identifying the audio to obtain the text, searching the keywords to determine the starting time of the target video segment and then editing the video to obtain the target video segment are adopted, the technical problems that the existing video editing mode is low in processing efficiency and extremely high in labor cost, and the basic video processing requirement cannot be met are solved, and the technical effects that the video processing efficiency can be improved, and the time cost and the labor cost are reduced to meet the video processing requirement are achieved.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 illustrates an exemplary system architecture diagram suitable for application to a video editing method or video editing apparatus of an embodiment of the present invention;

FIG. 2 is a schematic diagram of the main flow of a video editing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a method of determining a target video clip according to one embodiment of the invention;

FIG. 4 is a schematic diagram of a video segmentation method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a method of determining a target video clip according to another embodiment of the present invention;

FIG. 6 is a schematic diagram of the main modules of a video editing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Ffmteg: the open source computer program can be used for recording, converting digital audio and video, and streaming audio and video. Ffmpeg employs LGPL or GPL licenses, including the audio/video codec library libavcodec.

Speech-to-Text: google's AI technology provides a supporting API (Application Programming Interface, application program interface) that can convert speech into text.

FIG. 1 illustrates an exemplary system architecture diagram suitable for application to a video editing method or video editing apparatus of an embodiment of the present invention, as illustrated in FIG. 1, the exemplary system architecture of a video editing method or video editing apparatus of an embodiment of the present invention includes:

as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a video-type application, a shopping-type application, a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server providing support for video-type websites browsed by users using the terminal devices 101, 102, 103. The background management server may perform analysis and other processing on the received data such as the video clip request, and feed back the processing result (e.g., a plurality of video clips) to the terminal devices 101, 102, 103.

It should be noted that, the video editing method provided in the embodiment of the present invention is generally executed by the server 105, and accordingly, the video editing apparatus is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 is a schematic diagram of main flow of a video editing method according to an embodiment of the present invention, and as shown in fig. 2, the video editing method of the present invention includes:

step S201, a video to be processed is acquired.

Live broadcast with goods is taken as an emerging shopping form, is deeply favored by consumers, and greatly promotes the growth and development of the electronic commerce industry. Often, more than ten or even tens of commodities are taken in a single live broadcast, wherein sales, evaluation and attention data of some commodities are excellent, but the data cannot be reasonably and deeply utilized in the single live broadcast, and corresponding live broadcast video also becomes redundant data.

In order to improve the utilization rate of live videos, high-quality sku short videos can be recommended to merchants, so that the merchants can place the sku short videos on merchant detail pages to improve merchant detail conversion rate.

In the embodiment of the invention, when video clipping is performed, the video to be processed is acquired from the designated video storage position of the server. Specifically, the video is loaded from a designated video storage location of the server.

In the embodiment of the invention, the FFmpeg is called to load the video by setting the API of the FFmpeg. When the Java language is adopted, the loading video can call File functions, such as: file f=new File ($path); the path is a storage path of the video to be processed.

In the embodiment of the invention, the storage space is allocated for the video in advance, the video is stored in the designated video storage position, and different storage spaces can be allocated according to different categories of the video, so that the video can be selected from the storage spaces according to the requirement in the later period. The video to be processed can be a live video, and a storage space is allocated for the live video in advance, so that after live video is completed each time, the system can store the live video in a designated live video storage area, and the live video can be generated by means of point burying or recording and the like.

In the embodiment of the invention, the audio to be processed can be extracted from the video to be processed through the audio capturing technology of the multimedia video processing tool FFmpeg, and the FFmpeg has the functions of video acquisition, video format conversion, video screenshot, video watermarking, video audio processing (audio extraction, video cutting, image audio synthesis) and the like, and can be used for extracting the audio in the video, extracting the image in the video, editing the video and the like.

Step S202, determining a target video segment from the video to be processed according to one or more preset object identifiers, wherein the target video segment comprises the demonstration content of the object corresponding to the object identifier.

In the embodiment of the invention, the acquired video to be processed can comprise demonstration contents of a plurality of articles, and in order to acquire the demonstration contents of the target articles, the target video segments are determined from the video to be processed according to one or more preset article identifiers, and the target video segments comprise the demonstration contents of the articles corresponding to the article identifiers.

In the embodiment of the invention, one target object may preset one or more object identifiers, and the object identifiers may be objects sku, or related information of the objects, etc.

As shown in fig. 3, a method for determining a video clip according to an embodiment of the present invention includes:

step S301, extracting audio to be processed from the video to be processed.

In the embodiment of the invention, the storage space is allocated for the audio in advance, the audio is stored in the designated audio storage position, and different audio storage spaces can be allocated according to different categories of video, so that the audio can be selected from the audio storage spaces according to the requirement in the later period. The audio to be processed can be live audio extracted from live video, a storage space is allocated for the live audio in advance, and the extracted live audio is stored in a designated live audio storage area.

In the embodiment of the invention, when the audio to be processed is extracted from the video to be processed by calling the FFmpeg API, for example, the extraction command is: ffmpeg-ixxx.mp4-vn-y-acode copy xxx.mfa, wherein,

-i: representing a specified input file name;

-vn: representing removal of the video stream;

-y: directly covering the same-name file without confirmation during the representation output;

acode: indicating that the voice codec is set (if not, the same codec as the input stream is used).

Step S302, the audio to be processed is converted into text to be processed.

In the embodiment of the invention, the audio to be processed is acquired from the appointed live audio storage position, and the audio to be processed is converted into the text to be processed.

In the embodiment of the present invention, as shown in fig. 4, the acquired audio to be processed may be cut in segments to obtain a plurality of segments, and the plurality of segments are respectively converted into a plurality of texts. For example, the acquired audio to be processed is divided into one section at intervals of 5 s-10 s.

In the embodiment of the invention, the Speech-to-Text technology is called to identify the audio to be processed through setting the API of the Speech-to-Text, the audio to be processed is converted into the Text to be processed, and the Text to be processed and the audio to be processed are correspondingly stored. Further, when the audio to be processed is segmented, each segment is identified by calling a special-to-Text API, and identification Text of each segment is obtained.

In the embodiment of the invention, speech recognition can be performed by adopting the modes of synchronous recognition, asynchronous recognition and the like of the Speech-to-Text, wherein the synchronous recognition refers to that audio data is sent to a Speech-to-Text API, recognition is performed on the audio data, and a recognition result is returned after all the audio data are recognized, so that the Speech recognition method is applicable to the audio data with the duration of not more than 1 minute; asynchronous recognition refers to sending audio data to a Speech-to-Text API, performing recognition on the audio data, and periodically polling to obtain recognition results applicable to audio data having a duration of no more than 480 minutes.

Step S303, determining a target vocabulary corresponding to the object identification in the text to be processed.

In the embodiment of the invention, after the text to be processed is acquired, searching is carried out in the text to be processed according to a preset target vocabulary, the target vocabulary corresponding to the object identifier in the text to be processed is determined, and the position of the target vocabulary in the text to be processed is acquired; the target vocabulary can be search keywords and close meaning words, synonyms and the like determined according to the object identification.

In the embodiment of the invention, an accurate target vocabulary library can be constructed according to the target vocabulary corresponding to the object identifier. For example, when constructing a vocabulary library, for similar items skus, it is necessary to analyze differences between similar skus, distinguish different target vocabularies of the similar skus, and further construct an accurate target vocabulary library to match with the text to be processed.

Further, in order to achieve accurate matching of the object sku, word segmentation matching is conducted on the text to be processed by means of a bidirectional maximum matching algorithm when searching for a target word in the text to be processed. The bidirectional maximum matching method is to divide words of a text to be processed by utilizing a forward maximum matching method (FMM) and a reverse maximum matching method (BMM) respectively, and output word division results according to the principle that the more large-granularity words are better, the fewer non-dictionary words and single words are better.

The forward maximum matching method (FMM) is to fetch words from left to right, wherein the maximum length of the fetched words is the length of long words in the dictionary, and one word is subtracted from the right side each time until 1 single word exists or remains in the dictionary; reverse maximum matching (BMM) is a word from right to left, and other logic is the same as forward.

Step S304, determining a target video segment according to the position of the target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed.

In the embodiment of the invention, after the position of the target vocabulary in the text to be processed is determined, the time when the target vocabulary appears in the video to be processed can be determined according to the time corresponding relation between the text to be processed and the video to be processed. And determining the starting time and the ending time of the article demonstration corresponding to the article identification according to the starting time and the ending time of the target vocabulary corresponding to the article identification in the video to be processed, and further determining the target video segment corresponding to the demonstration content of the article.

In the embodiment of the invention, when the acquired audio to be processed is segmented and sheared, the starting time and the ending time of the segment can be marked for each segment, the segments are respectively converted into the texts, the target vocabulary corresponding to the object identifier in each text is determined, and the position of the target vocabulary in each text is acquired. After the position of the target vocabulary in each text is determined, the occurrence time of the target vocabulary in the video to be processed can be determined according to the time corresponding relation between each text and the corresponding segment.

In the embodiment of the invention, the demonstration content of a plurality of similar objects possibly appears in one video to be processed, in order to obtain the target video segment of the accurate target object, the interference of the demonstration content of the similar objects appearing in other time periods in the video to be processed is prevented, when a plurality of target words are adopted, the time interval between every two target words is determined according to the time corresponding relation between the text to be processed and the video to be processed, and when the time interval is smaller than the preset time interval threshold, the target words are determined to correspond to the same target video segment; and when the time interval is not smaller than the preset time threshold, determining that the target vocabulary corresponds to different target video clips. The preset time threshold can be set according to needs, and can be the average time of one article demonstration in the live video, for example, 5min.

In the embodiment of the invention, when the same target video segment corresponds to a plurality of target words, a first target word appearing first and a second target word appearing last are determined from the plurality of target words according to the arrangement sequence of the plurality of target words in the text to be processed aiming at the same target video segment. And determining a target video segment according to the positions of the first target vocabulary and the second target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed.

In the embodiment of the invention, determining the target video segment according to the positions of the first target vocabulary and the second target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed comprises the following steps:

and determining a first moment of the first target vocabulary in the video to be processed according to the position of the first target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed, and taking the first moment as the starting moment of the target video segment. And determining a second moment of occurrence of the second target vocabulary in the video to be processed according to the position of the second target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed, and taking the second moment as the ending moment of the target video segment.

Or alternatively, the process may be performed,

according to the position of the first target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed, determining a first moment when the first target vocabulary appears in the video to be processed, and according to the first moment and a preset first time interval, determining the starting moment of the target video segment. Determining a second moment when the second target vocabulary appears in the to-be-processed video according to the position of the second target vocabulary in the to-be-processed text and the time corresponding relation between the to-be-processed text and the to-be-processed video, and determining the ending moment of the target video segment according to the second moment and a preset second time interval. The preset first time interval and the preset second time interval can be set according to requirements, and can be a shorter time interval, such as 3s, 5s and 8s.

In the embodiment of the invention, the audio to be processed is extracted from the video to be processed; converting the audio to be processed into a text to be processed; determining a target vocabulary corresponding to the object identifier in the text to be processed; determining the target video segment according to the position of the target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed, and the like, so that the video processing efficiency can be improved, and the time cost and the labor cost can be reduced.

As shown in fig. 5, a method for determining a video clip according to another embodiment of the present invention includes:

step S501, extracting a to-be-processed image from a to-be-processed video.

In the embodiment of the invention, the images to be processed in the video to be processed can be extracted according to frames, so that multi-frame images to be processed corresponding to the video to be processed are obtained.

Step S502, determining a target vocabulary corresponding to the object identification in the image to be processed.

In the embodiment of the invention, after a plurality of frames of images to be processed are acquired, the characters in the images are acquired by carrying out image recognition on each frame of images to be processed, and the target vocabulary corresponding to the object identification in the images to be processed is determined.

Step S503, determining a target video segment according to the image to be processed corresponding to the target vocabulary and the time corresponding relation between the image to be processed and the video to be processed.

Step S203, clipping the video to be processed according to the position of the target video segment in the video to be processed, so as to obtain the target video segment.

In the embodiment of the invention, after the position of the target video segment in the video to be processed is determined, the video to be processed is clipped according to the starting time and the ending time of the target video segment.

Or alternatively, the process may be performed,

after the position of the target video segment in the video to be processed is determined, determining the clipping starting point of the video to be processed according to the starting time of the target video segment and a preset third time interval; and determining the clipping ending point of the video to be processed according to the ending time of the target video segment and a preset fourth time interval. And editing the video to be processed according to the editing starting point and the editing ending point of the video to be processed. The preset third time interval and the preset fourth time interval may be set as required, and may be a shorter time interval, for example, 3s, 5s, and 8s.

In the embodiment of the invention, the video to be processed can be clipped by calling the FFmpeg API, for example, the clipping command is as follows: ffmpeg-ss xx: xx-ixxx.mp4-c copy-t xx output.mp4", wherein,

-ss: representing a clip start time;

-c copy: the representation is directly copied without recoding;

-t: representing the length of the segmentation time.

Specifically, "ffmpeg-ss 00:00:00-i zb1.Mp4-c copy-t 600output. Mp4" means:

the video with the video name of "zb1.Mp4" is clipped from 00:00:00, the clipping time length is 600s, and the frequency name of the target video clip is "output. Mp4".

The time range of the target video clip is determined through video and audio extraction and audio identification matching positioning, clipping is carried out according to the determined time range, and the like, so that the technical problems that manual clipping cost is high, time consumption is long, efficiency is low, batch video cannot be completed in a short time are solved, video clipping efficiency can be improved, time cost and labor cost of video clipping are reduced, and video clipping efficiency is improved.

In the embodiment of the invention, the first moment is taken as the starting moment of the target video segment, the second moment is taken as the ending moment of the target video segment, and the video to be processed is clipped according to the starting moment and the ending moment of the target video segment.

Or alternatively, the process may be performed,

the starting time of the target video clip can be determined according to the first time and a preset first time interval; and determining the ending time of the target video segment according to the second time and the preset second time interval, and editing the video to be processed according to the starting time and the ending time of the target video segment.

Or alternatively, the process may be performed,

the first time may be taken as a start time of the target video clip and the second time may be taken as an end time of the target video clip; and determining a clipping starting point of the video to be processed according to the starting time of the target video segment and a preset third time interval, and determining a clipping ending point of the video to be processed according to the ending time of the target video segment and a preset fourth time interval. And editing the video to be processed according to the editing starting point and the editing ending point of the video to be processed.

Or alternatively, the process may be performed,

the starting time of the target video clip can be determined according to the first time and a preset first time interval; determining the ending time of the target video segment according to the second time and a preset second time interval; and determining a clipping starting point of the video to be processed according to the starting time of the target video segment and a preset third time interval, and determining a clipping ending point of the video to be processed according to the ending time of the target video segment and a preset fourth time interval. And editing the video to be processed according to the editing starting point and the editing ending point of the video to be processed.

Or alternatively, the process may be performed,

the starting time of the target video clip can be determined according to the first time or the first time and a preset first time interval; and determining the ending time of the target video segment according to the second time or the second time and a preset second time interval.

And determining a clipping starting point of the video to be processed according to the starting time or the starting time and a preset third time interval of the target video segment, and determining a clipping ending point of the video to be processed according to the ending time or the ending time and a preset fourth time interval of the target video segment. And editing the video to be processed according to the editing starting point and the editing ending point of the video to be processed.

In the embodiment of the invention, the detail page of the object corresponding to the target video clip can be generated according to the target video clip; the target video clip can be converted into a video link to be provided for a business detail page, a video playing tag is arranged on the business detail page for loading the video, and a user can click to watch after entering the business detail page.

Further, when the same object identifier corresponds to a plurality of target video clips, the plurality of target video clips corresponding to the same object identifier can be ordered according to feedback data corresponding to the target video clips; and selecting the target video segment with the strongest positive feedback from the target video segments according to the sorting result, and generating a detail page according to the selected target video segment. The feedback data can be data of sales volume ranking, online number ranking of playing, attention number ranking or evaluation ranking of target video clips, and the target video clips with the highest sales volume, the largest online number of playing, the largest attention number of playing or the best evaluation are determined through screening, and detail pages are generated according to the target video clips with the strongest positive feedback. Illustratively, the ordering algorithm may select a bubbling ordering method, a heap ordering method, an insertion ordering method, or the like.

The method has the advantages that the target video segment generator detailed page is adopted, the problems that the existing detailed video is required to be independently customized and made by script, the time and labor cost are extremely high, and the utilization rate of live video with goods is low are solved, the live video can be fully utilized, and the conversion rate of detailed pages of the manufacturer is improved. Further, the existing video related data are analyzed, and the short video which is optimal for the data of the same SKU can be recommended to the merchant, so that the merchant can place the optimal short video on a merchant detail page, and the merchant detail page conversion rate is improved.

In the embodiment of the invention, the video to be processed is acquired; determining a target video segment from the video to be processed according to one or more preset object identifiers, wherein the target video segment comprises demonstration contents of objects corresponding to the object identifiers; editing the video to be processed according to the position of the target video segment in the video to be processed to obtain the target video segment, and the like, so that the video processing efficiency can be improved, and the time cost and the labor cost are greatly reduced.

Further, the utilization rate of the live video can be improved, and the detailed conversion rate of the quotient is improved.

Fig. 6 is a schematic diagram of main modules of a video editing apparatus according to an embodiment of the present invention, and as shown in fig. 6, a video editing apparatus 600 of the present invention includes:

The acquiring module 601 is configured to acquire a video to be processed.

In the embodiment of the present invention, when video editing is performed, the obtaining module 601 obtains a video to be processed from a designated video storage location of a server. Specifically, the video is loaded from a designated video storage location of the server.

The video processing module 602 is configured to determine, according to one or more preset item identifiers, a target video segment from the video to be processed, where the target video segment includes presentation contents of an item corresponding to the item identifier.

In the embodiment of the present invention, the video to be processed acquired by the acquiring module 601 may include presentation contents of a plurality of items, and in order to acquire the presentation contents of a target item, the video processing module 602 determines a target video clip from the video to be processed according to one or more preset item identifiers, where the target video clip includes the presentation contents of the item corresponding to the item identifier.

And a video clipping module 603, configured to clip the video to be processed according to the position of the target video segment in the video to be processed, so as to obtain the target video segment.

In the embodiment of the present invention, after the video processing module 602 determines the position of the target video segment in the video to be processed, the video clipping module 603 clips the video to be processed according to the start time and the end time of the target video segment.

Or alternatively, the process may be performed,

after the video processing module 602 determines the position of the target video segment in the video to be processed, determining a clipping start point of the video to be processed according to the start time of the target video segment and a preset third time interval; and determining the clipping ending point of the video to be processed according to the ending time of the target video segment and a preset fourth time interval. The video clipping module 603 clips the video to be processed according to the clipping start point and the clipping end point of the video to be processed. The preset third time interval and the preset fourth time interval may be set as required, and may be a shorter time interval, for example, 3s, 5s, and 8s.

In the embodiment of the invention, the video processing efficiency can be improved through the acquisition module, the video processing module, the video editing module and other modules, and the time cost and the labor cost are greatly reduced.

Fig. 7 is a schematic structural diagram of a computer system suitable for use in implementing a terminal device or a server according to an embodiment of the present invention, and as shown in fig. 7, a computer system 700 of a terminal device or a server according to an embodiment of the present invention includes:

a Central Processing Unit (CPU) 701, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the system 700 are also stored. The CPU701, ROM702, and RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes an acquisition module, a video processing module, and a video clip module. The names of these modules do not constitute a limitation on the module itself in some cases, and for example, the video editing module may also be described as "a module that clips a video to be processed according to the position of a target video clip in the video to be processed".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: acquiring a video to be processed; determining a target video segment from the video to be processed according to one or more preset object identifiers, wherein the target video segment comprises demonstration contents of objects corresponding to the object identifiers; and editing the video to be processed according to the position of the target video segment in the video to be processed so as to obtain the target video segment.

According to the technical scheme provided by the embodiment of the invention, the video processing efficiency can be improved, and even massive video data can be rapidly processed, so that the time cost and the labor cost are greatly reduced.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A video editing method, comprising:

acquiring a video to be processed;

extracting audio to be processed from the video to be processed; converting the audio to be processed into a text to be processed; according to one or more preset article identifiers, determining a target vocabulary corresponding to the article identifiers in the text to be processed, wherein the target vocabulary comprises: searching in the text to be processed according to a preset target vocabulary after the text to be processed is acquired, determining the target vocabulary corresponding to the object identifier in the text to be processed, and acquiring the position of the target vocabulary in the text to be processed; the target vocabulary can be search keywords, paraphraseology and synonyms determined according to the object identifiers, and a target vocabulary library is constructed according to the target vocabulary corresponding to the object identifiers;

determining a target video segment according to the position of the target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed, wherein the target video segment comprises the demonstration content of the object corresponding to the object identifier; when the number of the target words is multiple, determining the time interval between every two target words according to the corresponding relation, and when the time interval is smaller than a preset time interval threshold value, determining that the target words correspond to the same target video segment;

Editing the video to be processed according to the position of the target video segment in the video to be processed to obtain the target video segment, wherein the method comprises the following steps: when the same target video segment corresponds to a plurality of target vocabularies, aiming at the same target video segment: and determining a first target vocabulary appearing first and a second target vocabulary appearing last from the plurality of target vocabularies, and determining the target video segment according to the positions of the first target vocabulary and the second target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed.

2. The method as recited in claim 1, further comprising:

and determining the first target vocabulary appearing first and the second target vocabulary appearing last from the target vocabularies according to the arrangement sequence of the target vocabularies in the text to be processed.

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

determining a first moment when the first target vocabulary appears in the to-be-processed video and a second moment when the second target vocabulary appears in the to-be-processed video according to the positions of the first target vocabulary and the second target vocabulary in the to-be-processed text and the time corresponding relation between the to-be-processed text and the to-be-processed video;

4. The method of claim 2, wherein the step of determining the position of the substrate comprises,

5. The method according to claim 3 or 4, wherein the editing the video to be processed according to the position of the target video clip in the video to be processed comprises:

6. The method according to claim 3 or 4, wherein the editing the video to be processed according to the position of the target video clip in the video to be processed comprises:

determining a clipping starting point of the video to be processed according to the starting time of the target video segment and a preset third time interval;

7. The method as recited in claim 1, further comprising:

8. The method of claim 7, wherein when the same item identification corresponds to a plurality of the target video clips,

9. A video editing apparatus, comprising:

the acquisition module is used for acquiring the video to be processed;

the video processing module is used for extracting audio to be processed from the video to be processed according to one or more preset object identifiers; converting the audio to be processed into a text to be processed; determining a target vocabulary corresponding to the object identifier in the text to be processed, wherein the target vocabulary comprises: searching in the text to be processed according to a preset target vocabulary after the text to be processed is acquired, determining the target vocabulary corresponding to the object identifier in the text to be processed, and acquiring the position of the target vocabulary in the text to be processed; the target vocabulary can be search keywords, paraphraseology and synonyms determined according to the object identifiers, and a target vocabulary library is constructed according to the target vocabulary corresponding to the object identifiers;

A video clipping module, configured to clip the video to be processed according to the position of the target video segment in the video to be processed, so as to obtain the target video segment, where the video clipping module includes: when the same target video segment corresponds to a plurality of target vocabularies, aiming at the same target video segment: and determining a first target vocabulary appearing first and a second target vocabulary appearing last from the plurality of target vocabularies, and determining the target video segment according to the positions of the first target vocabulary and the second target vocabulary in the text to be processed and the time corresponding relation between the text to be processed and the video to be processed.

10. A video clip electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-8.

11. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.