CN112135201B - Video production method and related device - Google Patents

Video production method and related device Download PDF

Info

Publication number
CN112135201B
CN112135201B CN202010890456.5A CN202010890456A CN112135201B CN 112135201 B CN112135201 B CN 112135201B CN 202010890456 A CN202010890456 A CN 202010890456A CN 112135201 B CN112135201 B CN 112135201B
Authority
CN
China
Prior art keywords
product
video
information
sub
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010890456.5A
Other languages
Chinese (zh)
Other versions
CN112135201A (en
Inventor
许雷
吴磊
王元吉
于志兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202010890456.5A priority Critical patent/CN112135201B/en
Publication of CN112135201A publication Critical patent/CN112135201A/en
Application granted granted Critical
Publication of CN112135201B publication Critical patent/CN112135201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Abstract

The embodiment of the application provides a video production method and a related device, wherein the method comprises the following steps: performing segmentation processing on a first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of a first product; acquiring target position information of product information of a first product in the first sub-video; replacing the product information of the first product in the first sub-video with the product information of a second product according to the target position information to obtain a second sub-video; and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product, so that a short video of the second product can be obtained, the video production efficiency is improved, and the video production cost is reduced.

Description

Video production method and related device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a video production method and a related apparatus.
Background
Along with the creative form that short video is often used when the advertiser promotes at present, the short video is good in effect and high in conversion rate and is favored by the advertiser. However, compared with the traditional picture creativity, the production of the short video requires the participation of more creators, such as actors, dubbing, and professionals at the later stage, and therefore, the production cost of the short video is higher.
Disclosure of Invention
The embodiment of the application provides a video production method and a related device.
A first aspect of an embodiment of the present application provides a video production method, including:
performing segmentation processing on the first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of the first product;
acquiring target position information of product information of a first product in a first sub-video;
replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;
and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.
In this example, a first sub-video is obtained by performing segmentation processing on a first product video, a target position of product information of a first product in the first sub-video is obtained, the product information of the first product is replaced by product information of a second product according to the target position information, a second sub-video is obtained, the second sub-video and a third sub-video are spliced to obtain a second product video, the second product video can be obtained, video production efficiency of the second product is improved, and video production cost is reduced.
With reference to the first aspect, in one possible implementation manner, the obtaining of the target location information of the product information of the first product in the first sub-video includes:
and identifying a plurality of image frames in the first sub-video through a video identification model to obtain the target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.
With reference to the first aspect, in one possible implementation manner, the identifying, by the video identification model, the multiple image frames in the first sub-video is performed to obtain the target position information of the product information of the first product in the first sub-video, where the target position information includes at least one of the following:
performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;
performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
identifying the audio clip included in the first sub-video through an audio identification model to obtain third position information of the audio information of the first product in the audio clip;
and carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
In this example, the product information includes a first product identifier, a first product name, and audio information of a first product, and the product information of the first product can be quickly and accurately obtained by obtaining the first product identifier according to image recognition, obtaining the first product name through character recognition, and obtaining the first product audio through an audio recognition method.
With reference to the first aspect, in one possible implementation manner, the replacing, by the information of the second product according to the target position information, the information of the first product in the first sub-video to obtain the second sub-video includes:
acquiring an audio track of audio information of a first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;
processing the audio track to obtain a first person audio track and a background music track;
replacing the audio information of the first product in the first voice track with the audio information of the second product according to the third position information to obtain a second voice track;
synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;
the second sub-video is determined based on the audio track of the second product audio information.
In this example, through replacing first person's sound track to obtain the second sub video, the realization that can be quick is to the replacement of the product information of first product, efficiency when having promoted the second sub video preparation.
With reference to the first aspect, in one possible implementation manner, the method further includes:
performing voice recognition on the first person sound track to obtain first text information of the audio information of the first product;
determining second text information of the audio information of the second product according to the first text information and the product information of the second product;
performing voice conversion on the second text information to obtain reference product audio information;
and processing the reference audio information to obtain second product audio information, wherein the tone color of the second product audio information is the same as that of the first audio product information.
In this example, the text information of the audio information of the second product is determined according to the first text information and the product information of the second product, so that the integrating degree between the video of the second product and the video of the first product can be improved, and the accuracy of the video of the second product in the manufacturing process is improved.
With reference to the first aspect, in one possible implementation manner, the method further includes:
obtaining at least one candidate product video according to the product information of the second product;
and determining a first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.
With reference to the first aspect, in one possible implementation manner, the acquiring at least one candidate product video according to the product information of the second product includes:
determining at least one product key from the product information of the second product;
determining the type of the product video according to at least one product keyword;
and acquiring at least one candidate product video according to the type of the product video.
In this example, the product keywords are determined according to the product information of the second product, the at least one candidate product video is determined according to the product keywords, and the first product video is determined from the at least one candidate product video, so that the first product video can be intelligently acquired, and the efficiency of video replacement is improved.
A second aspect of an embodiment of the present application provides a video production apparatus, including:
the processing unit is used for carrying out segmentation processing on the first product video to obtain a first sub video, and the first sub video comprises an introduction fragment of the first product;
the acquisition unit is used for acquiring target position information of product information of a first product in the first sub-video;
the replacing unit is used for replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information so as to obtain a second sub-video;
and the splicing unit is used for splicing the second sub-video and the third sub-video to obtain a second product video, and the third sub-video comprises an introduction section of the second product.
With reference to the second aspect, in one possible implementation manner, the product information includes at least one of a first product identifier, a first product name, and audio information of the first product, and the obtaining unit is configured to:
and identifying a plurality of image frames in the first sub-video through a video identification model to obtain target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.
With reference to the second aspect, in a possible implementation manner, in terms of obtaining target position information of the product information of the first product in the first sub-video by performing recognition processing on a plurality of image frames in the first sub-video through a video recognition model, the obtaining unit is specifically configured to perform at least one of the following:
performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;
performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
identifying the audio clip included in the first sub-video through an audio identification model to obtain third position information of the audio information of the first product in the audio clip;
and carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
With reference to the second aspect, in one possible implementation manner, the product information includes audio information of the first product, and the replacement unit is configured to:
acquiring an audio track of audio information of a first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;
processing the audio track to obtain a first person audio track and a background music track;
replacing the audio information of the first product in the first voice track with the audio information of the second product according to the third position information to obtain a second voice track;
synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;
the second sub-video is determined based on the audio track of the second product audio information.
With reference to the second aspect, in one possible implementation manner, the apparatus is further configured to:
performing voice recognition on the first person sound track to obtain first text information of the audio information of the first product;
determining second text information of the audio information of the second product according to the first text information and the product information of the second product;
performing voice conversion on the second text information to obtain reference product audio information;
and processing the reference audio information to obtain second product audio information, wherein the tone color of the second product audio information is the same as that of the first audio product information.
With reference to the second aspect, in one possible implementation manner, the apparatus is further configured to:
obtaining at least one candidate product video according to the product information of the second product;
and determining a first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.
With reference to the second aspect, in one possible implementation manner, the apparatus is further configured to:
determining at least one product key from the product information of the second product;
determining the type of the product video according to at least one product keyword;
and acquiring at least one candidate product video according to the type of the product video.
A third aspect of embodiments of the present application provides a terminal, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store computer program instructions, and the processor is configured to call the computer program instructions to execute the step instructions in the first aspect of embodiments of the present application.
A fourth aspect of embodiments of the present application provides a computer readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform some or all of the steps as described in the first aspect of embodiments of the present application.
A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic diagram of a video production method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a video production method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of another video production method provided in the embodiment of the present application;
fig. 4 is a schematic flow chart of another video production method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a video production apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to better understand the video production method provided in the embodiments of the present application, a brief description is first given to a scene to which the video production method is applied. As shown in fig. 1, when a user needs to generate a product video quickly, the user may input product information of a product (a second product) that needs to be displayed, and may replace the product information of a product video (a first product video) related to the second product to generate the product video of the second product quickly, specifically, a first product video is obtained, the first product video is a video for introducing or publicizing a first product, and the first video may be obtained by shooting or obtained by a video processing technology. The first video is processed in a segmented mode to obtain a first sub video and a fourth sub video, the first sub video is subjected to image video, character recognition and the like to obtain target position information of product information of a first product, the information of the first product is replaced by information of a second product according to the target position information, so as to obtain a second sub-video, splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video and the fourth sub-video are videos of the same category, which are videos for detailed introduction of products, such as core selling points for introducing products and the like, therefore, compared with the existing scheme that the product information in the first sub-video in the product video needs to be processed by a professional person when the product information is replaced, and the videos are spliced to obtain the second product video, so that the efficiency of the second product video production is improved.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a video production method according to an embodiment of the present disclosure. As shown in fig. 2, the method includes:
201. and carrying out segmentation processing on the first product video to obtain a first sub video, wherein the first sub video comprises an introduction fragment of the first product.
After the first product video is segmented, a first sub video or the first sub video and other sub videos may be obtained, for example, the first product video is segmented to obtain the first sub video and a fourth sub video corresponding to the first product, where the fourth sub video is used to introduce the core selling points of the first product, and the like, and may also be used to introduce the first product in detail, and the detailed introduction may be understood as introducing most or specific features of the product, and the like.
The introduction segment of the first product included in the first sub-video may be a sitcom or an orocast video introducing the first product, and specifically, may be understood as a video for promoting a professional first product or introducing the introduction of the first product by other users, and the video is usually generated by a recording method. For example, a video recorded by an a-broadcaster introducing or advertising a first product. Of course, the first product video may also be divided into three segments, specifically, for example, including the first sub video, the fourth sub video, the video end frame, and the like. In some embodiments, the first sub-video may not include the real object of the product, and the real object of the product herein may be understood as: when the product is an entity product, the content such as the appearance of the entity product may not be included, and when the product is a virtual product (software, etc.), the interface of the virtual product may not be included. The fourth sub-video may then include the physical object of the product.
202. And acquiring target position information of the product information of the first product in the first sub-video.
In some embodiments, the product information of the first product includes at least one of a first product identification, a first product name, and audio information of the first product. The first product identification may be a logo icon of the first product, etc. The target position information may be acquired by an image recognition method, a character recognition method, a voice recognition method, or the like.
The target location information may include location information of the first product identification in a part or all of image frames of the first sub video, location information of the first product name in each image frame of the first sub video, and an audio location of the first product audio in the first sub video.
203. And replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video.
The second sub-video can be obtained after replacing the first product identifier, the first product name and the audio information of the first product in the first sub-video with the second product identifier, the second product name and the audio information of the second product respectively.
When the replacement is performed, parallel replacement may be used, or serial replacement may be used, or both may be combined. The parallel replacement can be understood as three product information replacements performed in parallel, and the serial replacement can be understood as three product information replacements performed in sequence.
204. And splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.
The product introduction fragment can also comprise a video end frame, and the video end frame is usually used for displaying a purchase link of a product, a two-dimensional code of product detailed information and the like.
In this example, a first product video is segmented to obtain a first sub video, a target position of product information of a first product in the first sub video is obtained, the product information of the first product is replaced by product information of a second product according to the target position information to obtain a second sub video, the second sub video and the third sub video are spliced to obtain a second product video, and compared with the prior art that the replacement needs to be performed by a professional, the method and the device for replacing the product information in the first sub video in the product video can replace the product information in the first sub video in the product video, and the video splicing is performed to obtain the second product video, so that efficiency of manufacturing the second product video is improved.
In one possible implementation, the product information includes at least one of a first product identifier, a first product name, and audio information of the first product, and a possible method of obtaining target location information of the product information of the first product in the first sub-video includes:
and identifying a plurality of image frames in the first sub-video through a video identification model to obtain the target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.
The position information corresponding to the product information can be respectively determined through the video recognition model. The mouth shape recognition model can recognize the mouth shape of a user through image frames in the first sub-video, determine the image frames where the audio information is located according to the mouth shape of the user, and determine the position information corresponding to the audio information. The text recognition model, the audio recognition model, the mouth shape recognition model and the identification recognition model can be pre-trained models used for identifying product identification, product name and audio information of products.
The identification recognition model may be a logo recognition model, particularly for recognizing product identifications (logos, etc.).
In this example, the product information includes at least one of the first product identifier, the first product name, and the audio information of the first product, and the target location information is recognized according to the corresponding text recognition model, the audio recognition model, the mouth shape recognition model, and the identifier recognition model, so that the product information of the first product can be quickly and accurately obtained.
In one possible implementation manner, a possible method for obtaining target position information of product information of a first product in a first sub video by performing recognition processing on a plurality of image frames in the first sub video through a video recognition model includes at least one of the following steps:
a1, performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;
a2, performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
a3, carrying out recognition processing on an audio clip included in the first sub-video through an audio recognition model to obtain third position information of the audio information of the first product in the audio clip;
a4, carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through the mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
The first position information may include a frame number of the first image frame, a frame identifier, etc., and position information of the first product identifier in the first image frame.
The second position information may include a frame number, a frame identifier, etc. of the second image frame, and position information of the first product name in the second image frame.
The mouth shape of the user can be distinguished through the mouth shape recognition model, so that at least one third image frame related to the audio of the user is determined, and third position information of the audio information of the first product in the audio clip is specifically recognized in the third image frame.
The steps A3 and a4 may be executed in parallel, and the position information acquired preferentially is determined as the third position information to stop acquiring the third position information, or one of the schemes may be independently adopted to acquire the third position information, or the third position information acquired in the steps A3 and a4 may be processed to obtain final third position information, and the processing mode is to optimally select and determine a more accurate part of the position information as a part of the final third position information.
In one possible implementation manner, the product information includes audio information of a first product, and a possible method for replacing information of the first product in the first sub-video with information of a second product according to the target position information to obtain a second sub-video includes:
b1, acquiring an audio track of the audio information of the first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;
b2, processing the audio track to obtain a first person sound track and a background music track;
b3, replacing the audio information of the first product in the first voice track with the audio information of the second product according to the third position information to obtain a second voice track;
b4, synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;
b5, determining the second sub-video based on the audio track of the second product audio information.
The human track can be understood as a voice track of a user when introducing a product, and the background music track can be understood as a voice track of background music in a video. The format of the second audio information is the same as the format of the first audio information, which can be understood as the length of the second audio information is the same as the length of the first audio information, and the speech pause and the pitch of the user's speaking in the audio are also the same.
When the product information of the second product is adopted to replace the information of the first product in the first sub-video, the replacement of the first product identification and the first product name is also included. The method for replacing the first product identifier may be: and according to the target position information, covering the first product identification in the image frame of the first sub video by adopting the second product identification so as to finish the replacement of the product information in the image frame. The method for replacing the first product name refers to a method for replacing the first product identifier, and is not described herein again.
In this example, through replacing first person's sound track to obtain the second sub video, the realization that can be quick is to the replacement of the product information of first product, efficiency when having promoted the second sub video preparation.
In a possible implementation manner, the audio information of the second product may be further subjected to speech synthesis, and the method specifically includes:
c1, carrying out voice recognition on the first person sound track to obtain first text information of the audio information of the first product;
c2, determining second text information of the audio information of the second product according to the first text information and the product information of the second product;
c3, performing voice conversion on the second text information to obtain reference audio information;
and C4, processing the reference audio information to obtain second product audio information, wherein the tone color of the second product audio information is the same as that of the first audio product information.
The product information of the first product in the first text information can be replaced by the product information of the second product to obtain the second text information.
Or, acquiring a keyword of the first product in the first text information; determining a keyword format of the second product according to the keyword; determining a keyword of the second product according to the product information of the second product and the keyword format; and replacing the keywords of the first product with the keywords of the second product to obtain second text information. The keyword format of the second product is the same as the keyword format of the first product, for example, the number of keywords of the first product is N, and then the number of keywords of the second product is N, and for example, the number of words in the keyword phrase of the first product is 2, and then the number of words in the keyword phrase of the second product is 2.
In this example, the text information of the audio information of the second product is determined according to the first text information and the product information of the second product, so that the integrating degree between the video of the second product and the video of the first product can be improved, and the accuracy of the video of the second product in the manufacturing process is improved.
In one possible implementation manner, a first product video may be further acquired, and the method for acquiring the first product video includes:
d1, acquiring at least one candidate product video according to the product information of the second product;
d2, determining the first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.
Product keywords and the like can be determined according to the product information of the second product, and at least one candidate product video is obtained according to the product keywords and the like, which specifically includes: the at least one candidate product video may be obtained from a database, or may be obtained in other manners, for example, after searching through the internet, the at least one candidate product video may be obtained, or the at least one candidate product video may be obtained from a cloud server.
Similarity between the product information in the at least one candidate product video and the product information of the second product can be obtained, and the first product video is determined according to the similarity. Specifically, the candidate product video corresponding to the highest similarity value may be determined as the first product video. If the similarity is lower than the preset similarity, at least one candidate product video can be obtained from the newly obtained candidate product video, and the first product video is determined from the newly obtained candidate product video. Of course, the coincidence degree of the product information of the third product and the product information of the second product may also be determined, and the candidate product video corresponding to the product information with the highest coincidence degree is determined as the first product video. The degree of overlap can be understood as the degree of overlap of characters in the text of the product information, i.e. the exact same content proportion.
In this example, the key information is determined according to the product information of the second product, the at least one candidate product video is determined according to the key information, and the first product video is determined from the at least one candidate product video, so that the first product video can be intelligently acquired, and the efficiency of video replacement is improved.
In one possible implementation manner, the method for acquiring at least one candidate product video according to the product information of the second product includes:
f1, determining at least one product key word from the product information of the second product;
f2, determining the type of the product video according to at least one product keyword;
f3, obtaining at least one candidate product video according to the type of the product video.
The product keyword may be a product category, a product function, product characteristic information, and the like, and the product characteristic information may be, for example, if the product is a game product, the product characteristic information may be a game type, such as a hand game, an end game, and the like, and further, for example, a battle game in the hand game, a tower defense game, and the like.
The type of the video is determined according to the key information, and the type of the video can be determined through a preset corresponding relation, for example, if the product characteristic information is a match game, the type can be a match type, and the like.
The at least one candidate product video can be obtained from the database according to the type of the product video, or can be obtained in other manners, for example, after searching from the internet, the at least one candidate product video is obtained, or the at least one candidate product video is obtained from the cloud server, the at least one candidate product video is obtained from a third-party platform, and the like.
In this example, the type of the product video is determined by the product keyword in the product information of the second product, and at least one candidate product video is determined according to the type, so that the accuracy of candidate product identification and acquisition can be improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of another video production method according to an embodiment of the present application. As shown in fig. 3, the video production method includes:
301. performing segmentation processing on the first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of the first product;
the product information of the first product includes at least one of a first product identification, a first product name, and audio information of the first product.
302. Identifying a plurality of image frames in the first sub-video through a video identification model to obtain target position information of product information of a first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model;
the target location information may be acquired by at least one of:
performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;
performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
identifying an audio clip included in the first sub-video through an audio identification model to obtain third position information of the audio information of the first product in the audio clip;
and carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
303. Replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;
304. and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.
The first product introduction fragment is used for introducing products and the like to the core selling points of the first product, and can also be used for introducing the first product in detail and the like.
In this example, the product information includes a first product identifier, a first product name, and audio information of a first product, and the product information of the first product can be quickly and accurately obtained by obtaining the first product identifier according to image recognition, obtaining the first product name through character recognition, and obtaining the first product audio through an audio recognition method.
Referring to fig. 4, fig. 4 is a schematic flow chart of another video production method according to an embodiment of the present application. As shown in fig. 4, the video production method includes:
401. determining at least one product key from the product information of the second product;
the key information may be, for example, a keyword, product feature information, and the like, and the product feature information may be, for example, if the product is a game product, the product feature information may be a game type, for example, a hand game, an end game, and the like, and further, for example, a battle game in the hand game, a tower defense game, and the like.
402. Determining the type of the product video according to at least one product keyword;
403. obtaining at least one candidate product video according to the type of the product video;
404. determining a first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product;
similarity between the product information in the at least one candidate product video and the product information of the second product can be obtained, and the first product video is determined according to the similarity. Specifically, the candidate product video corresponding to the highest similarity value may be determined as the first product video. If the similarity is lower than the preset similarity, at least one candidate product video can be obtained from the newly obtained candidate product video, and the first product video is determined from the newly obtained candidate product video. Of course, the coincidence degree of the product information of the third product and the product information of the second product may also be determined, and the candidate product video corresponding to the product information with the highest coincidence degree is determined as the first product video. The degree of overlap can be understood as the degree of overlap of characters in the text of the product information, i.e. the exact same content proportion.
405. Performing segmentation processing on the first product video to obtain a first sub-video;
406. acquiring target position information of product information of a first product in a first sub-video;
407. replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;
408. and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.
In this example, the type of the product video is determined by the product keyword in the product information of the second product, and at least one candidate product video is determined according to the type, so that the accuracy of candidate product identification and acquisition can be improved.
In accordance with the foregoing embodiments, please refer to fig. 5, fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application, and as shown in the drawing, the terminal includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions, and the program includes instructions for performing the following steps;
performing segmentation processing on the first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of the first product;
acquiring target position information of product information of a first product in a first sub-video;
replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;
and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.
The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the terminal includes corresponding hardware structures and/or software modules for performing the respective functions in order to implement the above-described functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the terminal may be divided into the functional units according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
In accordance with the above, please refer to fig. 6, fig. 6 is a schematic structural diagram of a video production apparatus according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:
a processing unit 601, configured to perform segmentation processing on a first product video to obtain a first sub-video, where the first sub-video includes an introduction section of a first product;
an obtaining unit 602, configured to obtain target position information of product information of a first product in a first sub-video;
a replacing unit 603, configured to replace product information of a first product in the first sub-video with product information of a second product according to the target position information, so as to obtain a second sub-video;
the splicing unit 604 is configured to splice the second sub-video and the third sub-video to obtain a second product video, where the third sub-video includes an introduction section of the second product.
In a possible implementation manner, the product information includes at least one of a first product identifier, a first product name, and audio information of the first product, and the obtaining unit 602 is configured to:
and identifying a plurality of image frames in the first sub-video through a video identification model to obtain the target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.
In one possible implementation manner, in terms of obtaining the target position information of the product information of the first product in the first sub video by performing recognition processing on a plurality of image frames in the first sub video through a video recognition model, the obtaining unit 602 is specifically configured to perform at least one of the following:
performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;
performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
identifying the audio clip included in the first sub-video through an audio identification model to obtain third position information of the audio information of the first product in the audio clip;
and carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
In one possible implementation, the product information includes audio information of the first product, and the replacing unit 603 is configured to:
acquiring an audio track of audio information of a first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;
processing the audio track to obtain a first person audio track and a background music track;
replacing the audio information of the first product in the first voice track with the audio information of the second product according to the third position information to obtain a second voice track;
synthesizing the second voice track and the background music track to obtain an audio track of the audio information of the second product;
the second sub-video is determined based on the audio track of the second product audio information.
In one possible implementation, the apparatus is further configured to:
performing voice recognition on the first person sound track to obtain first text information of the audio information of the first product;
determining second text information of the audio information of the second product according to the first text information and the product information of the second product;
performing voice conversion on the second text information to obtain reference product audio information;
and processing the reference audio information to obtain second product audio information, wherein the tone color of the second product audio information is the same as that of the first audio product information.
In one possible implementation, the apparatus is further configured to:
obtaining at least one candidate product video according to the product information of the second product;
and determining a first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.
In one possible implementation, the apparatus is further configured to:
determining at least one product key from the product information of the second product;
determining the type of the product video according to at least one product keyword;
and acquiring at least one candidate product video according to the type of the product video.
Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the video production methods as described in the above method embodiments.
Embodiments of the present application further provide a computer program product, wherein the computer readable storage medium stores computer program instructions, which when executed by a processor, cause the processor to perform part or all of the steps of any one of the video production methods as described in the above method embodiments.
It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated unit, if implemented in the form of a software program module and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solutions of the present application, in essence or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (14)

1. A method of video production, the method comprising:
performing segmentation processing on a first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of a first product;
acquiring target position information of product information of a first product in the first sub-video;
replacing the product information of the first product in the first sub-video with the product information of a second product according to the target position information to obtain a second sub-video;
splicing the second sub video and a third sub video to obtain a second product video, wherein the third sub video comprises an introduction fragment of the second product;
the product information of the first product includes audio information of the first product, and the replacing, according to the target position information, the product information of the first product in the first sub-video with the product information of the second product to obtain a second sub-video includes:
acquiring an audio track of the audio information of the first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;
processing the audio track to obtain a first person audio track and a background music track;
replacing the audio information of the first product in the first voice track with the audio information of the second product according to third position information of the audio information of the first product in an audio clip included in the first sub-video to obtain a second voice track;
synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;
and determining the second sub-video according to the audio track of the second product audio information.
2. The method of claim 1, wherein the product information of the first product comprises at least one of a first product identifier, a first product name and audio information of the first product, and the obtaining of the target position information of the product information of the first product in the first sub-video comprises:
and identifying a plurality of image frames in the first sub-video through a video identification model to obtain target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.
3. The method of claim 2,
the identifying processing is carried out on a plurality of image frames in the first sub-video through a video identifying model, so that the target position information of the product information of the first product in the first sub-video is obtained, and the target position information comprises at least one of the following items:
performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of the first product identification in the at least one first image frame;
performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
identifying the audio clip through an audio identification model to obtain the third position information;
performing mouth shape recognition processing on the plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
4. The method of claim 1, further comprising:
performing voice recognition on the first person sound track to obtain first text information of the audio information of the first product;
determining second text information of audio information of a second product according to the first text information and the product information of the second product;
performing voice conversion on the second text information to obtain reference audio information;
and processing the reference audio information to obtain the second product audio information, wherein the tone color of the second product audio information is the same as that of the first product audio information.
5. The method according to any one of claims 1-4, further comprising:
obtaining at least one candidate product video according to the product information of the second product;
and determining the first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.
6. The method of claim 5, wherein obtaining at least one candidate product video based on the product information of the second product comprises:
determining at least one product key from the product information of the second product;
determining the type of the product video according to the at least one product keyword;
and acquiring the at least one candidate product video according to the type of the product video.
7. A video production apparatus, characterized in that the apparatus comprises:
the processing unit is used for carrying out segmentation processing on the first product video to obtain a first sub video, and the first sub video comprises an introduction fragment of a first product;
the acquisition unit is used for acquiring target position information of product information of a first product in the first sub-video;
the replacing unit is used for replacing the product information of the first product in the first sub-video with the product information of a second product according to the target position information so as to obtain a second sub-video;
the splicing unit is used for splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction section of the second product;
the product information of the first product comprises audio information of the first product, and the replacement unit is configured to:
acquiring an audio track of the audio information of the first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;
processing the audio track to obtain a first person audio track and a background music track;
replacing the audio information of the first product in the first voice track with the audio information of the second product according to third position information of the audio information of the first product in an audio clip included in the first sub-video to obtain a second voice track;
synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;
and determining the second sub-video according to the audio track of the second product audio information.
8. The apparatus of claim 7, wherein the product information of the first product comprises at least one of a first product identification, a first product name, and audio information of the first product, and wherein the obtaining unit is configured to:
and identifying a plurality of image frames in the first sub-video through a video identification model to obtain target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.
9. The apparatus according to claim 8, wherein in the aspect that the identification processing is performed on the plurality of image frames in the first sub-video through a video identification model to obtain the target location information of the product information of the first product in the first sub-video, the obtaining unit is specifically configured to perform at least one of the following:
performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of the first product identification in the at least one first image frame;
performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;
identifying the audio clip through an audio identification model to obtain the third position information;
performing mouth shape recognition processing on the image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.
10. The apparatus of claim 7, wherein the apparatus is further configured to:
performing voice recognition on the first person sound track to obtain first text information of the audio information of the first product;
determining second text information of audio information of a second product according to the first text information and the product information of the second product;
performing voice conversion on the second text information to obtain reference audio information;
and processing the reference audio information to obtain the second product audio information, wherein the tone of the second product audio information is the same as that of the first product audio information.
11. The apparatus of any of claims 7-10, further configured to:
obtaining at least one candidate product video according to the product information of the second product;
and determining the first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.
12. The apparatus of claim 11, wherein the apparatus is further configured to:
determining at least one product key from the product information of the second product;
determining the type of the product video according to the at least one product keyword;
and acquiring the at least one candidate product video according to the type of the product video.
13. A terminal, comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store computer program instructions, and the processor is configured to invoke the computer program instructions to perform the method of any of claims 1-6.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer program instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-6.
CN202010890456.5A 2020-08-29 2020-08-29 Video production method and related device Active CN112135201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010890456.5A CN112135201B (en) 2020-08-29 2020-08-29 Video production method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010890456.5A CN112135201B (en) 2020-08-29 2020-08-29 Video production method and related device

Publications (2)

Publication Number Publication Date
CN112135201A CN112135201A (en) 2020-12-25
CN112135201B true CN112135201B (en) 2022-08-26

Family

ID=73848355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010890456.5A Active CN112135201B (en) 2020-08-29 2020-08-29 Video production method and related device

Country Status (1)

Country Link
CN (1) CN112135201B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022732B (en) * 2022-05-25 2023-11-03 阿里巴巴(中国)有限公司 Video generation method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225265A (en) * 2019-06-21 2019-09-10 深圳市奥拓电子股份有限公司 Advertisement replacement method, system and storage medium during video transmission
CN110430339A (en) * 2019-07-19 2019-11-08 长沙理工大学 Altering detecting method and system in digital video frame
CN110691276A (en) * 2019-11-06 2020-01-14 北京字节跳动网络技术有限公司 Method and device for splicing multimedia segments, mobile terminal and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2838157A1 (en) * 2011-06-06 2012-12-13 Webtuner Corp. System and method for enhancing and extending video advertisements
US20130259312A1 (en) * 2011-09-08 2013-10-03 Kenton M. Lyons Eye Gaze Based Location Selection for Audio Visual Playback
CN109034032B (en) * 2018-07-17 2022-01-11 北京世纪好未来教育科技有限公司 Image processing method, apparatus, device and medium
US20200213644A1 (en) * 2019-01-02 2020-07-02 International Business Machines Corporation Advertisement insertion in videos

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225265A (en) * 2019-06-21 2019-09-10 深圳市奥拓电子股份有限公司 Advertisement replacement method, system and storage medium during video transmission
CN110430339A (en) * 2019-07-19 2019-11-08 长沙理工大学 Altering detecting method and system in digital video frame
CN110691276A (en) * 2019-11-06 2020-01-14 北京字节跳动网络技术有限公司 Method and device for splicing multimedia segments, mobile terminal and storage medium

Also Published As

Publication number Publication date
CN112135201A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN110582025B (en) Method and apparatus for processing video
CN108305643B (en) Method and device for determining emotion information
CN113709561B (en) Video editing method, device, equipment and storage medium
US20190026367A1 (en) Navigating video scenes using cognitive insights
CN106960051B (en) Audio playing method and device based on electronic book and terminal equipment
CN112533051B (en) Barrage information display method, barrage information display device, computer equipment and storage medium
CN111988658B (en) Video generation method and device
CN108833973A (en) Extracting method, device and the computer equipment of video features
CN109117777A (en) The method and apparatus for generating information
CN105096932A (en) Voice synthesis method and apparatus of talking book
CN111161739B (en) Speech recognition method and related product
CN108307229A (en) A kind of processing method and equipment of video-audio data
KR20070121810A (en) Synthesis of composite news stories
CN110072140B (en) Video information prompting method, device, equipment and storage medium
CN109558513A (en) A kind of content recommendation method, device, terminal and storage medium
CN110750996A (en) Multimedia information generation method and device and readable storage medium
JP5209593B2 (en) Video editing apparatus, video editing method, and video editing program
CN107547922B (en) Information processing method, device, system and computer readable storage medium
CN112995756A (en) Short video generation method and device and short video generation system
CN114598933B (en) Video content processing method, system, terminal and storage medium
JP6389296B1 (en) VIDEO DATA PROCESSING DEVICE, VIDEO DATA PROCESSING METHOD, AND COMPUTER PROGRAM
CN112135201B (en) Video production method and related device
CN111488813A (en) Video emotion marking method and device, electronic equipment and storage medium
CN114845149B (en) Video clip method, video recommendation method, device, equipment and medium
CN110312161B (en) Video dubbing method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant