CN112135201B

CN112135201B - Video production method and related device

Info

Publication number: CN112135201B
Application number: CN202010890456.5A
Authority: CN
Inventors: 许雷; 吴磊; 王元吉; 于志兴
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-08-29
Filing date: 2020-08-29
Publication date: 2022-08-26
Anticipated expiration: 2040-08-29
Also published as: CN112135201A

Abstract

The embodiment of the application provides a video production method and a related device, wherein the method comprises the following steps: performing segmentation processing on a first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of a first product; acquiring target position information of product information of a first product in the first sub-video; replacing the product information of the first product in the first sub-video with the product information of a second product according to the target position information to obtain a second sub-video; and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product, so that a short video of the second product can be obtained, the video production efficiency is improved, and the video production cost is reduced.

Description

Video production method and related device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a video production method and a related apparatus.

Background

Along with the creative form that short video is often used when the advertiser promotes at present, the short video is good in effect and high in conversion rate and is favored by the advertiser. However, compared with the traditional picture creativity, the production of the short video requires the participation of more creators, such as actors, dubbing, and professionals at the later stage, and therefore, the production cost of the short video is higher.

Disclosure of Invention

The embodiment of the application provides a video production method and a related device.

A first aspect of an embodiment of the present application provides a video production method, including:

performing segmentation processing on the first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of the first product;

acquiring target position information of product information of a first product in a first sub-video;

replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;

and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.

In this example, a first sub-video is obtained by performing segmentation processing on a first product video, a target position of product information of a first product in the first sub-video is obtained, the product information of the first product is replaced by product information of a second product according to the target position information, a second sub-video is obtained, the second sub-video and a third sub-video are spliced to obtain a second product video, the second product video can be obtained, video production efficiency of the second product is improved, and video production cost is reduced.

With reference to the first aspect, in one possible implementation manner, the obtaining of the target location information of the product information of the first product in the first sub-video includes:

and identifying a plurality of image frames in the first sub-video through a video identification model to obtain the target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.

With reference to the first aspect, in one possible implementation manner, the identifying, by the video identification model, the multiple image frames in the first sub-video is performed to obtain the target position information of the product information of the first product in the first sub-video, where the target position information includes at least one of the following:

performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;

performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;

identifying the audio clip included in the first sub-video through an audio identification model to obtain third position information of the audio information of the first product in the audio clip;

and carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.

In this example, the product information includes a first product identifier, a first product name, and audio information of a first product, and the product information of the first product can be quickly and accurately obtained by obtaining the first product identifier according to image recognition, obtaining the first product name through character recognition, and obtaining the first product audio through an audio recognition method.

With reference to the first aspect, in one possible implementation manner, the replacing, by the information of the second product according to the target position information, the information of the first product in the first sub-video to obtain the second sub-video includes:

acquiring an audio track of audio information of a first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;

processing the audio track to obtain a first person audio track and a background music track;

replacing the audio information of the first product in the first voice track with the audio information of the second product according to the third position information to obtain a second voice track;

synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;

the second sub-video is determined based on the audio track of the second product audio information.

In this example, through replacing first person's sound track to obtain the second sub video, the realization that can be quick is to the replacement of the product information of first product, efficiency when having promoted the second sub video preparation.

With reference to the first aspect, in one possible implementation manner, the method further includes:

performing voice recognition on the first person sound track to obtain first text information of the audio information of the first product;

determining second text information of the audio information of the second product according to the first text information and the product information of the second product;

performing voice conversion on the second text information to obtain reference product audio information;

and processing the reference audio information to obtain second product audio information, wherein the tone color of the second product audio information is the same as that of the first audio product information.

In this example, the text information of the audio information of the second product is determined according to the first text information and the product information of the second product, so that the integrating degree between the video of the second product and the video of the first product can be improved, and the accuracy of the video of the second product in the manufacturing process is improved.

obtaining at least one candidate product video according to the product information of the second product;

and determining a first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.

With reference to the first aspect, in one possible implementation manner, the acquiring at least one candidate product video according to the product information of the second product includes:

determining at least one product key from the product information of the second product;

determining the type of the product video according to at least one product keyword;

and acquiring at least one candidate product video according to the type of the product video.

In this example, the product keywords are determined according to the product information of the second product, the at least one candidate product video is determined according to the product keywords, and the first product video is determined from the at least one candidate product video, so that the first product video can be intelligently acquired, and the efficiency of video replacement is improved.

A second aspect of an embodiment of the present application provides a video production apparatus, including:

the processing unit is used for carrying out segmentation processing on the first product video to obtain a first sub video, and the first sub video comprises an introduction fragment of the first product;

the acquisition unit is used for acquiring target position information of product information of a first product in the first sub-video;

the replacing unit is used for replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information so as to obtain a second sub-video;

and the splicing unit is used for splicing the second sub-video and the third sub-video to obtain a second product video, and the third sub-video comprises an introduction section of the second product.

With reference to the second aspect, in one possible implementation manner, the product information includes at least one of a first product identifier, a first product name, and audio information of the first product, and the obtaining unit is configured to:

and identifying a plurality of image frames in the first sub-video through a video identification model to obtain target position information of the product information of the first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model.

With reference to the second aspect, in a possible implementation manner, in terms of obtaining target position information of the product information of the first product in the first sub-video by performing recognition processing on a plurality of image frames in the first sub-video through a video recognition model, the obtaining unit is specifically configured to perform at least one of the following:

With reference to the second aspect, in one possible implementation manner, the product information includes audio information of the first product, and the replacement unit is configured to:

With reference to the second aspect, in one possible implementation manner, the apparatus is further configured to:

A third aspect of embodiments of the present application provides a terminal, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store computer program instructions, and the processor is configured to call the computer program instructions to execute the step instructions in the first aspect of embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform some or all of the steps as described in the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic diagram of a video production method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a video production method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of another video production method provided in the embodiment of the present application;

fig. 4 is a schematic flow chart of another video production method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video production apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to better understand the video production method provided in the embodiments of the present application, a brief description is first given to a scene to which the video production method is applied. As shown in fig. 1, when a user needs to generate a product video quickly, the user may input product information of a product (a second product) that needs to be displayed, and may replace the product information of a product video (a first product video) related to the second product to generate the product video of the second product quickly, specifically, a first product video is obtained, the first product video is a video for introducing or publicizing a first product, and the first video may be obtained by shooting or obtained by a video processing technology. The first video is processed in a segmented mode to obtain a first sub video and a fourth sub video, the first sub video is subjected to image video, character recognition and the like to obtain target position information of product information of a first product, the information of the first product is replaced by information of a second product according to the target position information, so as to obtain a second sub-video, splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video and the fourth sub-video are videos of the same category, which are videos for detailed introduction of products, such as core selling points for introducing products and the like, therefore, compared with the existing scheme that the product information in the first sub-video in the product video needs to be processed by a professional person when the product information is replaced, and the videos are spliced to obtain the second product video, so that the efficiency of the second product video production is improved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a video production method according to an embodiment of the present disclosure. As shown in fig. 2, the method includes:

201. and carrying out segmentation processing on the first product video to obtain a first sub video, wherein the first sub video comprises an introduction fragment of the first product.

After the first product video is segmented, a first sub video or the first sub video and other sub videos may be obtained, for example, the first product video is segmented to obtain the first sub video and a fourth sub video corresponding to the first product, where the fourth sub video is used to introduce the core selling points of the first product, and the like, and may also be used to introduce the first product in detail, and the detailed introduction may be understood as introducing most or specific features of the product, and the like.

The introduction segment of the first product included in the first sub-video may be a sitcom or an orocast video introducing the first product, and specifically, may be understood as a video for promoting a professional first product or introducing the introduction of the first product by other users, and the video is usually generated by a recording method. For example, a video recorded by an a-broadcaster introducing or advertising a first product. Of course, the first product video may also be divided into three segments, specifically, for example, including the first sub video, the fourth sub video, the video end frame, and the like. In some embodiments, the first sub-video may not include the real object of the product, and the real object of the product herein may be understood as: when the product is an entity product, the content such as the appearance of the entity product may not be included, and when the product is a virtual product (software, etc.), the interface of the virtual product may not be included. The fourth sub-video may then include the physical object of the product.

202. And acquiring target position information of the product information of the first product in the first sub-video.

In some embodiments, the product information of the first product includes at least one of a first product identification, a first product name, and audio information of the first product. The first product identification may be a logo icon of the first product, etc. The target position information may be acquired by an image recognition method, a character recognition method, a voice recognition method, or the like.

The target location information may include location information of the first product identification in a part or all of image frames of the first sub video, location information of the first product name in each image frame of the first sub video, and an audio location of the first product audio in the first sub video.

203. And replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video.

The second sub-video can be obtained after replacing the first product identifier, the first product name and the audio information of the first product in the first sub-video with the second product identifier, the second product name and the audio information of the second product respectively.

When the replacement is performed, parallel replacement may be used, or serial replacement may be used, or both may be combined. The parallel replacement can be understood as three product information replacements performed in parallel, and the serial replacement can be understood as three product information replacements performed in sequence.

204. And splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.

The product introduction fragment can also comprise a video end frame, and the video end frame is usually used for displaying a purchase link of a product, a two-dimensional code of product detailed information and the like.

In this example, a first product video is segmented to obtain a first sub video, a target position of product information of a first product in the first sub video is obtained, the product information of the first product is replaced by product information of a second product according to the target position information to obtain a second sub video, the second sub video and the third sub video are spliced to obtain a second product video, and compared with the prior art that the replacement needs to be performed by a professional, the method and the device for replacing the product information in the first sub video in the product video can replace the product information in the first sub video in the product video, and the video splicing is performed to obtain the second product video, so that efficiency of manufacturing the second product video is improved.

In one possible implementation, the product information includes at least one of a first product identifier, a first product name, and audio information of the first product, and a possible method of obtaining target location information of the product information of the first product in the first sub-video includes:

The position information corresponding to the product information can be respectively determined through the video recognition model. The mouth shape recognition model can recognize the mouth shape of a user through image frames in the first sub-video, determine the image frames where the audio information is located according to the mouth shape of the user, and determine the position information corresponding to the audio information. The text recognition model, the audio recognition model, the mouth shape recognition model and the identification recognition model can be pre-trained models used for identifying product identification, product name and audio information of products.

The identification recognition model may be a logo recognition model, particularly for recognizing product identifications (logos, etc.).

In this example, the product information includes at least one of the first product identifier, the first product name, and the audio information of the first product, and the target location information is recognized according to the corresponding text recognition model, the audio recognition model, the mouth shape recognition model, and the identifier recognition model, so that the product information of the first product can be quickly and accurately obtained.

In one possible implementation manner, a possible method for obtaining target position information of product information of a first product in a first sub video by performing recognition processing on a plurality of image frames in the first sub video through a video recognition model includes at least one of the following steps:

a1, performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of a first product identification in the at least one first image frame;

a2, performing text recognition processing on at least one second image frame in the first sub-video through a text recognition model to obtain second position information of the first product name in the at least one second image frame;

a3, carrying out recognition processing on an audio clip included in the first sub-video through an audio recognition model to obtain third position information of the audio information of the first product in the audio clip;

a4, carrying out mouth shape recognition processing on a plurality of image frames in the first sub-video through the mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.

The first position information may include a frame number of the first image frame, a frame identifier, etc., and position information of the first product identifier in the first image frame.

The second position information may include a frame number, a frame identifier, etc. of the second image frame, and position information of the first product name in the second image frame.

The mouth shape of the user can be distinguished through the mouth shape recognition model, so that at least one third image frame related to the audio of the user is determined, and third position information of the audio information of the first product in the audio clip is specifically recognized in the third image frame.

The steps A3 and a4 may be executed in parallel, and the position information acquired preferentially is determined as the third position information to stop acquiring the third position information, or one of the schemes may be independently adopted to acquire the third position information, or the third position information acquired in the steps A3 and a4 may be processed to obtain final third position information, and the processing mode is to optimally select and determine a more accurate part of the position information as a part of the final third position information.

In one possible implementation manner, the product information includes audio information of a first product, and a possible method for replacing information of the first product in the first sub-video with information of a second product according to the target position information to obtain a second sub-video includes:

b1, acquiring an audio track of the audio information of the first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;

b2, processing the audio track to obtain a first person sound track and a background music track;

b3, replacing the audio information of the first product in the first voice track with the audio information of the second product according to the third position information to obtain a second voice track;

b4, synthesizing the second human voice track and the background music track to obtain an audio track of the audio information of the second product;

b5, determining the second sub-video based on the audio track of the second product audio information.

The human track can be understood as a voice track of a user when introducing a product, and the background music track can be understood as a voice track of background music in a video. The format of the second audio information is the same as the format of the first audio information, which can be understood as the length of the second audio information is the same as the length of the first audio information, and the speech pause and the pitch of the user's speaking in the audio are also the same.

When the product information of the second product is adopted to replace the information of the first product in the first sub-video, the replacement of the first product identification and the first product name is also included. The method for replacing the first product identifier may be: and according to the target position information, covering the first product identification in the image frame of the first sub video by adopting the second product identification so as to finish the replacement of the product information in the image frame. The method for replacing the first product name refers to a method for replacing the first product identifier, and is not described herein again.

In a possible implementation manner, the audio information of the second product may be further subjected to speech synthesis, and the method specifically includes:

c1, carrying out voice recognition on the first person sound track to obtain first text information of the audio information of the first product;

c2, determining second text information of the audio information of the second product according to the first text information and the product information of the second product;

c3, performing voice conversion on the second text information to obtain reference audio information;

and C4, processing the reference audio information to obtain second product audio information, wherein the tone color of the second product audio information is the same as that of the first audio product information.

The product information of the first product in the first text information can be replaced by the product information of the second product to obtain the second text information.

Or, acquiring a keyword of the first product in the first text information; determining a keyword format of the second product according to the keyword; determining a keyword of the second product according to the product information of the second product and the keyword format; and replacing the keywords of the first product with the keywords of the second product to obtain second text information. The keyword format of the second product is the same as the keyword format of the first product, for example, the number of keywords of the first product is N, and then the number of keywords of the second product is N, and for example, the number of words in the keyword phrase of the first product is 2, and then the number of words in the keyword phrase of the second product is 2.

In one possible implementation manner, a first product video may be further acquired, and the method for acquiring the first product video includes:

d1, acquiring at least one candidate product video according to the product information of the second product;

d2, determining the first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.

Product keywords and the like can be determined according to the product information of the second product, and at least one candidate product video is obtained according to the product keywords and the like, which specifically includes: the at least one candidate product video may be obtained from a database, or may be obtained in other manners, for example, after searching through the internet, the at least one candidate product video may be obtained, or the at least one candidate product video may be obtained from a cloud server.

Similarity between the product information in the at least one candidate product video and the product information of the second product can be obtained, and the first product video is determined according to the similarity. Specifically, the candidate product video corresponding to the highest similarity value may be determined as the first product video. If the similarity is lower than the preset similarity, at least one candidate product video can be obtained from the newly obtained candidate product video, and the first product video is determined from the newly obtained candidate product video. Of course, the coincidence degree of the product information of the third product and the product information of the second product may also be determined, and the candidate product video corresponding to the product information with the highest coincidence degree is determined as the first product video. The degree of overlap can be understood as the degree of overlap of characters in the text of the product information, i.e. the exact same content proportion.

In this example, the key information is determined according to the product information of the second product, the at least one candidate product video is determined according to the key information, and the first product video is determined from the at least one candidate product video, so that the first product video can be intelligently acquired, and the efficiency of video replacement is improved.

In one possible implementation manner, the method for acquiring at least one candidate product video according to the product information of the second product includes:

f1, determining at least one product key word from the product information of the second product;

f2, determining the type of the product video according to at least one product keyword;

f3, obtaining at least one candidate product video according to the type of the product video.

The product keyword may be a product category, a product function, product characteristic information, and the like, and the product characteristic information may be, for example, if the product is a game product, the product characteristic information may be a game type, such as a hand game, an end game, and the like, and further, for example, a battle game in the hand game, a tower defense game, and the like.

The type of the video is determined according to the key information, and the type of the video can be determined through a preset corresponding relation, for example, if the product characteristic information is a match game, the type can be a match type, and the like.

The at least one candidate product video can be obtained from the database according to the type of the product video, or can be obtained in other manners, for example, after searching from the internet, the at least one candidate product video is obtained, or the at least one candidate product video is obtained from the cloud server, the at least one candidate product video is obtained from a third-party platform, and the like.

In this example, the type of the product video is determined by the product keyword in the product information of the second product, and at least one candidate product video is determined according to the type, so that the accuracy of candidate product identification and acquisition can be improved.

Referring to fig. 3, fig. 3 is a schematic flow chart of another video production method according to an embodiment of the present application. As shown in fig. 3, the video production method includes:

301. performing segmentation processing on the first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of the first product;

the product information of the first product includes at least one of a first product identification, a first product name, and audio information of the first product.

302. Identifying a plurality of image frames in the first sub-video through a video identification model to obtain target position information of product information of a first product in the first sub-video, wherein the video identification model comprises at least one of a text identification model, an audio identification model, a mouth shape identification model and an identification model;

the target location information may be acquired by at least one of:

identifying an audio clip included in the first sub-video through an audio identification model to obtain third position information of the audio information of the first product in the audio clip;

303. Replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;

304. and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.

The first product introduction fragment is used for introducing products and the like to the core selling points of the first product, and can also be used for introducing the first product in detail and the like.

Referring to fig. 4, fig. 4 is a schematic flow chart of another video production method according to an embodiment of the present application. As shown in fig. 4, the video production method includes:

401. determining at least one product key from the product information of the second product;

the key information may be, for example, a keyword, product feature information, and the like, and the product feature information may be, for example, if the product is a game product, the product feature information may be a game type, for example, a hand game, an end game, and the like, and further, for example, a battle game in the hand game, a tower defense game, and the like.

402. Determining the type of the product video according to at least one product keyword;

403. obtaining at least one candidate product video according to the type of the product video;

404. determining a first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product;

405. Performing segmentation processing on the first product video to obtain a first sub-video;

406. acquiring target position information of product information of a first product in a first sub-video;

407. replacing the product information of the first product in the first sub-video with the product information of the second product according to the target position information to obtain a second sub-video;

408. and splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction fragment of the second product.

In accordance with the foregoing embodiments, please refer to fig. 5, fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application, and as shown in the drawing, the terminal includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions, and the program includes instructions for performing the following steps;

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the terminal includes corresponding hardware structures and/or software modules for performing the respective functions in order to implement the above-described functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the terminal may be divided into the functional units according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

In accordance with the above, please refer to fig. 6, fig. 6 is a schematic structural diagram of a video production apparatus according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:

a processing unit 601, configured to perform segmentation processing on a first product video to obtain a first sub-video, where the first sub-video includes an introduction section of a first product;

an obtaining unit 602, configured to obtain target position information of product information of a first product in a first sub-video;

a replacing unit 603, configured to replace product information of a first product in the first sub-video with product information of a second product according to the target position information, so as to obtain a second sub-video;

the splicing unit 604 is configured to splice the second sub-video and the third sub-video to obtain a second product video, where the third sub-video includes an introduction section of the second product.

In a possible implementation manner, the product information includes at least one of a first product identifier, a first product name, and audio information of the first product, and the obtaining unit 602 is configured to:

In one possible implementation manner, in terms of obtaining the target position information of the product information of the first product in the first sub video by performing recognition processing on a plurality of image frames in the first sub video through a video recognition model, the obtaining unit 602 is specifically configured to perform at least one of the following:

In one possible implementation, the product information includes audio information of the first product, and the replacing unit 603 is configured to:

synthesizing the second voice track and the background music track to obtain an audio track of the audio information of the second product;

In one possible implementation, the apparatus is further configured to:

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the video production methods as described in the above method embodiments.

Embodiments of the present application further provide a computer program product, wherein the computer readable storage medium stores computer program instructions, which when executed by a processor, cause the processor to perform part or all of the steps of any one of the video production methods as described in the above method embodiments.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated unit, if implemented in the form of a software program module and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solutions of the present application, in essence or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of video production, the method comprising:

performing segmentation processing on a first product video to obtain a first sub-video, wherein the first sub-video comprises an introduction fragment of a first product;

acquiring target position information of product information of a first product in the first sub-video;

replacing the product information of the first product in the first sub-video with the product information of a second product according to the target position information to obtain a second sub-video;

splicing the second sub video and a third sub video to obtain a second product video, wherein the third sub video comprises an introduction fragment of the second product;

the product information of the first product includes audio information of the first product, and the replacing, according to the target position information, the product information of the first product in the first sub-video with the product information of the second product to obtain a second sub-video includes:

acquiring an audio track of the audio information of the first product from the first sub-video, wherein the audio track comprises a human voice track and a background music track;

replacing the audio information of the first product in the first voice track with the audio information of the second product according to third position information of the audio information of the first product in an audio clip included in the first sub-video to obtain a second voice track;

and determining the second sub-video according to the audio track of the second product audio information.

2. The method of claim 1, wherein the product information of the first product comprises at least one of a first product identifier, a first product name and audio information of the first product, and the obtaining of the target position information of the product information of the first product in the first sub-video comprises:

3. The method of claim 2,

the identifying processing is carried out on a plurality of image frames in the first sub-video through a video identifying model, so that the target position information of the product information of the first product in the first sub-video is obtained, and the target position information comprises at least one of the following items:

performing identification processing on at least one first image frame in the first sub-video through an identification model to obtain first position information of the first product identification in the at least one first image frame;

identifying the audio clip through an audio identification model to obtain the third position information;

performing mouth shape recognition processing on the plurality of image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.

4. The method of claim 1, further comprising:

determining second text information of audio information of a second product according to the first text information and the product information of the second product;

performing voice conversion on the second text information to obtain reference audio information;

and processing the reference audio information to obtain the second product audio information, wherein the tone color of the second product audio information is the same as that of the first product audio information.

5. The method according to any one of claims 1-4, further comprising:

and determining the first product video from the at least one candidate product video according to the product information in the at least one candidate product video and the product information of the second product.

6. The method of claim 5, wherein obtaining at least one candidate product video based on the product information of the second product comprises:

determining the type of the product video according to the at least one product keyword;

and acquiring the at least one candidate product video according to the type of the product video.

7. A video production apparatus, characterized in that the apparatus comprises:

the processing unit is used for carrying out segmentation processing on the first product video to obtain a first sub video, and the first sub video comprises an introduction fragment of a first product;

the replacing unit is used for replacing the product information of the first product in the first sub-video with the product information of a second product according to the target position information so as to obtain a second sub-video;

the splicing unit is used for splicing the second sub-video and the third sub-video to obtain a second product video, wherein the third sub-video comprises an introduction section of the second product;

the product information of the first product comprises audio information of the first product, and the replacement unit is configured to:

8. The apparatus of claim 7, wherein the product information of the first product comprises at least one of a first product identification, a first product name, and audio information of the first product, and wherein the obtaining unit is configured to:

9. The apparatus according to claim 8, wherein in the aspect that the identification processing is performed on the plurality of image frames in the first sub-video through a video identification model to obtain the target location information of the product information of the first product in the first sub-video, the obtaining unit is specifically configured to perform at least one of the following:

performing mouth shape recognition processing on the image frames in the first sub-video through a mouth shape recognition model to obtain at least one third image frame corresponding to the audio information of the first product in the first sub-video, and determining third position information of the audio information of the first product in the audio clip based on the at least one third image frame.

10. The apparatus of claim 7, wherein the apparatus is further configured to:

and processing the reference audio information to obtain the second product audio information, wherein the tone of the second product audio information is the same as that of the first product audio information.

11. The apparatus of any of claims 7-10, further configured to:

12. The apparatus of claim 11, wherein the apparatus is further configured to:

13. A terminal, comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store computer program instructions, and the processor is configured to invoke the computer program instructions to perform the method of any of claims 1-6.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer program instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-6.