CN114039958A

CN114039958A - Multimedia processing method and device

Info

Publication number: CN114039958A
Application number: CN202111315205.5A
Authority: CN
Inventors: 喻昱
Original assignee: Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Current assignee: Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2022-02-11

Abstract

The invention discloses a multimedia processing method and a device, wherein the method comprises the following steps: obtaining a target video file, wherein the target video file comprises video content and at least one role tag, and different role tags correspond to different character roles in the video content; identifying a character avatar to be replaced in the video content based on the character tag selected by the user; and in the video content output process, acquiring the dynamic video of the user in real time, and controlling the video of the head portrait area in the dynamic video to be output at the position of the head portrait of the character to be replaced in the video content. According to the implementation scheme, the face of the designated role in the video content can be dynamically replaced, and the replaced face is also a dynamic video, so that the limitation of static picture replacement is broken, application implementation with higher interestingness is provided, the authenticity of the replaced video content can be improved, and the use experience of a user is optimized.

Description

Multimedia processing method and device

Technical Field

The present invention relates to the multimedia field, and more particularly, to a multimedia processing method and apparatus.

Background

Currently, many users of platforms like a new entertainment mode, that is, the face image of a person in some images or video contents is replaced with their own face image, so that the person in the processed images or video contents becomes the principal character of themselves. This technique of replacing the face image may become a face replacement technique.

The current way of face replacement includes: the first mode is that a feature point set of a first face region image and a corresponding 'sectional region image' are extracted; calculating a feature point set in a second face region image and a corresponding 'mapping region image'; adjusting a 'map region image' of a second face region image according to the parameters of the feature point set of the first face region image to obtain a 'replacement map region head portrait'; replacing the head portrait of the 'replacement mapping area' with the 'mapping area image'; and the other method is to identify key parts of a first face and a second face, calculate a motion vector field from the key parts of the second face area to the key parts of the first face area, and replace the face based on the calculated motion vector field.

However, the current face replacement technology is face replacement based on a static picture, and the replaced image or video content is difficult to meet the user requirements in terms of reality and interestingness.

Disclosure of Invention

In view of this, the present invention provides the following technical solutions:

a multimedia processing method, comprising:

obtaining a target video file, wherein the target video file comprises video content and at least one role tag, and different role tags correspond to different character roles in the video content;

identifying a character avatar to be replaced in the video content based on the character tag selected by the user;

and in the video content output process, acquiring the dynamic video of the user in real time, and controlling the video of the head portrait area in the dynamic video to be output at the position of the head portrait of the character to be replaced in the video content.

Optionally, the controlling the video of the avatar area in the dynamic video to be output at the position where the avatar of the character to be replaced is located in the video content includes:

erasing the position of the character head portrait to be replaced to obtain a head portrait video to be supplemented, wherein the head portrait does not contain the head portrait to be replaced;

filling the video of the head portrait area in the dynamic video in a blank area in the head portrait video to be supplemented, wherein the blank area corresponds to the position of the head portrait of the character to be replaced.

marking the identified character head image to be replaced with a first label;

marking the video of the head portrait region in the dynamic video with a second label;

and replacing the content corresponding to the first label in the video content with the content corresponding to the second label.

Optionally, before the controlling the video of the avatar area in the dynamic video and outputting the position of the avatar to be replaced in the video content, the method further includes:

and storing the target video file.

Optionally, the identifying the character avatar to be replaced in the video content based on the character tag selected by the user includes:

identifying at least two character images to be replaced in the video content based on at least two character tags selected by a user;

then, in the process of outputting the video content, acquiring the dynamic video of the user in real time, and controlling the video of the head portrait area in the dynamic video to be output at the position of the head portrait of the character to be replaced in the video content, including:

in the video content output process, at least two different image acquisition devices respectively acquire dynamic videos of at least two users in a one-to-one real-time manner, and control videos of at least two head portrait areas in the dynamic videos to be respectively output at the positions of the at least two character head portraits to be replaced in the video content.

A multimedia processing apparatus comprising:

the system comprises a file determining module, a searching module and a searching module, wherein the file determining module is used for obtaining a target video file, the target video file comprises video content and at least one role tag, and different role tags correspond to different character roles in the video content;

the content identification module is used for identifying the character head portrait to be replaced in the video content based on the character label selected by the user;

and the video processing module is used for acquiring the dynamic video of the user in real time in the video content output process and controlling the video of the head portrait area in the dynamic video to be output at the position of the head portrait of the character to be replaced in the video content.

Optionally, the video processing module includes:

the image processing module is used for erasing the position of the character head portrait to be replaced to obtain a head portrait video to be supplemented, which does not contain the head portrait to be replaced;

and the filling processing module is used for filling the video of the head portrait area in the dynamic video into a blank area in the head portrait video to be filled, wherein the blank area corresponds to the position of the head portrait of the character to be replaced.

Optionally, the video processing module includes:

the content marking module is used for marking the identified character head portrait to be replaced with a first label and marking the video of the head portrait area in the dynamic video with a second label;

and the replacement processing module is used for replacing the content corresponding to the first label in the video content with the content corresponding to the second label.

Optionally, the method further includes:

and the storage module is used for storing the target video file.

Optionally, the content identification module is specifically configured to: identifying at least two character images to be replaced in the video content based on at least two character tags selected by a user;

the video processing module is specifically configured to: in the video content output process, at least two different image acquisition devices respectively acquire dynamic videos of at least two users in a one-to-one real-time manner, and control videos of at least two head portrait areas in the dynamic videos to be respectively output at the positions of the at least two character head portraits to be replaced in the video content.

As can be seen from the foregoing technical solutions, compared with the prior art, an embodiment of the present invention discloses a multimedia processing method and apparatus, where the method includes: obtaining a target video file, wherein the target video file comprises video content and at least one role tag, and different role tags correspond to different character roles in the video content; identifying a character avatar to be replaced in the video content based on the character tag selected by the user; and in the video content output process, acquiring the dynamic video of the user in real time, and controlling the video of the head portrait area in the dynamic video to be output at the position of the head portrait of the character to be replaced in the video content. According to the implementation scheme, the face of the designated role in the video content can be dynamically replaced, and the replaced face is also a dynamic video, so that the limitation of static picture replacement is broken, application implementation with higher interestingness is provided, the authenticity of the replaced video content can be improved, and the use experience of a user is optimized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a multimedia processing method according to an embodiment of the present invention;

fig. 2 is a flowchart for controlling face replacement according to an embodiment of the present invention;

fig. 3 is a flowchart of another method for controlling face replacement according to the embodiment of the present invention;

FIG. 4 is a flow chart of another method for multimedia processing according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a multimedia processing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video processing module according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another video processing module according to an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application can be applied to electronic equipment, the product form of the electronic equipment is not limited by the application, and the electronic equipment can include but is not limited to a smart phone, a tablet computer, wearable equipment, a Personal Computer (PC), a netbook and the like, and can be selected according to application requirements.

Fig. 1 is a flowchart of a multimedia processing method according to an embodiment of the present invention, and referring to fig. 1, the multimedia processing method may include:

step 101: and obtaining a target video file, wherein the target video file comprises video content and at least one role tag, and different role tags correspond to different character roles in the video content.

And the target video file refers to a video file selected by a user. In specific implementation, a user can input keywords to find a desired target video file in some video applications or platforms through a search function, or select the desired target video file through interface browsing.

The target video file can be, but is not limited to, a movie and television work, a short video and the like, and all roles included in the video content correspond to a unique role tag, so that different role characters can be identified and distinguished conveniently. The role labels may be, for example, serial number labels, character labels, role names, etc., and based on the role labels, role characters in the video content may be located.

Step 102: and identifying the character head portrait to be replaced in the video content based on the character label selected by the user.

In this application, the avatar of the character to be replaced specifically refers to the face of the character to be replaced. When the head portrait is replaced subsequently, the head portrait is also replaced by the face area, so that the head shape and the decorative articles of the corresponding character are not changed before and after replacement, and the harmony of the whole picture is not influenced.

The user can determine a role character which the user is interested in or wants to experience from the video content, and then select a role label corresponding to the role character, so that the system can perform face recognition on tasks appearing in the video content according to the determined role label, recognize a part containing the role character in the video content, and position the face of the role character, thereby facilitating subsequent processing aiming at the face of the role character.

Step 103: and in the video content output process, acquiring the dynamic video of the user in real time, and controlling the video of the head portrait area in the dynamic video to be output at the position of the head portrait of the character to be replaced in the video content.

Because the application needs to realize the replacement of the face dynamic video, the real-time replacement and output of the face video of the user need to be completed in the video content output process. In the specific implementation, the front-mounted or rear-mounted camera can be opened in the video content output process, the dynamic video of the user can be recorded in real time, each expression and expression of the user can be recorded in real time, and the user can record the expression and voice of the user according to the scenario and the lines of the leading horn.

The video of the face area in the dynamic video of the user recorded in real time needs to be output at the position of the character head portrait to be replaced in the video content, so that the replacement of the face is realized. Through the technical scheme disclosed by the embodiment of the application, the face of the designated role can be identified in the movie and television play, the face of the role is replaced by the user after identification, and then the user watches the face of the role, so that the experience feeling of the user who is the main role is provided for the user.

The multimedia processing method can dynamically replace the face of the designated role in the video content, and the replaced face is also a dynamic video, so that the limitation of static picture replacement is broken, application implementation with higher interestingness is provided, the authenticity of the replaced video content can be improved, and the use experience of a user is optimized.

In an implementation, the controlling of the video of the avatar area in the dynamic video to be output at the position of the avatar to be replaced in the video content may include:

step 201: and erasing the position of the character head portrait to be replaced to obtain a head portrait video to be supplemented, which does not contain the head portrait to be replaced.

Step 202: filling the video of the head portrait area in the dynamic video in a blank area in the head portrait video to be supplemented, wherein the blank area corresponds to the position of the head portrait of the character to be replaced.

In the implementation, after the character head portrait to be replaced in the video content is identified, the content at the position corresponding to the character head portrait to be replaced is directly erased or cut to form a blank area, and then the video of the face area of the user needing to be output can be directly filled in the blank area, so that the splicing and dynamic output of the face area of the user and other areas of the video content except the position corresponding to the character head portrait to be replaced are realized.

Fig. 3 is another flowchart for controlling face replacement according to another embodiment of the present invention, and in another implementation, with reference to fig. 3, the controlling of the video of the avatar area in the dynamic video to be output at the position of the avatar to be replaced in the video content may include:

step 301: and marking the identified character head image to be replaced with a first label.

After the character head portrait to be replaced in the video content is identified, the character head portrait to be replaced is directly marked with the first label, so that the subsequent corresponding processing of the character head portrait to be replaced based on the first label is directly performed, and the processing efficiency is improved.

Step 302: and marking the video of the head portrait area in the dynamic video with a second label.

Similarly, the second tag is marked on the video of the head portrait area in the dynamic video recorded by the user, so as to perform corresponding processing based on the tag conveniently in the following process.

Step 303: and replacing the content corresponding to the first label in the video content with the content corresponding to the second label.

After the avatar needing to be replaced and the avatar to be replaced are marked, the content corresponding to the first tag can be replaced by the content corresponding to the second tag directly based on the tags of the two, namely, the video of the facial area recorded by the user is replaced by the video corresponding to the avatar to be replaced.

According to the embodiment, the real-time video of the user is recorded in a video mode, the role face in the TV play set is replaced, the user can replace the recorded real-time video with the role head portrait in the play set through recording of the camera, and dynamic replacement among the dynamic videos is achieved.

In another implementation, before the video for controlling the avatar area in the dynamic video is output at the position of the avatar to be replaced in the video content, the method may further include: and storing the target video file. The step stores the original video resources before processing, stores the original video resources in a database, and can provide necessary video resource support subsequently.

Fig. 4 is a flowchart of another multimedia processing method according to an embodiment of the present invention. Referring to fig. 4, in another implementation, a multimedia processing method may include:

step 401: and obtaining a target video file, wherein the target video file comprises video content and at least one role tag, and different role tags correspond to different character roles in the video content.

Step 402: identifying at least two to-be-replaced character images in the video content based on at least two character tags selected by a user.

In the implementation, different users can simultaneously replace the faces of people with different roles aiming at one video content, so that the implementation interestingness of the method is further increased. For example, two friends respectively carry out face replacement aiming at a role A and a role B in a movie and television work, so that the two friends play against each other in a performance; certainly, in implementation, the role figure corresponding to each user needs to be determined in advance, so that the face of the role a can be ensured to be completely replaced by the face of the first user, and the face of the role B can be ensured to be completely replaced by the face of the second user.

Step 403: in the video content output process, at least two different image acquisition devices respectively acquire dynamic videos of at least two users in a one-to-one real-time manner, and control videos of at least two head portrait areas in the dynamic videos to be respectively output at the positions of the at least two character head portraits to be replaced in the video content.

In the implementation, two users can acquire own face videos through two electronic devices with a connection relation, so that the face videos of the two users are replaced to the face videos of corresponding roles in the video content in real time.

The multimedia processing method can enable a plurality of users to participate in face replacement simultaneously aiming at the same video file, increases the interestingness of the method implementation, and can attract more users.

In one specific implementation, the implementation of the multimedia processing method includes two parts, namely a server and a client. The client comprises a front camera and a rear camera video recording module, a dynamic face recognition module, a video positioning function module, a video replacement function module and a video playing function module; the server side comprises a label identification module, a video resource storage module and a matching keyword module.

Firstly, a user selects a video resource at a client, searches a video name, and finds out the corresponding video resource in a database stored in the video resource according to the searched name label

And identifying replaceable role head portraits in the drama according to the replaceable role labels, cutting and erasing the role head portraits after identification, wherein all the role head portraits in the whole drama set need to be identified so as to be replaced

After the identification is finished, cutting and erasing the identified head portrait, removing the head portrait of the original role or not, directly recording the head portrait of the original role as a so as to provide a mark in the subsequent replacement

And then, the front-back camera permission is opened, the front-back camera or the rear-back camera is opened while the video is played, the dynamic video of the user is recorded in real time, each expression and expression of the user can be recorded in real time, the user can record the expression and voice of the user according to the scenario and the lines of the main corners, and the video is recorded as b.

And replacing the generated b label video and the a label video by using a video replacement function module, and replacing the hero head portrait and the video recorded by the user to realize dynamic replacement among videos.

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

The method is described in detail in the embodiments disclosed above, and the method of the present invention can be implemented by various types of apparatuses, so that the present invention also discloses an apparatus, and the following detailed description will be given of specific embodiments.

Fig. 5 is a schematic structural diagram of a multimedia processing apparatus according to an embodiment of the present invention, and referring to fig. 5, the multimedia processing apparatus 50 may include:

a file determining module 501, configured to obtain a target video file, where the target video file includes video content and at least one role tag, and different role tags correspond to different personas in the video content.

A content identification module 502, configured to identify a character avatar to be replaced in the video content based on the character tag selected by the user.

The video processing module 503 is configured to acquire a dynamic video of the user in real time during the video content output process, and control the video of the avatar area in the dynamic video to be output at the position of the avatar to be replaced in the video content.

The multimedia processing device can dynamically replace the face of the designated role in the video content, and the replaced face is also a dynamic video, so that the limitation of static picture replacement is broken, application implementation with higher interestingness is provided, the authenticity of the replaced video content can be improved, and the use experience of a user is optimized.

Fig. 6 is a schematic structural diagram of a video processing module. In one implementation, the video processing module 503 may include: the image processing module 601 is configured to perform an erasing operation on the position of the avatar to be replaced, so as to obtain an avatar video to be supplemented, where the avatar to be replaced does not include the avatar to be replaced; a filling processing module 602, configured to fill a blank area in the to-be-supplemented avatar video with a video of an avatar area in the dynamic video, where the blank area corresponds to a location of the to-be-replaced character avatar.

Fig. 7 is a schematic structural diagram of another video processing module. In one implementation, the video processing module 503 may include: a content marking module 701, configured to mark the identified character head image to be replaced with a first tag, and mark a video in a head image area in the dynamic video with a second tag; a replacement processing module 702, configured to replace, in the video content, the content corresponding to the first tag with the content corresponding to the second tag. In one implementation, the multimedia processing apparatus may further include: and the storage module is used for storing the target video file.

In one implementation, the content identification module is specifically configured to: identifying at least two character images to be replaced in the video content based on at least two character tags selected by a user;

For specific implementation and other possible implementations of the multimedia processing apparatus and each module thereof, reference may be made to the content descriptions of corresponding parts in the method embodiments, and no repeated description is given here.

The multimedia processing apparatus in any of the above embodiments includes a processor and a memory, the file determination module, the content identification module, the video processing module, the image processing module, the filling processing module, the content marking module, the replacement processing module, the storage module, and the like in the above embodiments are all stored in the memory as program modules, and the processor executes the program modules stored in the memory to implement corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program module from the memory. The kernel can be provided with one or more, and the processing of the return visit data is realized by adjusting the kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

An embodiment of the present invention provides a storage medium on which a program is stored, which when executed by a processor implements the multimedia processing method described in the above embodiments.

The embodiment of the invention provides a processor, wherein the processor is used for running a program, and the program executes the multimedia processing method in the embodiment when running.

Further, the present embodiment provides an electronic device, which includes a processor and a memory. Wherein the memory is used for storing executable instructions of the processor, and the processor is configured to execute the multimedia processing method described in the above embodiments via executing the executable instructions.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for multimedia processing, comprising:

2. The method according to claim 1, wherein the controlling of the video of the avatar area in the dynamic video to be output at the position of the avatar to be replaced in the video content comprises:

3. The method according to claim 1, wherein the controlling of the video of the avatar area in the dynamic video to be output at the position of the avatar to be replaced in the video content comprises:

marking the identified character head image to be replaced with a first label;

4. The method according to claim 1, further comprising, before the video for controlling the avatar area in the dynamic video is output at the position of the avatar to be replaced in the video content, the steps of:

and storing the target video file.

5. The method of claim 1, wherein the identifying the character avatar to be replaced in the video content based on the user-selected character tag comprises:

6. A multimedia processing apparatus, comprising:

7. The multimedia processing apparatus of claim 6, wherein the video processing module comprises:

8. The multimedia processing apparatus of claim 6, wherein the video processing module comprises:

9. The multimedia processing apparatus according to claim 6, further comprising:

and the storage module is used for storing the target video file.

10. The multimedia processing apparatus of claim 6, wherein the content identification module is specifically configured to: identifying at least two character images to be replaced in the video content based on at least two character tags selected by a user;