CN111669647A - Real-time video processing method, device, equipment and storage medium - Google Patents

Real-time video processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN111669647A
CN111669647A CN202010537321.0A CN202010537321A CN111669647A CN 111669647 A CN111669647 A CN 111669647A CN 202010537321 A CN202010537321 A CN 202010537321A CN 111669647 A CN111669647 A CN 111669647A
Authority
CN
China
Prior art keywords
frame
video
style conversion
face
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010537321.0A
Other languages
Chinese (zh)
Other versions
CN111669647B (en
Inventor
李鑫
李甫
林天威
何栋梁
张赫男
孙昊
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010537321.0A priority Critical patent/CN111669647B/en
Publication of CN111669647A publication Critical patent/CN111669647A/en
Application granted granted Critical
Publication of CN111669647B publication Critical patent/CN111669647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for processing a real-time video, and relates to the technical field of digital image processing and deep learning. The specific implementation scheme is as follows: adding the video frame into a video frame set, and acquiring a current processing frame from the video frame set; inputting the current processing frame into a face style conversion model, and acquiring a first type style conversion frame output by the model; generating a corresponding second-class style conversion frame by taking the current processing frame and the first-class style conversion frame as starting points according to the position relation between the key points of each face in the subsequent video frames and the previous video frame with the set number in the video frame set; and after acquiring a new current processing frame from the video frame set, returning to execute the operation of inputting the current processing frame into the face style conversion model, and taking the first type style conversion frame and the second type style conversion frame as real-time video processing results. The technical scheme of the embodiment of the application can generate the matched style face in real time based on the real face in the video.

Description

Real-time video processing method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to image processing and deep learning technologies, in particular to a digital image processing technology, and specifically relates to a real-time video processing method, device, equipment and storage medium.
Background
Along with the continuous improvement of living standard, the demands of users for entertainment are more and more diversified, and the users pay more and more attention and love from users to transform the real faces in the videos into the faces in the cartoon style.
In the prior art, a matched style face is generated based on a real face in a video in an off-line state, or a fixed style face which is generated in advance is driven in real time by using the real face in the video to generate a style face which is consistent with the expression of the real face. However, both of these methods cannot generate a style face matching a real face in real time based on the real face in the video.
Disclosure of Invention
The embodiment of the application provides a method, a device and equipment for processing a real-time video and a storage medium, which realize real-time generation of a style face matched with a real face based on the real face in the video.
In a first aspect, an embodiment of the present application provides a method for processing a real-time video, including:
adding a video frame acquired in real time into a video frame set, and acquiring a current processing frame from the video frame set, wherein the video frame comprises a real human face;
inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face;
generating a set number of second-class style conversion frames by taking a current processing frame and a first-class style conversion frame as starting points according to the position relation between the key points of each face in a set number of subsequent video frames and a previous video frame in a video frame set;
and after acquiring a new current processing frame from the video frame set, returning to execute the operation of inputting the current processing frame into the face style conversion model, and taking the first type style conversion frame and the second type style conversion frame as real-time video processing results.
In a second aspect, an embodiment of the present application further provides a device for processing a real-time video, including:
the acquisition module is used for adding the video frames acquired in real time into the video frame set and acquiring the current processing frame from the video frame set, wherein the video frames comprise real faces;
the first conversion module is used for inputting the current processing frame into the face style conversion model and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face;
the second transformation module is used for generating the second type of style transformation frames with the set number according to the position relation between the key points of each face in the subsequent video frames and the previous video frames with the set number in the video frame set by taking the current processing frame and the first type of style transformation frames as starting points;
and the circulating module is used for returning and executing the operation of inputting the current image frame into the face style conversion model after acquiring a new current processing frame from the video frame set, and taking the first type style conversion frame and the second type style conversion frame as real-time video processing results.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of processing real-time video as provided in any of the embodiments of the present application.
In a fourth aspect, the embodiments of the present application further provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the processing method of real-time video provided in any of the embodiments of the present application.
According to the technical scheme of the embodiment of the application, the video frames acquired in real time are added into the video frame set, and the current processing frame is acquired from the video frame set, wherein the video frames comprise real faces; inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face; generating a set number of second-class style conversion frames by taking a current processing frame and a first-class style conversion frame as starting points according to the position relation between the key points of each face in a set number of subsequent video frames and a previous video frame in a video frame set; after a new current processing frame is acquired from the video frame set, the operation of inputting the current processing frame into the face style conversion model is returned to be executed, and the first type style conversion frame and the second type style conversion frame are used as real-time video processing results, so that a style face matched with the real face is generated in real time based on the real face in the video, and the problem that the style face matched with the real face in the video cannot be generated in real time in the prior art is solved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a flow chart of a method for processing real-time video according to a first embodiment of the present application;
fig. 2a is a flow chart of a method for processing real-time video according to a second embodiment of the present application;
FIG. 2b is a flow chart of a process for real-time video according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a real-time video processing apparatus according to a third embodiment of the present application;
fig. 4 is a block diagram of an electronic device for implementing a method for processing real-time video according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
First embodiment
Fig. 1 is a flowchart of a method for processing a real-time video in a first embodiment of the present application, where the technical solution of this embodiment is suitable for a case where a style face matched with a real face in a video is generated in real time, and the method may be implemented by processing a real-time video, and the apparatus may be implemented in a software and/or hardware manner, and may be generally integrated in an electronic device, for example, a terminal device, and the method of this embodiment specifically includes the following steps:
and step 110, adding the video frames acquired in real time into a video frame set, and acquiring a current processing frame from the video frame set.
Wherein, the video frame comprises a real face.
In this embodiment, the video frame may be a certain frame of image acquired in the process of shooting a video by the terminal device, and exemplarily, the video frame may be a frame of image acquired in real time in the process of recording the video, or may be a frame of image acquired in real time in the process of live broadcasting the video. The video frame set is a set for storing video frames acquired in real time, and usually all video frames of a video are put into one video frame set so as to distinguish the video frames of different videos through different video frame sets.
Optionally, adding the video frames acquired in real time to the video frame set may include: and in the video recording process, responding to a face style conversion request, and adding the video frames acquired in real time into a video frame set.
In this optional embodiment, the video recording process refers to a process of only performing video shooting and not playing the shot content in real time, and in the video recording process, if a user clicks a face style conversion option in a current shooting page, a face style conversion request is responded, and after the user clicks the face style conversion option, video frames collected in real time are added to a video frame set, or all the video frames collected in real time from the beginning of video recording are added to the video frame set.
Optionally, adding the video frames acquired in real time to the video frame set may include: and in the live video broadcasting process, responding to a face style conversion request, and adding the video frames acquired in real time into a video frame set.
In this optional embodiment, the live video broadcast process refers to a process of playing a shot content in real time while shooting a video, and in the live video broadcast process, if a user clicks a face style conversion option in a current live broadcast page, a face style conversion request is responded, and after the user clicks the face style conversion option, a video frame acquired in real time is added to a video frame set, or all video frames acquired in real time from video recording are added to the video frame set.
In this embodiment, after the video frames acquired in real time are added to the video frame set, a video frame which has the most previous current acquisition time and is not subjected to face style conversion processing is acquired from the video frame set according to the sequence of the acquisition times of the video frames, and the video frame is used as a current processing frame.
And 120, inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model.
Wherein the style conversion frame comprises a style face.
In this embodiment, the face style conversion model is configured to generate a style face matched with a real face according to the real face in the input video frame, and output a first type of style conversion frame including the style face. The first type of style conversion frame is a video frame obtained by replacing a real face in a current processing frame with a style face, each style face is only matched with a unique real face, namely each real face has a special style face, and the size of the two types of style conversion frames is consistent with the included face characteristics.
And step 130, taking the current processing frame and the first-class style conversion frame as starting points, and generating a set number of second-class style conversion frames according to the position relation between the key points of each face in the subsequent video frames and the previous video frames in the set number of video frames.
In the embodiment, considering that the face style conversion model is independently adopted to perform the style conversion on the real faces in all the video frames acquired in real time, the calculation amount is large, the time consumption is long, and the real-time effect cannot be achieved, therefore, the way of performing the style conversion on the real faces in the video frames according to the face style conversion model is combined with the way of performing the style conversion on the real faces in the video frames according to the position variation between the key points of each face in two adjacent video frames, and the style faces matched with the real faces in the video frames are generated in real time through the alternate use of the two ways.
In this embodiment, after the first-type style conversion frame matched with the currently processed frame is obtained through the face style conversion model, the second-type style conversion frames with the set number respectively matched with the subsequent video frames with the set number after the currently processed frame in the video frame set are generated according to the position variation between the key points of each face in the two adjacent video frames, and then the above processes are repeated to generate the first-type style conversion frame or the second-type style conversion frame respectively corresponding to the subsequent video frames in the video frame set.
Optionally, generating the second type of style transformation frames in the set number according to the position relationship between the key points of the face in the previous video frame and the subsequent video frames in the set number with the current processing frame and the first type of style transformation frame as starting points may include: taking the current processing frame as a processing starting point frame, and acquiring a subsequent video frame of the processing starting point frame in the video frame set; generating a face key point transformation matrix according to the image positions of each face key point in the subsequent video frame and the processing starting point frame in the corresponding video frame; generating a second type style conversion frame of a subsequent video frame according to the face key point conversion matrix and the first type style conversion frame or the second type style conversion frame matched with the processing starting point frame; and taking the next video frame as a processing starting point frame, returning and executing the operation of obtaining the next video frame of the processing starting point frame in the video frame set until the processing quantity of the next video frame reaches the set quantity.
In this optional embodiment, a specific manner is provided for generating the second type of style conversion frames in the set number according to the position relationship between the key points of each face in the set number of subsequent video frames and the previous video frame in the video frame set, with the currently processed frame and the first type of style conversion frame as starting points, and the specific process is as follows:
firstly, taking a current processing frame as a processing starting frame, acquiring a next video frame of the processing starting frame from a video frame set, calculating coordinate variation between each face key point in two adjacent video frames according to the position coordinate of each face key point in the next video frame and the position coordinate of each face key point in the video frame in the processing starting frame, further obtaining a face key point transformation matrix, then adjusting the coordinate of each face key point of a style face in a first type style transformation frame matched with the processing starting frame according to the face key point transformation matrix to generate a second type style transformation frame of the next video frame, judging whether the number of the continuously generated second type style transformation frames is equal to a set number, if so, stopping operating the video frames after the next video frame in the video frame set, if not, the next video frame is taken as a processing starting point frame, the operation of acquiring the next video frame of the processing starting point frame from the video frame set is returned to be executed, the coordinates of each face key point of the style face in the second type style conversion frame matched with the processing starting point frame are continuously adjusted according to the face key point conversion matrix corresponding to the two adjacent video frames, and the second type style conversion frame of the next video frame is generated.
It should be noted that the set number in the embodiment of the present application may be 1, 2 or other values set according to requirements. When the second type of style conversion frame is generated according to the face key point conversion matrix, the position of the face key point is only required to be adjusted for the style face of the style conversion frame matched with the previous video frame, the calculated amount is less, and the time consumption is shorter, so that the speed of realizing face style conversion for the video frames in the video frame set is higher when the set amount is larger. However, in order to maintain a high degree of matching between the style face and the real face, the set number is not too large.
And step 140, after acquiring a new current processing frame from the video frame set, returning to execute the operation of inputting the current processing frame into the face style conversion model, and taking the first type style conversion frame and the second type style conversion frame as real-time video processing results.
In this embodiment, after the number of second-class style conversion frames generated continuously according to the face key point conversion matrices corresponding to the two adjacent video frames is equal to the set number, a new current processing frame needs to be acquired from the video frame set, the operation of inputting the current processing frame into the face style conversion model is performed, the first-class style conversion frame matched with the current processing frame is obtained again through the face style conversion model, and the set number of second-class style conversion frames respectively matched with the set number of subsequent video frames after the current processing frame in the video frame set is generated according to the face key point conversion matrices corresponding to the two adjacent video frames.
Optionally, the processing, with the first-class style transformation frame and the second-class style transformation frame as the real-time video processing result, may include: and recording and playing the first type style conversion frame and the second type style conversion frame in real time, and generating a recorded video.
In this optional embodiment, if the video frames included in the video frame set are acquired in real time during the video recording process, the first type of style conversion frames generated according to the face style conversion model and the second type of style conversion frames generated according to the face key point conversion matrix are recorded and played in real time, and a recorded video is generated.
Optionally, the processing, with the first-class style transformation frame and the second-class style transformation frame as the real-time video processing result, may include: and generating a live broadcast video stream according to the first type style conversion frame and the second type style conversion frame, and sending the live broadcast video stream to a live broadcast server for live video broadcast.
In this optional embodiment, if the video frames included in the video frame set are acquired in real time during the live video broadcast process, a live video stream is generated according to the first type of style conversion frame generated by the face style conversion model and the second type of style conversion frame generated by the face key point conversion matrix, and the live video stream is sent to a live broadcast server for live video broadcast.
According to the technical scheme of the embodiment of the application, the video frames collected in real time are added into the video frame set, and the current processing frame including the real face is obtained from the video frame set; inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face; generating a set number of second-class style conversion frames by taking a current processing frame and a first-class style conversion frame as starting points according to the position relation between the key points of each face in a set number of subsequent video frames and a previous video frame in a video frame set; after a new current processing frame is acquired from the video frame set, the operation of inputting the current processing frame into the face style conversion model is returned to be executed, and the first type style conversion frame and the second type style conversion frame are used as real-time video processing results, so that a style face matched with the real face is generated in real time based on the real face in the video, and the problem that the style face matched with the real face in the video cannot be generated in real time in the prior art is solved.
Second embodiment
Fig. 2a is a flowchart of a real-time video processing method in a second embodiment of the present application, which is further detailed based on the above embodiments, and provides a specific step of determining a face style conversion model before a current processing frame is input into the face style conversion model, a specific step of recording and playing a first type of style conversion frame and a second type of style conversion frame in real time, and a specific step of generating a live video stream according to the first type of style conversion frame and the second type of style conversion frame, and sending the live video stream to a live server. A method for processing a real-time video according to a second embodiment of the present application is described below with reference to fig. 2a, which includes the following steps:
and step 210, determining a face style conversion model.
In this embodiment, a preset machine learning model is subjected to model training to generate a face style conversion model that can generate a style face matched with a real face according to the real face in an input video frame, and output a first type of style conversion frame including the style face.
Optionally, determining the face style conversion model may include: acquiring a training sample set, wherein the training sample set comprises a plurality of sample image pairs, and each sample image pair comprises an original image and a transformed image; setting a machine learning model for training by using each sample image in the training sample set to obtain the face style conversion model; the original image comprises a real face, and the transformed image comprises a style face matched with the real face.
In this optional embodiment, a plurality of high-quality sample image pairs are obtained to form a training sample set, each sample image pair includes an original image and a transformed image, the original image includes a real face, such as a face in a current processing frame in fig. 2b, and the transformed image includes a style face matched with the real face, such as a face in a first style transformation frame in fig. 2 b. And then, training a set machine learning model by using each sample image in the training sample set, so that the set machine learning model learns and generates a style face matched with the real face in the transformed image according to the real face in the input original image, and the trained machine learning model is a face style conversion model.
Optionally, the machine learning model includes: a countermeasure network is generated.
In this alternative embodiment, the generation countermeasure network includes a generator and a discriminator, the generator is configured to generate a style face matching with a real face in the original image sample, and the discriminator is configured to judge whether the style face generated by the generator is a style face in the transformed image sample. By training the countermeasure network, if the style face generated by the generator is the same as the style face in the converted image sample, and the discriminator cannot find out the difference between the style face generated by the generator and the style face in the converted image sample, namely, the style face generated by the generator is judged to be the same as the style face in the converted image sample, the ideas of the game of the generator and the discriminator are converged, and at the moment, the generated countermeasure network is the face style conversion model.
In this embodiment, the style face in the first-class style transformation frame output by the face style conversion model has a high matching degree with the real face, and the sizes of the face and the face features included in the face and the face are consistent, so that the real face can be vividly converted into the style face, for example, a cartoon-style quadratic element image face, which attracts the attention of a user and increases the interest of the user.
And step 220, adding the video frames acquired in real time into the video frame set, and acquiring the current processing frame from the video frame set.
As can be seen from the explanation of step 110 in the first embodiment, adding the video frames collected in real time to the video frame set may refer to adding the video frames collected in real time to the video frame set in response to the face style conversion request during the video recording process, or may refer to adding the video frames collected in real time to the video frame set in response to the face style conversion request during the video live broadcast process.
And step 230, inputting the current processing frame into the face style conversion model, and acquiring a first type style conversion frame output by the face style conversion model.
Exemplarily, as shown in fig. 2b, the current processing frame is input into the face style conversion model, a quadratic image face matched with the real face in the current processing frame is generated through the trained face style conversion model, and a first-class style conversion frame including the quadratic image face is obtained.
And step 240, taking the current processing frame as a processing starting point frame, and acquiring a video frame subsequent to the processing starting point frame in the video frame set.
And step 250, generating a face key point transformation matrix according to the image positions of each face key point in the subsequent video frame and the processing starting point frame in the corresponding video frame.
For example, as shown in fig. 2b, based on the position coordinates of each face key point in the subsequent video frame in the video frame and the position coordinates of each face key point in the processing start frame in the video frame, the coordinate variation of each face key point in the subsequent video frame with respect to each face key point in the processing start frame, for example, the coordinate variation of each key point of the nose, the coordinate variation of each key point of the face contour, and the like, is calculated, and the face key point transformation matrix is generated.
And step 260, generating a second-class style conversion frame of the next video frame according to the face key point conversion matrix and the first-class style conversion frame or the second-class style conversion frame matched with the processing starting point frame.
In this embodiment, coordinates of each face key point of the style face in the first type of style conversion frame matched with the processing start point frame are correspondingly adjusted according to the coordinate variation of each face key point in the face key point conversion matrix, so as to generate a second type of style conversion frame of the subsequent video frame.
Step 270, determining whether the number of the second-type style conversion frames generated continuously is equal to the set number, if yes, executing step 290, otherwise, executing step 280.
Step 280, the next video frame is taken as the processing starting point frame, and the step 240 is executed again.
Step 290, obtaining a new current processing frame from the video frame set, and returning to execute step 230.
And step 2110, taking the first-class style transformation frame and the second-class style transformation frame as a real-time video processing result.
In this embodiment, after a set number of second-type style conversion frames are continuously generated according to the face keypoint conversion matrix, a new current processing frame is obtained from the video frame set, and the process returns to execute step 230, and the above process is repeated to generate a first-type style conversion frame or a second-type style conversion frame matched with the unprocessed video frames in the video frame set.
In this embodiment, step 290 and step 2110 may be performed in parallel by two threads, that is, implemented in one thread, to obtain a new currently processed frame from the video frame set, and obtain a first-type style transformation frame matching the currently processed frame according to the face style transformation model, and if the obtained first-type style transformation frame and second-type style transformation frame satisfy the processing condition, the first-type style transformation frame and second-type style transformation frame may be processed in another thread at the same time.
In this embodiment, taking the first-type style transformation frame and the second-type style transformation frame as the real-time video processing result may include: and recording and playing the first type style conversion frame and the second type style conversion frame in real time to generate a recorded video, or generating a live broadcast video stream according to the first type style conversion frame and the second type style conversion frame, and sending the live broadcast video stream to a live broadcast server for live broadcast.
Optionally, the recording and playing the first-type style conversion frame and the second-type style conversion frame in real time may include: sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue; and when the preset hard delay condition is met in the cache queue, sequentially acquiring the first type of style conversion frames or the second type of style conversion frames from the cache queue for real-time recording and playing.
In this optional embodiment, in a video recording scene, the first type of style conversion frames and the second type of style conversion frames generated in real time may be stored in a set buffer queue according to a generation time sequence, and when the number of style conversion frames in the buffer queue reaches a preset number, for example, 20, or style conversion frames are obtained from the buffer queue last time for playing, and an interval time is equal to a preset time, for example, 3 seconds, the first type of style conversion frames or the second type of style conversion frames are obtained from the buffer queue in sequence for real-time recording and playing.
Optionally, generating a live video stream according to the first-type style conversion frame and the second-type style conversion frame, and sending the live video stream to a live server, where the generating may include: sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue; and when the cache queue meets a preset hard delay condition, sequentially acquiring a first type of style conversion frame or a second type of style conversion frame from the cache queue to generate a live broadcast video stream, and sending the live broadcast video stream to a live broadcast server.
In this optional embodiment, in a live video scene, the first-type style conversion frames and the second-type style conversion frames generated in real time may be stored in a set buffer queue according to a generation time sequence, and when the number of style conversion frames in the buffer queue reaches a preset number, for example, 20, or style conversion frames are obtained from the buffer queue last time and played, and an interval time is equal to a preset time, for example, 3 seconds, the first-type style conversion frames or the second-type style conversion frames are obtained sequentially from the buffer queue, so as to generate a live video stream, and the live video stream is sent to a live broadcast server.
In this embodiment, the storing of the style conversion frames and the obtaining of the style conversion frames from the buffer queue for playing may be divided into two threads to be executed in parallel, that is, when the first type of style conversion frames or the second type of style conversion frames are obtained, the obtained style conversion frames may be sequentially stored in the buffer queue in one thread, and if the number of style conversion frames in the buffer queue reaches the preset number, or the interval time from the last obtaining of the style conversion frames from the buffer queue is equal to the preset time, the operation of sequentially obtaining the style conversion frames from the buffer queue may be simultaneously implemented in another thread.
According to the technical scheme, the face style conversion model capable of generating the style face highly matched with the real face is obtained by performing model training on the preset machine learning model, then the face style conversion mode according to the face style conversion model is combined with the face style conversion mode according to the face key point transformation matrix, and the style face matched with the real face in the video frame is generated in real time by alternately using the two modes, so that the problem that the style face matched with the real face in the video cannot be generated in real time in the prior art is solved.
Third embodiment
Fig. 3 is a schematic structural diagram of a real-time video processing apparatus in a third embodiment of the present application, where the real-time video processing apparatus includes: an acquisition module 310, a first transformation module 320, a second transformation module 330, and a loop module 340.
An obtaining module 310, configured to add a video frame acquired in real time to a video frame set, and obtain a current processing frame from the video frame set, where the video frame includes a real face;
the first transformation module 320 is configured to input the current processed frame into a face style transformation model, and obtain a first type of style transformation frame output by the face style transformation model, where the style transformation frame includes a style face;
a second transformation module 330, configured to use the current processing frame and the first-class style transformation frame as starting points, and generate a set number of second-class style transformation frames according to a position relationship between each face key point in a set number of subsequent video frames and a previous video frame in a video frame set;
and the loop module 340 is configured to return to execute an operation of inputting the current image frame into the face style conversion model after acquiring a new current processing frame from the video frame set, and use the first-type style conversion frame and the second-type style conversion frame as real-time video processing results.
According to the technical scheme of the embodiment of the application, the video frames collected in real time are added into the video frame set, and the current processing frame including the real face is obtained from the video frame set; inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face; generating a set number of second-class style conversion frames by taking a current processing frame and a first-class style conversion frame as starting points according to the position relation between the key points of each face in a set number of subsequent video frames and a previous video frame in a video frame set; after a new current processing frame is acquired from the video frame set, the operation of inputting the current processing frame into the face style conversion model is returned to be executed, and the first type style conversion frame and the second type style conversion frame are used as real-time video processing results, so that a style face matched with the real face is generated in real time based on the real face in the video, and the problem that the style face matched with the real face in the video cannot be generated in real time in the prior art is solved.
Optionally, the second transforming module 330 includes:
the subsequent video frame acquisition unit is used for taking the current processing frame as a processing starting point frame and acquiring a subsequent video frame of the processing starting point frame in the video frame set;
the transformation matrix generating unit is used for generating a face key point transformation matrix according to the image positions of each face key point in the next video frame and the processing starting point frame in the corresponding video frame;
the second-class style conversion frame generating unit is used for generating a second-class style conversion frame of the next video frame according to the face key point conversion matrix and the first-class style conversion frame or the second-class style conversion frame matched with the processing starting point frame;
and the cyclic operation unit is used for taking the next video frame as a processing starting point frame, returning and executing the operation of acquiring the next video frame of the processing starting point frame in the video frame set until the processing quantity of the next video frame reaches the set quantity.
Optionally, the method further includes: the model training module is used for acquiring a training sample set before a current processing frame is input into the face style conversion model, wherein the training sample set comprises a plurality of sample image pairs, and each sample image pair comprises an original image and a transformed image;
setting a machine learning model for training by using each sample image in the training sample set to obtain the face style conversion model;
the original image comprises a real face, and the transformed image comprises a style face matched with the real face.
Optionally, the machine learning model includes: a countermeasure network is generated.
Optionally, the obtaining module 310 includes:
the recording acquisition unit is used for responding to a face style conversion request in the video recording process and adding the video frames acquired in real time into a video frame set;
wherein, the circulation module 340 includes:
and the recording and playing unit is used for recording and playing the first type style conversion frame and the second type style conversion frame in real time and generating a recorded video.
Optionally, the recording and playing unit is specifically configured to:
sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue;
and when the preset hard delay condition is met in the cache queue, sequentially acquiring the first type of style conversion frames or the second type of style conversion frames from the cache queue for real-time recording and playing.
Optionally, the obtaining module 310 includes:
the live broadcast acquisition unit is used for responding to a face style conversion request in the live broadcast process of the video and adding the video frames acquired in real time into a video frame set;
wherein, the circulation module 340 includes:
and the live broadcast playing unit is used for generating a live broadcast video stream according to the first type style conversion frame and the second type style conversion frame and sending the live broadcast video stream to a live broadcast server so as to carry out live video broadcast.
Optionally, the live broadcast unit is specifically configured to:
sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue;
and when the cache queue meets a preset hard delay condition, sequentially acquiring a first type of style conversion frame or a second type of style conversion frame from the cache queue to generate a live broadcast video stream, and sending the live broadcast video stream to a live broadcast server.
The real-time video processing device provided by the embodiment of the application can execute the real-time video processing method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.
Fourth embodiment
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 4, the electronic apparatus includes: one or more processors 401, memory 402, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor 401 is taken as an example.
Memory 402 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for processing real-time video provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the processing method of real-time video provided by the present application.
The memory 402, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 310, the first transformation module 320, the second transformation module 330, and the loop module 340 shown in fig. 3) corresponding to the real-time video processing method in the embodiment of the present application. The processor 401 executes various functional applications of the server and data processing, i.e., implements the processing method of the real-time video in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 402.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the processing electronics of the real-time video, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 402 optionally includes memory located remotely from processor 401, which may be connected to real-time video processing electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the processing method of the real-time video may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing electronics of the real-time video, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the video frames acquired in real time are added into the video frame set, and the current processing frame including the real face is acquired from the video frame set; inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face; generating a set number of second-class style conversion frames by taking a current processing frame and a first-class style conversion frame as starting points according to the position relation between the key points of each face in a set number of subsequent video frames and a previous video frame in a video frame set; after a new current processing frame is acquired from the video frame set, the operation of inputting the current processing frame into the face style conversion model is returned to be executed, and the first type style conversion frame and the second type style conversion frame are used as real-time video processing results, so that a style face matched with the real face is generated in real time based on the real face in the video, and the problem that the style face matched with the real face in the video cannot be generated in real time in the prior art is solved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (18)

1. A method of processing real-time video, comprising:
adding a video frame acquired in real time into a video frame set, and acquiring a current processing frame from the video frame set, wherein the video frame comprises a real human face;
inputting the current processing frame into a face style conversion model, and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face;
generating a set number of second-class style conversion frames by taking a current processing frame and a first-class style conversion frame as starting points according to the position relation between the key points of each face in a set number of subsequent video frames and a previous video frame in a video frame set;
and after acquiring a new current processing frame from the video frame set, returning to execute the operation of inputting the current processing frame into the face style conversion model, and taking the first type style conversion frame and the second type style conversion frame as real-time video processing results.
2. The method of claim 1, wherein generating a set number of second-type style transformation frames from the current processing frame and the first-type style transformation frame according to the position relationship between the face key points in a set number of subsequent video frames and a previous video frame in the video frame set comprises:
taking the current processing frame as a processing starting point frame, and acquiring a subsequent video frame of the processing starting point frame in the video frame set;
generating a face key point transformation matrix according to the image positions of each face key point in the subsequent video frame and the processing starting point frame in the corresponding video frame;
generating a second type style conversion frame of a subsequent video frame according to the face key point conversion matrix and the first type style conversion frame or the second type style conversion frame matched with the processing starting point frame;
and taking the next video frame as a processing starting point frame, returning and executing the operation of obtaining the next video frame of the processing starting point frame in the video frame set until the processing quantity of the next video frame reaches the set quantity.
3. The method of claim 1, further comprising, prior to inputting the current processed frame into the face style conversion model:
acquiring a training sample set, wherein the training sample set comprises a plurality of sample image pairs, and each sample image pair comprises an original image and a transformed image;
setting a machine learning model for training by using each sample image in the training sample set to obtain the face style conversion model;
the original image comprises a real face, and the transformed image comprises a style face matched with the real face.
4. The method of claim 3, wherein the machine learning model comprises: a countermeasure network is generated.
5. The method of any of claims 1-4, wherein adding the real-time captured video frames to the set of video frames comprises:
in the video recording process, responding to a face style conversion request, and adding a video frame acquired in real time into a video frame set;
the method for processing the real-time video by using the first-class style conversion frame and the second-class style conversion frame as a real-time video processing result comprises the following steps:
and recording and playing the first type style conversion frame and the second type style conversion frame in real time, and generating a recorded video.
6. The method of claim 5, wherein recording and playing the first type of frame and the second type of frame in real time comprises:
sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue;
and when the preset hard delay condition is met in the cache queue, sequentially acquiring the first type of style conversion frames or the second type of style conversion frames from the cache queue for real-time recording and playing.
7. The method of any of claims 1-4, wherein adding the real-time captured video frames to the set of video frames comprises:
in the video live broadcast process, responding to a face style conversion request, and adding a video frame acquired in real time into a video frame set;
the method for processing the real-time video by using the first-class style conversion frame and the second-class style conversion frame as a real-time video processing result comprises the following steps:
and generating a live broadcast video stream according to the first type style conversion frame and the second type style conversion frame, and sending the live broadcast video stream to a live broadcast server for live video broadcast.
8. The method of claim 7, wherein generating a live video stream from the first type of stylistic transformation frames and the second type of stylistic transformation frames and transmitting the live video stream to a live server comprises:
sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue;
and when the cache queue meets a preset hard delay condition, sequentially acquiring a first type of style conversion frame or a second type of style conversion frame from the cache queue to generate a live broadcast video stream, and sending the live broadcast video stream to a live broadcast server.
9. A device for processing real-time video, comprising:
the acquisition module is used for adding the video frames acquired in real time into the video frame set and acquiring the current processing frame from the video frame set, wherein the video frames comprise real faces;
the first conversion module is used for inputting the current processing frame into the face style conversion model and acquiring a first type of style conversion frame output by the face style conversion model, wherein the style conversion frame comprises a style face;
the second transformation module is used for generating the second type of style transformation frames with the set number according to the position relation between the key points of each face in the subsequent video frames and the previous video frames with the set number in the video frame set by taking the current processing frame and the first type of style transformation frames as starting points;
and the circulating module is used for returning and executing the operation of inputting the current image frame into the face style conversion model after acquiring a new current processing frame from the video frame set, and taking the first type style conversion frame and the second type style conversion frame as real-time video processing results.
10. The apparatus of claim 9, wherein the second transformation module comprises:
the subsequent video frame acquisition unit is used for taking the current processing frame as a processing starting point frame and acquiring a subsequent video frame of the processing starting point frame in the video frame set;
the transformation matrix generating unit is used for generating a face key point transformation matrix according to the image positions of each face key point in the next video frame and the processing starting point frame in the corresponding video frame;
the second-class style conversion frame generating unit is used for generating a second-class style conversion frame of the next video frame according to the face key point conversion matrix and the first-class style conversion frame or the second-class style conversion frame matched with the processing starting point frame;
and the cyclic operation unit is used for taking the next video frame as a processing starting point frame, returning and executing the operation of acquiring the next video frame of the processing starting point frame in the video frame set until the processing quantity of the next video frame reaches the set quantity.
11. The apparatus of claim 9, further comprising:
the model training module is used for acquiring a training sample set before a current processing frame is input into the face style conversion model, wherein the training sample set comprises a plurality of sample image pairs, and each sample image pair comprises an original image and a transformed image;
setting a machine learning model for training by using each sample image in the training sample set to obtain the face style conversion model;
the original image comprises a real face, and the transformed image comprises a style face matched with the real face.
12. The apparatus of claim 11, wherein the machine learning model comprises: a countermeasure network is generated.
13. The apparatus of any of claims 9-12, wherein the means for obtaining comprises:
the recording acquisition unit is used for responding to a face style conversion request in the video recording process and adding the video frames acquired in real time into a video frame set;
wherein, the circulation module includes:
and the recording and playing unit is used for recording and playing the first type style conversion frame and the second type style conversion frame in real time and generating a recorded video.
14. The apparatus according to claim 13, wherein the recording and playing unit is specifically configured to:
sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue;
and when the preset hard delay condition is met in the cache queue, sequentially acquiring the first type of style conversion frames or the second type of style conversion frames from the cache queue for real-time recording and playing.
15. The apparatus of any of claims 9-12, wherein the means for obtaining comprises:
the live broadcast acquisition unit is used for responding to a face style conversion request in the live broadcast process of the video and adding the video frames acquired in real time into a video frame set;
wherein, the circulation module includes:
and the live broadcast playing unit is used for generating a live broadcast video stream according to the first type style conversion frame and the second type style conversion frame and sending the live broadcast video stream to a live broadcast server so as to carry out live video broadcast.
16. The apparatus of claim 15, wherein the live playback unit is specifically configured to:
sequentially storing the first-class style conversion frame and the second-class style conversion frame which are generated in real time in a set buffer queue;
and when the cache queue meets a preset hard delay condition, sequentially acquiring a first type of style conversion frame or a second type of style conversion frame from the cache queue to generate a live broadcast video stream, and sending the live broadcast video stream to a live broadcast server.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
CN202010537321.0A 2020-06-12 2020-06-12 Real-time video processing method, device and equipment and storage medium Active CN111669647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010537321.0A CN111669647B (en) 2020-06-12 2020-06-12 Real-time video processing method, device and equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010537321.0A CN111669647B (en) 2020-06-12 2020-06-12 Real-time video processing method, device and equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111669647A true CN111669647A (en) 2020-09-15
CN111669647B CN111669647B (en) 2022-11-25

Family

ID=72387518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010537321.0A Active CN111669647B (en) 2020-06-12 2020-06-12 Real-time video processing method, device and equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111669647B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082300A (en) * 2022-07-22 2022-09-20 中国科学技术大学 Training method of image generation model, image generation method and device
CN116112761A (en) * 2023-04-12 2023-05-12 海马云(天津)信息技术有限公司 Method and device for generating virtual image video, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106920212A (en) * 2015-12-24 2017-07-04 掌赢信息科技(上海)有限公司 A kind of method and electronic equipment for sending stylized video
CN108564127A (en) * 2018-04-19 2018-09-21 腾讯科技(深圳)有限公司 Image conversion method, device, computer equipment and storage medium
CN109147017A (en) * 2018-08-28 2019-01-04 百度在线网络技术(北京)有限公司 Dynamic image generation method, device, equipment and storage medium
US20190340419A1 (en) * 2018-05-03 2019-11-07 Adobe Inc. Generation of Parameterized Avatars

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106920212A (en) * 2015-12-24 2017-07-04 掌赢信息科技(上海)有限公司 A kind of method and electronic equipment for sending stylized video
CN108564127A (en) * 2018-04-19 2018-09-21 腾讯科技(深圳)有限公司 Image conversion method, device, computer equipment and storage medium
US20190340419A1 (en) * 2018-05-03 2019-11-07 Adobe Inc. Generation of Parameterized Avatars
CN109147017A (en) * 2018-08-28 2019-01-04 百度在线网络技术(北京)有限公司 Dynamic image generation method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082300A (en) * 2022-07-22 2022-09-20 中国科学技术大学 Training method of image generation model, image generation method and device
CN115082300B (en) * 2022-07-22 2022-12-30 中国科学技术大学 Training method of image generation model, image generation method and device
CN116112761A (en) * 2023-04-12 2023-05-12 海马云(天津)信息技术有限公司 Method and device for generating virtual image video, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111669647B (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN112233210B (en) Method, apparatus, device and computer storage medium for generating virtual character video
CN112328816B (en) Media information display method and device, electronic equipment and storage medium
CN111225236B (en) Method and device for generating video cover, electronic equipment and computer-readable storage medium
CN112102448B (en) Virtual object image display method, device, electronic equipment and storage medium
CN110806865A (en) Animation generation method, device, equipment and computer readable storage medium
CN111738910A (en) Image processing method and device, electronic equipment and storage medium
CN111246257B (en) Video recommendation method, device, equipment and storage medium
CN111586459B (en) Method and device for controlling video playing, electronic equipment and storage medium
CN111860167A (en) Face fusion model acquisition and face fusion method, device and storage medium
CN111669647B (en) Real-time video processing method, device and equipment and storage medium
CN110636366A (en) Video playing processing method and device, electronic equipment and medium
CN111832613B (en) Model training method and device, electronic equipment and storage medium
CN111680517A (en) Method, apparatus, device and storage medium for training a model
CN111539897A (en) Method and apparatus for generating image conversion model
CN111935502A (en) Video processing method, video processing device, electronic equipment and storage medium
CN111524123A (en) Method and apparatus for processing image
CN112435313A (en) Method and device for playing frame animation, electronic equipment and readable storage medium
CN111918073B (en) Live broadcast room management method and device
CN111970560A (en) Video acquisition method and device, electronic equipment and storage medium
CN112329919A (en) Model training method and device
CN112382292A (en) Voice-based control method and device
CN111638787A (en) Method and device for displaying information
CN111428489A (en) Comment generation method and device, electronic equipment and storage medium
CN110493614A (en) Video control method, device and equipment
CN113542802B (en) Video transition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant