CN113727187B - Animation video processing method and device based on skeleton migration and related equipment - Google Patents

Animation video processing method and device based on skeleton migration and related equipment Download PDF

Info

Publication number
CN113727187B
CN113727187B CN202111011701.1A CN202111011701A CN113727187B CN 113727187 B CN113727187 B CN 113727187B CN 202111011701 A CN202111011701 A CN 202111011701A CN 113727187 B CN113727187 B CN 113727187B
Authority
CN
China
Prior art keywords
video
initial
image
target
skeleton
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111011701.1A
Other languages
Chinese (zh)
Other versions
CN113727187A (en
Inventor
马亿凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111011701.1A priority Critical patent/CN113727187B/en
Publication of CN113727187A publication Critical patent/CN113727187A/en
Application granted granted Critical
Publication of CN113727187B publication Critical patent/CN113727187B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application relates to an artificial intelligence technology, and provides an animation video processing method and device based on skeleton migration, computer equipment and a storage medium, wherein the method comprises the following steps: cutting the initial video to obtain an initial image video and an initial background video; extracting an image skeleton from the initial image video to obtain a skeleton video stream, and analyzing the skeleton video stream to obtain a conversion action instruction; loading the converted action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image; converting the initial audio information into an initial text, and modifying the initial text to obtain a target text; generating target audio information according to a preset voice packet and a target text; and fusing the target audio information, the image action video and the initial background video to obtain the target video. The method and the device can reduce the manufacturing cost of the animation video, improve the efficiency of video manufacturing, and promote the rapid development of the smart city.

Description

Animation video processing method and device based on skeleton migration and related equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for processing an animation video based on skeleton migration, a computer device, and a medium.
Background
In the age of short video prevalence, the daily internet surfing time of a client is gradually shifted to each short video platform from the previous WeChat public number. In order to contact clients, large insurance companies organize agents to shoot a plurality of videos such as insurance noun paraphrases, insurance idea importation, insurance product explanation and the like. However, for the production of short videos, a lot of manpower is needed in each step from script writing, script auditing, video shooting and video editing, the production cost is high, and the video production efficiency is low. Meanwhile, if the agent leaves the job once, the person in the video may no longer represent the original company, and there is a great risk in continuing to use the material.
In the process of implementing the present application, the applicant finds that the following technical problems exist in the prior art: at present, the industry has a solution for replacing the face of an original video figure by the face, but the solution is limited by the reasons of shooting light, angle and the like, and the synthetic effect of face fusion is not ideal; there is also a solution for producing animation video by customizing an animation image, but animation video needs to be drawn frame by frame, and reusability between scenes is not high, resulting in very high video production cost.
Therefore, it is necessary to provide a method for processing an animation video based on skeleton migration, which can reduce the production cost of the animation video and improve the efficiency of video production.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device and a medium for processing a motion picture video based on skeleton migration, which can reduce the production cost of the motion picture video and improve the efficiency of video production.
In a first aspect, an embodiment of the present application provides an animation video processing method based on skeleton migration, where the animation video processing method based on skeleton migration includes:
acquiring an initial video, and cutting the initial video to obtain an initial image video and an initial background video, wherein the initial image video comprises a plurality of image frameworks of an initial image;
extracting an image skeleton from the initial image video to obtain a skeleton video stream, and analyzing the skeleton video stream to obtain a conversion action instruction;
acquiring a target cartoon image, and loading the conversion action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image;
identifying initial audio information in the initial video, converting the initial audio information into an initial text, and modifying the initial text to obtain a target text;
acquiring a preset voice packet, and generating target audio information according to the preset voice packet and the target text;
and fusing the target audio information, the image action video and the initial background video to obtain a target video.
Further, in the above animation video processing method based on skeleton migration provided in an embodiment of the present application, the cutting the initial video to obtain an initial image video includes:
calling the trained image cutting algorithm to cut the initial video to obtain a plurality of frames of foreground region images, wherein the foreground region image of each frame comprises an image skeleton of an initial image;
acquiring a time stamp of each frame of the foreground area image;
and combining the foreground area images of each frame according to the sequence of the time stamps to obtain an initial image video.
Further, in the above animation video processing method based on skeleton migration provided in an embodiment of the present application, the extracting an image skeleton from the initial image video to obtain a skeleton video stream includes:
extracting an initial image picture from the initial image video according to a preset frame rate, and converting the initial image picture into a binary image;
selecting edge pixel points of the binary image and adjacent pixel points of the edge pixel points, and calculating distance values between the edge pixel points and the adjacent pixel points;
detecting whether the distance value is smaller than a preset distance value threshold value;
when the detection result shows that the distance value is smaller than the preset distance value threshold, determining and deleting target adjacent pixel points corresponding to the target distance value smaller than the preset distance threshold;
and repeating the steps to obtain key pixel points, connecting the key pixel points into lines to form a skeleton, and arranging and combining the skeleton according to the sequence of the timestamps to obtain the skeleton video stream.
Further, in the animation video processing method based on skeleton migration provided in an embodiment of the present application, the analyzing the skeleton video stream to obtain a conversion action instruction includes:
obtaining a skeleton contained in the skeleton video stream according to a preset time interval;
determining timestamp information of a skeleton in the skeleton video stream, node information contained in the skeleton and position information corresponding to the node information;
and storing the timestamp information, the node information and the position information according to a preset data format to obtain a conversion action instruction.
Further, in the above animation video processing method based on skeleton migration provided in an embodiment of the present application, the loading the conversion action instruction into the target avatar to obtain an image action video corresponding to the target avatar includes:
acquiring a first basic structure of the target cartoon image and a second basic structure of the initial image;
determining a mapping relationship between the first infrastructure and the second infrastructure;
acquiring the conversion action instruction corresponding to the second infrastructure;
and establishing the association between the first basic structure and the conversion action command according to the mapping relation to obtain an image action video corresponding to the target cartoon image.
Further, in the animation video processing method based on skeleton migration provided in the embodiment of the present application, the converting the initial audio information into an initial text, and modifying the initial text to obtain a target text includes:
calling a pre-trained background music denoising model to perform background music denoising processing on the initial audio information to obtain intermediate audio information;
converting the intermediate audio information into an initial text through an ASR model;
acquiring a preset text;
and modifying the initial text by using a preset text to obtain a target text.
Further, in the above animation video processing method based on skeleton migration provided in an embodiment of the present application, the fusing the target audio information, the avatar motion video, and the initial background video to obtain the target video includes:
acquiring coding information corresponding to the initial video;
determining a first position of a first preset keyword, a second position of a second preset keyword and a third position of a third preset keyword contained in the coded information;
respectively acquiring the content corresponding to the first position as initial audio information, the content corresponding to the second position as initial image video and the content corresponding to the third position as initial background video;
replacing the initial audio information with the target audio information and replacing the initial image video with the image action video respectively to obtain target coding information;
and executing the target coding information to obtain a target video.
The second aspect of the embodiments of the present application further provides an animation video processing apparatus based on skeleton migration, where the animation video processing apparatus based on skeleton migration includes:
the video cutting module is used for acquiring an initial video and cutting the initial video to obtain an initial image video and an initial background video, wherein the initial image video comprises a plurality of image frameworks of an initial image;
the instruction conversion module is used for extracting an image framework from the initial image video to obtain a framework video stream, and analyzing the framework video stream to obtain a conversion action instruction;
the instruction loading module is used for acquiring a target cartoon image and loading the converted action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image;
the text conversion module is used for identifying initial audio information in the initial video, converting the initial audio information into an initial text, and modifying the initial text to obtain a target text;
the audio generation module is used for acquiring a preset voice packet and generating target audio information according to the preset voice packet and the target text;
and the video fusion module is used for fusing the target audio information, the image action video and the initial background video to obtain a target video.
The third aspect of the embodiments of the present application further provides a computer device, where the computer device includes a processor, and the processor is configured to implement the animation video processing method based on skeleton migration as described in any one of the above when executing the computer program stored in the memory.
The fourth aspect of the embodiments of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for processing an animation video based on skeleton migration is implemented as any one of the above methods.
According to the animation video processing method based on skeleton migration, the animation video processing device based on skeleton migration, the computer equipment and the computer readable storage medium, through background cutting, skeleton extraction and AI image processing capacity of skeleton migration, action instructions corresponding to an initial image in the initial video are determined, and then the initial image is replaced by a target cartoon image to obtain an image action video, so that the making cost of the action video can be reduced, and the video making efficiency is improved; in addition, according to the method and the device, the initial audio information in the initial video is identified, the initial audio information is converted into the initial text, the text is modified to obtain the target text, the target text is converted into the target audio information by using the preset voice packet, the target audio information, the image action video and the initial background video are fused to obtain the target video, a large number of animation videos can be produced in batches at low cost, the video production efficiency is improved, and a large number of operating manpower is saved. This application can be applied to in each functional module in wisdom cities such as wisdom government affairs, wisdom traffic, for example the animation video processing module etc. based on skeleton migration of wisdom government affairs, can promote the rapid development in wisdom city.
Drawings
Fig. 1 is a flowchart of an animation video processing method based on skeleton migration according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a skeleton video stream according to an embodiment of the present application.
Fig. 3 is a block diagram of a motion picture video processing apparatus based on skeleton migration according to a second embodiment of the present application.
Fig. 4 is a schematic structural diagram of a computer device provided in the third embodiment of the present application.
The following detailed description will further illustrate the present application in conjunction with the above-described figures.
Detailed Description
In order that the above objects, features and advantages of the present application can be more clearly understood, a detailed description of the present application will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, and the described embodiments are a part, but not all, of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The animation video processing method based on the skeleton migration provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the animation video processing device based on the skeleton migration runs in the computer equipment. Fig. 1 is a flowchart of an animation video processing method based on skeleton migration according to a first embodiment of the present application. As shown in fig. 1, the animation video processing method based on skeleton migration may include the following steps, and according to different requirements, the order of the steps in the flowchart may be changed, and some of the steps may be omitted:
s11, obtaining an initial video, and cutting the initial video to obtain an initial image video and an initial background video, wherein the initial image video comprises a plurality of image frameworks of an initial image.
In at least one embodiment of the present application, the initial video may be a short video that is collected by system personnel from a related channel (e.g., a buffeting channel, a volcano channel, a micro-vision channel, etc.) and suitable for its subject, and the initial video or a video address corresponding to the initial video is saved in a preset database, where the preset database may be a target node in a block chain in consideration of reliability and privacy of data storage. The initial image video is a video containing an initial image, the initial image video comprises a plurality of image skeletons of the initial image, and the initial image can be a human image or an animation image. The initial background video refers to a video containing a background portion.
The initial video includes a foreground region and a background region, in an embodiment, the foreground region is a portion including an initial image, and the background region is a portion which remains still for a longer time and has less color change. The method and the device can cut the foreground area and the background area of the initial video, and take the video containing the foreground area as the initial image video.
Optionally, before the cutting the initial video to obtain the initial avatar video, the method further includes:
acquiring a frame image corresponding to the initial video, and dividing the frame image into an initial foreground area and an initial background area;
down-sampling the frame image to obtain a first image, and defining pixels belonging to an initial foreground area in the first image as an initial foreground area;
establishing an initial background Gaussian mixture model by using the pixels in the initial background area, and establishing an initial foreground Gaussian mixture model by using the pixels in the initial foreground area;
cutting the first image input image, and dividing a new foreground area and a new background area;
and optimizing the parameters of the foreground Gaussian mixture model by using the pixels in the new foreground region, and optimizing the parameters of the background Gaussian mixture model by using the pixels in the new background region until the graph cutting process is converged.
In which the animated image position of the foreground remains substantially fixed, so as to compare the change of the front and back frames of the video, an embodiment of the present application employs a background/foreground segmentation algorithm based on gaussian mixture, which uses a method to simulate a mixed K gaussian distribution (K is 3 to 5) for each pixel. The weight of the mixture represents the proportion of time these colors remain in the scene. And determining the segmentation limit of the front background and the rear background by adjusting the threshold.
Optionally, the cutting the initial video to obtain the initial avatar video includes:
calling a trained image cutting algorithm to cut the initial video to obtain a plurality of frames of foreground region images, wherein the foreground region image of each frame comprises an image skeleton of an initial image;
acquiring a time stamp of each frame of the foreground area image;
and combining the foreground area images of each frame according to the sequence of the time stamps to obtain an initial image video.
And S12, extracting an image skeleton from the initial image video to obtain a skeleton video stream, and analyzing the skeleton video stream to obtain a conversion action instruction.
In at least one embodiment of the present application, the initial character video refers to a video including an initial character, the initial character may perform some instruction actions, and the skeleton video stream refers to a video composed of a plurality of character skeletons. The image skeleton comprises a plurality of key pixel points, and the image skeleton is formed among the key pixel points in a connection mode. Referring to fig. 2, fig. 2 is a schematic diagram of an image skeleton corresponding to an initial image under a certain timestamp executing a corresponding instruction action, and the image skeleton shown in fig. 2 is composed of 14 key pixel points, and the key pixel points are connected by lines. The initial image video comprises a plurality of image frameworks, each image framework corresponds to a unique timestamp, and the image frameworks are arranged and combined according to the sequence of the timestamps to obtain the framework video stream.
Optionally, extracting an avatar skeleton from the initial avatar video to obtain a skeleton video stream includes:
extracting an initial image picture from the initial image video according to a preset frame rate, and converting the initial image picture into a binary image;
selecting edge pixel points of the binary image and adjacent pixel points of the edge pixel points, and calculating distance values between the edge pixel points and the adjacent pixel points;
detecting whether the distance value is smaller than a preset distance value threshold value or not;
when the detection result is that the distance value is smaller than the preset distance value threshold, determining and deleting target adjacent pixel points corresponding to the target distance value smaller than the preset distance threshold;
and repeating the steps to obtain key pixel points, connecting the key pixel points into lines to form a skeleton, and arranging and combining the skeleton according to the sequence of the timestamps to obtain the skeleton video stream.
In one embodiment, a Khalid sheet K3M algorithm is adopted, the skeleton extraction of the character image is realized by iteratively corroding edges, the iterative idea of the algorithm is that an image is converted into a binary image, each point pixel point is checked with 9 adjacent pixel points from the edge, the similar pixel points are deleted, only different characteristic points are reserved, key points of the object are gradually thinned, and the key points are connected in a line to form the skeleton.
The preset distance value threshold is a preset value used for evaluating the distance between the pixel points, and it can be understood that when the distance value between two pixel points is smaller than the preset distance value threshold, the two pixel points are determined to be close; and when the distance value between the two pixel points is larger than the preset distance value threshold, determining that the two pixel points are not similar. Each frame of initial image picture in the initial image video corresponds to a framework, the frameworks contain timestamp information, the timestamp information contained by different frameworks is inquired, and the frameworks are arranged and combined according to the sequence of the timestamps to obtain a framework video stream.
Optionally, the parsing the skeleton video stream to obtain a conversion action instruction includes:
obtaining a skeleton contained in the skeleton video stream according to a preset time interval;
determining timestamp information of a skeleton in the skeleton video stream, node information contained in the skeleton and position information corresponding to the node information;
and storing the timestamp information, the node information and the position information according to a preset data format to obtain a conversion action instruction.
The preset time interval is a preset time interval for extracting a skeleton from the skeleton video stream, for example, the preset time interval may be 1.5 seconds, which is not limited herein. And each framework comprises timestamp information, node information and position information corresponding to the node information. The preset data format is a preset format for arranging the timestamp, the node information, and the position information, for example, the preset data format may be { time + skeleton { node 1, node 2, node 3, \8230;, node n } + position }.
S13, obtaining a target cartoon image, and loading the conversion action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image.
In at least one embodiment of the present application, the target cartoon may be positioned by the system personnel according to their own brand to make a 2D cartoon, and the target cartoon needs to satisfy T-shaped skeleton characteristics, for example, the target cartoon needs to have a head, limbs, trunk, and other basic structures. The character working video is a video in which an initial character in the initial video is replaced by a target cartoon character, and the target cartoon character in the video can execute the action of the initial character. In an embodiment, the initial figure is also provided with a head, limbs, torso, etc. infrastructure. And a mapping relation exists between the basic structures of the initial image and the target cartoon image.
Optionally, the loading the conversion action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image includes:
acquiring a first basic structure of the target cartoon image and a second basic structure of the initial image;
determining a mapping relationship between the first infrastructure and the second infrastructure;
acquiring the conversion action instruction corresponding to the second infrastructure;
and establishing the association between the first basic structure and the conversion action instruction according to the mapping relation to obtain an image action video corresponding to the target cartoon image.
The first foundation structure comprises structure nodes such as a head, four limbs and a trunk, the second foundation structure comprises structure nodes such as a head, four limbs and a trunk, and a mapping relation exists between the structure nodes of the first foundation structure and the structure nodes of the second foundation structure at the similar positions. And for each structure node, controlling each structure node in the second basic structure through the corresponding conversion action instruction, so that the initial image can complete corresponding action. And establishing the association between the first basic structure and the conversion action command through the mapping relation, so that the conversion action command can control related structure nodes in the first basic structure, and finally the target cartoon image completes corresponding action, thereby obtaining an image action video corresponding to the target cartoon image.
And S14, identifying initial audio information in the initial video, converting the initial audio information into an initial text, and modifying the initial text to obtain a target text.
In at least one embodiment of the present application, the initial video further includes initial audio information, and the initial audio information is converted into an initial text, so that the initial text is modified conveniently to obtain a target text.
Optionally, the identifying the initial audio information in the initial video includes:
acquiring coding information corresponding to the initial video;
detecting whether the coded information contains preset keywords or not;
and when the detection result shows that the coded information contains the preset keyword, determining the position of the preset keyword in the coded information, and determining the information at the position as initial audio information.
For the initial video, corresponding coding information exists, the coding information comprises video coding information and audio coding information, and the video coding information and the audio coding information can be distinguished by adding preset keywords. The preset keywords are preset keywords for distinguishing audio information.
Optionally, the converting the initial audio information into an initial text, and modifying the initial text to obtain a target text includes:
calling a pre-trained background music denoising model to perform background music denoising processing on the initial audio information to obtain intermediate audio information;
converting the intermediate audio information into an initial text through an ASR model;
acquiring a preset text;
and modifying the initial text by using a preset text to obtain a target text.
The method and the device have the advantages that the ASR model is used for recognizing the audio in the video and then converting the recognized audio into the text, and the ASR model is used for recognizing the audio to obtain the text, which belongs to the prior art and is not repeated herein. Considering that the background music in the partial video may affect the recognition accuracy, a background music denoising model is also added to ensure that the text output by recognition is coherent and readable text. The preset text can be a text for modifying the initial text, the preset text comprises a modification position and modification content, the position of the initial text is determined according to the modification position, and the content at the position is replaced by the modification content, so that a target text is obtained.
S15, acquiring a preset voice packet, and generating target audio information according to the preset voice packet and the target text.
In at least one embodiment of the present application, the preset voice packet is a voice packet that is preset by a system and contains different tone information, and the target audio information is audio information formed by reading the target text according to a tone in the preset voice packet. An embodiment of the application provides a TTS model, and a new video audio track is generated through the TTS model.
And S16, fusing the target audio information, the image action video and the initial background video to obtain a target video.
In at least one embodiment of the present application, target audio information, the visual action video, and the initial background video are fused to obtain a target video.
Optionally, the fusing the target audio information, the avatar motion video, and the initial background video to obtain the target video includes:
acquiring coding information corresponding to the initial video;
determining a first position of a first preset keyword, a second position of a second preset keyword and a third position of a third preset keyword contained in the coded information;
respectively acquiring the content corresponding to the first position as initial audio information, the content corresponding to the second position as initial image video and the content corresponding to the third position as initial background video;
replacing the initial audio information with the target audio information and replacing the initial image video with the image action video to obtain target coding information;
and executing the target coding information to obtain a target video. The first preset keyword, the second preset keyword and the third preset keyword are preset keywords for identifying the initial audio information, the initial image video and the initial background video. In the coding information, the initial audio information, the initial avatar video and the initial background video may be written in a form of video link, so that when a video replacement operation is performed, the video replacement operation may be performed in a video link replacement manner to obtain target coding information, and the target coding information is processed to obtain a target video.
According to the animation video processing method based on skeleton migration, provided by the embodiment of the application, through background cutting, skeleton extraction and AI image processing capacity of skeleton migration, action instructions corresponding to an initial image in the initial video are determined, and then the initial image is replaced by a target cartoon image to obtain an image action video, so that the production cost of the action video can be reduced, and the video production efficiency is improved; in addition, the method and the device have the advantages that the initial audio information in the initial video is identified, the initial audio information is converted into the initial text, the target text is obtained by modifying the text, the target text is converted into the target audio information by the aid of the preset voice packet, the target audio information, the image action video and the initial background video are fused to obtain the target video, a large number of animation videos can be produced in batches at low cost, the video production efficiency is improved, and a large amount of operation manpower is saved. This application can be applied to in each functional module in wisdom cities such as wisdom government affairs, wisdom traffic, for example the animation video processing module etc. based on skeleton migration of wisdom government affairs, can promote the rapid development in wisdom city.
Fig. 3 is a block diagram of a motion picture video processing apparatus based on skeleton migration according to the second embodiment of the present application.
In some embodiments, the animation video processing device 20 based on skeleton migration may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the animation video processing device 20 based on skeleton migration may be stored in a memory of a computer device and executed by at least one processor to perform (see fig. 1 for details) the functions of the animation video processing based on skeleton migration.
In this embodiment, the animation video processing device 20 based on skeleton migration may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: the video fusion module comprises a video cutting module 201, an instruction conversion module 202, an instruction loading module 203, a text conversion module 204, an audio generation module 205 and a video fusion module 206. A module as referred to herein is a sequence of computer program segments capable of being executed by at least one processor and of performing a fixed function and stored in a memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The video cutting module 201 may be configured to obtain an initial video, and cut the initial video to obtain an initial image video and an initial background video, where the initial image video includes a plurality of image skeletons of an initial image.
In at least one embodiment of the present application, the initial video may be a short video that is collected by system personnel from a related channel (e.g., a buffeting channel, a volcano channel, a micro-vision channel, etc.) and suitable for its subject, and the initial video or a video address corresponding to the initial video is saved in a preset database, where the preset database may be a target node in a block chain in consideration of reliability and privacy of data storage. The initial image video is a video containing an initial image, the initial image video comprises a plurality of image skeletons of the initial image, and the initial image can be a human image or an animation image. The initial background video refers to a video that includes a background portion.
The initial video comprises a foreground part and a background part, wherein in one embodiment, the foreground part is a part comprising an initial image, and the background part is a part which is kept still for a longer time and has less color change. The method and the device can obtain the initial image video by cutting the foreground part and the background part of the initial video.
Optionally, before the cutting the initial video to obtain the initial avatar video, the method further includes:
acquiring a frame image corresponding to the initial video, and dividing the frame image into an initial foreground area and an initial background area;
down-sampling the frame image to obtain a first image, and defining pixels belonging to an initial foreground area in the first image as an initial foreground area;
establishing an initial background Gaussian mixture model by using the pixels in the initial background area, and establishing an initial foreground Gaussian mixture model by using the pixels in the initial foreground area;
cutting the first image input image, and dividing a new foreground area and a new background area;
and optimizing the parameters of the foreground Gaussian mixture model by using the pixels in the new foreground region, and optimizing the parameters of the background Gaussian mixture model by using the pixels in the new background region until the graph cutting process is converged.
In which the animated image position of the foreground remains substantially fixed, so that as long as comparing the change of the front and back frames of the video, an embodiment of the present application employs a background/foreground segmentation algorithm based on gaussian mixture, which uses a method to simulate a mixture of K gaussian distributions (K is 3 to 5) for each pixel. The weight of the mixture represents the proportion of time these colors remain in the scene. And determining the segmentation limit of the front background and the rear background by adjusting the threshold.
Optionally, the cutting the initial video to obtain an initial image video includes:
calling the trained image cutting algorithm to cut the initial video to obtain a plurality of frames of foreground area images;
acquiring a timestamp of each frame of the foreground area image;
and combining the foreground area images of each frame according to the sequence of the time stamps to obtain an initial image video.
The instruction conversion module 202 may be configured to extract an image skeleton from the initial image video to obtain a skeleton video stream, and analyze the skeleton video stream to obtain a conversion action instruction.
In at least one embodiment of the present application, the initial character video refers to a video including an initial character, the initial character performs some command actions, and the skeleton video stream refers to a video composed of a plurality of character skeletons.
Optionally, the extracting an avatar skeleton from the initial avatar video to obtain a skeleton video stream includes:
extracting an initial image picture from the initial image video according to a preset frame rate, and converting the initial image picture into a binary image;
selecting edge pixel points of the binary image and adjacent pixel points of the edge pixel points, and calculating distance values between the edge pixel points and the adjacent pixel points;
detecting whether the distance value is smaller than a preset distance value threshold value or not;
when the detection result shows that the distance value is smaller than the preset distance value threshold, determining and deleting target adjacent pixel points corresponding to the target distance value smaller than the preset distance threshold;
and repeating the steps to obtain key pixel points, connecting the key pixel points into lines to form a skeleton, and arranging and combining the skeleton according to the sequence of the timestamps to obtain the skeleton video stream.
In one embodiment, a Khalid sheet K3M algorithm is adopted, skeleton extraction on human images is achieved through iterative erosion of edges, the iterative thought of the algorithm is that images are converted into binary images, each pixel point and 9 adjacent pixel points are checked from the edges, the adjacent pixel points are deleted, only different characteristic points are reserved, key points of objects are gradually thinned, and the key points are connected into lines to form a skeleton.
The preset distance value threshold is a preset value used for evaluating the distance between the pixel points, and it can be understood that when the distance value between two pixel points is smaller than the preset distance value threshold, the two pixel points are determined to be close; and when the distance value between the two pixel points is greater than the preset distance value threshold, determining that the two pixel points are not similar. Each frame of initial image picture in the initial image video corresponds to a framework, the frameworks contain timestamp information, the timestamp information contained by different frameworks is inquired, and the frameworks are arranged and combined according to the sequence of the timestamps to obtain a framework video stream.
Optionally, the parsing the skeleton video stream to obtain a conversion action instruction includes:
obtaining a skeleton contained in the skeleton video stream according to a preset time interval;
determining timestamp information of a skeleton in the skeleton video stream, node information contained in the skeleton and position information corresponding to the node information;
and storing the timestamp information, the node information and the position information according to a preset data format to obtain a conversion action instruction.
The preset time interval is a preset time interval for extracting a skeleton from the skeleton video stream, for example, the preset time interval may be 1.5 seconds, which is not limited herein. And each framework comprises timestamp information, node information and position information corresponding to the node information. The preset data format is a preset format for arranging the timestamp, the node information, and the position information, for example, the preset data format may be { time + skeleton { node 1, node 2, node 3, \8230;, node n } + position }.
The instruction loading module 203 can be used for acquiring a target cartoon image and loading the conversion action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image.
In at least one embodiment of the present application, the target cartoon image may be positioned by a system person according to its own brand to make a 2D cartoon image, and the target cartoon image needs to satisfy T-shaped skeleton features, for example, the target cartoon image needs to have a head, four limbs, a trunk, and other basic structures. The image working video is a video in which an initial image in the initial video is replaced by a target cartoon image, and the target cartoon image in the video can execute the action of the initial image. In an embodiment, the initial figure is also provided with a head, limbs, torso, etc. infrastructure. And a mapping relation exists between the basic structures of the initial image and the target cartoon image.
Optionally, the loading the conversion action instruction into the target avatar to obtain an avatar action video corresponding to the target avatar includes:
acquiring a first basic structure of the target cartoon image and a second basic structure of the initial image;
determining a mapping relationship between the first infrastructure and the second infrastructure;
acquiring the conversion action instruction corresponding to the second infrastructure;
and establishing the association between the first basic structure and the conversion action command according to the mapping relation to obtain an image action video corresponding to the target cartoon image.
The first foundation structure comprises structure nodes such as a head, four limbs and a trunk, the second foundation structure comprises structure nodes such as a head, four limbs and a trunk, and a mapping relation exists between the structure nodes of the first foundation structure and the structure nodes of the second foundation structure at the similar positions. And for each structure node, controlling each structure node in the second basic structure through the corresponding conversion action instruction, so that the initial image can complete corresponding action. And establishing the association between the first basic structure and the conversion action instruction through the mapping relation, so that the conversion action instruction can control related structure nodes in the first basic structure, and finally the target cartoon image completes corresponding actions, thereby obtaining an image action video corresponding to the target cartoon image.
The text conversion module 204 may be configured to identify initial audio information in the initial video, convert the initial audio information into an initial text, and modify the initial text to obtain a target text.
In at least one embodiment of the present application, the initial video further includes initial audio information, and the initial audio information is converted into an initial text, so that the initial text is modified conveniently to obtain a target text.
Optionally, the identifying the initial audio information in the initial video includes:
acquiring coding information corresponding to the initial video;
detecting whether the coded information contains a preset keyword or not;
and when the detection result shows that the coded information contains the preset keyword, determining the position of the preset keyword in the coded information, and determining the information at the position as initial audio information.
For the initial video, corresponding coding information exists, the coding information comprises video coding information and audio coding information, and the video coding information and the audio coding information can be distinguished in a mode of adding preset keywords. The preset keywords are preset keywords for distinguishing audio information.
Optionally, the converting the initial audio information into an initial text, and modifying the initial text to obtain a target text includes:
calling a pre-trained background music denoising model to perform background music denoising processing on the initial audio information to obtain intermediate audio information;
converting the intermediate audio information into an initial text through an ASR model;
acquiring a preset text;
and modifying the initial text by using a preset text to obtain a target text.
The method and the device have the advantages that the ASR model is used for recognizing the audio in the video and then converting the recognized audio into the text, and the ASR model is used for recognizing the audio to obtain the text, which belongs to the prior art and is not repeated herein. Considering that the background music in the partial video may affect the recognition accuracy, a background music denoising model is also added to ensure that the text output by recognition is coherent and readable text. The preset text can be a text for modifying the initial text, the preset text comprises a modification position and modification content, the position of the initial text is determined according to the modification position, and the content at the position is replaced by the modification content, so that a target text is obtained.
The audio generating module 205 may be configured to obtain a preset voice packet, and generate target audio information according to the preset voice packet and the target text.
In at least one embodiment of the present application, the preset voice packet is a voice packet that is preset by a system and contains different timbre information, and the target audio information is audio information formed by reading the target text according to the timbre in the preset voice packet. An embodiment of the application provides a TTS model, and a new video audio track is generated through the TTS model.
The video fusion module 206 may be configured to fuse the target audio information, the visual action video, and the initial background video to obtain a target video.
In at least one embodiment of the present application, target audio information, the visual action video, and the initial background video are fused to obtain a target video.
Optionally, the fusing the target audio information, the avatar motion video, and the initial background video to obtain the target video includes:
acquiring coding information corresponding to the initial video;
determining positions of a first preset keyword, a second preset keyword and a third preset keyword which are contained in the coded information;
respectively acquiring the position content of the first preset keyword as initial audio information, the position content of the second preset keyword as an initial image video and the position content of the third preset keyword as an initial background video;
replacing the initial audio information with the target audio information and replacing the initial image video with the image action video respectively to obtain target coding information;
and executing the target coding information to obtain a target video.
The first preset keyword, the second preset keyword and the third preset keyword are preset keywords for identifying the initial audio information, the initial image video and the initial background video. In the coding information, the initial audio information, the initial avatar video and the initial background video may be written in a form of video link, so that when a video replacement operation is performed, the video replacement operation may be performed in a video link replacement manner to obtain target coding information, and the target coding information is processed to obtain a target video.
Fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present application. In the preferred embodiment of the present application, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 4 is not limited to the embodiment of the present application, and may be a bus-type configuration or a star-type configuration, and the computer device 3 may include more or less hardware or software than those shown in the figures, or different component arrangements.
In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to instructions set in advance or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.
It should be noted that the computer device 3 is only an example, and other existing or future electronic products, such as those that may be adapted to the present application, are also included in the scope of the present application and are incorporated herein by reference.
In some embodiments, the memory 31 has stored therein a computer program which, when executed by the at least one processor 32, implements all or part of the steps of the skeleton migration based animation video processing method as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the animation video processing method based on skeleton migration described in the embodiments of the present application; or realize all or part of the functions of the animation video processing device based on the skeleton migration. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims (10)

1. An animation video processing method based on skeleton migration is characterized by comprising the following steps:
acquiring an initial video, and cutting the initial video to obtain an initial image video and an initial background video, wherein the initial image video comprises a plurality of image frameworks of an initial image;
extracting an image skeleton from the initial image video to obtain a skeleton video stream, and analyzing the skeleton video stream to obtain a conversion action instruction;
acquiring a target cartoon image, and loading the conversion action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image;
identifying initial audio information in the initial video, converting the initial audio information into an initial text, and modifying the initial text to obtain a target text;
acquiring a preset voice packet, and generating target audio information according to the preset voice packet and the target text;
and fusing the target audio information, the image action video and the initial background video to obtain a target video.
2. The method for processing animated video based on skeleton migration according to claim 1, wherein said cutting the initial video to obtain an initial image video comprises:
calling a trained image cutting algorithm to cut the initial video to obtain a plurality of frames of foreground area images, wherein the foreground area image of each frame comprises an image skeleton of an initial image;
acquiring a timestamp of each frame of the foreground area image;
and combining the foreground area images of each frame according to the sequence of the time stamps to obtain an initial image video.
3. The method of claim 1, wherein the extracting of the visual skeleton from the initial visual video to obtain the skeleton video stream comprises:
step 1, extracting an initial image picture from the initial image video according to a preset frame rate, and converting the initial image picture into a binary image;
step 2, selecting edge pixel points of the binary image and adjacent pixel points of the edge pixel points, and calculating distance values between the edge pixel points and the adjacent pixel points;
step 3, detecting whether the distance value is smaller than a preset distance value threshold value;
step 4, when the detection result shows that the distance value is smaller than the preset distance value threshold, determining and deleting target adjacent pixel points corresponding to the target distance value smaller than the preset distance value threshold;
and (5) repeating the steps 2-4 to obtain key pixel points, connecting the key pixel points into lines to form a skeleton, and arranging and combining the skeleton according to the sequence of the timestamps to obtain a skeleton video stream.
4. The method according to claim 1, wherein the parsing the skeleton video stream to obtain a conversion action command comprises:
obtaining a skeleton contained in the skeleton video stream according to a preset time interval;
determining timestamp information of a skeleton in the skeleton video stream, node information contained in the skeleton and position information corresponding to the node information;
and storing the timestamp information, the node information and the position information according to a preset data format to obtain a conversion action instruction.
5. The method for processing animation video based on skeleton migration according to claim 1, wherein the loading the conversion action command into the target cartoon character to obtain the character action video corresponding to the target cartoon character comprises:
acquiring a first basic structure of the target cartoon image and a second basic structure of the initial image;
determining a mapping relationship between the first infrastructure and the second infrastructure;
acquiring the conversion action instruction corresponding to the second infrastructure;
and establishing the association between the first basic structure and the conversion action instruction according to the mapping relation to obtain an image action video corresponding to the target cartoon image.
6. The method of claim 1, wherein the converting the initial audio information into an initial text and modifying the initial text to obtain a target text comprises:
calling a pre-trained background music denoising model to perform background music denoising processing on the initial audio information to obtain intermediate audio information;
converting the intermediate audio information into an initial text through an ASR model;
acquiring a preset text;
and modifying the initial text by using a preset text to obtain a target text.
7. The method of claim 1, wherein the fusing the target audio information, the visual action video and the initial background video to obtain the target video comprises:
acquiring coding information corresponding to the initial video;
determining a first position containing a first preset keyword, a second position containing a second preset keyword and a third position containing a third preset keyword in the coded information;
respectively acquiring the content corresponding to the first position as initial audio information, the content corresponding to the second position as an initial image video and the content corresponding to the third position as an initial background video;
replacing the initial audio information with the target audio information and replacing the initial image video with the image action video respectively to obtain target coding information;
and executing the target coding information to obtain a target video.
8. An animation video processing device based on skeleton migration, characterized in that the animation video processing device based on skeleton migration comprises:
the video cutting module is used for obtaining an initial video and cutting the initial video to obtain an initial image video and an initial background video, wherein the initial image video comprises a plurality of image frameworks of an initial image;
the instruction conversion module is used for extracting an image framework from the initial image video to obtain a framework video stream, and analyzing the framework video stream to obtain a conversion action instruction;
the instruction loading module is used for acquiring a target cartoon image and loading the converted action instruction into the target cartoon image to obtain an image action video corresponding to the target cartoon image;
the text conversion module is used for identifying initial audio information in the initial video, converting the initial audio information into an initial text, and modifying the initial text to obtain a target text;
the audio generation module is used for acquiring a preset voice packet and generating target audio information according to the preset voice packet and the target text;
and the video fusion module is used for fusing the target audio information, the image action video and the initial background video to obtain a target video.
9. A computer device, characterized in that the computer device comprises a processor and a memory, the processor being configured to implement the method for processing animated video based on skeletal migration according to any of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the method for processing animation video based on skeleton migration according to any one of claims 1 to 7.
CN202111011701.1A 2021-08-31 2021-08-31 Animation video processing method and device based on skeleton migration and related equipment Active CN113727187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111011701.1A CN113727187B (en) 2021-08-31 2021-08-31 Animation video processing method and device based on skeleton migration and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111011701.1A CN113727187B (en) 2021-08-31 2021-08-31 Animation video processing method and device based on skeleton migration and related equipment

Publications (2)

Publication Number Publication Date
CN113727187A CN113727187A (en) 2021-11-30
CN113727187B true CN113727187B (en) 2022-10-11

Family

ID=78679641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111011701.1A Active CN113727187B (en) 2021-08-31 2021-08-31 Animation video processing method and device based on skeleton migration and related equipment

Country Status (1)

Country Link
CN (1) CN113727187B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114554267B (en) * 2022-02-22 2024-04-02 上海艾融软件股份有限公司 Audio and video synchronization method and device based on digital twin technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium
CN111681302A (en) * 2020-04-22 2020-09-18 北京奇艺世纪科技有限公司 Method and device for generating 3D virtual image, electronic equipment and storage medium
WO2021143103A1 (en) * 2020-01-13 2021-07-22 平安国际智慧城市科技股份有限公司 Video data processing method, apparatus and device, and computer-readable storage medium
CN113313794A (en) * 2021-05-19 2021-08-27 深圳市慧鲤科技有限公司 Animation migration method and device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium
WO2021143103A1 (en) * 2020-01-13 2021-07-22 平安国际智慧城市科技股份有限公司 Video data processing method, apparatus and device, and computer-readable storage medium
CN111681302A (en) * 2020-04-22 2020-09-18 北京奇艺世纪科技有限公司 Method and device for generating 3D virtual image, electronic equipment and storage medium
CN113313794A (en) * 2021-05-19 2021-08-27 深圳市慧鲤科技有限公司 Animation migration method and device, equipment and storage medium

Also Published As

Publication number Publication date
CN113727187A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN108010112B (en) Animation processing method, device and storage medium
CN111476871B (en) Method and device for generating video
CN111681681A (en) Voice emotion recognition method and device, electronic equipment and storage medium
CN104995662A (en) Avatar-based transfer protocols, icon generation and doll animation
EP4235491A1 (en) Method and apparatus for obtaining virtual image, computer device, computer-readable storage medium, and computer program product
CN113096242A (en) Virtual anchor generation method and device, electronic equipment and storage medium
CN113870395A (en) Animation video generation method, device, equipment and storage medium
CN114866807A (en) Avatar video generation method and device, electronic equipment and readable storage medium
CN116309992A (en) Intelligent meta-universe live person generation method, equipment and storage medium
CN113727187B (en) Animation video processing method and device based on skeleton migration and related equipment
JP2022133409A (en) Virtual object lip driving method, model training method, related apparatus, and electronic device
CN115049513A (en) Intelligent social contact construction method, device and equipment and readable storage medium
CN112686232B (en) Teaching evaluation method and device based on micro expression recognition, electronic equipment and medium
CN111737780B (en) Online model editing method and online model editing system
CN116939288A (en) Video generation method and device and computer equipment
US20210166073A1 (en) Image generation method and computing device
CN116761013A (en) Digital human face image changing method, device, equipment and storage medium
CN115393532B (en) Face binding method, device, equipment and storage medium
CN108958571B (en) Three-dimensional session data display method and device, storage medium and computer equipment
CN116962807A (en) Video rendering method, device, equipment and storage medium
CN112289321B (en) Explanation synchronization video highlight processing method and device, computer equipment and medium
CN114898019A (en) Animation fusion method and device
Kondratiuk et al. Dactyl alphabet modeling and recognition using cross platform software
CN114422862A (en) Service video generation method, device, equipment, storage medium and program product
CN112101191A (en) Expression recognition method, device, equipment and medium based on frame attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant