WO2020113733A1

WO2020113733A1 - Animation generation method and apparatus, electronic device, and computer-readable storage medium

Info

Publication number: WO2020113733A1
Application number: PCT/CN2018/125392
Authority: WO
Inventors: 都之夏
Original assignee: 北京微播视界科技有限公司
Priority date: 2018-12-07
Filing date: 2018-12-29
Publication date: 2020-06-11
Also published as: CN109615682A

Abstract

An animation generation method and apparatus, an electronic device, and a computer-readable storage medium. The method comprises: determining, by means of a predetermined speech recognition method, a music element feature of a target music (S101); determining a plurality of animation composition images, and determining, according to the music element feature of the target music, animation playback effects matching respective animation composition images (S102); generating, according to the animation playback effects and the plurality of animation composition images, a target animation (S103); and synthesizing the target music and the target animation, such that the target music and the target animation can be played correspondingly and synchronously (S104). The animation playback effect of each image is determined on the basis of the music element feature of the target music, thereby preventing a feeling of discomfort caused by incongruities between the playback effects of the images and the music element feature, and improving animation viewing experience for users.

Description

Animation generating method, device, electronic equipment and computer readable storage medium

Cross-reference of related applications

This disclosure requires the rights and interests of the Chinese patent application with the application number 201811496521.5 filed at the State Intellectual Property Office of China on December 7, 2018, the entire contents of which are incorporated herein by reference.

Technical field

The present disclosure relates to the technical field of image processing, and in particular, the present disclosure relates to an animation generation method, device, electronic device, and computer-readable storage medium.

Background technique

Taking pictures to record life has become an important way of life for people. With the development of animation processing technology, people can select a certain number of pictures and corresponding music to directly generate corresponding multimedia animation through the corresponding animation processing technology, and then can share the corresponding animation generated through the social network platform to other social network users Show your life.

Currently, the playback mode of the pictures in the corresponding multimedia animation generated based on the selected pictures and music is randomly determined, that is, the playback mode of the pictures in the generated animation is not related to the selected music. However, in the animation generated based on the selected pictures and music according to the prior art, the characteristics of the music (such as the rhythm, beat, melody, etc.) of the music do not affect the playback mode of the generated pictures, and thus the characteristics of the music and the playback of the pictures The way is not related, which causes the viewing experience of the animation viewer to be low. For example, the rhythm of the music in a certain cartoon segment is relatively soothing, and the corresponding picture transition method is very fast. This abrupt picture playback method may be It brings discomfort to the video viewer, which reduces the user's viewing experience. Therefore, in the prior art, there is a problem that the playback method of the pictures in the animation generated based on the pictures and the music is not related to the music, resulting in a poor video viewer experience.

Summary of the invention

In a first aspect, the present disclosure provides an animation generation method, the method including:

Determine the music element characteristics of the target music through a predetermined voice recognition method;

Determine multiple animation composition pictures, and determine the animation playback effect matching each animation composition picture according to the music element characteristics of the target music;

Generate target animation based on animation playback effect and multiple animation composition pictures;

The target music and the target animation are synthesized so that the target music and the target animation can be synchronously played and displayed accordingly.

In a second aspect, the present disclosure provides an animation generating device, which includes:

The first determining module is used to determine the music element characteristics of the target music through a predetermined voice recognition method;

The second determination module is used to determine a plurality of animation composition pictures, and determine the animation playback effect matching each animation composition picture according to the music element characteristics of the target music determined by the first determination module;

The animation generation module is used to generate a target animation according to the multiple animation composition pictures determined by the second determination module and the animation playback effects matching each animation composition picture;

The synthesis processing module is used for synthesizing the target music and the target animation generated by the animation generating module, so that the target music and the target animation can be synchronously played and displayed accordingly.

In a third aspect, the present disclosure provides an electronic device including:

processor;

A memory that stores at least one application program, and when the application program is executed by the processor, causes the electronic device to execute the animation generation method according to the first aspect.

According to a fourth aspect, the present disclosure provides a computer-readable storage medium for storing computer instructions, which when executed on a computer, causes the computer to execute the animation generation method according to the first aspect.

In the solution of the present disclosure, the animation playback effect is determined according to the music element characteristics of the target music, and each animation composition picture corresponds to a playback effect matching the music element characteristics of the target music, which can avoid the picture playback effect and the music element characteristics Discomfort caused by irrelevance (for example, the speed characteristics of the corresponding target music are more comfortable, and the discomfort caused by the extremely fast transition of the corresponding picture does not match the music characteristics), thereby improving the user's Animation viewing experience.

Additional aspects and advantages of the present disclosure will be partially given in the following description, which will become apparent from the following description or be learned through the practice of the present disclosure.

BRIEF DESCRIPTION

The above and/or additional aspects and advantages of the present disclosure will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic flowchart of an animation generation method according to an embodiment of the present disclosure;

2 is a schematic structural diagram of an animation generation device according to an embodiment of the present disclosure;

3 is a schematic structural diagram of another animation generating device according to an embodiment of the present disclosure;

4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

detailed description

The embodiments of the present disclosure are described in detail below. Examples of the embodiments are shown in the drawings, in which the same or similar reference numerals indicate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present disclosure, and cannot be construed as limiting the present disclosure.

Those skilled in the art can understand that unless specifically stated, the singular forms "a", "an", and "the" used herein may also include the plural form. It should be further understood that the word "comprising" used in the specification of the present disclosure refers to the presence of features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, Steps, operations, elements, components and/or their groups. The expression "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.

To make the objectives, technical solutions, and advantages of the present disclosure more clear, the embodiments of the present disclosure will be further described in detail below in conjunction with the accompanying drawings.

The technical solutions of the present disclosure and how the technical solutions of the present disclosure solve the above technical problems will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present disclosure will be described below with reference to the drawings.

An embodiment of the present disclosure provides an animation generation method. As shown in FIG. 1, the method may include steps S101 to S104.

Step S101: Determine the music element characteristics of the target music through a predetermined voice recognition method.

For this embodiment, music recognition is a cross-type research field, involving music knowledge and signal processing technology. Music recognition includes analyzing the music to obtain the music element characteristics of the target music. Among them, the target music can be a music file in WAV (Wave form audio format) format. The WAV file is a waveform file that stores lossless music; the target music can also be a music file in MIDI (Musical Instrument Digital Interface) format. Unlike the wave file, the MIDI file does not sample the music, but records each note of the music as a number, so the file is much smaller than the wave file.

For this embodiment, the target music may be input by playing or humming, or it may be obtained by searching a local music library or downloading through the network. Among them, if the target music is a file in the format of MP3, WMA, etc., because the music file in the format of MP3, WMA, etc. is a music file in a compressed format, the format of the target music can be decoded (ie decompressed), and decoded into a format such as WAV document.

For this embodiment, the music element characteristics of the target music are determined by a predetermined voice recognition method, wherein the predetermined voice recognition method may be a method based on time-frequency analysis, a method based on time-domain analysis, or a method based on frequency-domain analysis The method, or through other corresponding methods, is not limited here.

Step S102: Determine a plurality of animation composition pictures, and determine an animation playback effect matching each animation composition picture according to the music element characteristics of the target music.

For this embodiment, multiple animation composition pictures are determined, where the multiple animation composition pictures may be manually selected by the user from the picture library, or may be automatically determined from the picture library through a corresponding picture determination method.

For this embodiment, the animation playback effect matching each animation composition picture may be determined based on the predetermined music element characteristic and the animation composition picture playing effect matching rule, according to the music element characteristic of the target music.

Step S103: Generate a target animation according to multiple animation composition pictures and animation playback effects matching each animation composition picture.

Specifically, each animation component picture can be processed based on the matching animation playback effect corresponding to each animation component picture, so that each animation component picture can be played according to the determined matching playback effect, and then processed The subsequent animations form a picture and undergo fusion processing to generate the target animation.

Step S104: Synthesizing the target music and the target animation so that the target music and the target animation can be synchronously played and displayed accordingly.

Specifically, the target music and the target animation can be synthesized based on the time state information, so that the target music and the target animation can be played and displayed synchronously accordingly, wherein the corresponding synchronized play and display can be an animation composed picture according to the corresponding time point Or, the music element characteristics of the target music in the time period are played and displayed according to the corresponding play effect.

In the embodiment of the present disclosure, the animation playback effect is determined according to the music element characteristics of the target music, and each animation composition picture corresponds to a playback effect matching the music element characteristics of the target music, which can avoid the picture playback effect and the music element Discomfort caused by irrelevant features (for example, the speed characteristic of the corresponding target music is more comfortable, and the discomfort caused by the extremely fast transition of the corresponding picture does not match the music features), thereby improving the user Animation viewing experience.

An embodiment of the present disclosure provides a possible implementation manner. Specifically, step S101 may include step S1011 to step S1013.

Step S1011 (not shown in the figure): extract the audio information of the target music.

Specifically, the audio information of the target music is extracted through the corresponding audio information extraction technology, where the audio information of the target music may be mixed with a lot of noise information, because it may be interfered by electrical equipment when recording music, or other The noise of the object may also be the interference of the power frequency signal, so the noise information is inevitable, so you can pre-process the extracted audio information to remove the influence of the corresponding noise information, and compress the audio information data to reduce the amount of calculation.

Step S1012 (not shown in the figure): perform acoustic feature extraction on the extracted audio information to obtain corresponding acoustic feature information.

Specifically, the audio information can be processed through a corresponding filter to obtain the acoustic characteristic information of the target music; wherein, the audio information can be processed through a Gaussian low-pass FIR filter to obtain the target music signal in PCM format Envelope, and then peak detection through a combination of frequency domain analysis and time domain analysis to determine the position of the note of the target music, and then obtain the acoustic feature information of each note; Among them, the acoustic feature information includes but is not limited to zero-crossing rate, Characteristic information such as short-term energy.

Step S1013 (not shown in the figure): determine the music element characteristics of the target music based on the extracted acoustic characteristic information.

For this embodiment, through corresponding analysis of the extracted acoustic feature information, the music element features of the target music are obtained.

Further, the characteristics of music elements include but are not limited to at least one of the following:

Intensity, pitch, length, beat, rhythm, tempo and melody.

For this embodiment, the characteristics of music elements include, but are not limited to, features such as pitch, pitch, pitch, beat, rhythm, tempo, melody and so on.

Among them, the pitch is determined by the frequency of the object's vibration. If the frequency of the vibration is faster, the sound is higher, otherwise, the sound is lower; the sound length is determined by the length of the object's vibration time, the longer the vibration time, the longer the sound; Sound intensity refers to the size of the sound, which is determined by the vibration amplitude of the object. If the vibration amplitude is larger, the sound is stronger; rhythm refers to the length of the sound organized by the strength and weakness. The length is related, and it is related to the strength of the sound; the beat is the same time segment with strong and weak, repeated in a certain order to form the beat; the speed is a measure of the speed of the music beat, such as 144BMP, It means that there are 144 notes per minute; melody usually refers to an organized and rhythmic sequence formed by several musical sounds through artistic conception. Melody is formed by the organic combination of many basic elements of music, such as rhythm, beat, intensity, and length. There are three forms of melody, namely: descending, horizontal and ascending.

Among them, from the point of view of the organizational structure of music, music is composed of music segments, music segments are composed of music bars, music bars are composed of musical notes, and musical notes are the most basic elements of music; the extraction of the characteristics of the music elements of the target music can first determine the target The characteristics of musical notes, where the characteristics of notes include pitch, intensity, length and other basic characteristics, and then through the analysis of the note characteristics of the target music to obtain more complex musical element characteristics of the target music, such as beat, rhythm, speed, melody And other characteristics.

For the embodiments of the present disclosure, the music element features include pitch, pitch, length, beat, rhythm, speed, and melody. Such diverse music element features provide an improvement in the correlation between the playback effect of the animation composition picture and the target music basis.

The embodiments of the present disclosure provide a possible implementation manner. Specifically, step S102 may include step S1021 and step S1022.

Step S1021 (not shown in the figure): determine the music type of the target music according to the music element characteristics of the target music.

Specifically, the feature vector of the target music can be obtained based on the music element characteristics of the target music, and then the music type of the target music can be obtained through a pre-trained neural network model; wherein, the music type of the target music can be a certain segment of the target music The music type of may also be the music type of the target music as a whole; where the music type of the target music includes but is not limited to gentle, intense, calm, etc.

Step S1022 (not shown in the figure): based on the music type of the target music, determine a plurality of animation composition pictures matching the target music.

Specifically, according to different music types of the target music, a plurality of animation composition pictures matching the target music are determined respectively.

For the embodiments of the present disclosure, multiple animation composition pictures matching the target music are determined according to the music type of the target music, thereby improving the relevance of the animation composition pictures and the target music.

Yet another embodiment of the present disclosure provides a possible implementation manner. Specifically, step S1022 may include step S10221 and step S10222.

Step S10221 (not shown in the figure): determine the picture scene type that matches the music type of the target music;

For this embodiment, determine the picture scene type that matches the music type of the target music, for example, for gentle music, it can be used in the scene of tourist scenery, for intense music, it can be used in sports competitions, rock concerts Scenes such as the scene.

Step S10222 (not shown in the figure): it is determined that a plurality of animations conforming to the picture scene type constitute a picture.

Specifically, it is possible to determine, based on the scene type tag of the picture, a plurality of animation composition pictures matching the scene type through a picture library that has been subjected to scene classification processing in advance; or to identify multiple candidate pictures through corresponding picture recognition methods to determine Multiple animations that match the type of picture scene make up the picture.

For the embodiment of the present disclosure, multiple animation composition pictures are determined according to the picture scene type that matches the music type of the target music, which solves the problem of how to determine the animation composition picture according to the type of the target music.

An embodiment of the present disclosure provides a possible implementation manner, and step S102 may include step S1023 and step S1024.

Step S1023 (not shown in the figure): perform segmentation processing on the target music according to the music element characteristics of the target music to obtain multiple music fragments.

For this embodiment, the target music is segmented according to the music element characteristics of the target music to obtain multiple music fragments; wherein, the music fragments may correspond to one or more music bars of the target music, or may be the target music One is the music segment; among them, the division of the music bars can be obtained based on the strong and weak characteristics of the musical elements between the notes, and the division of the music bars can be based on the similarity between the divided music bars.

Step S1024 (not shown in the figure): according to the characteristics of the music elements corresponding to the respective music fragments, determine the animation playback effect of the corresponding animations in each music segment to constitute the picture, the animation playback effect includes the transition mode, animation special effects and filtering At least one item in mirror mode.

For this embodiment, the animation playback effect of each animation component picture corresponding to each music segment is determined according to the music element characteristics corresponding to each music segment, wherein each animation component can be determined according to a certain feature or a combination of multiple features The animation playback effect of the picture, for example, the transition mode of the animation composition picture can be determined according to the rhythm and speed, and the corresponding filter mode can be determined according to the melody.

For the embodiment of the present disclosure, the target music is segmented according to the music element characteristics of the target music to obtain a plurality of music fragments, and then the corresponding animation composition pictures in each music segment are determined according to the music element characteristics corresponding to the respective music fragments The animation playback effect solves the problem of how to determine the animation playback effect of an animation composition picture according to the characteristics of music elements.

FIG. 2 is an animation generation device provided in an embodiment of the present disclosure. The device 20 includes: a first determination module 201, a second determination module 202, an animation generation module 203, and a synthesis processing module 204, where:

The first determination module 201 is used to determine the music element characteristics of the target music through a predetermined voice recognition method;

The second determination module 202 is used to determine a plurality of animation composition pictures, and determine the animation playback effect matching each animation composition picture according to the music element characteristics of the target music determined by the first determination module 201;

The animation generation module 203 is used to generate a target animation according to the plurality of animation composition pictures determined by the second determination module 202 and the animation playback effect matching each animation composition picture;

The synthesis processing module 204 is used for synthesizing the target music and the target animation generated by the animation generating module 203, so that the target music and the target animation can be synchronously played and displayed accordingly.

The device of this embodiment can execute an animation generation method provided in the above embodiments of the present disclosure, and its implementation principles are similar, and will not be repeated here.

An embodiment of the present disclosure provides a possible animation generating apparatus. As shown in FIG. 3, the apparatus 30 of this embodiment may include: a first determining module 301, a second determining module 302, an animation generating module 303, and a synthesis processing module 304 .

The first determination module 301 is used to determine the music element characteristics of the target music through a predetermined voice recognition method.

The functions of the first determining module 301 in FIG. 3 and the first determining module 201 in FIG. 2 are the same or similar.

The second determination module 302 is used to determine a plurality of animation composition pictures, and determine an animation playing effect matching each animation composition picture according to the music element characteristics of the target music determined by the first determination module 301.

The functions of the second determination module 302 in FIG. 3 and the second determination module 202 in FIG. 2 are the same or similar.

The animation generation module 303 is used to generate a target animation according to the plurality of animation component pictures determined by the second determination module 302 and the animation playback effect matching each animation component picture.

The functions of the animation generating module 303 in FIG. 3 and the animation generating module 203 in FIG. 2 are the same or similar.

The synthesis processing module 304 is used for synthesizing the target music and the target animation generated by the animation generating module 303, so that the target music and the target animation can be synchronously played and displayed accordingly.

The functions of the synthesis processing module 304 in FIG. 3 and the synthesis processing module 204 in FIG. 2 are the same or similar.

According to an embodiment of the present disclosure, the first determination module 301 may include a first extraction unit 3011, a second extraction unit 3012, and a first determination unit 3013, where:

The first extraction unit 3011 is used to extract audio information of the target music;

The second extraction unit 3012 is used to perform acoustic feature extraction on the audio information extracted by the first extraction unit 3011 to obtain corresponding acoustic feature information;

The first determination unit 3013 is used to determine the music element characteristics of the target music based on the acoustic characteristic information extracted by the second extraction unit 3012.

Further, the music element characteristics include at least one of the following:

Pitch, pitch, pitch, beat, rhythm, tempo and melody.

According to an embodiment of the present disclosure, the second determination module 302 may include a second determination unit 3021 and a third determination unit 3022, where:

The second determining unit 3021 is used to determine the music type of the target music according to the music element characteristics of the target music;

The third determination unit 3022 is used to determine a plurality of animation composition pictures matching the target music based on the music type of the target music determined by the second determination unit 3021.

According to an embodiment of the present disclosure, the third determining unit 3022 may be further configured to determine a picture scene type that matches the music type of the target music, and determine a plurality of animation constituent pictures that conform to the picture scene type.

According to an embodiment of the present disclosure, the second determination module 302 may further include a segment processing unit 3023 and a fourth determination unit 3024, where:

The segmentation processing unit 3023 is used to segment the target music according to the characteristics of the music elements of the target music to obtain multiple music segments;

The fourth determining unit 3024 is used to determine the animation playing effect of each animation composition picture corresponding to each music segment according to the music element characteristics corresponding to each music segment obtained by the segment processing by the segment processing unit 3023. At least one of field mode, animation effects, and filter mode.

The animation generating device provided by the embodiment of the present disclosure is applicable to the method shown in the above embodiment, and will not be repeated here.

In an alternative embodiment, an electronic device is provided, as shown in FIG. 4, which shows a schematic structural diagram of an electronic device (eg, terminal device or server) 40 suitable for implementing the embodiments of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), and in-vehicle terminals ( Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 4 is only an example, and should not bring any limitation to the functions and use scope of the embodiments of the present disclosure.

As shown in FIG. 4, the electronic device 40 may include a processing device (for example, a central processing unit, a graphics processor, etc.) 401, which may be loaded into a random storage according to a program stored in a read-only memory (ROM) 402 or from the storage device 408 The program in the memory (RAM) 403 is fetched to perform various appropriate actions and processes. In the RAM 403, various programs and data necessary for the operation of the electronic device 40 are also stored. The processing device 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to the bus 404.

Generally, the following devices can be connected to the I/O interface 405: including input devices 406 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speaker, vibration An output device 407 such as a storage device; a storage device 408 including, for example, a magnetic tape or a hard disk; and a communication device 409. The communication device 409 may allow the electronic device 40 to perform wireless or wired communication with other devices to exchange data. Although FIG. 4 shows an electronic device 40 having various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided instead.

This embodiment provides an electronic device applicable to the foregoing method embodiments, and details are not described herein again.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product that includes a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 409, or from the storage device 408, or from the ROM 402. When the computer program is executed by the processing device 401, the above-mentioned functions defined in the method of the embodiments of the present disclosure are executed.

It should be noted that, the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, the computer-readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In this disclosure, the computer-readable signal medium may include a data signal that is propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device . The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: electric wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.

The computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device is caused to: execute the animation generation method shown in the above method embodiment.

The computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof. The above programming languages include object-oriented programming languages such as Java, Smalltalk, C++, as well as conventional Procedural programming language-such as "C" language or similar programming language. The program code may be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In situations involving remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, through an Internet service provider Internet connection).

This embodiment provides a computer-readable storage medium suitable for the foregoing method embodiments, and details are not described herein again.

The flowcharts and block diagrams in the drawings illustrate the possible implementation architecture, functions, and operations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of code that contains one or more logic functions Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks represented in succession may actually be executed in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with dedicated hardware-based systems that perform specified functions or operations Or, it can be realized by a combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.

The above description is only the preferred embodiment of the present disclosure and the explanation of the applied technical principles. Those skilled in the art should understand that the scope of the disclosure in this disclosure is not limited to the technical solutions formed by the specific combination of the above technical features, but should also cover the above technical features or without departing from the above disclosed concepts. Other technical solutions formed by arbitrary combinations of equivalent features. For example, the above features and the technical features disclosed in this disclosure (but not limited to) having similar functions are replaced with each other to form a technical solution.

Claims

An animation generation method, including:

Determine the music element characteristics of the target music through a predetermined voice recognition method;

Determine multiple animation composition pictures, and determine the animation playback effect matching each animation composition picture according to the music element characteristics of the target music;

Generating a target animation according to the multiple animation composition pictures and the animation playback effect matching each animation composition picture;

Synthesizing the target music and the target animation, so that the target music and the target animation can be synchronously played and displayed accordingly.
The method according to claim 1, wherein determining the music element characteristics of the target music comprises:

Extract audio information of the target music;

Acoustic feature extraction is performed on the extracted audio information to obtain corresponding acoustic feature information;

The music element characteristics of the target music are determined based on the obtained acoustic characteristic information.
The method according to claim 1 or 2, wherein the music element characteristics include at least one of the following:

Pitch, pitch, pitch, beat, rhythm, tempo and melody.
The method according to claim 1, wherein the determining of the plurality of animation composition pictures comprises:

Determine the music type of the target music according to the music element characteristics of the target music;

Based on the music type of the target music, a plurality of animation composition pictures matching the target music are determined.
The method according to claim 4, wherein determining a plurality of animation composition pictures matching the target music includes:

Determine a picture scene type that matches the music type of the target music;

It is determined that a plurality of animation composition pictures conforming to the picture scene type.
The method according to claim 1, wherein the determination of the animation playback effect matching each animation composition picture comprises:

Segmenting the target music according to the characteristics of the music elements of the target music to obtain multiple music fragments;

According to the characteristics of the music elements corresponding to the respective music segments, the animation playback effect of each animation corresponding to each music segment is determined, and the playback effect includes at least one of a transition mode, an animation special effect, and a filter mode.
An animation generating device, including:

The first determining module is used to determine the music element characteristics of the target music through a predetermined voice recognition method;

A second determining module, configured to determine a plurality of animation composition pictures, and determine an animation playing effect matching each animation composition picture according to the music element characteristics of the target music determined by the first determination module;

An animation generation module, configured to generate a target animation according to the plurality of animation component pictures determined by the second determination module and the animation playback effect matching each animation component picture;

The synthesis processing module is used for synthesizing the target music and the target animation generated by the animation generating module, so that the target music and the target animation can be synchronously played and displayed accordingly.
The apparatus according to claim 7, wherein the first determination module comprises:

A first extraction unit for extracting audio information of the target music;

A second extraction unit, configured to perform acoustic feature extraction on the audio information extracted by the first extraction unit to obtain corresponding acoustic feature information;

The first determining unit is configured to determine the music element feature of the target music based on the acoustic feature information obtained by the second extracting unit.
An electronic device, including:

processor;

A memory that stores at least one application program, and when the at least one application program is executed by the processor, causes the electronic device to execute the animation generation method according to any one of claims 1 to 6.
A computer-readable storage medium for storing computer instructions, which when executed on a computer, causes the computer to execute the animation generation method according to any one of claims 1 to 6.