CN108492817B - Song data processing method based on virtual idol and singing interaction system - Google Patents

Song data processing method based on virtual idol and singing interaction system Download PDF

Info

Publication number
CN108492817B
CN108492817B CN201810142242.2A CN201810142242A CN108492817B CN 108492817 B CN108492817 B CN 108492817B CN 201810142242 A CN201810142242 A CN 201810142242A CN 108492817 B CN108492817 B CN 108492817B
Authority
CN
China
Prior art keywords
song
information
data
lyric
music
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810142242.2A
Other languages
Chinese (zh)
Other versions
CN108492817A (en
Inventor
陆羽皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Virtual Point Technology Co Ltd
Original Assignee
Beijing Guangnian Wuxian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guangnian Wuxian Technology Co Ltd filed Critical Beijing Guangnian Wuxian Technology Co Ltd
Priority to CN201810142242.2A priority Critical patent/CN108492817B/en
Publication of CN108492817A publication Critical patent/CN108492817A/en
Application granted granted Critical
Publication of CN108492817B publication Critical patent/CN108492817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/145Composing rules, e.g. harmonic or musical rules, for use in automatic composition; Rule generation algorithms therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/295Noise generation, its use, control or rejection for music processing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Abstract

The invention discloses a song data processing method based on virtual idol, which comprises the following steps: obtaining multi-modal data, extracting singing song audio from the multi-modal data, converting the singing song audio into a song file, and generating music score information and lyric information corresponding to the song file through word-spectrum separation operation; editing the music score information and the lyric information according to the music processing model to generate music score creation information and lyric creation information; and carrying out voice synthesis on the music score creation information and the lyric creation information based on the sound ray of the virtual idol, generating a target song file and outputting the target song file. The application can perform the target self-authoring song on the imaging device through the control of the mobile device so as to assist the interactive object to perform self-authoring and singing, and improve the authoring experience of the interactive object.

Description

Song data processing method based on virtual idol and singing interaction system
Technical Field
The invention relates to the field of intelligent robots, in particular to a song data processing method and a singing interaction system based on virtual idols.
Background
With the continuous development of the field of artificial intelligence, the research work on robots has been gradually expanded not only for the industrial field but also across the fields of entertainment, medical treatment, health care, family, service, and the like.
In the entertainment field, most of the applications aiming at the virtual robot play songs on demand and existing songs, and corresponding songs cannot be generated according to music scores and lyrics. Therefore, the interactive capacity of the novel robot is provided, the user music man is assisted to create the target tracks, and the user experience is improved.
Disclosure of Invention
One of the technical problems to be solved by the present invention is to provide a method for processing song data based on virtual idol, the method includes the following steps: a step of scratching music, which is to acquire multi-mode data, extract the audio frequency of a singing song from the multi-mode data, convert the audio frequency of the singing song into a song file and separate the song file by word and spectrum to generate music score information and lyric information corresponding to the song file; a step of editing word and music, which is to edit the music score information and the lyric information according to a music processing model to generate music score creation information and lyric creation information; and a voice synthesis step of performing voice synthesis on the music score creation information and the lyric creation information based on the sound ray of the virtual idol, generating a target song file and outputting the target song file.
In one embodiment, in the vocabulary editing step, the method further comprises judging the current song style based on the music score information; calling a music processing model matched with the style; and editing the music score information and the lyric information according to a music processing model matched with the style to generate music score creation information and lyric creation information.
In one embodiment, the lyric creation information comprises lyric fragment data, wherein the lyric fragment data is configured with lyric fragment codes, lyric fragment start and stop time intervals, vowel/initial identifiers, tone codes, sentence break identifiers and vowel/initial identifiers; the music score creation information comprises music score segment data, wherein the music score segment data is configured with a music score segment code, a music score segment start and stop time interval, tone segment data and a measure identification.
In one embodiment, in the step of scratching the spectrum, the method further comprises the following steps: separating the singing voice of the obtained singing song audio, removing background voice data and reserving voice singing data; and further decomposing the human voice singing data, editing and sorting to generate the music score information and the lyric information.
In one embodiment, in the speech synthesis step, further comprising, based on a current song style, determining a sound ray matching the song style; and substituting the lyric composition information, the music score composition information, the sound ray matched with the song style and the background sound data into a preset voice synthesis system to generate a target song file.
In one embodiment, the music processing model is constructed by training based on a word spelling library, song notes and composition database, and combining a plurality of composition habit feature databases.
According to another aspect of the embodiments of the present invention, there is also provided a virtual idol-based song data processing system, including the following modules: the music score scratching module is used for acquiring multi-mode data, extracting singing song audio from the multi-mode data, converting the singing song audio into a song file to be operated by word-spectrum separation, and generating music score information and lyric information corresponding to the song file; the word and music editing module is used for editing the music score information and the lyric information according to a music processing model to generate music score creation information and lyric creation information; and the voice synthesis module is used for carrying out voice synthesis on the music score creation information and the lyric creation information based on the sound ray of the virtual idol, generating a target song file and outputting the target song file.
In one embodiment, in the vocabulary editing module, the song style identification unit is further used for judging the current song style based on the music score information and generating corresponding information; an creation model selection unit that calls a music processing model matching the style; and an editing unit editing the score information and the lyric information according to a music processing model matching the style to generate score creation information and lyric creation information.
In one embodiment, the score raking module further comprises: the singing voice separation unit is used for carrying out singing voice separation on the obtained singing song audio, removing background voice data and reserving voice singing data; and the word and song decomposition unit is used for further decomposing the human voice singing data, editing and sorting the human voice singing data and generating the music score information and the lyric information.
In one embodiment, the speech synthesis module, further comprising: the sound ray selecting unit is used for determining a sound ray matched with the song style based on the current song style; and the song synthesizing unit substitutes the lyric composition information, the music score composition information, the sound ray matched with the song style and the background sound data into a preset voice synthesizing system to generate a target song file.
In one embodiment, the music processing model is constructed by training based on a word spelling library, song notes and composition database, and combining a plurality of composition habit feature databases.
According to another aspect of embodiments of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the song data processing method described above.
According to another aspect of the embodiments of the present invention, there is also provided a song processing and singing interactive system based on virtual idol, the interactive system including: a cloud server provided with the computer-readable storage medium; the mobile device receives and plays the target song file output by the cloud server, generates imaging control information based on the target song file, and controls the target song file and the imaging control information to be output synchronously; and the imaging equipment receives the imaging control information sent by the output equipment, and displays a virtual idol based on the imaging control information, wherein the virtual idol displays a specific image characteristic matched with the imaging control information.
Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:
according to the embodiment of the invention, after the singing song audio frequency in the multi-mode data is obtained through the mobile equipment, the song style can be identified, the music score information and the lyric information in the singing song audio frequency are edited according to the style information and the music processing model customized through a machine learning method, and the target self-creation song is performed on the imaging equipment through the control of the mobile equipment by further utilizing the sound ray aiming at the style information. The virtual robot can assist the song source interactive object to create and improve the creation experience of the interactive object.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure and/or process particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is an imaging scenario application diagram of a virtual idol-based song processing and singing interaction system according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a song processing and singing interactive system based on a virtual idol according to an embodiment of the present application.
Fig. 3 is a block diagram of an output device 22 in a virtual idol-based song processing and singing interactive system according to an embodiment of the present application.
Fig. 4 is a flowchart illustrating an implementation of the output device 22 in the virtual idol-based song processing and singing interactive system according to the embodiment of the present application.
Fig. 5 is a schematic structural diagram of a song data processing system 23 based on virtual idol according to an embodiment of the present application.
Fig. 6 is a flowchart illustrating an implementation of the pan-tilt module 231 in the virtual even-based song data processing system 23 according to an embodiment of the present application.
Fig. 7 is a flowchart illustrating an implementation of the vocabulary editing module 232 in the virtual idol-based song data processing system 23 according to an embodiment of the present application.
Fig. 8 is a flowchart illustrating an implementation of the speech synthesis module 233 in the song data processing system 23 based on virtual idol according to an embodiment of the present application.
Fig. 9 is a flowchart illustrating steps of a method for processing song data based on virtual idols according to an embodiment of the present application.
Detailed Description
The following detailed description of the embodiments of the present invention will be provided with reference to the accompanying drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the corresponding technical effects can be fully understood and implemented. The embodiments and the features of the embodiments can be combined without conflict, and the technical solutions formed are all within the scope of the present invention.
Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
The embodiment of the application is implemented by aiming at a virtual idol which can realize multi-mode data interaction with an interactive object on line and/or off line in an entertainment scene, wherein the virtual idol has specific image characteristics, operates in a mobile device, and is controlled by the mobile device to be displayed through an imaging device. The mobile device may configure the virtual idol with social attributes, personality attributes, character skills, and the like. Specifically, the mobile device can be connected with a cloud server, so that the virtual idol has multi-mode man-machine interaction and has the abilities of Artificial Intelligence (AI) such as natural language understanding, visual perception, touch perception, language voice output, emotion expression and action output and the like; display functions of the imaging device may also be controlled, including controlling the display of scene appendages (e.g., controlling flowers, trees, etc. in a scene), and displaying lights, effects, particles, or rays, which may be displayed by the imaging device. The social attributes may include: attributes such as appearance, name, apparel, decoration, gender, native place, age, family relationship, occupation, position, religious belief, emotional state, academic calendar, etc.; personality attributes may include: character, temperament, etc.; the character skills may include: singing, dancing, storytelling, training, and the like, and character skill display is not limited to body, expression, head, and/or mouth skill display.
It should be noted that social attributes, personality attributes, character skills, and the like of the virtual idol can guide the analysis process of the multi-modal interactive data, so that the multi-modal output data result obtained by decision making is more inclined or more suitable for the virtual idol. Meanwhile, the virtual idol can be projected to the imaging equipment in cooperation with the mobile equipment and can perform deduction according to scenes displayed by the imaging equipment, such as singing, dancing and the like.
In the application, the virtual idol has the song editing processing capacity, can acquire song audio information from acquired multi-mode data, performs song creation simulating word making habits of musicians on the information, synthesizes songs conforming to sound rays of the robot, plays the created songs by taking the specific image of the virtual idol as a carrier, and controls the imaging equipment to finish displaying the specific image.
The following describes embodiments of the present invention.
Fig. 1 is an imaging scenario application diagram of a virtual idol-based song processing and singing interaction system according to an embodiment of the present application. As shown in fig. 1, a virtual idol runs in mobile device 101 and is rendered by projection by imaging device 102. The cloud server 103 is provided with a computer readable storage medium, and is interconnected with the mobile device 101 through the internet to provide data analysis, processing and storage support for data received by the mobile device 101, wherein physical positions of the mobile device 101 and the imaging device 102 are aligned with reference to each other, so that signal interconnection between the mobile device 101 and the imaging device 102 is realized. And the mobile device 101 receives and plays the target song file output by the cloud server 103, generates imaging control information based on the target song file, and controls the target song file and the imaging control information to be output synchronously so as to realize display operation of projecting the virtual idol running on the mobile device 102. And the imaging device 102 receives the imaging control information sent by the mobile device 101, and performs virtual idol display based on the imaging control information, wherein the virtual idol displays a specific image characteristic matched with the imaging control information. The imaging device 102 may be a holographic projection device, and the holographic projection device may provide a basic carrier for projection imaging, may display a picture or text displayed on a screen of the mobile device, and may also collect signals related to vision, infrared, and/or bluetooth, so as to assist the mobile device in interacting. It should be noted that the present application is not limited to the device types of the mobile device 101 and the imaging device 102, where the mobile device 101 may be a smartphone, an IPAD, a tablet computer, and the like, and those skilled in the art may select the device types according to actual situations.
Fig. 2 is a schematic structural diagram of a song processing and singing interactive system based on a virtual idol according to an embodiment of the present application. As shown in fig. 2, the interactive system includes: an input device 21, an output device 22, a song data processing system 23, and an imaging device 102. The input device 21 and the output device 22 are built in the mobile device 101. It should be noted that the song data processing system 23 is built in the cloud server 103, and by means of the strong storage and data processing capabilities of the cloud brain, the function of creating and singing the audio information output by the interactive object by itself is completed. The following is a detailed description of the components and functions of the various parts of the interactive system.
First, in the input device 21, it can acquire and transmit multimodal data output by an interaction object. Specifically, the input device 21 may be a physical hardware device such as a microphone, a front lens or a rear lens installed on the mobile device 101, or may be a network channel or a local channel, which is not specifically limited in this application. When the input device 21 is a physical hardware device, the device may convert the multimodal data output from the interactive object into a format readable by the song data processing system 23, and then send the multimodal data to the song data processing system 23, and at this time, the interactive object may be video or audio information sung by the user, and the application is not limited thereto. For example: the user records a performance through the front camera, and the camera driving software in the input device 21 can convert the information of the performance completion into multi-mode data in a video format and output the data. When the input device 21 is a network channel or a local channel, the input device 21 may directly send the obtained multimodal data to the song data processing system 23, and in this case, the interactive object may be a network platform, and the like, which is not particularly limited in this application. For example: the input device 21 may directly acquire performance data played by the network platform, or may directly load multi-modal data into the input device 21 through a local channel by the user.
Next, the output device 22 will be described in detail. Fig. 3 is a block diagram of an output device 22 in a virtual idol-based song processing and singing interactive system according to an embodiment of the present application. The output device 22 is capable of receiving and playing the created target track file output by the song data processing system 23, and also generating imaging control information based on the target track file, and completing the synchronized output of the target track file and the imaging control information. As shown in fig. 3, the apparatus 22 includes: a lyric text decomposition module 221, an oral animation storage module 222, a specific character storage module 223, a performance character generation module 224, and a synchronization output module 225.
Fig. 4 is a flowchart illustrating an implementation of the output device 22 in the virtual idol-based song processing and singing interactive system according to the embodiment of the present application. The functions of the respective blocks in the output device 22 will be described in detail below with reference to fig. 3 and 4.
A lyric text decomposition module 221 that receives lyric creation information transmitted by the song data processing system 23; and then the lyric creation information is analyzed into lyric fragment data with pinyin syllables as units, and the lyric fragment data is further decomposed into lyric fragment codes, lyric fragment starting and stopping time intervals, initial/final identifications, tone codes, sentence break identifications, final/initial contents and the like of the current lyric fragment. Then, the obtained final/initial content or the code corresponding to the content is sent to the oral animation storage module 222.
The following points are needed to explain the specific flow in the lyric text decomposition module 221: the lyric creation information comprises lyric fragment data, and natural sequence coding is carried out according to the pinyin arrangement position of each fragment in the lyrics in the song. Since the lyric fragment data is edited in units of pinyin syllables, a complete word may include several lyric fragment data. In tone coding, non-tone, first tone, second tone, third tone and fourth tone can be coded correspondingly. The encoding rule of the lyric creation information is not specifically limited in the present application, and those skilled in the art can make corresponding adjustment and definition according to the actual situation.
(one example) if the first sentence of the whole lyric is "snowflake", wherein the "velvet" word will include two lyric fragment data, the following information is included in the second fragment data: the lyric fragment is coded as 5, the starting and stopping time interval is 00:01: 01-00: 01:56, the vowel is marked, the tone is two-tone coded as 2, and the content information is ong.
An oral animation storage module 222, which stores oral animation data for each initial consonant and final sound, the oral animation data is composed of three-dimensional position information of each pixel point, when the module 222 acquires the content for the segment information or the code representing the segment information sent by the lyric character decomposition module 221, the oral animation storage module 222 will find out the corresponding oral animation data from the database, and send the corresponding oral animation data to the performance image generation module 224. It should be noted that the present application describes the mouth shape animation data by using the three-dimensional position information, but this is merely an example, and the present application is not limited to the form of the mouth shape animation data and animation data such as specific image data and performance image data described below.
The specific image storage module 223 stores specific images of preset virtual idols, and can randomly select any one of the specific images to send to the performance image generation module 224. In addition, the module 223 can also receive the image command of the user, and send the specific image information corresponding to the command to the performance image generation module 224. It should be noted that, the form of the specific image retrieval instruction of the module is not specifically limited, the same specific image information may be specified to be output, and the specific image information may be randomly selected, may be individually selected by a user, or may be converted into an individual selection mode from the former two modes.
The description of the avatar generation module 224 continues as follows. The module 224 receives the mouth shape animation data sent by the mouth shape animation storage module 222 and the specific image information of the specific image storage module 223 in real time, replaces the information representing the mouth motion in the specific image information with the mouth shape animation data, and generates the real-time performance image corresponding to the content of the current lyric fragment, that is, the current imaging control information.
And a synchronous output module 225, which receives the target song file sent by the song data processing system 23, and then obtains the imaging control information sent by the performance image generation module 224 in real time, and the lyric text decomposition module 221 analyzes the lyric fragment code and the lyric fragment start-stop time interval of the current fragment, and integrates the above information, so that the output time of the imaging control information and the corresponding fragment in the target song file are output at the same time. The output of the target song file is played through a voice output device such as a sound box and a built-in speaker configured on the mobile device 101; the imaging control information is displayed on the screen of the mobile device 101, or the imaging control information is directly sent to the imaging device 102 after the mobile device 101 and the imaging device 102 are connected, so that the auxiliary interaction function of the imaging device 102 on the mobile device 101 is realized.
Referring again to fig. 2, the imaging apparatus 102 is explained next. And an imaging device 102 that receives the imaging control information transmitted by the output device and performs display based on the imaging control information. Specifically, if the imaging device 102 is a holographic projection device, specific character information shown on the screen of the mobile device 101 according to the imaging control information is presented. If the mobile device 101 is connected with the mobile device 101 in a wireless, bluetooth or infrared manner, the imaging control information generated by the mobile device 101 is directly received in real time, and the information is converted into three-dimensional position information of a specific image by using a preset stereo imaging model and is presented, so as to realize the interactive capability of the auxiliary mobile device 101.
Finally, the virtual idol-based song data processing system 23 is explained in detail.
Fig. 5 is a schematic structural diagram of a song data processing system 23 based on virtual idol according to an embodiment of the present application. As shown in fig. 5, the system 23 includes a score extraction module 231, a word-phrase editing module 232, and a speech synthesis module 233. The functions and the workflow of each module in the song data processing system 23 will be specifically described below.
And a music score module 231 for acquiring the multi-modal data, extracting the audio of the singing song from the multi-modal data, converting the audio of the singing song into a song file, and generating music score information and lyric information corresponding to the song file through word-spectrum separation operation. Wherein the module 231 further includes an audio extracting unit 2311, a singing voice separating unit 2312, and a word song decomposing unit 2313 by its functions.
Specifically, fig. 6 is a flowchart illustrating an implementation of the pan-tilt module 231 in the virtual even-image based song data processing system 23 according to an embodiment of the present application. As shown in fig. 6 and referring to fig. 5 again, first, the audio extraction unit 2311 receives and identifies the multi-modal data acquired from the input device 21, and if the multi-modal data includes multiple states of audio, video, or text, extracts corresponding audio information therefrom and sends the audio information to the singing voice separation unit 2312; if the obtained multimodal data is only audio data, the obtained multimodal data is directly forwarded to the singing voice separation unit 2312 as the singing song audio.
Then, the singing voice separation unit 2312 can separate the singing song audio obtained by using a common singing voice separation method, retain the voice singing data through a voice recognition technology, achieve the purpose of separating background voice data by using the existing voice elimination technology, distinguish the background voice from the voice, and finally send the voice singing data to the vocabulary decomposition unit 2313. In this example, in order to improve the accuracy of the fundamental tone extraction, an open-loop-closed loop mode is firstly adopted to determine the approximate range of the fundamental tone, then adjustment and accurate extraction are carried out, then the singing voice is distinguished from the background voice by using an energy comparison method, the fundamental tone extracted only in the background voice period is removed, and thus the separation of the human voice singing data is completed. The background sound data may include information such as accompaniment and harmony according to different audio contents of the singing song. It should be noted that, the specific implementation method for extracting the vocal singing data in the present application is not particularly limited, and methods such as extracting vocal features in the vocal recognition technology to establish a corresponding vocal singing recognition model and the like may be adopted, and a vocal enhancement technology, a high-pitch recognition technology and the like may also be utilized.
Then, after the voice singing data is acquired, the word and song decomposition unit 2313 further decomposes the voice singing data, and edits and collates the voice singing data to generate music score information and lyric information. Specifically, the lyric information is generated by utilizing a voice recognition technology and combining a lyric primary recognition model constructed by a semantic understanding database, and the lyric part in the human voice singing data is extracted. In the course of training the preliminary lyric recognition model, after filtering and denoising are completed by using a large amount of historical data of vocal singing, the mapping relation between voice acoustic parameters and recognized character information is extracted, so that a training model of the preliminary lyric recognition model is obtained, the vocal singing data obtained by the module 2313 is used as input information for recognition, the training model is used for feature matching, and finally corresponding lyric information is analyzed. After the recognized character information is obtained through a voice recognition technology, according to a semantic understanding database, a natural language processing technology is utilized to correct the preliminarily generated lyric information according with the language logic or logically correct the words which are not clearly recognized, and finally the result is used as a data base for training a lyric preliminary recognition model. In addition, the music score information is generated by utilizing a voice recognition technology and combining a music score preliminary recognition model constructed by a song note library, and the music score part in the human voice singing data is extracted. In the process of training the preliminary music score recognition model, a mapping relation between a voice acoustic frequency parameter and recognized note information is extracted after filtering and denoising are finished by utilizing a large amount of historical data of vocal singing, so that the training model of the preliminary music score recognition model is obtained, the vocal singing data obtained by the module 2313 are used as input information for recognition, and feature matching is carried out according to the training model of the preliminary music score recognition to obtain corresponding music score information. And after the recognized note information is obtained through a voice recognition technology, according to a song note library, the notes which are not clearly recognized in the preliminarily generated music score information are supplemented, and finally, the supplementing result is used as a data basis of a training music score preliminary recognition model.
Referring again to fig. 5, the description continues with the vocabulary editing module 232 in the song data processing system 23. And a score editing module 232, which is internally disposed in the cloud server 103 and can edit and process the music score information and the lyric information acquired from the music score scratching module 231 according to the music processing model to generate music score creation information and lyric creation information. Wherein, this module still includes: a song style identification unit 2321, an authoring model selection unit 2322, and an editing unit 2323.
Specifically, fig. 7 is a flowchart illustrating an implementation of the vocabulary editing module 232 in the virtual idol-based song data processing system 23 according to an embodiment of the present application. As shown in fig. 7, the functions and workflow of each unit (refer to fig. 5) in the vocabulary editing module 232 will be described below.
The song style recognition unit 2321 may determine a current song style through the established music style recognition model based on the acquired music score information, so as to generate corresponding song style information, and then send the generated style information to the creation model selection unit 2322. When the music score information is input, judging the style corresponding to the music according to different feature values of the same feature of various music styles, and finally outputting codes of the corresponding song style. In this example, the style type is confirmed by a binary tree construction method. Firstly, classifying style types according to the styles including fashion, nationality, classical, jazz, ballad, rock, R & B, playing and the like; secondly, taking the rhythm, note type distribution, tone (audio) distribution and other aspects as sub-features, and summarizing specific sub-feature information aiming at the style sub-features of each type; then, establishing a weight expression of each characteristic node in the binary tree; and finally, integrating the weight analysis results of all the nodes to construct a final song style identification type. It should be noted that, the method for determining the genre of a song in the present application is not limited in particular, and those skilled in the art can select the method according to actual situations based on the principle of completing the function of the unit 2321.
In the composition model selecting unit 2322, music processing models constructed for respective song genres are stored, and the unit 2322 calls a music processing model matching the current genre according to the obtained song genres. The music processing model is constructed by training based on a word spelling library, song notes and composition database and combining a plurality of composition habit characteristic databases. It should be noted that, the music processing model takes music lyric data and music score data in a large number of music historical data of different song styles as a data base for training the model, and lyric creation information and music score creation information which are generated corresponding to the historical data and are matched with the data format specified in the application as training target data, and a mapping relation from the basic data to the target data is trained by a machine learning method according to an existing word and phrase spelling library, a song note library, a word composition database (recording music composition theoretical rule characteristics) and a word composition habit characteristic database established for various song styles stored in the cloud server 103, so that a training model which accords with a word composition rule of a habit, namely the music processing model, is established. It should be noted that, the final form of the music processing model is not specifically limited in the present application, and may be a labeled document template obtained through the training process, a mapping relationship after training, or the like, in addition to the mapping model trained in the present application. Wherein, the word spelling library has the initial consonants and the final consonants in all the spellings and encodes each spelling segment; the song note library includes all note types and time length information thereof, rest types and time length information thereof, and the like, and encodes each note segment.
The word making habit feature database comprises a large number of word making habit features, song making habit features and word and song matching habit features aiming at song styles. (first embodiment) if the current song style is rock, because the song wind turns less, the time course of each initial/final in the lyric fragment is shorter and the time course curve of the whole lyric is more stable; and because the breeze can elongate the tail sound at the tail of the sentence, the larger time course of the lyric fragment is always seen at the sentence break mark and the lyric fragment corresponds to at most two music score fragments. (second embodiment) if it is judged that the current song style is R & B, since there are many inflection of this style, the time interval of each note in the score fragment is short, the number of fragments of the whole score is large, and one lyric fragment often corresponds to a plurality of score fragments.
Referring again to fig. 5, in the editing unit 2323, it can edit score information and lyric information according to a music processing model matching a currently generated song style to generate score composition information and lyric composition information. The lyric creation information comprises a plurality of lyric fragment data configured with lyric fragment codes, lyric fragment starting and stopping time intervals, final/initial consonant identifiers, tone codes, sentence break identifiers and final/initial content. Since the lyric fragment data is already described in the lyric text decomposition module 221 of the song processing and singing interactive system, further description is omitted here.
Specifically, in the process of processing the input lyric information by the music processing model, the lyric information is first subjected to a phonetization process. And then, fragmenting the lyric information subjected to the pinyin processing according to different initials/finals, namely, the pinyin information of the Chinese character is 'zhong', generating two lyric fragments aiming at 'zh' and 'ong' after the fragmentation processing, further labeling the finals/initials of each fragment, and coding each fragment. Then, the fragmented lyric information is theoretically processed according to the word making and song making theoretical rule, the starting and ending time of each lyric fragment is calibrated, and a lyric phrase break mark is marked (for example, ensuring that the lyrics 'spring breeze blossoming fragrance', continuity in the music). Finally, the lyric information is stylized, and according to the song style information obtained by the unit 2323, the time course data in the lyric fragment is finally adjusted according to the word making habit, the music making habit and the common habit of word and music matching aiming at the song style, so that the finally generated lyric composition data conforms to the word making habit in the corresponding song style.
In addition, the score composition information includes a plurality of score segment data, and each score segment data is configured with a score segment code, a score segment start and stop time interval, a tone segment data and a measure mark. The music score segment data generates segment data by taking a note of one unit as a minimum unit, each segment needs to be coded according to the sequence of a natural sequence of arrangement positions of music score notes in the whole song, the start-stop time interval of each note is edited according to the corresponding time length of the note, such as a whole note, a half note, a quarter note, an eighth note or a sixteenth note, and if the current note segment is the last note of the section, an effective section identifier needs to be marked.
(one example)
In the song "little stars", the score corresponding to the first sentence of lyrics "flashing and shining crystal" is "Do So La So". This musical score is divided into seven score segment data, and the following information needs to be included in the fourth note segment "So": the music score segment is coded to be '4', the starting and stopping time interval of the music score segment is the time interval corresponding to the quarter note, the tone segment data is the audio data of the C tone 'So' and the bar mark is effective; the following information needs to be included in the fifth note segment "La": the score segment is coded as '5', the score segment start-stop time interval is the time interval corresponding to the quarter note, the tone segment data is the audio data of C tone 'La' and the bar mark is invalid.
Specifically, in the process of processing the input score information by the music processing model, first, the score information is subjected to note segmentation processing in units of minimum notes, each score segment is encoded, and tone segment data of each note is loaded. And then, carrying out theoretical processing on the fragmented music score information according to a word-making and music-making theoretical rule, calibrating the starting and stopping time interval of each music score fragment, and labeling the bar mark. Finally, the music score information is stylized, the characteristics of word making and music making habits aiming at the style are called according to the song style acquired from the song style identification unit 2321, and time schedule data in the lyric fragments are finally adjusted, so that the finally generated music score creation information has the word making habits, the music making habits and the common habits of word and music matching aiming at the current song style.
Referring again to fig. 5, next, the speech synthesis module 233 in the song data processing system 23 is explained. And a speech synthesis module 233 that performs speech synthesis of the musical score composition information and the lyric composition information acquired from the lyric composition module 232 based on the sound ray with the virtual idol, generates a target track file, and outputs the target track file. Wherein, this module still includes: a sound ray selection unit 2331 and a track synthesis unit 2322. It should be noted that, in this example, the speech synthesis module 233 is embedded in the mobile device 101, and can implement an offline speech synthesis function, but the application does not specifically limit the position of the module 233, and can be configured in the cloud server 103 to implement an online real-time speech synthesis function; the sound line selection unit 2331 may also be configured in the cloud server 103 while the track synthesis unit 2322 is built into the mobile device 101.
Fig. 8 is a flowchart illustrating an implementation of the speech synthesis module 233 in the song data processing system 23 based on virtual idol according to an embodiment of the present application. As shown in fig. 8, the functions and the workflow of each unit (refer to fig. 5) in the speech synthesis module 233 will be described below.
First, in the sound ray selecting unit 2331, virtual idol sound rays for different song styles are stored, and based on the current song style obtained from the vocabulary editing module 232, a virtual idol sound ray matching the song style is determined to obtain corresponding synthetic effect data for the sound ray.
In the track synthesis unit 2332, it can substitute the lyric composition information, score composition information, sound rays matching the style of the song, background sound data, and the like into a preset speech synthesis system to generate a target track file. Specifically, the unit 2332 first performs analysis preprocessing on the acquired information to further obtain information effective for each link in the speech synthesis system. Wherein the preprocessing work at least comprises: analyzing music score creation information, acquiring effective bar identification, note tone data and corresponding time schedule data in each music score segment, and loading the data to a voice synthesis system according to the coding sequence of each segment; analyzing lyric creation information, acquiring a final/initial identification, a sentence break identification, final/initial content and a corresponding time course of each lyric fragment, further distinguishing the initial time course and the final time course, and similarly, loading the initial/initial identification, the sentence break identification, the final/initial content and the corresponding time course to a voice synthesis system according to the coding sequence of each fragment. Then, the speech synthesis process is entered to complete the following synthesis operations: acquiring lyric information from the score scratching module 231, using the lyric information as input information of an original text of the voice synthesis system 23, and generating prosody information aiming at the original lyric information through analysis processing of a front end of the text; acquiring final/initial content and corresponding final/initial identification which need to be loaded according to the coding sequence of the lyric fragments, substituting the information and the rhythm information into a time course model, and after the initial time course and the final time course are respectively loaded and finished, obtaining an initial time course and a final time course with rhythm marks; processing the final/initial content and its corresponding final/initial identification, and the initial time and final time with rhythm label through the preset acoustic model, outputting the SP information and AP information for the current song, then combining with the bar identification, note tone data and its corresponding time data which need to be loaded in the order of music score segment coding for each segment, and the virtual idol sound line which is obtained from the sound line selection unit 2331 and matches with the current song style information, to perform final effect synthesis, thereby outputting the corresponding song creation sound; finally, the track combining operation of the composed song singing voice and the background sound data obtained from the score module 231 is completed to generate the final target track file.
It should be noted that, the preset acquisition form of the speech synthesis system is not specifically limited, and a machine learning method may be adopted to generate a corresponding time-course model, an acoustic model and an effect synthesis model, so as to construct and perform speech synthesis operation on the speech synthesis system.
In addition, the application also provides a song data processing method based on the virtual idol. Fig. 9 is a flowchart illustrating steps of a method for processing song data based on virtual idols according to an embodiment of the present application. As shown in fig. 9, the following description will be given of this processing method.
In step S910 (a music score gathering step), the music score gathering module 231 acquires the multimodal data, extracts a singing song audio from the multimodal data, converts the singing song audio into a song file, and generates music score information and lyric information corresponding to the song file through a word-spectrum separation operation. Specifically, the audio extraction unit 2311 may acquire multimodal data, extract a singing song audio from the multimodal data, and send the singing song audio to the singing sound separation unit 2312 or directly forward the acquired audio information as the singing song audio to the singing sound separation unit 2312. The singing voice separation unit 2312 separates the obtained singing song audio, voice singing data are reserved through a voice recognition technology, background voice data are separated (removed), and finally the voice singing data are sent to the word song decomposition unit 2313. After the voice singing data is acquired by the word and song decomposition unit 2313, the voice singing data is further decomposed, and the music score information and the lyric information are generated through editing and sorting.
After receiving the music score information and the lyric information sent by the music score editing module 231, the vocabulary editing module 232 proceeds to step S920 (a vocabulary editing step). Specifically, first, the song style recognition unit 2321 determines the current song style through the established music style recognition model according to the acquired music score information, and then sends the information related to the song style to the creation model selection unit 2322. The creation model selecting unit 2322 stores music processing models constructed for respective song genres, and the unit 2322 calls a music processing model matching the current genre according to the obtained song genres. Finally, the score information and the lyric information obtained from the lyric decomposition unit 2313 are edited by the editing unit 2323 according to a music processing model matching the currently generated song style information to generate corresponding score composition information and lyric composition information, proceeding to step S930 (speech synthesis step).
In step S930 (speech synthesis step), the sound ray selection unit 2331 in the speech synthesis module 233 can determine a virtual idol sound ray matching the current song style based on the song style to obtain corresponding synthesis effect data for the sound ray. The song synthesizing unit 2332 substitutes the lyric composition information, score composition information, sound ray matching the song style information, synthesis effect data, background sound data, etc. into a preset speech synthesis system to generate a target song file.
It should be noted that the above-mentioned song data processing method based on virtual idol can be stored as a computer program module on a computer readable medium in the cloud server 103, and when the program is executed by a processor, the function of self-creating and singing the audio information output by the interactive object can be completed.
It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or steps disclosed herein, but extend to equivalents thereof as would be understood by those skilled in the relevant art. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A song data processing method based on virtual idol is characterized by comprising the following steps:
a music score skimming step, namely acquiring multi-mode data, extracting singing song audio from the multi-mode data, converting the singing song audio into a song file to perform word-spectrum separation operation, and generating music score information and lyric information corresponding to the song file;
a music editing step, namely judging the current song style through a built music style identification model according to the acquired music score information, calling a music processing model matched with the current style according to the acquired current song style, editing the music score information and the lyric information according to the music processing model, and generating music score creation information and lyric creation information, wherein the music score creation information comprises music score fragment data, and the music score fragment data is configured with music score fragment codes, music score fragment starting and stopping time intervals, tone fragment data and bar identifiers; the lyric creation information comprises lyric fragment data, wherein the lyric fragment data is configured with lyric fragment codes, lyric fragment starting and stopping time intervals, final identifications, initial identifications, tone codes, sentence break identifications, final data and initial data;
a voice synthesis step, namely determining a sound ray matched with the current song style based on the current song style, substituting the music score creation information, the sound ray matched with the song lyric creation information and the song style and background music into a preset voice synthesis system for voice synthesis based on the sound ray of the virtual idol, generating a target song file and outputting the target song file;
a performance image generation step of receiving mouth shape animation data corresponding to the content of the lyric fragment data or a code representing the fragment information and information of a specific image of a randomly selected virtual idol, and generating a real-time performance image corresponding to the current lyric fragment content as current imaging control information;
and a synchronous output step, wherein the target song file, the imaging control information, the lyric fragment code aiming at the current fragment and the lyric fragment start-stop time interval are integrated, so that the output time of the imaging control information and the corresponding fragment in the target song file are simultaneously output.
2. The method according to claim 1, wherein in the step of scratching, further comprising:
separating the singing voice of the obtained singing song audio, removing background voice data and reserving voice singing data;
and further decomposing the human voice singing data, editing and sorting to generate the music score information and the lyric information.
3. The method of claim 1, wherein the music processing model is constructed by training a mapping relationship from basic data to target data by a machine learning method based on a word spelling library, a song note library, a composition database, and a combined multi-category composition habit feature database established for various song styles.
4. A virtual idol-based song data processing system, comprising the following modules:
the music score scratching module is used for acquiring multi-mode data, extracting singing song audio from the multi-mode data, converting the singing song audio into a song file to be operated by word-spectrum separation, and generating music score information and lyric information corresponding to the song file;
the music editing module judges the current song style through a built music style identification model according to the acquired music score information, calls a music processing model matched with the current style according to the acquired song style, edits the music score information and the lyric information according to the music processing model, and generates music score creation information and lyric creation information, wherein the music score creation information comprises music score segment data, and the music score segment data is configured with music score segment codes, music score segment starting and stopping schedules, tone segment data and bar identifiers; the lyric creation information comprises lyric fragment data, wherein the lyric fragment data is configured with lyric fragment codes, lyric fragment starting and stopping time intervals, final identifications, initial identifications, tone codes, sentence break identifications, final data and initial data;
the voice synthesis module is used for determining a sound ray matched with the song style based on the current song style, substituting the music score creation information, the sound ray matched with the song lyric creation information and the song style and background music into a preset voice synthesis system for voice synthesis based on the sound ray of the virtual idol, generating a target song file and outputting the target song file;
a performance image generation module which receives mouth shape animation data corresponding to the content of the lyric fragment data or a code representing the fragment information and information of a specific image of a randomly selected virtual idol, and generates a real-time performance image corresponding to the current lyric fragment content as current imaging control information;
and the synchronous output module integrates the target song file, the imaging control information, the lyric fragment code aiming at the current fragment and the lyric fragment start-stop time interval so that the output time of the imaging control information and the corresponding fragment in the target song file are simultaneously output.
5. The system of claim 4, wherein the score raking module further comprises:
the singing voice separation unit is used for carrying out singing voice separation on the obtained singing song audio, removing background voice data and reserving voice singing data;
and the word and song decomposition unit is used for further decomposing the human voice singing data, editing and sorting the human voice singing data and generating the music score information and the lyric information.
6. The system of claim 4, wherein the music processing model is constructed by training a mapping relationship from basic data to target data by a machine learning method based on a word spelling library, a song note library, a composition database, and a composition habit feature database established for various song styles.
7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the virtual idol-based song data processing method according to any one of claims 1 to 3.
8. A song processing and singing interactive system based on virtual idol is characterized in that the interactive system is provided with:
a cloud server having the computer-readable storage medium of claim 7;
the mobile device receives and plays the target song file output by the cloud server, generates imaging control information based on the target song file, and controls the target song file and the imaging control information to be output synchronously;
and the imaging equipment receives the imaging control information sent by the mobile equipment and displays a virtual idol based on the imaging control information, wherein the virtual idol displays a specific image characteristic matched with the imaging control information.
CN201810142242.2A 2018-02-11 2018-02-11 Song data processing method based on virtual idol and singing interaction system Active CN108492817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810142242.2A CN108492817B (en) 2018-02-11 2018-02-11 Song data processing method based on virtual idol and singing interaction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810142242.2A CN108492817B (en) 2018-02-11 2018-02-11 Song data processing method based on virtual idol and singing interaction system

Publications (2)

Publication Number Publication Date
CN108492817A CN108492817A (en) 2018-09-04
CN108492817B true CN108492817B (en) 2020-11-10

Family

ID=63340216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810142242.2A Active CN108492817B (en) 2018-02-11 2018-02-11 Song data processing method based on virtual idol and singing interaction system

Country Status (1)

Country Link
CN (1) CN108492817B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471951B (en) * 2018-09-19 2023-06-02 平安科技(深圳)有限公司 Lyric generating method, device, equipment and storage medium based on neural network
CN109215626A (en) * 2018-10-26 2019-01-15 广东电网有限责任公司 A method of it wrirtes music from action word
CN109829482B (en) * 2019-01-04 2023-10-27 平安科技(深圳)有限公司 Song training data processing method and device and computer readable storage medium
CN109817191B (en) * 2019-01-04 2023-06-06 平安科技(深圳)有限公司 Tremolo modeling method, device, computer equipment and storage medium
CN110136678B (en) * 2019-04-26 2022-06-03 北京奇艺世纪科技有限公司 Music editing method and device and electronic equipment
CN110570876B (en) * 2019-07-30 2024-03-15 平安科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
CN112420008A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Method and device for recording songs, electronic equipment and storage medium
CN112417201A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Audio information pushing method and system, electronic equipment and computer readable medium
CN111326131B (en) * 2020-03-03 2023-06-02 北京香侬慧语科技有限责任公司 Song conversion method, device, equipment and medium
CN111445897B (en) * 2020-03-23 2023-04-14 北京字节跳动网络技术有限公司 Song generation method and device, readable medium and electronic equipment
CN111798821B (en) * 2020-06-29 2022-06-14 北京字节跳动网络技术有限公司 Sound conversion method, device, readable storage medium and electronic equipment
CN113409747B (en) * 2021-05-28 2023-08-29 北京达佳互联信息技术有限公司 Song generation method and device, electronic equipment and storage medium
CN113539217A (en) * 2021-06-29 2021-10-22 广州酷狗计算机科技有限公司 Lyric creation navigation method and device, equipment, medium and product thereof
CN113808555A (en) * 2021-09-17 2021-12-17 广州酷狗计算机科技有限公司 Song synthesis method and device, equipment, medium and product thereof
CN113836344A (en) * 2021-09-30 2021-12-24 广州艾美网络科技有限公司 Personalized song file generation method and device and music singing equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101313477A (en) * 2005-12-21 2008-11-26 Lg电子株式会社 Music generating device and operating method thereof
CN101414322A (en) * 2007-10-16 2009-04-22 盛趣信息技术(上海)有限公司 Exhibition method and system for virtual role
US8687005B2 (en) * 2007-06-26 2014-04-01 Samsung Electronics Co., Ltd. Apparatus and method for synchronizing and sharing virtual character
CN103839559A (en) * 2012-11-20 2014-06-04 华为技术有限公司 Audio file manufacturing method and terminal equipment
CN105740394A (en) * 2016-01-27 2016-07-06 广州酷狗计算机科技有限公司 Music generation method, terminal, and server
CN106448630A (en) * 2016-09-09 2017-02-22 腾讯科技(深圳)有限公司 Method and device for generating digital music file of song
CN106652984A (en) * 2016-10-11 2017-05-10 张文铂 Automatic song creation method via computer
CN106898341A (en) * 2017-01-04 2017-06-27 清华大学 A kind of individualized music generation method and device based on common semantic space

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7977562B2 (en) * 2008-06-20 2011-07-12 Microsoft Corporation Synthesized singing voice waveform generator
KR101274961B1 (en) * 2011-04-28 2013-06-13 (주)티젠스 music contents production system using client device.

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101313477A (en) * 2005-12-21 2008-11-26 Lg电子株式会社 Music generating device and operating method thereof
US8687005B2 (en) * 2007-06-26 2014-04-01 Samsung Electronics Co., Ltd. Apparatus and method for synchronizing and sharing virtual character
CN101414322A (en) * 2007-10-16 2009-04-22 盛趣信息技术(上海)有限公司 Exhibition method and system for virtual role
CN103839559A (en) * 2012-11-20 2014-06-04 华为技术有限公司 Audio file manufacturing method and terminal equipment
CN105740394A (en) * 2016-01-27 2016-07-06 广州酷狗计算机科技有限公司 Music generation method, terminal, and server
CN106448630A (en) * 2016-09-09 2017-02-22 腾讯科技(深圳)有限公司 Method and device for generating digital music file of song
CN106652984A (en) * 2016-10-11 2017-05-10 张文铂 Automatic song creation method via computer
CN106898341A (en) * 2017-01-04 2017-06-27 清华大学 A kind of individualized music generation method and device based on common semantic space

Also Published As

Publication number Publication date
CN108492817A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
CN108492817B (en) Song data processing method based on virtual idol and singing interaction system
CN108962217B (en) Speech synthesis method and related equipment
CN108806655B (en) Automatic generation of songs
CN108806656B (en) Automatic generation of songs
CN103218842B (en) A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation
CN106653052A (en) Virtual human face animation generation method and device
CN102568023A (en) Real-time animation for an expressive avatar
CN111145777A (en) Virtual image display method and device, electronic equipment and storage medium
JP2003530654A (en) Animating characters
CN109326280B (en) Singing synthesis method and device and electronic equipment
JP2022518721A (en) Real-time generation of utterance animation
CN105390133A (en) Tibetan TTVS system realization method
CN116863038A (en) Method for generating digital human voice and facial animation by text
CN112184859B (en) End-to-end virtual object animation generation method and device, storage medium and terminal
KR101089184B1 (en) Method and system for providing a speech and expression of emotion in 3D charactor
WO2022242706A1 (en) Multimodal based reactive response generation
CN113538636B (en) Virtual object control method and device, electronic equipment and medium
CN112750187A (en) Animation generation method, device and equipment and computer readable storage medium
CN115497448A (en) Method and device for synthesizing voice animation, electronic equipment and storage medium
US20150187112A1 (en) System and Method for Automatic Generation of Animation
CN106292424A (en) Music data processing method and device for anthropomorphic robot
CN116958342A (en) Method for generating actions of virtual image, method and device for constructing action library
CN108922505B (en) Information processing method and device
JP6222465B2 (en) Animation generating apparatus, animation generating method and program
CN116129868A (en) Method and system for generating structured photo

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230919

Address after: 100000 6198, Floor 6, Building 4, Yard 49, Badachu Road, Shijingshan District, Beijing

Patentee after: Beijing Virtual Dynamic Technology Co.,Ltd.

Address before: 100000 Fourth Floor Ivy League Youth Venture Studio No. 193, Yuquan Building, No. 3 Shijingshan Road, Shijingshan District, Beijing

Patentee before: Beijing Guangnian Infinite Technology Co.,Ltd.