CN109147757A - Song synthetic method and device - Google Patents

Song synthetic method and device Download PDF

Info

Publication number
CN109147757A
CN109147757A CN201811056146.2A CN201811056146A CN109147757A CN 109147757 A CN109147757 A CN 109147757A CN 201811056146 A CN201811056146 A CN 201811056146A CN 109147757 A CN109147757 A CN 109147757A
Authority
CN
China
Prior art keywords
word
song
user
adjusted
fundamental frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811056146.2A
Other languages
Chinese (zh)
Other versions
CN109147757B (en
Inventor
劳振锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201811056146.2A priority Critical patent/CN109147757B/en
Publication of CN109147757A publication Critical patent/CN109147757A/en
Application granted granted Critical
Publication of CN109147757B publication Critical patent/CN109147757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser

Abstract

The invention discloses a kind of song synthetic method and devices, belong to speech synthesis technique field.The described method includes: extracting the fundamental frequency of each word, envelope and consonant information in the user speech when getting user speech;According to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;Synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, obtains Composite tone;According to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, the user's song synthesized.The present invention synthesizes user's song using the envelope and auxiliary information of user's script, can retain the tone color of user's script, user's song of synthesis and the sound of user are more close.

Description

Song synthetic method and device
Technical field
The present invention relates to speech synthesis technique field more particularly to a kind of song synthetic method and devices.
Background technique
With the development of speech synthesis technique, speech synthesis technique is gradually applied in daily life, for example, having A little users sing tone-deaf, it is desirable to can read out the lyrics, then generate the song of oneself, then can use speech synthesis technique To realize.
Currently, the relevant technologies generally first identify the voice that user speaks, correspondence is found out in speech database for speech synthesis Then intrinsic song extracts the tone color of the song, then using the transformation model pre-established, the tone color of the song is become to use The tone color at family, the user's song synthesized.Wherein, which is used for song intrinsic in speech database for speech synthesis Tone color is converted to the tone color of user.
Above-mentioned technology synthesizes user's song using tone color intrinsic in speech database for speech synthesis, can not retain user originally Tone color, user's song of synthesis and the sound of user can have any different.
Summary of the invention
The embodiment of the invention provides a kind of song synthetic method and devices, can solve user's song of the relevant technologies synthesis The larger problem of the audible difference of sound and user.The technical solution is as follows:
In a first aspect, providing a kind of song synthetic method, comprising:
When getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted;
According to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, it is described The pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in song;
Synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, is obtained Composite tone;
According to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, is obtained User's song of synthesis.
In a kind of possible implementation, the pitch frequencies according to word each in song, in the user speech The fundamental frequency of each word is adjusted, comprising:
According to the pitch frequencies of word each in song, the fundamental frequency of word each in the user speech is adjusted to the song The pitch frequencies of middle corresponding word.
In a kind of possible implementation, the pitch frequencies according to word each in the song, by user's language The fundamental frequency of each word is adjusted to correspond to the pitch frequencies of word in the song in sound, comprising:
For each word in the song, when the word has multiple pitch frequencies, according to the multiple pitch frequencies Sequence and ratio, the fundamental frequency of word described in the user speech is adjusted.
It is described to extract the fundamental frequency of each word, envelope and consonant in the user speech in a kind of possible implementation Information, comprising:
By feature extraction algorithm, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted, each Word extracts the fundamental frequency of preset quantity, and the preset quantity is determined according to frequency is extracted.
In a kind of possible implementation, the pitch frequencies according to word each in song, in the user speech The fundamental frequency of each word is adjusted, comprising:
For each word in the user speech, the fundamental frequency of the preset quantity of the word is adjusted to institute in the song State the pitch frequencies of word.
In a kind of possible implementation, the duration according to word each in the song, in the Composite tone The duration of each word is adjusted, the user's song synthesized, comprising:
According to the duration of word each in the song, the duration of word each in the Composite tone is adjusted to the song The duration of middle corresponding word, the user's song synthesized.
Second aspect provides a kind of song synthesizer, comprising:
Extraction module, for when getting user speech, extracting the fundamental frequency of each word, envelope in the user speech And consonant information;
Module is adjusted, for the pitch frequencies according to word each in song, to the fundamental frequency of each word in the user speech It is adjusted, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;
Synthesis module, for each word in fundamental frequency adjusted, the user speech envelope and consonant information carry out Synthesis processing, obtains Composite tone;
The adjustment module is also used to the duration according to word each in the song, to each word in the Composite tone Duration is adjusted, the user's song synthesized.
In a kind of possible implementation, the adjustment module is used for the pitch frequencies according to word each in song, by institute The fundamental frequency for stating each word in user speech is adjusted to correspond to the pitch frequencies of word in the song.
In a kind of possible implementation, the adjustment module is used for for each word in the song, when the word When having multiple pitch frequencies, according to the sequence and ratio of the multiple pitch frequencies, to the base of word described in the user speech Frequency is adjusted.
In a kind of possible implementation, the extraction module is used to extract the user by feature extraction algorithm The fundamental frequency of each word, envelope and consonant information in voice, each word extract the fundamental frequency of preset quantity, the preset quantity according to Frequency is extracted to determine.
In a kind of possible implementation, the adjustment module is used for for each word in the user speech, by institute The fundamental frequency for stating the preset quantity of word is adjusted to the pitch frequencies of word described in the song.
In a kind of possible implementation, the adjustment module is used for the duration according to word each in the song, by institute The duration for stating each word in Composite tone is adjusted to correspond to the duration of word, the user's song synthesized in the song.
The third aspect provides a kind of computer equipment, including processor and memory;The memory, for storing Computer program;The processor realizes that first aspect is any for executing the computer program stored on the memory Method and step described in kind implementation.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium Computer program realizes the step of method described in any implementation of first aspect when the computer program is executed by processor Suddenly.
Technical solution bring beneficial effect provided in an embodiment of the present invention includes at least:
It will be adjusted after being adjusted to the fundamental frequency for each word that user says by the pitch frequencies according to word each in song The envelope and auxiliary information Composite tone of fundamental frequency, user's script after whole say user further according to the duration of word each in song The duration of each word be adjusted, to synthesize user's song.Above scheme uses the envelope and auxiliary information of user's script User's song is synthesized, the tone color of user's script can be retained, user's song of synthesis and the sound of user are more close.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention;
Fig. 3 is corresponding schematic diagram between a kind of pitch provided in an embodiment of the present invention and frequency;
Fig. 4 is a kind of structural schematic diagram of song synthesizer provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment 500 provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Fig. 1 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention.Referring to Fig. 1, this method comprises:
101, when getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted.
102, according to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, it should The pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in song.
103, synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, obtained To Composite tone.
104, according to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, is obtained User's song of synthesis.
Each of method provided in an embodiment of the present invention, by the pitch frequencies according to word each in song, user is said After the fundamental frequency of word is adjusted, by fundamental frequency adjusted, the envelope of user's script and auxiliary information Composite tone, further according to song In each word duration, the duration for each word that user says is adjusted, to synthesize user's song.Above scheme is using use The envelope and auxiliary information of family script synthesize user's song, can retain the tone color of user's script, the user's song and use of synthesis The sound at family is more close.
In a kind of possible implementation, the pitch frequencies according to word each in song, to each in the user speech The fundamental frequency of word is adjusted, comprising:
According to the pitch frequencies of word each in song, the fundamental frequency of word each in the user speech is adjusted to right in the song Answer the pitch frequencies of word.
In a kind of possible implementation, which will be every in the user speech The fundamental frequency of a word is adjusted to correspond to the pitch frequencies of word in the song, comprising:
For each word in the song, when the word has multiple pitch frequencies, according to the sequence of multiple pitch frequencies And ratio, the fundamental frequency of the word in the user speech is adjusted.
In a kind of possible implementation, this extracts the fundamental frequency of each word, envelope and consonant information in the user speech, Include:
By feature extraction algorithm, the fundamental frequency of each word, envelope and consonant information in the user speech, each word are extracted The fundamental frequency of preset quantity is extracted, which determines according to frequency is extracted.
In a kind of possible implementation, the pitch frequencies according to word each in song, to each in the user speech The fundamental frequency of word is adjusted, comprising:
For each word in the user speech, the fundamental frequency of the preset quantity of the word is adjusted to the sound of the word in the song High-frequency.
In a kind of possible implementation, the duration according to word each in the song, to each word in the Composite tone Duration be adjusted, the user's song synthesized, comprising:
According to the duration of word each in the song, the duration of word each in the Composite tone is adjusted in the song corresponding The duration of word, the user's song synthesized.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.
Fig. 2 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention.This method is executed by electronic equipment, Referring to fig. 2, this method comprises:
201, when getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted.
In the embodiment of the present invention, user can input user speech to electronic equipment, for example, can install on electronic equipment There is specified application, which has the function of song synthesis.When user wants the song of synthesis oneself, phase can be passed through The voice input interface that triggering electronic equipment shows the specified application should be operated, during showing the voice input interface, User can speak against electronic equipment, such as read out the lyrics of certain song, electronic equipment is allowed to collect user speech.
Further, electronic equipment can extract the feature of user speech, fundamental frequency, packet including word each in user speech Network and consonant information.In a kind of possible implementation, electronic equipment can extract user's language by feature extraction algorithm The fundamental frequency of each word, envelope and consonant information in sound, each word extract the fundamental frequency of preset quantity, and the preset quantity is according to extraction Frequency determines.
For example, the feature extraction algorithm that electronic equipment can include by world tool, carries out feature to user speech and mentions It takes, this feature extraction algorithm may include fundamental frequency extraction algorithm, envelope extraction algorithm and consonant extraction algorithm, be extracted by every kind Algorithm extracts user speech, available corresponding feature, as extracted user speech by fundamental frequency extraction algorithm Fundamental frequency information extracts the envelope information of user speech by envelope extraction algorithm, extracts user by consonant extraction algorithm The consonant information of voice.
202, according to the pitch frequencies of word each in song, the fundamental frequency of word each in the user speech is adjusted to the song The pitch frequencies of middle corresponding word.
Wherein, in the song each word pitch frequencies be the song in each word the corresponding frequency of pitch.Referring to figure 3, Fig. 3 be corresponding schematic diagram between a kind of pitch provided in an embodiment of the present invention and frequency, and the pitch of each word can in song Be converted to corresponding frequency with the rule according to twelve-tone equal temperament, as in Fig. 3 first row pitch and the 4th column frequency exist pair It should be related to, the pitch of word each in song can be converted to corresponding frequency according to the corresponding relationship by electronic equipment, thus To the pitch frequencies of each word.
In the embodiment of the present invention, for each word in the user speech, electronic equipment can be adjusted the fundamental frequency of the word For the pitch frequencies for corresponding to word in the song.Wherein, for each word in user speech, the correspondence word in song is with the word Same word.Certainly, corresponding word can also be only that pronunciation is identical as the pronunciation of the word, and it is not limited in the embodiment of the present invention.
The fundamental frequency that preset quantity is extracted for word each in step 201, for each word in the user speech, electronics The fundamental frequency of the preset quantity of the word can be adjusted to correspond to the pitch frequencies of word in the song by equipment.
In a kind of possible implementation, for each word in the song, when the word has multiple pitch frequencies, according to The sequence and ratio of multiple pitch frequencies are adjusted the fundamental frequency of the word in the user speech.It is with the word A in song Example, it is assumed that word A has frequency 1, frequency 2 and frequency 3 these three frequencies, and sequence is also frequency 1- > frequency 2- > frequency 3, and frequency 1 exists Front accounts for 50%, and frequency 2 accounts for 30%, frequency 3 later, accounts for 20% in centre.Then electronic equipment can be by user Preceding 50% frequency of word A is adjusted to frequency 1 in voice, and intermediate 30% frequency is adjusted to frequency 2, behind 20% frequency be adjusted to Frequency 3.
It should be noted that the step 202 is according to the pitch frequencies of word each in song, to each in the user speech A kind of possible implementation that the fundamental frequency of word is adjusted.Since fundamental frequency determines pitch, by by word each in user speech Fundamental frequency is adjusted to the pitch frequencies of word in song, so that corresponding to the pitch phase of word in user speech in the pitch of each word and song Together.
203, synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, obtained To Composite tone.
In the embodiment of the present invention, after electronic equipment is adjusted the fundamental frequency of word each in user speech, it can will adjust The envelope and consonant information Composite tone (or audio) of fundamental frequency and user's script afterwards.
The step of in a kind of possible implementation, electronic equipment can use world tool, execute synthesis processing, example Such as, which may include Speech Synthesis Algorithm, and correspondingly, computer equipment can be by the Speech Synthesis Algorithm, will The envelope and consonant information Composite tone of word in fundamental frequency adjusted and user speech.Due to using the envelope of user's script and auxiliary Supplementary information Composite tone, and envelope decides the tone color of user, therefore the mode of this Composite tone can retain user originally Tone color.
204, according to the duration of word each in the song, the duration of word each in the Composite tone is adjusted in the song The duration of corresponding word, the user's song synthesized.
In the embodiment of the present invention, Composite tone obtained in above-mentioned steps 203 is only the sound of word in the pitch and song of word Height matches, in order to obtain user's song, and for each word in the Composite tone, electronic equipment can be by speed change algorithm, such as The duration of the word, is adjusted to the duration of corresponding word in the song by sound touch speed change algorithm.Electronic equipment adjustment synthesis The user's song for having arrived synthesis in audio after the duration of word, that is to say, user's one's voice in speech is become song.
It should be noted that being carried out according to the duration of word each in the song to the duration of each word in the Composite tone Adjustment, a kind of possible implementation of the user's song synthesized.By the way that the duration of word each in Composite tone is adjusted to The duration of word in song, so that Composite tone becomes user's song.Due to using envelope and the auxiliary information synthesis of user's script User's song, can retain the tone color of user's script, and user's song of synthesis and the sound of user are more close.
Above-mentioned technical proposal first extracts fundamental frequency, envelope and the consonant information of each word that user says, and according to the every of song The corresponding frequencies for each word that user says are become the pitch frequencies of the word of song by the pitch frequencies of a word, then and originally Fundamental frequency information and envelope synthesized, and the duration of each word according to song carries out speed change to each word that user says.Cause Envelope and consonant information for the program are all user's scripts, the sound of tone color and user more closely, synthesis song Also more natural.
Each of method provided in an embodiment of the present invention, by the pitch frequencies according to word each in song, user is said After the fundamental frequency of word is adjusted, by fundamental frequency adjusted, the envelope of user's script and auxiliary information Composite tone, further according to song In each word duration, the duration for each word that user says is adjusted, to synthesize user's song.Above scheme is using use The envelope and auxiliary information of family script synthesize user's song, can retain the tone color of user's script, the user's song and use of synthesis The sound at family is more close.
Fig. 4 is a kind of structural schematic diagram of song synthesizer provided in an embodiment of the present invention.Referring to Fig. 4, the device packet It includes:
Extraction module 401, for when getting user speech, extracting the fundamental frequency of each word in the user speech, packet Network and consonant information;
Module 402 is adjusted, for the pitch frequencies according to word each in song, to the fundamental frequency of each word in the user speech It is adjusted, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;
Synthesis module 403, for each word in fundamental frequency adjusted, the user speech envelope and consonant information carry out Synthesis processing, obtains Composite tone;
The adjustment module 402 is also used to the duration according to word each in the song, in the Composite tone each word when Length is adjusted, the user's song synthesized.
In a kind of possible implementation, which is used for the pitch frequencies according to word each in song, by this The fundamental frequency of each word is adjusted to correspond to the pitch frequencies of word in the song in user speech.
In a kind of possible implementation, which is used for for each word in the song, when the word have it is more When a pitch frequencies, according to the sequence and ratio of multiple pitch frequencies, the fundamental frequency of the word in the user speech is adjusted.
In a kind of possible implementation, which is used to extract user's language by feature extraction algorithm The fundamental frequency of each word, envelope and consonant information in sound, each word extract the fundamental frequency of preset quantity, and the preset quantity is according to extraction Frequency determines.
In a kind of possible implementation, which is used for for each word in the user speech, by the word The fundamental frequency of preset quantity be adjusted to the pitch frequencies of the word in the song.
In a kind of possible implementation, which is used for the duration according to word each in the song, by the conjunction It is adjusted to correspond to the duration of word in the song at the duration of word each in audio, the user's song synthesized.
In the embodiment of the present invention, by the pitch frequencies according to word each in song, to the fundamental frequency for each word that user says After being adjusted, by fundamental frequency adjusted, the envelope of user's script and auxiliary information Composite tone, further according to word each in song Duration, the duration for each word that user says is adjusted, to synthesize user's song.Above scheme is using user's script Envelope and auxiliary information synthesize user's song, can retain the tone color of user's script, user's song of synthesis and the sound of user It is more close.
It should be understood that song synthesizer provided by the above embodiment song synthesize when, only with above-mentioned each function The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds Block is completed, i.e., the internal structure of equipment is divided into different functional modules, to complete all or part of function described above Energy.In addition, song synthesizer provided by the above embodiment and song synthetic method embodiment belong to same design, it is specific real Existing process is detailed in embodiment of the method, and which is not described herein again.
Fig. 5 is the structural schematic diagram of a kind of electronic equipment 500 provided in an embodiment of the present invention.The electronic equipment 500 can be with Be: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, Dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Electronic equipment 500 may be used also Other titles such as user equipment, portable electronic device, electronic equipment on knee, table type electronic equipment can be referred to as.
In general, electronic equipment 500 includes: processor 501 and memory 502.
Processor 501 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 501 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 501 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 501 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 501 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.
Memory 502 may include one or more computer readable storage mediums, which can To be non-transient.Memory 502 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 502 can Storage medium is read for storing at least one instruction, at least one instruction performed by processor 501 for realizing this Shen Please in embodiment of the method provide song synthetic method.
In some embodiments, electronic equipment 500 is also optional includes: peripheral device interface 503 and at least one periphery Equipment.It can be connected by bus or signal wire between processor 501, memory 502 and peripheral device interface 503.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 503.Specifically, peripheral equipment includes: to penetrate At least one of frequency circuit 504, display screen 505, camera 506, voicefrequency circuit 507, positioning component 508 and power supply 509.
Peripheral device interface 503 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 501 and memory 502.In some embodiments, processor 501, memory 502 and peripheral equipment Interface 503 is integrated on same chip or circuit board;In some other embodiments, processor 501, memory 502 and outer Any one or two in peripheral equipment interface 503 can realize on individual chip or circuit board, the present embodiment to this not It is limited.
Radio circuit 504 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 504 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 504 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 504 wraps It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 504 can by least one wireless communication protocol come with other electronic equipments It is communicated.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), nothing Line local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 504 Can also include NFC (Near Field Communication, wireless near field communication) related circuit, the application to this not It is limited.
Display screen 505 is for showing UI (UserInterface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 505 is touch display screen, display screen 505 also there is acquisition to show The ability of the touch signal on the surface or surface of screen 505.The touch signal can be used as control signal and be input to processor 501 are handled.At this point, display screen 505 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or Soft keyboard.In some embodiments, display screen 505 can be one, and the front panel of electronic equipment 500 is arranged;In other realities It applies in example, display screen 505 can be at least two, be separately positioned on the different surfaces of electronic equipment 500 or in foldover design;? In still other embodiments, display screen 505 can be flexible display screen, is arranged on the curved surface of electronic equipment 500 or folds On face.Even, display screen 505 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 505 can be adopted With LCD (Liquid Crystal Display, liquid crystal display), (Organic Light-Emitting Diode, has OLED Machine light emitting diode) etc. materials preparation.
CCD camera assembly 506 is for acquiring image or video.Optionally, CCD camera assembly 506 include front camera and Rear camera.In general, the front panel of electronic equipment is arranged in front camera, the back of electronic equipment is arranged in rear camera Face.In some embodiments, rear camera at least two, be respectively main camera, depth of field camera, wide-angle camera, Any one in focal length camera, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera shooting Head and wide-angle camera fusion realize pan-shot and VR (Virtual Reality, virtual reality) shooting function or its It merges shooting function.In some embodiments, CCD camera assembly 506 can also include flash lamp.Flash lamp can be monochrome Warm flash lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can For the light compensation under different-colour.
Voicefrequency circuit 507 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will Sound wave, which is converted to electric signal and is input to processor 501, to be handled, or is input to radio circuit 504 to realize voice communication. For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of electronic equipment 500 to be multiple. Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 501 or radio frequency will to be come from The electric signal of circuit 504 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 507 can be with Including earphone jack.
Positioning component 508 is used for the current geographic position of Positioning Electronic Devices 500, to realize navigation or LBS (Location Based Service, location based service).Positioning component 508 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.
Power supply 509 is used to be powered for the various components in electronic equipment 500.Power supply 509 can be alternating current, direct current Electricity, disposable battery or rechargeable battery.When power supply 509 includes rechargeable battery, which can support wired Charging or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, electronic equipment 500 further includes having one or more sensors 510.The one or more passes Sensor 510 includes but is not limited to: acceleration transducer 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515 and proximity sensor 516.
Acceleration transducer 511 can detecte the acceleration in three reference axis of the coordinate system established with electronic equipment 500 Spend size.For example, acceleration transducer 511 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 501 acceleration of gravity signals that can be acquired according to acceleration transducer 511, control touch display screen 505 with transverse views or Longitudinal view carries out the display of user interface.Acceleration transducer 511 can be also used for game or the exercise data of user Acquisition.
Gyro sensor 512 can detecte body direction and the rotational angle of electronic equipment 500, gyro sensor 512 can cooperate with acquisition user to act the 3D of electronic equipment 500 with acceleration transducer 511.Processor 501 is according to gyroscope The data that sensor 512 acquires, may be implemented following function: action induction (for example changed according to the tilt operation of user UI), image stabilization, game control and inertial navigation when shooting.
The lower layer of side frame and/or touch display screen 505 in electronic equipment 500 can be set in pressure sensor 513.When When the side frame of electronic equipment 500 is arranged in pressure sensor 513, user can detecte to the gripping signal of electronic equipment 500, Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 513 acquires by processor 501.Work as pressure sensing When the lower layer of touch display screen 505 is arranged in device 513, grasped by processor 501 according to pressure of the user to touch display screen 505 Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control, At least one of icon control, menu control.
Fingerprint sensor 514 is used to acquire the fingerprint of user, collected according to fingerprint sensor 514 by processor 501 The identity of fingerprint recognition user, alternatively, by fingerprint sensor 514 according to the identity of collected fingerprint recognition user.It is identifying When the identity of user is trusted identity out, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 501 Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Electronics can be set in fingerprint sensor 514 Front, the back side or the side of equipment 500.When being provided with physical button or manufacturer Logo on electronic equipment 500, fingerprint sensor 514 can integrate with physical button or manufacturer Logo.
Optical sensor 515 is for acquiring ambient light intensity.In one embodiment, processor 501 can be according to optics The ambient light intensity that sensor 515 acquires controls the display brightness of touch display screen 505.Specifically, when ambient light intensity is higher When, the display brightness of touch display screen 505 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 505 is bright Degree.In another embodiment, the ambient light intensity that processor 501 can also be acquired according to optical sensor 515, dynamic adjust The acquisition parameters of CCD camera assembly 506.
Proximity sensor 516, also referred to as range sensor are generally arranged at the front panel of electronic equipment 500.Proximity sensor 516 for acquiring the distance between the front of user Yu electronic equipment 500.In one embodiment, when proximity sensor 516 is examined When measuring the distance between the front of user and electronic equipment 500 and gradually becoming smaller, touch display screen 505 is controlled by processor 501 Breath screen state is switched to from bright screen state;When proximity sensor 516 detect between user and the front of electronic equipment 500 away from When from becoming larger, touch display screen 505 being controlled by processor 501 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that structure shown in Fig. 5 does not constitute the restriction to electronic equipment 500, it can To include perhaps combining certain components than illustrating more or fewer components or being arranged using different components.
In the exemplary embodiment, a kind of computer readable storage medium for being stored with computer program, example are additionally provided It is such as stored with the memory of computer program, above-mentioned computer program realizes the song in above-described embodiment when being executed by processor Synthetic method.For example, the computer readable storage medium can be read-only memory (Read-Only Memory, ROM), deposit at random Access to memory (Random Access Memory, RAM), CD-ROM (Compact Disc Read-Only Memory, CD- ROM), tape, floppy disk and optical data storage devices etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, which can store in a kind of computer-readable storage In medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (14)

1. a kind of song synthetic method, which is characterized in that the described method includes:
When getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted;
According to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, the song In each word pitch frequencies be the song in each word the corresponding frequency of pitch;
Synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, is synthesized Audio;
According to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, is synthesized User's song.
2. the method according to claim 1, wherein the pitch frequencies according to word each in song, to institute The fundamental frequency for stating each word in user speech is adjusted, comprising:
According to the pitch frequencies of word each in the song, the fundamental frequency of word each in the user speech is adjusted to the song The pitch frequencies of middle corresponding word.
3. according to the method described in claim 2, it is characterized in that, the pitch frequencies according to word each in the song, The fundamental frequency of word each in the user speech is adjusted to correspond to the pitch frequencies of word in the song, comprising:
For each word in the song, when the word has multiple pitch frequencies, according to the row of the multiple pitch frequencies Sequence and ratio are adjusted the fundamental frequency of word described in the user speech.
4. the method according to claim 1, wherein the base for extracting each word in the user speech Frequently, envelope and consonant information, comprising:
By feature extraction algorithm, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted, each word mentions The fundamental frequency of preset quantity is taken out, the preset quantity is determined according to frequency is extracted.
5. according to the method described in claim 4, it is characterized in that, the pitch frequencies according to word each in song, to institute The fundamental frequency for stating each word in user speech is adjusted, comprising:
For each word in the user speech, the fundamental frequency of the preset quantity of the word is adjusted to word described in the song Pitch frequencies.
6. the method according to claim 1, wherein the duration according to word each in the song, to institute The duration for stating each word in Composite tone is adjusted, the user's song synthesized, comprising:
According to the duration of word each in the song, the duration of word each in the Composite tone is adjusted to right in the song The duration for answering word, the user's song synthesized.
7. a kind of song synthesizer, which is characterized in that described device includes:
Extraction module, for when getting user speech, extracting the fundamental frequency of each word in the user speech, envelope and auxiliary Message breath;
Module is adjusted, for the pitch frequencies according to word each in song, the fundamental frequency of each word in the user speech is carried out It adjusts, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;
Synthesis module, for each word in fundamental frequency adjusted, the user speech envelope and consonant information synthesize Processing, obtains Composite tone;
The adjustment module is also used to the duration according to word each in the song, to the duration of each word in the Composite tone It is adjusted, the user's song synthesized.
8. device according to claim 7, which is characterized in that the adjustment module is used for according to word each in the song Pitch frequencies, the fundamental frequency of word each in the user speech is adjusted to correspond to the pitch frequencies of word in the song.
9. according to the method described in claim 8, it is characterized in that, the adjustment module is used for for each of described song Word, when the word has multiple pitch frequencies, according to the sequence and ratio of the multiple pitch frequencies, in the user speech The fundamental frequency of the word is adjusted.
10. device according to claim 7, which is characterized in that the extraction module is used to mention by feature extraction algorithm The fundamental frequency of each word, envelope and consonant information in the user speech are taken out, each word extracts the fundamental frequency of preset quantity, described Preset quantity is determined according to frequency is extracted.
11. device according to claim 10, which is characterized in that the adjustment module is used for in the user speech Each word, the fundamental frequency of the preset quantity of the word is adjusted to the pitch frequencies of word described in the song.
12. device according to claim 7, which is characterized in that the adjustment module is used for according to each in the song The duration of word each in the Composite tone is adjusted to correspond to the duration of word in the song, be synthesized by the duration of word User's song.
13. a kind of electronic equipment, which is characterized in that including processor and memory;The memory, for storing computer journey Sequence;The processor realizes any one of claim 1-6 institute for executing the computer program stored on the memory The method and step stated.
14. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program, the computer program realize method and step described in any one of claims 1-6 when being executed by processor.
CN201811056146.2A 2018-09-11 2018-09-11 Singing voice synthesis method and device Active CN109147757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811056146.2A CN109147757B (en) 2018-09-11 2018-09-11 Singing voice synthesis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811056146.2A CN109147757B (en) 2018-09-11 2018-09-11 Singing voice synthesis method and device

Publications (2)

Publication Number Publication Date
CN109147757A true CN109147757A (en) 2019-01-04
CN109147757B CN109147757B (en) 2021-07-02

Family

ID=64824403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811056146.2A Active CN109147757B (en) 2018-09-11 2018-09-11 Singing voice synthesis method and device

Country Status (1)

Country Link
CN (1) CN109147757B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148394A (en) * 2019-04-26 2019-08-20 平安科技(深圳)有限公司 Song synthetic method, device, computer equipment and storage medium
CN110600034A (en) * 2019-09-12 2019-12-20 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN111091807A (en) * 2019-12-26 2020-05-01 广州酷狗计算机科技有限公司 Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN111402842A (en) * 2020-03-20 2020-07-10 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio
CN111681637A (en) * 2020-04-28 2020-09-18 平安科技(深圳)有限公司 Song synthesis method, device, equipment and storage medium
CN112417201A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Audio information pushing method and system, electronic equipment and computer readable medium
CN112951198A (en) * 2019-11-22 2021-06-11 微软技术许可有限责任公司 Singing voice synthesis

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1533120A (en) * 2003-03-21 2004-09-29 ���Ͽع����޹�˾ Accdio frequency device
CN1581290A (en) * 2003-08-06 2005-02-16 雅马哈株式会社 Singing voice synthesizing method
CN1703735A (en) * 2002-07-29 2005-11-30 埃森图斯有限责任公司 System and method for musical sonification of data
US6992245B2 (en) * 2002-02-27 2006-01-31 Yamaha Corporation Singing voice synthesizing method
CN101727902A (en) * 2008-10-29 2010-06-09 中国科学院自动化研究所 Method for estimating tone
CN104464725A (en) * 2014-12-30 2015-03-25 福建星网视易信息系统有限公司 Method and device for singing imitation
CN106898340A (en) * 2017-03-30 2017-06-27 腾讯音乐娱乐(深圳)有限公司 The synthetic method and terminal of a kind of song
CN106971703A (en) * 2017-03-17 2017-07-21 西北师范大学 A kind of song synthetic method and device based on HMM

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6992245B2 (en) * 2002-02-27 2006-01-31 Yamaha Corporation Singing voice synthesizing method
CN1703735A (en) * 2002-07-29 2005-11-30 埃森图斯有限责任公司 System and method for musical sonification of data
CN1533120A (en) * 2003-03-21 2004-09-29 ���Ͽع����޹�˾ Accdio frequency device
CN1581290A (en) * 2003-08-06 2005-02-16 雅马哈株式会社 Singing voice synthesizing method
CN101727902A (en) * 2008-10-29 2010-06-09 中国科学院自动化研究所 Method for estimating tone
CN104464725A (en) * 2014-12-30 2015-03-25 福建星网视易信息系统有限公司 Method and device for singing imitation
CN106971703A (en) * 2017-03-17 2017-07-21 西北师范大学 A kind of song synthetic method and device based on HMM
CN106898340A (en) * 2017-03-30 2017-06-27 腾讯音乐娱乐(深圳)有限公司 The synthetic method and terminal of a kind of song

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148394A (en) * 2019-04-26 2019-08-20 平安科技(深圳)有限公司 Song synthetic method, device, computer equipment and storage medium
CN110148394B (en) * 2019-04-26 2024-03-01 平安科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
CN112417201A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Audio information pushing method and system, electronic equipment and computer readable medium
CN110600034A (en) * 2019-09-12 2019-12-20 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN110600034B (en) * 2019-09-12 2021-12-03 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN112951198A (en) * 2019-11-22 2021-06-11 微软技术许可有限责任公司 Singing voice synthesis
CN111091807A (en) * 2019-12-26 2020-05-01 广州酷狗计算机科技有限公司 Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN111402842A (en) * 2020-03-20 2020-07-10 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio
CN111402842B (en) * 2020-03-20 2021-11-19 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio
CN111681637A (en) * 2020-04-28 2020-09-18 平安科技(深圳)有限公司 Song synthesis method, device, equipment and storage medium
CN111681637B (en) * 2020-04-28 2024-03-22 平安科技(深圳)有限公司 Song synthesis method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109147757B (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN109147757A (en) Song synthetic method and device
CN108538302B (en) Method and apparatus for synthesizing audio
CN108401124A (en) The method and apparatus of video record
CN109033335A (en) Audio recording method, apparatus, terminal and storage medium
CN108008930A (en) The method and apparatus for determining K song score values
CN108965757A (en) video recording method, device, terminal and storage medium
CN110956971B (en) Audio processing method, device, terminal and storage medium
CN109192218B (en) Method and apparatus for audio processing
CN110491358A (en) Carry out method, apparatus, equipment, system and the storage medium of audio recording
CN110688082B (en) Method, device, equipment and storage medium for determining adjustment proportion information of volume
EP3618055B1 (en) Audio mixing method and terminal, and storage medium
CN109994127A (en) Audio-frequency detection, device, electronic equipment and storage medium
CN109635133A (en) Visualize audio frequency playing method, device, electronic equipment and storage medium
CN108320756A (en) It is a kind of detection audio whether be absolute music audio method and apparatus
CN109065068B (en) Audio processing method, device and storage medium
CN108922562A (en) Sing evaluation result display methods and device
CN107958672A (en) The method and apparatus for obtaining pitch waveform data
CN109192223A (en) The method and apparatus of audio alignment
CN111081277B (en) Audio evaluation method, device, equipment and storage medium
CN110867194B (en) Audio scoring method, device, equipment and storage medium
CN109003621A (en) A kind of audio-frequency processing method, device and storage medium
CN109243479A (en) Acoustic signal processing method, device, electronic equipment and storage medium
CN109147809A (en) Acoustic signal processing method, device, terminal and storage medium
CN109273008A (en) Processing method, device, computer storage medium and the terminal of voice document
CN112086102B (en) Method, apparatus, device and storage medium for expanding audio frequency band

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant