CN109147757A - Song synthetic method and device - Google Patents
Song synthetic method and device Download PDFInfo
- Publication number
- CN109147757A CN109147757A CN201811056146.2A CN201811056146A CN109147757A CN 109147757 A CN109147757 A CN 109147757A CN 201811056146 A CN201811056146 A CN 201811056146A CN 109147757 A CN109147757 A CN 109147757A
- Authority
- CN
- China
- Prior art keywords
- word
- song
- user
- adjusted
- fundamental frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
Abstract
The invention discloses a kind of song synthetic method and devices, belong to speech synthesis technique field.The described method includes: extracting the fundamental frequency of each word, envelope and consonant information in the user speech when getting user speech;According to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;Synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, obtains Composite tone;According to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, the user's song synthesized.The present invention synthesizes user's song using the envelope and auxiliary information of user's script, can retain the tone color of user's script, user's song of synthesis and the sound of user are more close.
Description
Technical field
The present invention relates to speech synthesis technique field more particularly to a kind of song synthetic method and devices.
Background technique
With the development of speech synthesis technique, speech synthesis technique is gradually applied in daily life, for example, having
A little users sing tone-deaf, it is desirable to can read out the lyrics, then generate the song of oneself, then can use speech synthesis technique
To realize.
Currently, the relevant technologies generally first identify the voice that user speaks, correspondence is found out in speech database for speech synthesis
Then intrinsic song extracts the tone color of the song, then using the transformation model pre-established, the tone color of the song is become to use
The tone color at family, the user's song synthesized.Wherein, which is used for song intrinsic in speech database for speech synthesis
Tone color is converted to the tone color of user.
Above-mentioned technology synthesizes user's song using tone color intrinsic in speech database for speech synthesis, can not retain user originally
Tone color, user's song of synthesis and the sound of user can have any different.
Summary of the invention
The embodiment of the invention provides a kind of song synthetic method and devices, can solve user's song of the relevant technologies synthesis
The larger problem of the audible difference of sound and user.The technical solution is as follows:
In a first aspect, providing a kind of song synthetic method, comprising:
When getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted;
According to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, it is described
The pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in song;
Synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, is obtained
Composite tone;
According to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, is obtained
User's song of synthesis.
In a kind of possible implementation, the pitch frequencies according to word each in song, in the user speech
The fundamental frequency of each word is adjusted, comprising:
According to the pitch frequencies of word each in song, the fundamental frequency of word each in the user speech is adjusted to the song
The pitch frequencies of middle corresponding word.
In a kind of possible implementation, the pitch frequencies according to word each in the song, by user's language
The fundamental frequency of each word is adjusted to correspond to the pitch frequencies of word in the song in sound, comprising:
For each word in the song, when the word has multiple pitch frequencies, according to the multiple pitch frequencies
Sequence and ratio, the fundamental frequency of word described in the user speech is adjusted.
It is described to extract the fundamental frequency of each word, envelope and consonant in the user speech in a kind of possible implementation
Information, comprising:
By feature extraction algorithm, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted, each
Word extracts the fundamental frequency of preset quantity, and the preset quantity is determined according to frequency is extracted.
In a kind of possible implementation, the pitch frequencies according to word each in song, in the user speech
The fundamental frequency of each word is adjusted, comprising:
For each word in the user speech, the fundamental frequency of the preset quantity of the word is adjusted to institute in the song
State the pitch frequencies of word.
In a kind of possible implementation, the duration according to word each in the song, in the Composite tone
The duration of each word is adjusted, the user's song synthesized, comprising:
According to the duration of word each in the song, the duration of word each in the Composite tone is adjusted to the song
The duration of middle corresponding word, the user's song synthesized.
Second aspect provides a kind of song synthesizer, comprising:
Extraction module, for when getting user speech, extracting the fundamental frequency of each word, envelope in the user speech
And consonant information;
Module is adjusted, for the pitch frequencies according to word each in song, to the fundamental frequency of each word in the user speech
It is adjusted, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;
Synthesis module, for each word in fundamental frequency adjusted, the user speech envelope and consonant information carry out
Synthesis processing, obtains Composite tone;
The adjustment module is also used to the duration according to word each in the song, to each word in the Composite tone
Duration is adjusted, the user's song synthesized.
In a kind of possible implementation, the adjustment module is used for the pitch frequencies according to word each in song, by institute
The fundamental frequency for stating each word in user speech is adjusted to correspond to the pitch frequencies of word in the song.
In a kind of possible implementation, the adjustment module is used for for each word in the song, when the word
When having multiple pitch frequencies, according to the sequence and ratio of the multiple pitch frequencies, to the base of word described in the user speech
Frequency is adjusted.
In a kind of possible implementation, the extraction module is used to extract the user by feature extraction algorithm
The fundamental frequency of each word, envelope and consonant information in voice, each word extract the fundamental frequency of preset quantity, the preset quantity according to
Frequency is extracted to determine.
In a kind of possible implementation, the adjustment module is used for for each word in the user speech, by institute
The fundamental frequency for stating the preset quantity of word is adjusted to the pitch frequencies of word described in the song.
In a kind of possible implementation, the adjustment module is used for the duration according to word each in the song, by institute
The duration for stating each word in Composite tone is adjusted to correspond to the duration of word, the user's song synthesized in the song.
The third aspect provides a kind of computer equipment, including processor and memory;The memory, for storing
Computer program;The processor realizes that first aspect is any for executing the computer program stored on the memory
Method and step described in kind implementation.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium
Computer program realizes the step of method described in any implementation of first aspect when the computer program is executed by processor
Suddenly.
Technical solution bring beneficial effect provided in an embodiment of the present invention includes at least:
It will be adjusted after being adjusted to the fundamental frequency for each word that user says by the pitch frequencies according to word each in song
The envelope and auxiliary information Composite tone of fundamental frequency, user's script after whole say user further according to the duration of word each in song
The duration of each word be adjusted, to synthesize user's song.Above scheme uses the envelope and auxiliary information of user's script
User's song is synthesized, the tone color of user's script can be retained, user's song of synthesis and the sound of user are more close.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention;
Fig. 3 is corresponding schematic diagram between a kind of pitch provided in an embodiment of the present invention and frequency;
Fig. 4 is a kind of structural schematic diagram of song synthesizer provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment 500 provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Fig. 1 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention.Referring to Fig. 1, this method comprises:
101, when getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted.
102, according to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, it should
The pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in song.
103, synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, obtained
To Composite tone.
104, according to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, is obtained
User's song of synthesis.
Each of method provided in an embodiment of the present invention, by the pitch frequencies according to word each in song, user is said
After the fundamental frequency of word is adjusted, by fundamental frequency adjusted, the envelope of user's script and auxiliary information Composite tone, further according to song
In each word duration, the duration for each word that user says is adjusted, to synthesize user's song.Above scheme is using use
The envelope and auxiliary information of family script synthesize user's song, can retain the tone color of user's script, the user's song and use of synthesis
The sound at family is more close.
In a kind of possible implementation, the pitch frequencies according to word each in song, to each in the user speech
The fundamental frequency of word is adjusted, comprising:
According to the pitch frequencies of word each in song, the fundamental frequency of word each in the user speech is adjusted to right in the song
Answer the pitch frequencies of word.
In a kind of possible implementation, which will be every in the user speech
The fundamental frequency of a word is adjusted to correspond to the pitch frequencies of word in the song, comprising:
For each word in the song, when the word has multiple pitch frequencies, according to the sequence of multiple pitch frequencies
And ratio, the fundamental frequency of the word in the user speech is adjusted.
In a kind of possible implementation, this extracts the fundamental frequency of each word, envelope and consonant information in the user speech,
Include:
By feature extraction algorithm, the fundamental frequency of each word, envelope and consonant information in the user speech, each word are extracted
The fundamental frequency of preset quantity is extracted, which determines according to frequency is extracted.
In a kind of possible implementation, the pitch frequencies according to word each in song, to each in the user speech
The fundamental frequency of word is adjusted, comprising:
For each word in the user speech, the fundamental frequency of the preset quantity of the word is adjusted to the sound of the word in the song
High-frequency.
In a kind of possible implementation, the duration according to word each in the song, to each word in the Composite tone
Duration be adjusted, the user's song synthesized, comprising:
According to the duration of word each in the song, the duration of word each in the Composite tone is adjusted in the song corresponding
The duration of word, the user's song synthesized.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer
It repeats one by one.
Fig. 2 is a kind of flow chart of song synthetic method provided in an embodiment of the present invention.This method is executed by electronic equipment,
Referring to fig. 2, this method comprises:
201, when getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted.
In the embodiment of the present invention, user can input user speech to electronic equipment, for example, can install on electronic equipment
There is specified application, which has the function of song synthesis.When user wants the song of synthesis oneself, phase can be passed through
The voice input interface that triggering electronic equipment shows the specified application should be operated, during showing the voice input interface,
User can speak against electronic equipment, such as read out the lyrics of certain song, electronic equipment is allowed to collect user speech.
Further, electronic equipment can extract the feature of user speech, fundamental frequency, packet including word each in user speech
Network and consonant information.In a kind of possible implementation, electronic equipment can extract user's language by feature extraction algorithm
The fundamental frequency of each word, envelope and consonant information in sound, each word extract the fundamental frequency of preset quantity, and the preset quantity is according to extraction
Frequency determines.
For example, the feature extraction algorithm that electronic equipment can include by world tool, carries out feature to user speech and mentions
It takes, this feature extraction algorithm may include fundamental frequency extraction algorithm, envelope extraction algorithm and consonant extraction algorithm, be extracted by every kind
Algorithm extracts user speech, available corresponding feature, as extracted user speech by fundamental frequency extraction algorithm
Fundamental frequency information extracts the envelope information of user speech by envelope extraction algorithm, extracts user by consonant extraction algorithm
The consonant information of voice.
202, according to the pitch frequencies of word each in song, the fundamental frequency of word each in the user speech is adjusted to the song
The pitch frequencies of middle corresponding word.
Wherein, in the song each word pitch frequencies be the song in each word the corresponding frequency of pitch.Referring to figure
3, Fig. 3 be corresponding schematic diagram between a kind of pitch provided in an embodiment of the present invention and frequency, and the pitch of each word can in song
Be converted to corresponding frequency with the rule according to twelve-tone equal temperament, as in Fig. 3 first row pitch and the 4th column frequency exist pair
It should be related to, the pitch of word each in song can be converted to corresponding frequency according to the corresponding relationship by electronic equipment, thus
To the pitch frequencies of each word.
In the embodiment of the present invention, for each word in the user speech, electronic equipment can be adjusted the fundamental frequency of the word
For the pitch frequencies for corresponding to word in the song.Wherein, for each word in user speech, the correspondence word in song is with the word
Same word.Certainly, corresponding word can also be only that pronunciation is identical as the pronunciation of the word, and it is not limited in the embodiment of the present invention.
The fundamental frequency that preset quantity is extracted for word each in step 201, for each word in the user speech, electronics
The fundamental frequency of the preset quantity of the word can be adjusted to correspond to the pitch frequencies of word in the song by equipment.
In a kind of possible implementation, for each word in the song, when the word has multiple pitch frequencies, according to
The sequence and ratio of multiple pitch frequencies are adjusted the fundamental frequency of the word in the user speech.It is with the word A in song
Example, it is assumed that word A has frequency 1, frequency 2 and frequency 3 these three frequencies, and sequence is also frequency 1- > frequency 2- > frequency 3, and frequency 1 exists
Front accounts for 50%, and frequency 2 accounts for 30%, frequency 3 later, accounts for 20% in centre.Then electronic equipment can be by user
Preceding 50% frequency of word A is adjusted to frequency 1 in voice, and intermediate 30% frequency is adjusted to frequency 2, behind 20% frequency be adjusted to
Frequency 3.
It should be noted that the step 202 is according to the pitch frequencies of word each in song, to each in the user speech
A kind of possible implementation that the fundamental frequency of word is adjusted.Since fundamental frequency determines pitch, by by word each in user speech
Fundamental frequency is adjusted to the pitch frequencies of word in song, so that corresponding to the pitch phase of word in user speech in the pitch of each word and song
Together.
203, synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, obtained
To Composite tone.
In the embodiment of the present invention, after electronic equipment is adjusted the fundamental frequency of word each in user speech, it can will adjust
The envelope and consonant information Composite tone (or audio) of fundamental frequency and user's script afterwards.
The step of in a kind of possible implementation, electronic equipment can use world tool, execute synthesis processing, example
Such as, which may include Speech Synthesis Algorithm, and correspondingly, computer equipment can be by the Speech Synthesis Algorithm, will
The envelope and consonant information Composite tone of word in fundamental frequency adjusted and user speech.Due to using the envelope of user's script and auxiliary
Supplementary information Composite tone, and envelope decides the tone color of user, therefore the mode of this Composite tone can retain user originally
Tone color.
204, according to the duration of word each in the song, the duration of word each in the Composite tone is adjusted in the song
The duration of corresponding word, the user's song synthesized.
In the embodiment of the present invention, Composite tone obtained in above-mentioned steps 203 is only the sound of word in the pitch and song of word
Height matches, in order to obtain user's song, and for each word in the Composite tone, electronic equipment can be by speed change algorithm, such as
The duration of the word, is adjusted to the duration of corresponding word in the song by sound touch speed change algorithm.Electronic equipment adjustment synthesis
The user's song for having arrived synthesis in audio after the duration of word, that is to say, user's one's voice in speech is become song.
It should be noted that being carried out according to the duration of word each in the song to the duration of each word in the Composite tone
Adjustment, a kind of possible implementation of the user's song synthesized.By the way that the duration of word each in Composite tone is adjusted to
The duration of word in song, so that Composite tone becomes user's song.Due to using envelope and the auxiliary information synthesis of user's script
User's song, can retain the tone color of user's script, and user's song of synthesis and the sound of user are more close.
Above-mentioned technical proposal first extracts fundamental frequency, envelope and the consonant information of each word that user says, and according to the every of song
The corresponding frequencies for each word that user says are become the pitch frequencies of the word of song by the pitch frequencies of a word, then and originally
Fundamental frequency information and envelope synthesized, and the duration of each word according to song carries out speed change to each word that user says.Cause
Envelope and consonant information for the program are all user's scripts, the sound of tone color and user more closely, synthesis song
Also more natural.
Each of method provided in an embodiment of the present invention, by the pitch frequencies according to word each in song, user is said
After the fundamental frequency of word is adjusted, by fundamental frequency adjusted, the envelope of user's script and auxiliary information Composite tone, further according to song
In each word duration, the duration for each word that user says is adjusted, to synthesize user's song.Above scheme is using use
The envelope and auxiliary information of family script synthesize user's song, can retain the tone color of user's script, the user's song and use of synthesis
The sound at family is more close.
Fig. 4 is a kind of structural schematic diagram of song synthesizer provided in an embodiment of the present invention.Referring to Fig. 4, the device packet
It includes:
Extraction module 401, for when getting user speech, extracting the fundamental frequency of each word in the user speech, packet
Network and consonant information;
Module 402 is adjusted, for the pitch frequencies according to word each in song, to the fundamental frequency of each word in the user speech
It is adjusted, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;
Synthesis module 403, for each word in fundamental frequency adjusted, the user speech envelope and consonant information carry out
Synthesis processing, obtains Composite tone;
The adjustment module 402 is also used to the duration according to word each in the song, in the Composite tone each word when
Length is adjusted, the user's song synthesized.
In a kind of possible implementation, which is used for the pitch frequencies according to word each in song, by this
The fundamental frequency of each word is adjusted to correspond to the pitch frequencies of word in the song in user speech.
In a kind of possible implementation, which is used for for each word in the song, when the word have it is more
When a pitch frequencies, according to the sequence and ratio of multiple pitch frequencies, the fundamental frequency of the word in the user speech is adjusted.
In a kind of possible implementation, which is used to extract user's language by feature extraction algorithm
The fundamental frequency of each word, envelope and consonant information in sound, each word extract the fundamental frequency of preset quantity, and the preset quantity is according to extraction
Frequency determines.
In a kind of possible implementation, which is used for for each word in the user speech, by the word
The fundamental frequency of preset quantity be adjusted to the pitch frequencies of the word in the song.
In a kind of possible implementation, which is used for the duration according to word each in the song, by the conjunction
It is adjusted to correspond to the duration of word in the song at the duration of word each in audio, the user's song synthesized.
In the embodiment of the present invention, by the pitch frequencies according to word each in song, to the fundamental frequency for each word that user says
After being adjusted, by fundamental frequency adjusted, the envelope of user's script and auxiliary information Composite tone, further according to word each in song
Duration, the duration for each word that user says is adjusted, to synthesize user's song.Above scheme is using user's script
Envelope and auxiliary information synthesize user's song, can retain the tone color of user's script, user's song of synthesis and the sound of user
It is more close.
It should be understood that song synthesizer provided by the above embodiment song synthesize when, only with above-mentioned each function
The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds
Block is completed, i.e., the internal structure of equipment is divided into different functional modules, to complete all or part of function described above
Energy.In addition, song synthesizer provided by the above embodiment and song synthetic method embodiment belong to same design, it is specific real
Existing process is detailed in embodiment of the method, and which is not described herein again.
Fig. 5 is the structural schematic diagram of a kind of electronic equipment 500 provided in an embodiment of the present invention.The electronic equipment 500 can be with
Be: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III,
Dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer
IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Electronic equipment 500 may be used also
Other titles such as user equipment, portable electronic device, electronic equipment on knee, table type electronic equipment can be referred to as.
In general, electronic equipment 500 includes: processor 501 and memory 502.
Processor 501 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place
Reason device 501 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field-
Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed
Logic array) at least one of example, in hardware realize.Processor 501 also may include primary processor and coprocessor, master
Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing
Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.?
In some embodiments, processor 501 can be integrated with GPU (Graphics Processing Unit, image processor),
GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 501 can also be wrapped
AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning
Calculating operation.
Memory 502 may include one or more computer readable storage mediums, which can
To be non-transient.Memory 502 may also include high-speed random access memory and nonvolatile memory, such as one
Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 502 can
Storage medium is read for storing at least one instruction, at least one instruction performed by processor 501 for realizing this Shen
Please in embodiment of the method provide song synthetic method.
In some embodiments, electronic equipment 500 is also optional includes: peripheral device interface 503 and at least one periphery
Equipment.It can be connected by bus or signal wire between processor 501, memory 502 and peripheral device interface 503.It is each outer
Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 503.Specifically, peripheral equipment includes: to penetrate
At least one of frequency circuit 504, display screen 505, camera 506, voicefrequency circuit 507, positioning component 508 and power supply 509.
Peripheral device interface 503 can be used for I/O (Input/Output, input/output) is relevant outside at least one
Peripheral equipment is connected to processor 501 and memory 502.In some embodiments, processor 501, memory 502 and peripheral equipment
Interface 503 is integrated on same chip or circuit board;In some other embodiments, processor 501, memory 502 and outer
Any one or two in peripheral equipment interface 503 can realize on individual chip or circuit board, the present embodiment to this not
It is limited.
Radio circuit 504 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates
Frequency circuit 504 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 504 turns electric signal
It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 504 wraps
It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip
Group, user identity module card etc..Radio circuit 504 can by least one wireless communication protocol come with other electronic equipments
It is communicated.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), nothing
Line local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 504
Can also include NFC (Near Field Communication, wireless near field communication) related circuit, the application to this not
It is limited.
Display screen 505 is for showing UI (UserInterface, user interface).The UI may include figure, text, figure
Mark, video and its their any combination.When display screen 505 is touch display screen, display screen 505 also there is acquisition to show
The ability of the touch signal on the surface or surface of screen 505.The touch signal can be used as control signal and be input to processor
501 are handled.At this point, display screen 505 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or
Soft keyboard.In some embodiments, display screen 505 can be one, and the front panel of electronic equipment 500 is arranged;In other realities
It applies in example, display screen 505 can be at least two, be separately positioned on the different surfaces of electronic equipment 500 or in foldover design;?
In still other embodiments, display screen 505 can be flexible display screen, is arranged on the curved surface of electronic equipment 500 or folds
On face.Even, display screen 505 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 505 can be adopted
With LCD (Liquid Crystal Display, liquid crystal display), (Organic Light-Emitting Diode, has OLED
Machine light emitting diode) etc. materials preparation.
CCD camera assembly 506 is for acquiring image or video.Optionally, CCD camera assembly 506 include front camera and
Rear camera.In general, the front panel of electronic equipment is arranged in front camera, the back of electronic equipment is arranged in rear camera
Face.In some embodiments, rear camera at least two, be respectively main camera, depth of field camera, wide-angle camera,
Any one in focal length camera, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera shooting
Head and wide-angle camera fusion realize pan-shot and VR (Virtual Reality, virtual reality) shooting function or its
It merges shooting function.In some embodiments, CCD camera assembly 506 can also include flash lamp.Flash lamp can be monochrome
Warm flash lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can
For the light compensation under different-colour.
Voicefrequency circuit 507 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will
Sound wave, which is converted to electric signal and is input to processor 501, to be handled, or is input to radio circuit 504 to realize voice communication.
For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of electronic equipment 500 to be multiple.
Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 501 or radio frequency will to be come from
The electric signal of circuit 504 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking
Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action
Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 507 can be with
Including earphone jack.
Positioning component 508 is used for the current geographic position of Positioning Electronic Devices 500, to realize navigation or LBS (Location
Based Service, location based service).Positioning component 508 can be the GPS (Global based on the U.S.
Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union
The positioning component of Galileo system.
Power supply 509 is used to be powered for the various components in electronic equipment 500.Power supply 509 can be alternating current, direct current
Electricity, disposable battery or rechargeable battery.When power supply 509 includes rechargeable battery, which can support wired
Charging or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, electronic equipment 500 further includes having one or more sensors 510.The one or more passes
Sensor 510 includes but is not limited to: acceleration transducer 511, gyro sensor 512, pressure sensor 513, fingerprint sensor
514, optical sensor 515 and proximity sensor 516.
Acceleration transducer 511 can detecte the acceleration in three reference axis of the coordinate system established with electronic equipment 500
Spend size.For example, acceleration transducer 511 can be used for detecting component of the acceleration of gravity in three reference axis.Processor
The 501 acceleration of gravity signals that can be acquired according to acceleration transducer 511, control touch display screen 505 with transverse views or
Longitudinal view carries out the display of user interface.Acceleration transducer 511 can be also used for game or the exercise data of user
Acquisition.
Gyro sensor 512 can detecte body direction and the rotational angle of electronic equipment 500, gyro sensor
512 can cooperate with acquisition user to act the 3D of electronic equipment 500 with acceleration transducer 511.Processor 501 is according to gyroscope
The data that sensor 512 acquires, may be implemented following function: action induction (for example changed according to the tilt operation of user
UI), image stabilization, game control and inertial navigation when shooting.
The lower layer of side frame and/or touch display screen 505 in electronic equipment 500 can be set in pressure sensor 513.When
When the side frame of electronic equipment 500 is arranged in pressure sensor 513, user can detecte to the gripping signal of electronic equipment 500,
Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 513 acquires by processor 501.Work as pressure sensing
When the lower layer of touch display screen 505 is arranged in device 513, grasped by processor 501 according to pressure of the user to touch display screen 505
Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control,
At least one of icon control, menu control.
Fingerprint sensor 514 is used to acquire the fingerprint of user, collected according to fingerprint sensor 514 by processor 501
The identity of fingerprint recognition user, alternatively, by fingerprint sensor 514 according to the identity of collected fingerprint recognition user.It is identifying
When the identity of user is trusted identity out, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 501
Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Electronics can be set in fingerprint sensor 514
Front, the back side or the side of equipment 500.When being provided with physical button or manufacturer Logo on electronic equipment 500, fingerprint sensor
514 can integrate with physical button or manufacturer Logo.
Optical sensor 515 is for acquiring ambient light intensity.In one embodiment, processor 501 can be according to optics
The ambient light intensity that sensor 515 acquires controls the display brightness of touch display screen 505.Specifically, when ambient light intensity is higher
When, the display brightness of touch display screen 505 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 505 is bright
Degree.In another embodiment, the ambient light intensity that processor 501 can also be acquired according to optical sensor 515, dynamic adjust
The acquisition parameters of CCD camera assembly 506.
Proximity sensor 516, also referred to as range sensor are generally arranged at the front panel of electronic equipment 500.Proximity sensor
516 for acquiring the distance between the front of user Yu electronic equipment 500.In one embodiment, when proximity sensor 516 is examined
When measuring the distance between the front of user and electronic equipment 500 and gradually becoming smaller, touch display screen 505 is controlled by processor 501
Breath screen state is switched to from bright screen state;When proximity sensor 516 detect between user and the front of electronic equipment 500 away from
When from becoming larger, touch display screen 505 being controlled by processor 501 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that structure shown in Fig. 5 does not constitute the restriction to electronic equipment 500, it can
To include perhaps combining certain components than illustrating more or fewer components or being arranged using different components.
In the exemplary embodiment, a kind of computer readable storage medium for being stored with computer program, example are additionally provided
It is such as stored with the memory of computer program, above-mentioned computer program realizes the song in above-described embodiment when being executed by processor
Synthetic method.For example, the computer readable storage medium can be read-only memory (Read-Only Memory, ROM), deposit at random
Access to memory (Random Access Memory, RAM), CD-ROM (Compact Disc Read-Only Memory, CD-
ROM), tape, floppy disk and optical data storage devices etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, which can store in a kind of computer-readable storage
In medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (14)
1. a kind of song synthetic method, which is characterized in that the described method includes:
When getting user speech, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted;
According to the pitch frequencies of word each in song, the fundamental frequency of each word in the user speech is adjusted, the song
In each word pitch frequencies be the song in each word the corresponding frequency of pitch;
Synthesis processing is carried out to the envelope and consonant information of each word in fundamental frequency adjusted, the user speech, is synthesized
Audio;
According to the duration of word each in the song, the duration of each word in the Composite tone is adjusted, is synthesized
User's song.
2. the method according to claim 1, wherein the pitch frequencies according to word each in song, to institute
The fundamental frequency for stating each word in user speech is adjusted, comprising:
According to the pitch frequencies of word each in the song, the fundamental frequency of word each in the user speech is adjusted to the song
The pitch frequencies of middle corresponding word.
3. according to the method described in claim 2, it is characterized in that, the pitch frequencies according to word each in the song,
The fundamental frequency of word each in the user speech is adjusted to correspond to the pitch frequencies of word in the song, comprising:
For each word in the song, when the word has multiple pitch frequencies, according to the row of the multiple pitch frequencies
Sequence and ratio are adjusted the fundamental frequency of word described in the user speech.
4. the method according to claim 1, wherein the base for extracting each word in the user speech
Frequently, envelope and consonant information, comprising:
By feature extraction algorithm, the fundamental frequency of each word, envelope and consonant information in the user speech are extracted, each word mentions
The fundamental frequency of preset quantity is taken out, the preset quantity is determined according to frequency is extracted.
5. according to the method described in claim 4, it is characterized in that, the pitch frequencies according to word each in song, to institute
The fundamental frequency for stating each word in user speech is adjusted, comprising:
For each word in the user speech, the fundamental frequency of the preset quantity of the word is adjusted to word described in the song
Pitch frequencies.
6. the method according to claim 1, wherein the duration according to word each in the song, to institute
The duration for stating each word in Composite tone is adjusted, the user's song synthesized, comprising:
According to the duration of word each in the song, the duration of word each in the Composite tone is adjusted to right in the song
The duration for answering word, the user's song synthesized.
7. a kind of song synthesizer, which is characterized in that described device includes:
Extraction module, for when getting user speech, extracting the fundamental frequency of each word in the user speech, envelope and auxiliary
Message breath;
Module is adjusted, for the pitch frequencies according to word each in song, the fundamental frequency of each word in the user speech is carried out
It adjusts, the pitch frequencies of each word are the corresponding frequency of pitch of each word in the song in the song;
Synthesis module, for each word in fundamental frequency adjusted, the user speech envelope and consonant information synthesize
Processing, obtains Composite tone;
The adjustment module is also used to the duration according to word each in the song, to the duration of each word in the Composite tone
It is adjusted, the user's song synthesized.
8. device according to claim 7, which is characterized in that the adjustment module is used for according to word each in the song
Pitch frequencies, the fundamental frequency of word each in the user speech is adjusted to correspond to the pitch frequencies of word in the song.
9. according to the method described in claim 8, it is characterized in that, the adjustment module is used for for each of described song
Word, when the word has multiple pitch frequencies, according to the sequence and ratio of the multiple pitch frequencies, in the user speech
The fundamental frequency of the word is adjusted.
10. device according to claim 7, which is characterized in that the extraction module is used to mention by feature extraction algorithm
The fundamental frequency of each word, envelope and consonant information in the user speech are taken out, each word extracts the fundamental frequency of preset quantity, described
Preset quantity is determined according to frequency is extracted.
11. device according to claim 10, which is characterized in that the adjustment module is used for in the user speech
Each word, the fundamental frequency of the preset quantity of the word is adjusted to the pitch frequencies of word described in the song.
12. device according to claim 7, which is characterized in that the adjustment module is used for according to each in the song
The duration of word each in the Composite tone is adjusted to correspond to the duration of word in the song, be synthesized by the duration of word
User's song.
13. a kind of electronic equipment, which is characterized in that including processor and memory;The memory, for storing computer journey
Sequence;The processor realizes any one of claim 1-6 institute for executing the computer program stored on the memory
The method and step stated.
14. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program, the computer program realize method and step described in any one of claims 1-6 when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811056146.2A CN109147757B (en) | 2018-09-11 | 2018-09-11 | Singing voice synthesis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811056146.2A CN109147757B (en) | 2018-09-11 | 2018-09-11 | Singing voice synthesis method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109147757A true CN109147757A (en) | 2019-01-04 |
CN109147757B CN109147757B (en) | 2021-07-02 |
Family
ID=64824403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811056146.2A Active CN109147757B (en) | 2018-09-11 | 2018-09-11 | Singing voice synthesis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109147757B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110148394A (en) * | 2019-04-26 | 2019-08-20 | 平安科技(深圳)有限公司 | Song synthetic method, device, computer equipment and storage medium |
CN110600034A (en) * | 2019-09-12 | 2019-12-20 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN111091807A (en) * | 2019-12-26 | 2020-05-01 | 广州酷狗计算机科技有限公司 | Speech synthesis method, speech synthesis device, computer equipment and storage medium |
CN111402842A (en) * | 2020-03-20 | 2020-07-10 | 北京字节跳动网络技术有限公司 | Method, apparatus, device and medium for generating audio |
CN111681637A (en) * | 2020-04-28 | 2020-09-18 | 平安科技(深圳)有限公司 | Song synthesis method, device, equipment and storage medium |
CN112417201A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Audio information pushing method and system, electronic equipment and computer readable medium |
CN112951198A (en) * | 2019-11-22 | 2021-06-11 | 微软技术许可有限责任公司 | Singing voice synthesis |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1533120A (en) * | 2003-03-21 | 2004-09-29 | ���Ͽع�����˾ | Accdio frequency device |
CN1581290A (en) * | 2003-08-06 | 2005-02-16 | 雅马哈株式会社 | Singing voice synthesizing method |
CN1703735A (en) * | 2002-07-29 | 2005-11-30 | 埃森图斯有限责任公司 | System and method for musical sonification of data |
US6992245B2 (en) * | 2002-02-27 | 2006-01-31 | Yamaha Corporation | Singing voice synthesizing method |
CN101727902A (en) * | 2008-10-29 | 2010-06-09 | 中国科学院自动化研究所 | Method for estimating tone |
CN104464725A (en) * | 2014-12-30 | 2015-03-25 | 福建星网视易信息系统有限公司 | Method and device for singing imitation |
CN106898340A (en) * | 2017-03-30 | 2017-06-27 | 腾讯音乐娱乐(深圳)有限公司 | The synthetic method and terminal of a kind of song |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
-
2018
- 2018-09-11 CN CN201811056146.2A patent/CN109147757B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6992245B2 (en) * | 2002-02-27 | 2006-01-31 | Yamaha Corporation | Singing voice synthesizing method |
CN1703735A (en) * | 2002-07-29 | 2005-11-30 | 埃森图斯有限责任公司 | System and method for musical sonification of data |
CN1533120A (en) * | 2003-03-21 | 2004-09-29 | ���Ͽع�����˾ | Accdio frequency device |
CN1581290A (en) * | 2003-08-06 | 2005-02-16 | 雅马哈株式会社 | Singing voice synthesizing method |
CN101727902A (en) * | 2008-10-29 | 2010-06-09 | 中国科学院自动化研究所 | Method for estimating tone |
CN104464725A (en) * | 2014-12-30 | 2015-03-25 | 福建星网视易信息系统有限公司 | Method and device for singing imitation |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN106898340A (en) * | 2017-03-30 | 2017-06-27 | 腾讯音乐娱乐(深圳)有限公司 | The synthetic method and terminal of a kind of song |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110148394A (en) * | 2019-04-26 | 2019-08-20 | 平安科技(深圳)有限公司 | Song synthetic method, device, computer equipment and storage medium |
CN110148394B (en) * | 2019-04-26 | 2024-03-01 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
CN112417201A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Audio information pushing method and system, electronic equipment and computer readable medium |
CN110600034A (en) * | 2019-09-12 | 2019-12-20 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN110600034B (en) * | 2019-09-12 | 2021-12-03 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN112951198A (en) * | 2019-11-22 | 2021-06-11 | 微软技术许可有限责任公司 | Singing voice synthesis |
CN111091807A (en) * | 2019-12-26 | 2020-05-01 | 广州酷狗计算机科技有限公司 | Speech synthesis method, speech synthesis device, computer equipment and storage medium |
CN111402842A (en) * | 2020-03-20 | 2020-07-10 | 北京字节跳动网络技术有限公司 | Method, apparatus, device and medium for generating audio |
CN111402842B (en) * | 2020-03-20 | 2021-11-19 | 北京字节跳动网络技术有限公司 | Method, apparatus, device and medium for generating audio |
CN111681637A (en) * | 2020-04-28 | 2020-09-18 | 平安科技(深圳)有限公司 | Song synthesis method, device, equipment and storage medium |
CN111681637B (en) * | 2020-04-28 | 2024-03-22 | 平安科技(深圳)有限公司 | Song synthesis method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109147757B (en) | 2021-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109147757A (en) | Song synthetic method and device | |
CN108538302B (en) | Method and apparatus for synthesizing audio | |
CN108401124A (en) | The method and apparatus of video record | |
CN109033335A (en) | Audio recording method, apparatus, terminal and storage medium | |
CN108008930A (en) | The method and apparatus for determining K song score values | |
CN108965757A (en) | video recording method, device, terminal and storage medium | |
CN110956971B (en) | Audio processing method, device, terminal and storage medium | |
CN109192218B (en) | Method and apparatus for audio processing | |
CN110491358A (en) | Carry out method, apparatus, equipment, system and the storage medium of audio recording | |
CN110688082B (en) | Method, device, equipment and storage medium for determining adjustment proportion information of volume | |
EP3618055B1 (en) | Audio mixing method and terminal, and storage medium | |
CN109994127A (en) | Audio-frequency detection, device, electronic equipment and storage medium | |
CN109635133A (en) | Visualize audio frequency playing method, device, electronic equipment and storage medium | |
CN108320756A (en) | It is a kind of detection audio whether be absolute music audio method and apparatus | |
CN109065068B (en) | Audio processing method, device and storage medium | |
CN108922562A (en) | Sing evaluation result display methods and device | |
CN107958672A (en) | The method and apparatus for obtaining pitch waveform data | |
CN109192223A (en) | The method and apparatus of audio alignment | |
CN111081277B (en) | Audio evaluation method, device, equipment and storage medium | |
CN110867194B (en) | Audio scoring method, device, equipment and storage medium | |
CN109003621A (en) | A kind of audio-frequency processing method, device and storage medium | |
CN109243479A (en) | Acoustic signal processing method, device, electronic equipment and storage medium | |
CN109147809A (en) | Acoustic signal processing method, device, terminal and storage medium | |
CN109273008A (en) | Processing method, device, computer storage medium and the terminal of voice document | |
CN112086102B (en) | Method, apparatus, device and storage medium for expanding audio frequency band |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |