CN110164414A - Method of speech processing, device and smart machine - Google Patents
Method of speech processing, device and smart machine Download PDFInfo
- Publication number
- CN110164414A CN110164414A CN201811452305.0A CN201811452305A CN110164414A CN 110164414 A CN110164414 A CN 110164414A CN 201811452305 A CN201811452305 A CN 201811452305A CN 110164414 A CN110164414 A CN 110164414A
- Authority
- CN
- China
- Prior art keywords
- voice
- verification
- user
- user speech
- tone color
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000012545 processing Methods 0.000 title claims abstract description 52
- 238000012795 verification Methods 0.000 claims abstract description 273
- 238000006243 chemical reaction Methods 0.000 claims description 40
- 230000015654 memory Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012937 correction Methods 0.000 claims description 8
- 238000004519 manufacturing process Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 9
- 238000012546 transfer Methods 0.000 description 6
- 241000208340 Araliaceae Species 0.000 description 5
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 5
- 235000003140 Panax quinquefolius Nutrition 0.000 description 5
- 235000008434 ginseng Nutrition 0.000 description 5
- 210000001260 vocal cord Anatomy 0.000 description 5
- 239000000203 mixture Substances 0.000 description 4
- 210000000056 organ Anatomy 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010060766 Heteroplasia Diseases 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- User Interface Of Digital Computer (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The embodiment of the invention discloses a kind of method of speech processing, device and smart machines, wherein method includes: after getting user speech information, obtain the tamber parameter and voice content included by it, and the first verification voice to match with voice content included by the voice messaging is searched, obtain the tamber parameter of the first verification voice.Further, the the second verification voice for being determined based on the tamber parameter in the tamber parameter and the first verification voice in the user speech information and referring to tone color frequency, and matched according to the voice content with reference to included by the generation of tone color frequency and user speech information.Using the embodiment of the present invention, verification voice can be generated, according to the tamber parameter of user in order to which user more intuitively carries out spoken language exercise.
Description
Technical field
The present invention relates to voice processing technology field more particularly to a kind of method of speech processing, device and smart machine.
Background technique
In this day and age, with developing for economic globalization and implementing for our the external open policys, international exchange
Increasingly increase, the enthusiasm for having driven user to learn foreign languages in this way.In order to skillfully be talked with the foreigner, it is necessary to improve
Foreigh-language oral-speech is horizontal.
Carrying out the common method of spoken language exercise now is to record mouth for one section of foreign language content by professional such as teacher
Language reads aloud model, and then user carries out this section of foreign language content to read aloud practice, and user is read aloud practice compared with reading aloud model,
Visual correlation curve is generated for the difference of the two so that user finds out difference from curve and practices.Practice discovery
The help of the foreign language learning of this mode Foreign Language Learning user is not high.Therefore, how for user more intuitive language is provided
Sound model has become a hot topic of research problem.
Summary of the invention
The embodiment of the present invention provides a kind of method of speech processing, device and smart machine, can be according to the tamber parameter of user
Verification voice is generated, in order to which user more intuitively carries out spoken language exercise.
On the one hand, the embodiment of the invention provides a kind of method of speech processing, comprising:
User speech information is obtained, and obtains the tamber parameter in the user speech information;
Search with the matched first verification voice of voice content included by the user speech information, and obtain described the
The tamber parameter of one verification voice;
Reference is determined based on the tamber parameter of tamber parameter and the first verification voice in the user speech information
Tone color frequency;
Based on described matched with the voice content included by the user speech information with reference to the generation of tone color frequency
Second verification voice.
On the other hand, the embodiment of the invention also provides a kind of voice processing apparatus, comprising:
Acquiring unit for obtaining user speech information, and obtains the tamber parameter in the user speech information;
Processing unit, for searching and the matched first verification language of voice content included by the user speech information
Sound;
The acquiring unit is also used to obtain the tamber parameter of the first verification voice;
The processing unit is also used to based on the tamber parameter and the first verification voice in the user speech information
Tamber parameter determine refer to tone color frequency;
The processing unit is also used to based on described with reference to included by the generation of tone color frequency and the user speech information
The matched second verification voice of voice content.
Another aspect, the embodiment of the invention provides a kind of smart machines, comprising: processor and memory, the storage
Device is for storing computer program, and the computer program includes program instruction, and the processor is configured for described in calling
Program instruction executes above-mentioned method of speech processing.
Correspondingly, it the embodiment of the invention also provides a kind of computer storage medium, is deposited in the computer storage medium
Computer program instructions are contained, when the computer program instructions are executed by processor, for executing above-mentioned method of speech processing.
In the embodiment of the present invention get user speech information and its it is corresponding first verification voice after, can according to
The tamber parameter in tamber parameter and the first verification voice in the voice messaging of family, which determines, refers to tone color frequency, further, base
The the second verification voice to match with user speech information is generated with reference to tone color frequency in this, is realized and is generated approximated user tone color
Verification voice improve the efficiency of user's spoken language exercise in order to which user accurately corrects incorrect pronunciations.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of architecture diagram of speech processing system provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of method of speech processing provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram for correcting prompt information provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of voice processing apparatus provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of smart machine provided in an embodiment of the present invention.
Specific embodiment
It is found in the research of Foreign Language spoken language exercise, with the development of internet, spoken language exercise is from the face under line
Opposite teaching pattern is changed on line spoken exercise mode.It is specified that face-to-face teaching under line refers to that user at the appointed time reaches
Place follows professional to carry out spoken language exercise, and such exercise mode learning time is by spoken professional according to itself religion
Between class hour and the time of most of spoken language exercise person formulates, and user is unable to the unrestricted choice study spoken time.Line is suitable for reading
Under language exercise mode, user can log on to spoken language exercise website or downloading spoken language exercise by smart machine such as mobile phone
The modes such as video carry out spoken language exercise at any time, and under this kind of exercise mode, user can practice according to the arrangement of time spoken language of oneself
Practise the time.
In one embodiment, the mode of spoken exercise mode can be smart machine on a user interface to user on line
The voice content for showing one section of Foreigh-language oral-speech, when collecting voice messaging of the user for this section of voice content, smart machine
The verification voice (alternatively referred to as received pronunciation) to match with this section of voice content is searched, the voice for verifying voice and user is believed
Breath compares, and output comparing result (for example comparing result can be in hello letter e pronunciation mistake, orthoepy is)
In order to which user corrects one's pronunciation according to comparing result.In other embodiments, smart machine is being found and voice content phase
After the verification voice matched, the verification voice can also be constantly played, in order to which user corrects one's pronunciation according to verification voice.
In two above-mentioned embodiments, if the tone color of verification voice more conducively user close with user's tone color intuitively learns and right
Pronounce than standard pronunciation.
It is the second verification voice for how generating approximated user tone color that the embodiment of the present invention, which is described in detail below, in order to
User practices spoken according to standard pronunciation identical with user's tone color.
It is a kind of speech processing system provided in an embodiment of the present invention with reference to Fig. 1, in speech processing system shown in FIG. 1
It may include voice acquisition module 101, verification speech polling module 102, similarity score module 103 and tone color adjustment module
104。
In one embodiment, for some spoken language exercise task, the voice acquisition module 101 is for acquiring user's language
Message breath, the voice acquisition module 101 can acquire user speech information by sound transducer such as microphone.The use
Family voice messaging includes voice content, after detecting that voice acquisition module 101 collects user speech information, the verification
The matched first verification voice of voice content of the inquiry of speech polling module 102 and the user speech information.
Optionally, the first verification voice set, the first verification voice collection can be previously stored in speech processing system
It include at least one first verification voice in conjunction, including language in each first verification voice in the first verification voice set
Sound content, matched first verification of voice content of verification speech polling module 102 inquiry and the user speech information
The mode of voice, which may is that, obtains voice content included in user speech information;According to institute in the user speech information
Including voice content search target first from the first verification voice set and verify voice, the target first verifies in voice
Included voice content matches with the voice content included in the user speech information;By the target first
Voice is verified as with voice content included by user speech information matched first and verifies voice.
In one embodiment, after getting user speech information and the first verification voice, the similarity score mould
Block 103 carries out similarity score to user speech information according to the first verification voice, to determine user pronunciation and the first verification language
Difference between sound pronunciation.If similarity score is lower than similarity threshold, show that user pronunciation accuracy is lower, tone color adjustment
Module 104 needs to carry out tone color adjustment to the first verification voice, so that user is according to verification voice identical with oneself tone color
Practice is spoken;If similarity score is higher than similarity threshold, show that user pronunciation accuracy is higher, tone color adjusts module 104
Tone color adjustment can not be carried out to the first verification voice.
In one embodiment, the mode that tone color adjustment module 104 carries out tone color adjustment to the first verification voice can be
The tone color of first verification voice is adjusted to the tone color of user, first after having adjusted tone color examines voice as the second verification language
Sound.In one embodiment, speech processing system shown in FIG. 1 can also include voice playing module 105, and the voice plays
Module 105 is for playing the first verification voice or the second verification voice so that user is according to the first verification voice or the second school
Test Practice on Phonetics pronunciation.
In conclusion, for some spoken language exercise task, passing through voice first in speech processing system shown in Fig. 1
Acquisition module 101 acquires user speech information, and verification speech polling module 102 searches the first school corresponding with user speech information
Voice is tested, further, similarity score module 103 carries out similarity to user speech information according to the first verification voice
Scoring, obtains similarity score result.When the similarity score result is lower than similarity threshold, tone color adjusts module 104
The tone color of adjustment the first verification voice, obtains the second verification voice, and last voice playing module 105 can play the second verification voice
So that user pronounces according to verification Practice on Phonetics identical with oneself tone color.
Referring to FIG. 2, be a kind of flow diagram of method of speech processing provided in an embodiment of the present invention, it is shown in Fig. 2
Method of speech processing can be using in speech processing system shown in Fig. 1, and the method for speech processing can be held by smart machine
Row, can specifically be executed by the processor in smart machine.In one embodiment, the smart machine may include mobile phone, plate
One of equipment such as computer, notebook are a variety of.
In one embodiment, method of speech processing shown in Fig. 2 mainly adjusts the first school according to user speech information
The second verification voice of speech production approximated user tone color is tested, in order to which user more intuitively learns and contrast standard pronunciation, head
First smart machine obtains user speech information in step s 201, and obtains the tamber parameter in the user speech information.?
It include voice content in the user speech information in one embodiment, the user speech information refers to user for a certain
The audio-frequency information that section voice content is read aloud, the institute in the user speech information in the case where user is to each word pronunciation correct situation
Including voice content be above-mentioned voice content, it is assumed that voice content be " welcome to QQ ", then user speech information is
Refer to the audio-frequency information comprising the voice content, voice content included by user speech information is welcome to QQ.
In one embodiment, the tamber parameter is the parameter for referring to influence tone color, and tone color is mainly by fundamental tone, general
What sound and running parameter determined, for most objects in vibration, vibration is not only whole object, its various pieces
It vibrates at the same time respectively, complex vibration is in this vibration.Wherein, sound caused by body vibration may be considered fundamental tone, each
Sound caused by partial vibration may be considered overtone.For people, in sounding there is also the vibration of multiple portions, also deposit
In point of fundamental tone and overtone, overtone and fundamental tone, which are carried out synthesis based on running parameter, can determine the tone color of sounding body.
In the embodiment of the present invention in the second verification voice process according to the first verification speech production approximated user tone color,
Running parameter can keep identical as the first verification voice, and tamber parameter described in the embodiment of the present invention is mainly according to phase
Fundamental frequency and the overtone frequency of voice is answered to determine, tamber parameter can only include the pitch parameter of corresponding voice, or
Tamber parameter can include the pitch parameter and overtone parameter of corresponding voice simultaneously.
In one embodiment, the user speech information that smart machine is got can be the voice letter that user inputs in real time
Breath, smart machine can obtain in real time the user speech information of user's input by sound transducer such as microphone, such as intelligently
Equipment shows spoken activity window to user on a user interface, shows have multistage to can be used for practicing in the spoken language exercise window
Spoken voice content;When detecting selection operation of the user to a certain section of voice content, smart machine exports prompt information
To prompt user to read aloud this section of voice content;When detecting that user starts to read aloud voice content, smart machine passes through microphone
It obtains user and reads aloud video, as user speech information.
In other embodiments, the user speech information that smart machine is got is also possible to the history voice of user's input
Information, such as user recorded the audio of reading aloud of oneself, and by each audio storage of reading aloud in intelligence when practice is spoken before
In equipment, when user wants to correct one's pronunciation problem by smart machine, it can read aloud in audio one section of selection from each and read aloud
Audio is as user speech information input into smart machine.
In one embodiment, the tamber parameter of the user speech information includes mean value and/or variance, user's language
The tamber parameter of message breath can be the mean value and variance of fundamental tone in user speech information, or, the sound of the user speech information
Color parameter can be the mean value and variance of the overtone of the mean value and variance and user speech information of the fundamental tone of user speech information.
Tamber parameter in the user speech information can be according to the tone color frequency determination within a certain period of time of user speech information
's.General sound is combined by the different vibration of a series of frequencies of sounding body sending, amplitude.These vibrations
The vibration for having a frequency minimum in dynamic, the sound issued by it is exactly fundamental tone, and fundamental frequency refers to that air-flow passes through when sending out voiced sound
Vocal cords generate the vibration of relaxation vibrating type when glottis, generate quasi-periodic driving pulse string.It is the length of fundamental frequency and vocal cords, thin
Thickness, toughness, stiffness and the factors such as habit of pronouncing are related, therefore everyone fundamental frequency is different.Overtone frequency, which refers to, is pronouncing
When, the vibration of the organs such as mouth, throat, lung generates quasi-periodic driving pulse string, and the organs such as everyone mouth, throat are different,
Cause everyone overtone frequency also not identical.Difference based on everyone tone color frequency in the embodiment of the present invention, Ye Jiji
The tone color frequency of first verification voice can be adjusted to related with the tone color frequency of user by the difference of voice frequency and overtone frequency
Reference tone color frequency, thus realize generate approximated user tone color second verification voice.
In one embodiment, the mode that the embodiment of the present invention generates the second verification voice of approximated user tone color can be
Smart machine carries out tone color adjustment for each word in voice content included by the first verification voice one by one.For example, it is assumed that
Voice content is " where are you from? " included by first verification voice, smart machine can for where, are,
Tetra- words of you and from carry out tone color adjustment respectively.The smart machine obtains the tone color in the user speech information
The mode of parameter can be with are as follows: after smart machine gets user speech information, determine the duration of the user speech information, with
And (duration of each word can be understood as each list for the duration of each word in voice content included by the voice messaging
Word corresponds to a period, includes multiple periods in user speech information);It is described for the target word in each word
Smart machine obtains the target time section and tone color frequency of the target word, according to the target time section of the target word and
Tone color frequency calculates the tamber parameter in the target time section.And so on, user speech information can be calculated each
Tamber parameter in a period.Specifically, being the target according to the target word when tone color frequency includes fundamental frequency
Period and fundamental frequency calculate the pitch parameter in the target time section;When tone color frequency includes overtone frequency, it is
The overtone parameter in the target time section is calculated according to the target time section of the target word and overtone frequency.
In one embodiment, after smart machine gets user speech information, in step S202 search with it is described
The matched first verification voice of voice content included by user speech information, and obtain the tone color ginseng of the first verification voice
Number.The tamber parameter of the first verification voice includes mean value and/or variance, and the tamber parameter of the first verification voice can be with
The mean value and variance for verifying fundamental tone in voice for first, or, the tamber parameter of the first verification voice can be the first verification
The mean value and variance of the overtone of the mean value and variance of the fundamental tone of voice and the first verification voice.The first verification voice refers to
Received pronunciation corresponding with user speech information, the received pronunciation also referred to as read aloud model, the received pronunciation can be by
Professional people's such as professional teachers are recorded, and are pronounced in the received pronunciation completely correct.In other words, for one section for spoken language
The voice content of practice, user read aloud the audio that this section of voice content obtains and are called user speech information, and professional reads aloud this
The audio that section voice content obtains is the first verification voice.
In one embodiment, lookup described in step S202 is matched with voice content included by the user speech information
The embodiment of the first verification voice may is that according to voice content included in user speech information from the first verification language
The first verification voice is determined in sound set.Specifically, the lookup with included by the user speech information in voice
Hold matched first verification voice, comprising: obtain the voice content included in the user speech information;According to described
The included voice content searches target first from the first verification voice set and verifies voice, institute in user speech information
The voice content phase for including in voice content and the user speech information included by stating in the verification voice of target first
Matching;Using the target first verification voice as with the matched institute of the voice content included by the user speech information
State the first verification voice.
In smart machine can associated storage have multistage for spoken language exercise voice content and the first corresponding school
Test voice, each first verification voice composition the first verification voice set.After smart machine gets user speech information,
Included voice content can be obtained in user speech information first, then searched in the multistage voice content of storage with it is described
The target voice content that voice content matches, finally according to voice content and first verification voice incidence relation, search with
The corresponding target first of the target voice content verifies voice, using the target first verification voice as with the user
The matched first verification voice of voice content included by voice messaging.
In one embodiment, smart machine is according to language included in user speech acquisition of information user speech information
Sound content, in this way may be due to user pronunciation inaccuracy, the language for leading to the voice content got and being stored in smart machine
Sound content is not exactly the same.Such as the voice content stored in smart machine is " I go to bed round 11:00at
Night ", user is when reading aloud, since night pronunciation of words is not accurate enough, by voiced consonant/n/ breaking-out voiceless consonant/l/, intelligence
Energy equipment may according to user speech acquisition of information to voice content are as follows: " I go to bed round 11:00at light ".
Therefore, in one embodiment, to solve the above-mentioned problems, smart machine can preset matching threshold, if intelligence is set
When matching for the voice content got and between pre-stored target voice content more than matching threshold, it is determined that the mesh
It marks voice content and the voice content of the user speech information is matched.It should be noted that the above are the present invention to enumerate
A kind of solution to the problems described above, without limitation for the specific method embodiment of the present invention.
In one embodiment, the matching threshold of smart machine setting can be for each word for including in voice content
It is arranged, for example matching threshold is set as 80%, shows that 80% or more is phase if the word for including in two sections of voice contents
With, then it can determine that two sections of voice contents match;Or the matching threshold of smart machine setting is also possible in voice
Hold meaning, if two sections of voice content meaning similarities are more than 80%, it is determined that two sections of voice contents are matched.It needs
Bright, the above are some possible matching threshold setting methods enumerated in the embodiment of the present invention, for specifically matching threshold
Value setting method can determine that the embodiment of the present invention is not specifically limited according to actual needs.
In one embodiment, above-mentioned acquisition can refer to for the acquisition methods of the tamber parameter of the first verification voice
The tamber parameter method part of user speech information describes, and details are not described herein.
In one embodiment, smart machine gets the tamber parameter and first in user speech information and verifies voice
After tamber parameter, the smart machine is in step S203 based on the tamber parameter and described in the user speech information
The tamber parameter of one verification voice, which determines, refers to tone color frequency.Wherein, the reference tone color frequency refers to believes according to user speech
Tamber parameter in breath the tone color frequency of the first verification voice is adjusted after tone color frequency or described with reference to tone color frequency
Rate is it also will be understood that tone color frequency needed for becoming the second verification voice of generation approximated user tone color.
In one embodiment, the embodiment of step S203 can be the tone color frequency and ginseng for determining the first verification voice
The transformational relation between tone color frequency is examined, is then determined needed for transformational relation according to the tamber parameter in user speech information
Conversion coefficient and corrected parameter can be finally calculated with reference to tone color frequency according to transformational relation, conversion coefficient and corrected parameter
Rate.Specifically, the tone color based on tamber parameter and the first verification voice in user speech information described in step S203
Parameter, which determines, refers to tone color frequency, it may include: determine the tone color frequency of the first verification voice;Believed according to the user speech
The tamber parameter of tamber parameter and the first verification voice in breath determines conversion coefficient and corrected parameter;According to described first
Tone color frequency, the conversion coefficient, the corrected parameter and the tone color frequency conversion rule for verifying voice, determine the reference
Tone color frequency.
In one embodiment, the tone color frequency of the first verification voice may include the fundamental tone frequency of the first verification voice
Rate or the tone color frequency of the first verification voice may include that the fundamental frequency of the first verification voice and first verify the general of voice
Voice frequency.When the tone color frequency of the first verification voice includes the fundamental frequency of the first verification voice, the first verification voice is determined
Fundamental frequency mode are as follows: then the pitch period for determining the first verification voice can be obtained pitch period is inverted
The fundamental frequency of first verification voice, it is assumed for example that the pitch period of the first verification voice is 4ms, the tone color of the first verification voice
Frequency is then 1/4ms=250Hz.It, can when the tone color frequency of the first verification voice includes the overtone frequency of the first verification voice
The overtone frequency of the first verification voice is determined using the above method.For fundamental frequency, the pitch period refers to hair
Time needed for the periodic opening and closing of the vocal cords of raw body determines the common method of pitch period in one embodiment
It may include time domain method, frequency domain method and mixing method.Wherein, the time domain method refers to directly estimates fundamental tone week by speech waveform
Phase, the frequency domain method are that voice signal is changed to frequency domain to estimate that the common method of pitch period is Cepstrum Method, the mixing
Method is first to extract signal channels model parameter, is then filtered using channel model parameter to voice messaging, obtains source of sound sequence
Column finally recycle correlation method or average amplitude difference method to acquire pitch period.Above a part of only enumerating determines first
The method for verifying the pitch period of voice, smart machine can select any of the above or a variety of methods to determine institute according to actual needs
State the pitch period of the first verification voice.
In one embodiment, the tone color frequency conversion rule can be smart machine and be converted according to used voice
What method determined, it is assumed that phonetics transfer method used by smart machine is the phonetics transfer method based on Gaussian Mixture, with base
For voice frequency, tone color frequency translation rule is indicated with following formula:
Wherein, F '0(t) it indicates to refer to fundamental frequency,Indicate that the fundamental frequency of the first verification voice, a indicate conversion
Coefficient, b indicate that corrected parameter, conversion coefficient and corrected parameter at this time are based on the pitch parameter and the in user speech information
What the pitch parameter of one verification voice determined.The phonetics transfer method that smart machine uses can also be based on Codebook of Vector Quantization
Mapping, artificial nerve network model and Hidden Markov Model etc. are not listed one by one various voices and turn in the embodiment of the present invention
Change the corresponding tone color frequency conversion rule of method.In other embodiments, the tone color frequency conversion rule is also possible to intelligence
Equipment is according to other settings that impose a condition.
It can identical not for fundamental frequency in tone color frequency and tone color transformation rule used by overtone frequency yet
Together.That is, the tone color transformation rule of overtone frequency can also use the conversion calculation of formula 1, wherein F '0(t)
It indicates to refer to overtone frequency,Indicating the overtone frequency of the first verification voice, a indicates that conversion coefficient, b indicate corrected parameter,
Conversion coefficient and corrected parameter at this time is based on the overtone in the overtone parameter and the first verification voice in user speech information
What parameter determined.
In one embodiment, it is described according to the first verification tone color frequency of voice, the conversion coefficient, described repair
Positive parameter and tone color frequency conversion rule, determine that the embodiment with reference to tone color frequency may is that first school
Tone color frequency, the conversion coefficient and the corrected parameter for testing voice, which are updated in the tone color frequency conversion rule, to carry out
It calculates, resulting calculated result is to refer to tone color frequency.
In one embodiment, the tamber parameter in the user speech information includes according in the user speech information
Target time section tone color frequency determine the first mean value and first variance, tamber parameter includes pitch parameter, or, tone color join
Number includes pitch parameter and overtone parameter, and correspondingly, the first mean value may include the first fundamental tone mean value, or, the first mean value may include
First fundamental tone mean value and the first overtone mean value, first variance may include the first fundamental tone variance or first variance may include the first base
Sound variance and the first overtone variance.Specifically, if the tamber parameter in user speech information includes pitch parameter, the user
Tamber parameter in voice messaging includes the determined according to the fundamental frequency of the target time section in the user speech information
One fundamental tone mean value and the first fundamental tone variance;If tamber parameter includes pitch parameter and overtone parameter, institute in user speech information
State the tamber parameter in user speech information include according to the fundamental frequency of the target time section in the user speech information and
The first fundamental tone mean value, the first overtone mean value, the first overtone variance and the first fundamental tone variance that overtone frequency determines respectively.
Similar, when the tamber parameter of the first verification voice includes the target according to the first verification voice
Between section tone color frequency determine the second mean value and second variance, that is to say, that it is described first verification voice tamber parameter can
The the second fundamental tone mean value and the second base that fundamental frequency including verifying the target time section of voice according to described first determines
The tamber parameter of sound variance or the first verification voice may include the object time according to the first verification voice
The the second fundamental tone mean value and the second overtone mean value that the fundamental frequency and overtone frequency of section determine respectively, the second fundamental tone variance and second
Overtone variance.
So, the tone color of the tamber parameter according in the user speech information and the first verification voice
Parameter determines that the embodiment of conversion coefficient and corrected parameter can be with are as follows: based on the first variance, the second variance and pre-
If conversion coefficient determines rule, the conversion parameter is determined;Based on first mean value, second mean value and default amendment ginseng
Number determines rule, determines the corrected parameter.Wherein, the default corrected parameter determines that rule and the preset conversion factor are true
Set pattern then can be what smart machine was determined according to used phonetics transfer method.
For example, it is assumed that phonetics transfer method used in smart machine is the phonetics transfer method based on Gaussian Mixture, institute
It states tone color frequency conversion rule in the conversion method based on Gaussian Mixture and is represented by formulaJoined with tone color
Number is including for pitch parameter, the conversion coefficient of smart machine setting determines that rule can be expressed as formula:
Wherein,Indicate the first fundamental tone variance that the pitch parameter in user speech information includes,Indicate the first verification
The second fundamental tone variance that pitch parameter in voice includes;The corrected parameter of smart machine setting determines that rule can be following public
Formula:
B=μt-aμs, formula 3;
Wherein, b indicates corrected parameter, μtIndicate the first fundamental tone mean value that the pitch parameter in user speech information includes, μs
Indicate the second mean value that the pitch parameter in the first verification voice includes.
For including sound parameter and overtone parameter in tamber parameter, conversion coefficient used by the smart machine, which determines, is advised
Then determine that rule can be the same or different with corrected parameter, that is to say, that if in tamber parameter including overtone parameter, intelligence
The method that formula 2 also can be used in energy equipment calculates conversion coefficient, and the method for using formula 3 calculates corrected parameter, whereinTable
Show the first overtone variance that the overtone parameter in user speech information includes,Indicate the overtone parameter in the first verification voice
Including the second overtone variance, μtIndicate the first overtone mean value that the overtone parameter in user speech information includes, μsIndicate first
The second overtone mean value that overtone parameter in verification voice includes.
In one embodiment, described determined based on the first variance, the second variance and preset conversion factor is advised
Then, determine the conversion coefficient is mode are as follows: the first variance, the second variance are substituted into the preset conversion factor
It determines and is calculated in rule, resulting calculated result is conversion coefficient;Similarly, described to be based on first mean value, institute
State the second mean value and the default corrected parameter determine rule, determine the mode of the corrected parameter are as follows: by first mean value,
Second mean value is substituted into the determining rule of the default corrected parameter and is calculated, and resulting calculated result is to correct ginseng
Number.
In one embodiment, smart machine before executing step S203, can also be held after executing step S202
Row: similarity score is carried out to the user speech information based on the first verification voice, obtains similarity score result;If
The similarity score result meets tone color regularization condition, then executes described based on the tamber parameter in the user speech information
It determines with the tamber parameter of the first verification voice with reference to the step of tone color frequency.That is, smart machine gets user's language
After message breath and the first verification voice, the difference between user speech information and the first verification voice can be first determined, according to institute
State whether diversity judgement needs to carry out tone color adjustment to the first verification voice.
In one embodiment, described that user speech information progress similarity is commented based on the first verification voice
Point can refer to and pronunciation similarity score is carried out to the user speech information and the first verification voice, if user speech information with
Similarity score between the first verification voice is higher, shows user pronunciation height similar to the first verification sound pronunciation,
That is the accuracy of user pronunciation is higher, at this time may user by first verification voice can correct one's pronunciation mistake part,
Smart machine can not execute step S203, can so save the power dissipation overhead of smart machine;If user speech information with it is described
Similarity score between first verification voice is lower, shows that user pronunciation and the first verification sound pronunciation similarity are lower,
That is the accuracy of user pronunciation is lower, and user may be difficult the mistake that corrects one's pronunciation by the first verification voice, smart machine at this time
Step S203 need to be executed.
In other embodiments, described that user speech information progress similarity is commented based on the first verification voice
Dividing can also refer to the user speech information and the first verification voice progress tone color similarity score, if user speech information
Similarity score between the first verification voice is higher, and the tone color and first for showing user verify the tone color of voice more
Close, user hears when the first verification voice that, just like the sound for hearing user oneself, user can be according to the first verification at this time
Practice on Phonetics is spoken, and smart machine can not execute step S203;If between user speech information and the first verification voice
Similarity score it is lower, show that the tone color of user and the tone color of the first verification voice differ larger, smart machine needs to be implemented
Step S203.
In one embodiment, smart machine is after having determined with reference to tone color frequency, based on described in step S204
It is generated and the matched second verification voice of the voice content included by the user speech information with reference to tone color frequency.Intelligence
Equipment can generate the second verification voice, the reality of the step S204 based on the first verification voice and with reference to tone color frequency
The mode of applying can be with are as follows: based on the tone color frequency for adjusting the first verification voice corresponding period with reference to tone color frequency, obtains
To the second verification voice.Specifically, if including referring to fundamental frequency with reference to tone color frequency, the smart machine is based on institute
State the fundamental frequency adjusted in the first verification voice corresponding period with reference to fundamental frequency;If including with reference to tone color frequency
Overtone frequency, the smart machine based on it is described adjust with reference to overtone frequency it is general in the first verification voice corresponding period
Voice frequency.
The factors such as length, thin and thick, toughness, stiffness and the pronunciation habit of different user vocal cords are different, this is but also different
The pitch parameter of user is not also identical, likewise, the vocal cords of different sounding bodies are different, the organs such as caused mouth, throat, lung
Vibration it is also different, therefore the overtone parameter of different generating body is not also identical.It is according to user speech information with reference to tone color frequency
Tamber parameter determine, therefore can reflect the sound of user with reference to tone color frequency, it is described based on adjusting with reference to tone color frequency
Described first verifies the tone color frequency of voice corresponding period it is to be understood that keeping language included by the first verification voice
Sound content is constant, and the tone color frequency of the first verification voice is adjusted to also to verify in voice and wrap by first with reference to tone color frequency
The voice content included is played out with the sound (tone color of user) of user.So be conducive to the verification that user listens to oneself tone color
Voice, more intuitively study and contrast standard pronunciation.
In one embodiment, smart machine is after generating the second verification voice, further includes: plays second school
Voice is tested, corrects user speech in order to which user is based on the second verification voice.Smart machine passes through step S201- step
The second verification voice that S204 is generated is verification voice identical with user's tone color, plays the second verification voice, Yong Huke
It is subject to and hears so that the verification voice of oneself tone color should how be pronounced, be conducive to user and correct one's pronunciation, improve spoken language exercise
Efficiency.In one embodiment, smart machine, can be with before playing the second verification voice: output play cuing letter
Breath, the play cuing information play the second verification voice for prompting the user whether;It is played when detecting that user confirms
When operation, the second verification voice is played, can choose whether to play according to their own needs in order to user in this way, improve and use
Family experience.
In one embodiment, the generation and the voice content included by the user speech information matched the
After two verification voices, the method also includes: obtain the difference between the second verification voice and the user speech information
Different information;It is generated based on the different information and corrects prompt information, correct use in order to which user is based on the correction prompt information
Family voice.Wherein, the different information may include pronunciation difference, smart machine comparative analysis user speech information and the second verification
Pronunciation difference between voice, and according to the heteroplasia that is weak in pronunciation at prompt information is corrected, may include in the correction prompt information to
Correct word, user pronunciation and verification pronunciation.The word to be corrected is any one word for including in voice content, is used
Family pronunciation refers to that user treats the pronunciation for correcting word, and the verification pronunciation is the standard pronunciation of the word to be corrected.
For example, being provided in an embodiment of the present invention a kind of according to the second verification voice and user speech letter with reference to Fig. 3
Different information between breath generates the schematic diagram for correcting prompt information.Shown in Fig. 3, smart machine can be aobvious by display device
Show spoken language exercise window, user selects the voice content to be practiced by the prompt in spoken language exercise window, and defeated according to prompting
Access customer voice messaging 301.After smart machine gets user speech information, the second school is generated by step S201-S204
Voice 302 is tested, the second verification voice 302 can be shown to user in order to which user chooses whether to play.In addition, smart machine
By the different information between the second verification voice of analysis and user speech information, generates and correct prompt information.
Assuming that the voice content that user selects is " Are you staying at home or going out this
Weekend? ", smart machine to second verification voice and user speech analysis when find: to word in user speech information 301
The pronunciation of letter t is voiceless consonant/t/ in staying, since in stressed syllable, the subsequent voiceless consonant/t/ of/s/ wants turbidity at turbid
Consonant/d/, the second pronunciation for verifying letter t in word staying in voice 302 is voiced consonant/d/, and smart machine is based on above-mentioned
The correction prompt information 303 that different information generates can be with are as follows: staying " t " pronunciation/t/ is corrected as/d/.
In one embodiment, it is described based on it is described with reference to tone color frequency generate with the user speech information included by
After the matched second verification voice of voice content, the method also includes: it is generated and is used according to the user speech information
Family voice curve;Voice curve is verified according to the second verification speech production;The user speech is shown on a user interface
Curve and the verification voice curve, in order to which user is based on the user speech curve and verification voice curvature correction use
Family voice.Smart machine can draw user speech curve according to the user speech information and the second verification voice respectively
With verification voice curve, user can by the comparison of above-mentioned two curves find read aloud when there are the problem of.
For example, based in Fig. 3 it is assumed that being based on after smart machine generates the second verification voice by step S201-S204
User speech information generates user speech curve and verifies voice curve based on the second verification speech production, and in the user interface
Display user can intuitively be found out from curve oneself read aloud with verification voice existing for difference, be conducive to user correct hair
Sound.
In conclusion the embodiment of the present invention get user speech information and its it is corresponding first verification voice after,
It can be determined according to the tamber parameter in the tamber parameter and the first verification voice in user speech information and refer to tone color frequency, into one
Step, the second verification voice to match with user speech information is generated with reference to tone color frequency based on this, it is approximate to realize generation
The verification voice of user's tone color improves the efficiency of user's spoken language exercise in order to which user accurately corrects incorrect pronunciations.
Description based on above method embodiment, in one embodiment, the embodiment of the invention also provides a kind of such as Fig. 4
Shown in voice processing apparatus structural schematic diagram.As shown in figure 4, the voice processing apparatus in the embodiment of the present invention, including obtain
Unit 401 and processing unit 402 are taken, in embodiments of the present invention, the voice processing apparatus, which can also be arranged in, to be needed to language
In the smart machine that sound is handled.
In one embodiment, the acquiring unit 401 is used for: being obtained user speech information, and is obtained user's language
Tamber parameter in message breath;The processing unit 402 is used for: being searched and voice content included by the user speech information
Matched first verification voice;The acquiring unit 401 is also used to: obtaining the tamber parameter of the first verification voice;It is described
Processing unit 402 is also used to: the tone color ginseng based on tamber parameter and the first verification voice in the user speech information
Number, which determines, refers to tone color frequency;Institute's A Hu processing unit 402 is also used to: being generated and the user based on described with reference to tone color frequency
The matched second verification voice of voice content included by voice messaging.
In one embodiment, the tamber parameter includes pitch parameter and overtone parameter, the reference tone color frequency packet
It includes with reference to fundamental frequency and with reference to overtone frequency.
In one embodiment, the processing unit 402 is also used to: based on the first verification voice to user's language
Message breath carries out similarity score, obtains similarity score result;If the similarity score result meets tone color regularization condition,
The then tamber parameter based on tamber parameter and the first verification voice in the user speech information described in processing unit 402
It determines and refers to tone color frequency.
In one embodiment, the processing unit 402 is based on tamber parameter in the user speech information and described
The tamber parameter of first verification voice determines the embodiment for referring to tone color frequency are as follows: determines the tone color of the first verification voice
Frequency;Conversion coefficient is determined according to the tamber parameter of tamber parameter and the first verification voice in the user speech information
And corrected parameter;According to the tone color frequency of the first verification voice, the conversion coefficient, the corrected parameter and tone color frequency
Rate transformation rule determines described with reference to tone color frequency.
In one embodiment, the tamber parameter in the user speech information includes according in the user speech information
Target time section tone color frequency determine the first mean value and first variance, it is described first verification voice tamber parameter include
The second mean value and second variance determined according to the tone color frequency of the target time section of the first verification voice;The place
It manages unit 402 and conversion is determined according to the tamber parameter of tamber parameter and the first verification voice in the user speech information
The embodiment of coefficient and corrected parameter are as follows: determined and advised based on the first variance, the second variance and preset conversion factor
Then, the conversion coefficient is determined;Rule is determined based on first mean value, second mean value and default corrected parameter, is determined
The corrected parameter.
In one embodiment, the processing unit 402 is searched and voice content included by the user speech information
The embodiment of matched first verification voice are as follows: obtain the voice content included in the user speech information;Root
Target first is searched from the first verification voice set according to the voice content included in the user speech information to verify
Voice, the target first verify the voice for including in voice content included in voice and the user speech information
Content matches;Using the target first verification voice as with the voice content included by the user speech information
The the first verification voice matched.
In one embodiment, the processing unit 402 is also used to: the second verification voice is played, in order to user
User speech is corrected based on the second verification voice.
In one embodiment, the processing unit 402 is also used to: obtaining the second verification voice and user's language
Different information between message breath;It is generated based on the different information and corrects prompt information, in order to which user is based on the correction
Prompt information corrects user speech.
In one embodiment, the processing unit 402 is also used to: generating user speech according to the user speech information
Curve;Voice curve is verified according to the second verification speech production;Show on a user interface the user speech curve and
The verification voice curve, in order to which user is based on the user speech curve and verification voice curve correction user's language
Sound.
In one embodiment, the processing unit 402 is based on described with reference to the generation of tone color frequency and the user speech
The embodiment of the matched second verification voice of voice content included by information are as follows: be based on the reference voice frequency tune
Whole described first verifies the tone color frequency of voice corresponding period, obtains the second verification voice.
In the embodiment of the present invention acquiring unit 401 get user speech information and its including tamber parameter, processing is single
Member 402 search it is corresponding with the user speech information first verification voice and its including tamber parameter, further, handle
The tamber parameter that the tamber parameter and the first verification voice that unit 402 includes based on user speech information include is determined with reference to tone color
Frequency, then processing unit 402 is based on the second verification language for generating with reference to tone color frequency and matching with user speech information
Sound realizes the verification voice for generating approximated user tone color, and in order to which user accurately corrects incorrect pronunciations, the registered permanent residence is used in raising
The efficiency of language practice.
Fig. 5 is referred to, for a kind of structural schematic diagram for smart machine that the embodiment of the present invention supplies.Intelligence as shown in Figure 5
Equipment includes: one or more processors 501 and one or more memories 502, and the processor 501 and memory 502 are logical
It crosses bus 503 to be connected, for memory 503 for storing computer program, the computer program includes program instruction, processor
501 store program instruction for executing the memory 502.
The memory 502 may include volatile memory (volatile memory), such as random access memory
(random-access memory, RAM);Memory 502 also may include nonvolatile memory (non-volatile
Memory), such as flash memory (flash memory), solid state hard disk (solid-state drive, SSD) etc.;Memory
502 can also include the combination of the memory of mentioned kind.
The processor 501 can be central processing unit (Central Processing Unit, CPU).The processor
501 can further include hardware chip.Above-mentioned hardware chip can be specific integrated circuit (application-
Specific integrated circuit, ASIC), programmable logic device (programmable logic device,
PLD) etc..The PLD can be field programmable gate array (field-programmable gate array, FPGA), lead to
With array logic (generic array logic, GAL) etc..The combination of the processor 501 or above structure.
In the embodiment of the present invention, for the memory 502 for storing computer program, the computer program includes program
Instruction, the processor 501 are used to execute the program instruction of the storage of memory 502, for realizing that above-mentioned method of speech processing is real
The step of applying the correlation method in example.
In one embodiment, the processor 501 is configured that described program instruction is called to be used for: obtaining user speech letter
Breath, and obtain the tamber parameter in the user speech information;It searches and voice content included by the user speech information
Matched first verification voice, and obtain the tamber parameter of the first verification voice;Based in the user speech information
The tamber parameter of tamber parameter and the first verification voice, which determines, refers to tone color frequency;It is generated based on described with reference to tone color frequency
With the matched second verification voice of the voice content included by the user speech information.
In one embodiment, the tamber parameter includes pitch parameter and overtone parameter, the reference tone color frequency packet
It includes with reference to fundamental frequency and with reference to overtone frequency.
In one embodiment, the processor 501 is based on the tamber parameter and described in the user speech information
The tamber parameter of one verification voice determines that the processor 501 is configured to call described program instruction with reference to before tone color frequency
It is also used to: similarity score being carried out to the user speech information based on the first verification voice, obtains similarity score knot
Fruit;If the similarity score result meets tone color regularization condition, the processor 501 executes described based on user's language
The tamber parameter of tamber parameter and the first verification voice in message breath is determined with reference to the step of tone color frequency.
In one embodiment, the processor 501 be used for based in the user speech information tamber parameter and institute
The tamber parameter for stating the first verification voice determines that embodiment when referring to tone color frequency is;Determine the first verification voice
Tone color frequency;Conversion is determined according to the tamber parameter of tamber parameter and the first verification voice in the user speech information
Coefficient and corrected parameter;According to tone color frequency, the conversion coefficient, the corrected parameter and the sound of the first verification voice
Color frequency conversion rule determines described with reference to tone color frequency.
In one embodiment, the tamber parameter in the user speech information includes according in the user speech information
Target time section tone color frequency determine the first mean value and first variance, it is described first verification voice tamber parameter include
The second mean value and second variance determined according to the tone color frequency of the target time section of the first verification voice;The place
It manages device 501 and is used to be determined according to the tamber parameter of tamber parameter and the first verification voice in the user speech information and turn
Change embodiment when coefficient and corrected parameter are as follows: true based on the first variance, the second variance and preset conversion factor
Set pattern then, determines the conversion coefficient;Rule is determined based on first mean value, second mean value and default corrected parameter,
Determine the corrected parameter.
In one embodiment, the processor 501 for searches and the user speech information included by voice
Hold embodiment when matched first verification voice are as follows: obtain in the voice included in the user speech information
Hold;Target first is searched from the first verification voice set according to the voice content included in the user speech information
Verify voice, the target first verifies include in included voice content and the user speech information in voice described
Voice content matches;Using the target first verification voice as in the voice included by the user speech information
Hold the matched first verification voice.
In one embodiment, the processor 501 is used to generate and user's language based on described with reference to tone color frequency
After the matched second verification voice of the included voice content of message breath, the processor 501 is configured described in calling
Program instruction is also used to: being played the second verification voice, is corrected user's language in order to which user is based on the second verification voice
Sound.
In one embodiment, the processor 501 is used to generate and user's language based on described with reference to tone color frequency
After the matched second verification voice of the included voice content of message breath, the processor 501 is configured described in calling
Program instruction is also used to: obtaining the different information between the second verification voice and the user speech information;Based on described
Different information, which generates, corrects prompt information, corrects user speech in order to which user is based on the correction prompt information.
In one embodiment, the processor 501 is used to generate and user's language based on described with reference to tone color frequency
After the matched second verification voice of the included voice content of message breath, the processor 501 is configured described in calling
Program instruction is also used to: generating user speech curve according to the user speech information;According to the second verification speech production
Verify voice curve;Show the user speech curve and the verification voice curve, on a user interface in order to user's base
User speech is corrected in the user speech curve and the verification voice curve.
In one embodiment, the processor 501 is used to generate and user's language based on described with reference to tone color frequency
The embodiment of the matched second verification voice of the included voice content of message breath are as follows: be based on the reference voice frequency
The tone color frequency for adjusting the first verification voice corresponding period obtains the second verification voice.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
Above disclosed is only section Example of the present invention, cannot limit the right model of the present invention with this certainly
It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.
Claims (13)
1. a kind of method of speech processing characterized by comprising
User speech information is obtained, and obtains the tamber parameter in the user speech information;
It searches and verifies voice with voice content included by the user speech information matched first, and obtain first school
Test the tamber parameter of voice;
It is determined based on the tamber parameter of tamber parameter and the first verification voice in the user speech information with reference to tone color
Frequency;
It is generated and the voice content matched second included by the user speech information based on described with reference to tone color frequency
Verify voice.
2. the method as described in claim 1, which is characterized in that the tamber parameter includes pitch parameter and overtone parameter, institute
Stating with reference to tone color frequency includes with reference to fundamental frequency and with reference to overtone frequency.
3. the method as described in claim 1, which is characterized in that the tamber parameter based in the user speech information and
Before the tamber parameter determination of the first verification voice refers to tone color frequency, the method also includes:
Similarity score is carried out to the user speech information based on the first verification voice, obtains similarity score result;
If the similarity score result meets tone color regularization condition, it is described based in the user speech information to trigger execution
Tamber parameter and the tamber parameter of the first verification voice the step of determining with reference to tone color frequency.
4. the method according to claim 1, which is characterized in that the sound based in the user speech information
The tamber parameter of color parameter and the first verification voice, which determines, refers to tone color frequency, comprising:
Determine the tone color frequency of the first verification voice;
Conversion coefficient is determined according to the tamber parameter of tamber parameter and the first verification voice in the user speech information
And corrected parameter;
According to tone color frequency, the conversion coefficient, the corrected parameter and the tone color frequency conversion of the first verification voice
Rule determines described with reference to tone color frequency.
5. method as claimed in claim 4, which is characterized in that the tamber parameter in the user speech information includes according to institute
State the first mean value and first variance that the tone color frequency of the target time section in user speech information determines, the first verification language
The tamber parameter of sound includes the second mean value determined according to the tone color frequency of the target time section of the first verification voice
And second variance;
The tamber parameter of the tamber parameter according in the user speech information and the first verification voice determines conversion
Coefficient and corrected parameter, comprising:
Rule is determined based on the first variance, the second variance and preset conversion factor, determines the conversion coefficient;
Rule is determined based on first mean value, second mean value and default corrected parameter, determines the corrected parameter.
6. the method as described in claim 1, which is characterized in that the lookup and voice included by the user speech information
First verification voice of content matching, comprising:
Obtain the voice content included in the user speech information;
Target the is searched from the first verification voice set according to the voice content included in the user speech information
One verification voice, the target first verify the institute for including in voice content included in voice and the user speech information
Voice content is stated to match;
Using the target first verification voice as with the matched institute of the voice content included by the user speech information
State the first verification voice.
7. the method as described in claim 1, which is characterized in that described to be generated and the user based on described with reference to tone color frequency
After the matched second verification voice of voice content included by voice messaging, the method also includes:
The second verification voice is played, corrects user speech in order to which user is based on the second verification voice.
8. the method as described in claim 1, which is characterized in that described to be generated and the user based on described with reference to tone color frequency
After the matched second verification voice of voice content included by voice messaging, the method also includes:
Obtain the different information between the second verification voice and the user speech information;
It is generated based on the different information and corrects prompt information, correct user's language in order to which user is based on the correction prompt information
Sound.
9. the method as described in claim 1, which is characterized in that described to be generated and the user based on described with reference to tone color frequency
After the matched second verification voice of voice content included by voice messaging, the method also includes:
User speech curve is generated according to the user speech information;
Voice curve is verified according to the second verification speech production;
Show the user speech curve and the verification voice curve, on a user interface in order to which user is based on the user
Voice curve and the verification voice curve correct user speech.
10. the method as described in claim 1, which is characterized in that described to be generated and the use based on described with reference to tone color frequency
The matched second verification voice of voice content included by the voice messaging of family, including;
Based on the tone color frequency for adjusting the first verification voice corresponding period with reference to tone color frequency, described second is obtained
Verify voice.
11. a kind of voice processing apparatus characterized by comprising
Acquiring unit for obtaining user speech information, and obtains the tamber parameter in the user speech information;
Processing unit, for searching and the matched first verification voice of voice content included by the user speech information;
The acquiring unit is also used to obtain the tamber parameter of the first verification voice;
The processing unit is also used to the sound based on tamber parameter and the first verification voice in the user speech information
Color parameter, which determines, refers to tone color frequency;
The processing unit, be also used to based on it is described with reference to tone color frequency generate with the user speech information included by described in
The matched second verification voice of voice content.
12. a kind of smart machine, which is characterized in that including processor and memory, the memory is for storing computer journey
Sequence, the computer program include program instruction, and the processor is configured for calling described program instruction, execute such as right
It is required that the described in any item method of speech processing of 1-10.
13. a kind of computer storage medium, which is characterized in that be stored with computer program in the computer storage medium and refer to
It enables, when the computer program instructions are executed by processor, for executing as at the described in any item voices of claim 1-10
Reason method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811452305.0A CN110164414B (en) | 2018-11-30 | 2018-11-30 | Voice processing method and device and intelligent equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811452305.0A CN110164414B (en) | 2018-11-30 | 2018-11-30 | Voice processing method and device and intelligent equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110164414A true CN110164414A (en) | 2019-08-23 |
CN110164414B CN110164414B (en) | 2023-02-14 |
Family
ID=67645231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811452305.0A Active CN110164414B (en) | 2018-11-30 | 2018-11-30 | Voice processing method and device and intelligent equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110164414B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111916106A (en) * | 2020-08-17 | 2020-11-10 | 牡丹江医学院 | Method for improving pronunciation quality in English teaching |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI277947B (en) * | 2005-09-14 | 2007-04-01 | Delta Electronics Inc | Interactive speech correcting method |
CN101359473A (en) * | 2007-07-30 | 2009-02-04 | 国际商业机器公司 | Auto speech conversion method and apparatus |
CN103531205A (en) * | 2013-10-09 | 2014-01-22 | 常州工学院 | Asymmetrical voice conversion method based on deep neural network feature mapping |
CN103594087A (en) * | 2013-11-08 | 2014-02-19 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving oral evaluation performance |
US20140170613A1 (en) * | 2011-05-10 | 2014-06-19 | Cooori Ehf | Language Learning System Adapted to Personalize Language Learning to Individual Users |
CN103886859A (en) * | 2014-02-14 | 2014-06-25 | 河海大学常州校区 | Voice conversion method based on one-to-many codebook mapping |
CN104123933A (en) * | 2014-08-01 | 2014-10-29 | 中国科学院自动化研究所 | Self-adaptive non-parallel training based voice conversion method |
CN105786801A (en) * | 2014-12-22 | 2016-07-20 | 中兴通讯股份有限公司 | Speech translation method, communication method and related device |
CN105845125A (en) * | 2016-05-18 | 2016-08-10 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and speech synthesis device |
CN106128477A (en) * | 2016-06-23 | 2016-11-16 | 南阳理工学院 | A kind of spoken identification correction system |
CN107424450A (en) * | 2017-08-07 | 2017-12-01 | 英华达(南京)科技有限公司 | Pronunciation correction system and method |
CN108198566A (en) * | 2018-01-24 | 2018-06-22 | 咪咕文化科技有限公司 | Information processing method and device, electronic device and storage medium |
CN108510995A (en) * | 2018-02-06 | 2018-09-07 | 杭州电子科技大学 | Identity information hidden method towards voice communication |
CN108806719A (en) * | 2018-06-19 | 2018-11-13 | 合肥凌极西雅电子科技有限公司 | Interacting language learning system and its method |
-
2018
- 2018-11-30 CN CN201811452305.0A patent/CN110164414B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI277947B (en) * | 2005-09-14 | 2007-04-01 | Delta Electronics Inc | Interactive speech correcting method |
CN101359473A (en) * | 2007-07-30 | 2009-02-04 | 国际商业机器公司 | Auto speech conversion method and apparatus |
US20140170613A1 (en) * | 2011-05-10 | 2014-06-19 | Cooori Ehf | Language Learning System Adapted to Personalize Language Learning to Individual Users |
CN103531205A (en) * | 2013-10-09 | 2014-01-22 | 常州工学院 | Asymmetrical voice conversion method based on deep neural network feature mapping |
CN103594087A (en) * | 2013-11-08 | 2014-02-19 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving oral evaluation performance |
CN103886859A (en) * | 2014-02-14 | 2014-06-25 | 河海大学常州校区 | Voice conversion method based on one-to-many codebook mapping |
CN104123933A (en) * | 2014-08-01 | 2014-10-29 | 中国科学院自动化研究所 | Self-adaptive non-parallel training based voice conversion method |
CN105786801A (en) * | 2014-12-22 | 2016-07-20 | 中兴通讯股份有限公司 | Speech translation method, communication method and related device |
CN105845125A (en) * | 2016-05-18 | 2016-08-10 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and speech synthesis device |
WO2017197809A1 (en) * | 2016-05-18 | 2017-11-23 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and speech synthesis device |
CN106128477A (en) * | 2016-06-23 | 2016-11-16 | 南阳理工学院 | A kind of spoken identification correction system |
CN107424450A (en) * | 2017-08-07 | 2017-12-01 | 英华达(南京)科技有限公司 | Pronunciation correction system and method |
CN108198566A (en) * | 2018-01-24 | 2018-06-22 | 咪咕文化科技有限公司 | Information processing method and device, electronic device and storage medium |
CN108510995A (en) * | 2018-02-06 | 2018-09-07 | 杭州电子科技大学 | Identity information hidden method towards voice communication |
CN108806719A (en) * | 2018-06-19 | 2018-11-13 | 合肥凌极西雅电子科技有限公司 | Interacting language learning system and its method |
Non-Patent Citations (2)
Title |
---|
HSIEN-CHENG LIAO等: "pronouciation correction tone", 《SYSTEM》 * |
赵博 等: "基于语音识别技术的英语口语教学系统", 《计算机应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111916106A (en) * | 2020-08-17 | 2020-11-10 | 牡丹江医学院 | Method for improving pronunciation quality in English teaching |
Also Published As
Publication number | Publication date |
---|---|
CN110164414B (en) | 2023-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949783B (en) | Song synthesis method and system | |
CN110148427A (en) | Audio-frequency processing method, device, system, storage medium, terminal and server | |
CN106531185B (en) | voice evaluation method and system based on voice similarity | |
US8392190B2 (en) | Systems and methods for assessment of non-native spontaneous speech | |
US7299188B2 (en) | Method and apparatus for providing an interactive language tutor | |
US20190130894A1 (en) | Text-based insertion and replacement in audio narration | |
US10134300B2 (en) | System and method for computer-assisted instruction of a music language | |
US11335324B2 (en) | Synthesized data augmentation using voice conversion and speech recognition models | |
US8447603B2 (en) | Rating speech naturalness of speech utterances based on a plurality of human testers | |
US20110123965A1 (en) | Speech Processing and Learning | |
Peabody | Methods for pronunciation assessment in computer aided language learning | |
CN104537926B (en) | Listen barrier childrenese training accessory system and method | |
Ahsiah et al. | Tajweed checking system to support recitation | |
CN110246489A (en) | Audio recognition method and system for children | |
CN110598208A (en) | AI/ML enhanced pronunciation course design and personalized exercise planning method | |
CN110164414A (en) | Method of speech processing, device and smart machine | |
JP2017167526A (en) | Multiple stream spectrum expression for synthesis of statistical parametric voice | |
Badenhorst et al. | The limitations of data perturbation for ASR of learner data in under-resourced languages | |
van Doremalen | Developing automatic speech recognition-enabled language learning applications: from theory to practice | |
Mendes et al. | Speaker identification using phonetic segmentation and normalized relative delays of source harmonics | |
CN111179902B (en) | Speech synthesis method, equipment and medium for simulating resonance cavity based on Gaussian model | |
CN112967538B (en) | English pronunciation information acquisition system | |
Raitio | Voice source modelling techniques for statistical parametric speech synthesis | |
Oyo et al. | A preliminary speech learning tool for improvement of African English accents | |
Danis | Developing successful speakers for an automatic speech recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |