CN113345467B - Spoken language pronunciation evaluation method, device, medium and equipment - Google Patents
Spoken language pronunciation evaluation method, device, medium and equipment Download PDFInfo
- Publication number
- CN113345467B CN113345467B CN202110545441.XA CN202110545441A CN113345467B CN 113345467 B CN113345467 B CN 113345467B CN 202110545441 A CN202110545441 A CN 202110545441A CN 113345467 B CN113345467 B CN 113345467B
- Authority
- CN
- China
- Prior art keywords
- evaluated
- acoustic
- disturbance
- acoustic feature
- frequency band
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000009432 framing Methods 0.000 claims abstract description 5
- 230000015556 catabolic process Effects 0.000 claims description 7
- 238000009827 uniform distribution Methods 0.000 claims description 6
- 238000004880 explosion Methods 0.000 claims description 4
- 238000005422 blasting Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 2
- 238000009826 distribution Methods 0.000 claims 2
- 230000003321 amplification Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a spoken language pronunciation evaluation method, a device, a medium and equipment, wherein the method comprises the following steps: acquiring audio to be evaluated and text to be evaluated from a spoken language to be evaluated; extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature; generating a phoneme sequence from a text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network; the method comprises the steps of inputting second acoustic features into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information, wherein the second acoustic features are obtained after pre-emphasis, windowing and framing and frequency domain random disturbance are performed on the audio features, so that signal distortion caused by front-end signal processing is simulated, and the extraction performance of the audio features in an actual noisy environment is improved; and a decoding network is constructed through the text to be evaluated, word pronunciation generation is performed by combining the context, the pronunciation evaluation accuracy under the specific pronunciation phenomenon is improved, and the accuracy of spoken language pronunciation evaluation is ensured.
Description
Technical Field
The invention belongs to the field of language identification, and particularly relates to a spoken language pronunciation evaluation method, device, medium and equipment.
Background
Computer-aided pronunciation scoring is an automatic pronunciation level evaluation method from which a learner of a language can obtain real-time feedback of pronunciation accuracy.
The main current computer aided pronunciation scoring system is based on an automatic voice recognition framework and generally comprises three parts of an acoustic model, decoding and GOP scoring, wherein the basic thought is to calculate acoustic information such as phoneme likelihood/posterior probability/duration of audio to be evaluated in a decoding network, and then use the acoustic information to calculate GOP scoring.
However, in practical applications, the current methods have the following drawbacks:
(1) the acoustic model is usually obtained by using standard audio training of a quiet scene, which results in that the pronunciation evaluation technology is usually limited to the quiet environment, and under a complex human voice environment such as classroom noise, audio after front-end signal processing is usually directly sent to an evaluation module, and at the moment, voice distortion introduced by the front-end signal processing can cause serious degradation of evaluation performance, so that the pronunciation evaluation technology is difficult to use in an actual English classroom.
(2) The decoding network is constructed based on the pronunciation dictionary to obtain the phoneme sequence of each word, but in practice, the corresponding reasonable phoneme sequences of the same word in different contexts may be different, which may lead to erroneous judgment in a special pronunciation phenomenon such as a loss of explosion/burst or the like.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a spoken language pronunciation assessment method, device, medium and equipment, which are used for improving the extraction performance of audio features in an actual noisy environment by carrying out data amplification on the audio features and simulating signal distortion caused by front-end signal processing, and obtaining more reasonable phoneme information through a decoding network related to a text construction context to be assessed, thereby improving pronunciation assessment accuracy.
In order to achieve the above objective, an embodiment of the present invention provides a spoken utterance evaluation method, including the following steps:
acquiring audio to be evaluated and text to be evaluated from a spoken language to be evaluated;
extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature;
generating a phoneme sequence from the text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network;
and inputting the second acoustic characteristics into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information.
Further, the method of extracting the first acoustic feature from the audio data segment includes: pre-emphasis is carried out on the audio to be evaluated, windowing and framing are carried out, and the output of the Mel frequency spectrum filter bank is used as a first acoustic feature.
Further, the method for generating the second acoustic feature after frequency disturbance of the first acoustic feature is as follows:
s100, randomly generating a starting frequency band number in uniform distribution according to the set starting disturbance coefficient
i to uniform (0, ratio 1. F), wherein,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient;
s101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times. F), wherein,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient;
s102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature.
Further, the step of generating a phoneme sequence from the text to be evaluated is as follows:
s200, performing intent division on a text to be evaluated by using a pre-trained spoken language position prediction model to obtain an intent division boundary; s201, in each intention group, a phoneme sequence of the whole intention group is given by combining predicted pronunciation rules such as continuous reading/blasting/breakdown losing and the like; simultaneously recording the corresponding relation between the phoneme sequence and the word so as to be used for outputting the score of the subsequent word; wherein,,
when modeling the position-dependent phonemes, adjacent words connected by the explosion/breakdown phenomenon are read/lost, and the generated phoneme sequence uses the form of the middle phonemes except the head and tail phonemes.
Further, the acoustic information obtained by inputting the second acoustic feature into the decoding network includes a phoneme likelihood, a posterior probability, and a duration.
An embodiment of the present invention provides a spoken language pronunciation evaluation device, including:
the acquisition module is configured to acquire audio to be evaluated and text to be evaluated from the spoken language to be evaluated;
the feature extraction module is configured to extract a first acoustic feature from the audio to be evaluated, and generate a second acoustic feature after frequency disturbance of the first acoustic feature;
the decoding network module is configured to generate a phoneme sequence from the text to be evaluated, and then combine the phoneme sequence with an HMM model to generate a decoding network;
and the GOD scoring module is configured to input the second acoustic characteristics into a decoding network to obtain acoustic information, and then score calculation is performed by utilizing the acoustic information.
An embodiment of the present invention provides a computer-readable storage medium storing program code which, when executed by a processor, implements the steps of the spoken utterance evaluation method described above.
An embodiment of the present invention provides an electronic device including a processor and a storage medium storing program code which, when executed by the processor, implements the steps of a spoken utterance evaluation method as described above.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
according to the spoken language pronunciation evaluation method, device, medium and equipment, the second acoustic characteristic is obtained after pre-emphasis, windowing and framing and frequency domain random disturbance are carried out on the audio characteristic, so that signal distortion caused by front-end signal processing is simulated, and the extraction performance of the audio characteristic in an actual noisy environment is improved; and a decoding network is constructed through the text to be evaluated, word pronunciation generation is performed by combining the context, and the pronunciation evaluation accuracy under the specific pronunciation phenomenon is improved, so that the accuracy of spoken language pronunciation evaluation is ensured.
Drawings
The technical scheme of the invention is further described below with reference to the accompanying drawings:
FIG. 1 is a flow chart of a spoken utterance evaluation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of boundaries of intent group divisions in a spoken utterance evaluation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a phoneme sequence in a spoken utterance evaluation method according to an embodiment of the present invention;
fig. 4 is a spoken language pronunciation evaluation device according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail with reference to the accompanying drawings and specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.
Referring to fig. 1, a spoken language pronunciation evaluation method according to an embodiment of the invention includes the following steps:
s001, acquiring audio to be evaluated and text to be evaluated from the spoken language to be evaluated.
After the audio to be evaluated and the text to be evaluated are obtained, S002 or S003 may be executed, and in this embodiment, the step S002 is executed first: extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature.
Specifically, after the audio to be evaluated is extracted, pre-emphasis is firstly carried out on the audio to be evaluated, windowing and framing are carried out, and the output of a Mel frequency spectrum filter bank is used as a first acoustic feature; then, frequency domain random disturbance is carried out on the first acoustic feature to generate a second acoustic feature, and the operation method is as follows:
s100, randomly generating a starting frequency band number in uniform distribution according to the set starting disturbance coefficient
i to uniform (0, ratio 1. F), wherein,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient.
S101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times. F), wherein,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient.
S102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature; the frequency bands may be selected to have uniform weighting coefficients, or randomly generated weighting coefficients may be used.
In addition, on the basis of the steps, all frequency bands can be divided into a plurality of blocks, and each block uses different initial disturbance coefficients, frequency band disturbance coefficients and weighting parameters, so that the accuracy of the frequency bands is improved.
In S002, the conventional quiet scene data is subjected to data amplification through the above steps, that is, the audio feature is pre-emphasized, windowed and framed, and frequency domain random disturbance to obtain the second acoustic feature, so as to simulate signal distortion caused by front-end signal processing, and improve the extraction performance of the audio feature in an actual noisy environment.
Then, S003: and generating a decoding network by combining the phoneme sequence with an HMM model from the phoneme sequence generated in the text to be evaluated.
Specifically, the steps of generating a phoneme sequence from the text to be evaluated are as follows:
s200, performing intent division on the text to be evaluated by using a pre-trained spoken language location prediction model to obtain intent division boundaries, and referring to FIG. 2. S201, in each intention group, a phoneme sequence of the whole intention group is given by combining predicted pronunciation rules such as continuous reading/blasting/breakdown losing and the like; and simultaneously recording the corresponding relation between the phoneme sequence and the word so as to be used for outputting the subsequent word score.
When modeling is performed using phonemes related to positions, adjacent words connected by the explosion/breakdown phenomenon are read/lost continuously, the generated phoneme sequences except for the head and tail phonemes all use intermediate phoneme forms, and in particular, see fig. 3, p_b represents a starting phoneme, p_i represents an intermediate phoneme, and p_e represents an end phoneme.
And finally, S004, inputting the second acoustic features into a decoding network to obtain acoustic information, and then performing GOP scoring calculation by utilizing the acoustic information, wherein the acoustic information which is obtained by inputting the second acoustic features into the decoding network in the embodiment comprises phoneme likelihood, posterior probability and duration information, and the GOP calculates a final score by utilizing the acoustic information.
In S004, a decoding network is constructed through the text to be evaluated, word pronunciation is generated by combining the context, and the pronunciation evaluation accuracy under the specific pronunciation phenomenon is improved, so that the accuracy of spoken language pronunciation evaluation is ensured.
The invention also provides a spoken language pronunciation evaluation device, which comprises:
and the acquisition module is configured to acquire the audio to be evaluated and the text to be evaluated from the spoken language to be evaluated.
The feature extraction module is configured to extract a first acoustic feature from the audio to be evaluated, and generate a second acoustic feature after frequency disturbance of the first acoustic feature.
And the decoding network module is configured to generate a phoneme sequence from the text to be evaluated, and then combine the phoneme sequence with an HMM model to generate a decoding network.
And the GOD scoring module is configured to input the second acoustic characteristics into a decoding network to obtain acoustic information, and then score calculation is performed by utilizing the acoustic information.
The invention also discloses a computer readable storage medium, on which a computer program (i.e. a program product) is stored which, when being executed by a processor, carries out the steps described in the above-mentioned method embodiments.
For example, audio to be evaluated and text to be evaluated are obtained from a spoken language to be evaluated; extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature; generating a phoneme sequence from the text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network; inputting the second acoustic characteristics into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information; the specific implementation of each step is not repeated here.
Next, the invention also discloses an electronic device, which can be a computer system or a server; components of an electronic device may include, but are not limited to: one or more processors or processing units, a system memory, and a bus that connects the different system components (including the system memory and the processing units).
The foregoing is merely a specific application example of the present invention, and the protection scope of the present invention is not limited in any way. All technical schemes formed by equivalent transformation or equivalent substitution fall within the protection scope of the invention.
Claims (7)
1. A spoken language pronunciation evaluation method is characterized by comprising the following steps:
acquiring audio to be evaluated and text to be evaluated from a spoken language to be evaluated;
extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature, wherein the method for generating the second acoustic feature after frequency disturbance of the first acoustic feature is as follows:
s100, randomly generating an initial frequency band number i-unitorm (0, ratio 1. Times.F) in an even distribution mode according to the set initial disturbance coefficient; wherein,,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient;
s101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times.F); wherein,,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient;
s102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature;
generating a phoneme sequence from the text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network;
and inputting the second acoustic characteristics into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information.
2. The spoken utterance evaluation method of claim 1, characterized in that the method of extracting first acoustic features from the audio data segment comprises: pre-emphasis is carried out on the audio to be evaluated, windowing and framing are carried out, and the output of the Mel frequency spectrum filter bank is used as a first acoustic feature.
3. The spoken utterance evaluation method of claim 1, wherein: the steps of generating the phoneme sequence from the text to be evaluated are as follows:
s200, performing intent division on a text to be evaluated by using a pre-trained spoken language position prediction model to obtain an intent division boundary;
s201, in each intention group, a phoneme sequence of the whole intention group is given by combining a predicted continuous reading/blasting/breakdown losing sound rule; simultaneously recording the corresponding relation between the phoneme sequence and the word so as to be used for outputting the score of the subsequent word; wherein,,
when modeling the position-dependent phonemes, adjacent words connected by the explosion/breakdown phenomenon are read/lost, and the generated phoneme sequence uses the form of the middle phonemes except the head and tail phonemes.
4. The spoken utterance evaluation method of claim 1, wherein: the acoustic information obtained by using the second acoustic feature input into the decoding network includes a phoneme likelihood, a posterior probability, and a duration.
5. A spoken utterance evaluation device, comprising:
the acquisition module is configured to acquire audio to be evaluated and text to be evaluated from the spoken language to be evaluated;
the feature extraction module is configured to extract a first acoustic feature from the audio to be evaluated, generate a second acoustic feature after frequency disturbance of the first acoustic feature, and generate the second acoustic feature after frequency disturbance of the first acoustic feature, wherein the method comprises the following steps:
s100, randomly generating an initial frequency band number i-unitorm (0, ratio 1. Times.F) in an even distribution mode according to the set initial disturbance coefficient; wherein,,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient;
s101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times.F); wherein,,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient;
s102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature;
the decoding network module is configured to generate a phoneme sequence from the text to be evaluated, and then combine the phoneme sequence with an HMM model to generate a decoding network;
and the GOP scoring module is configured to input the second acoustic characteristics into a decoding network to obtain acoustic information, and then score calculation is performed by utilizing the acoustic information.
6. A computer readable storage medium storing program code which, when executed by a processor, implements the method of one of claims 1-4.
7. An electronic device comprising a processor and a storage medium storing program code which, when executed by the processor, implements the method of one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110545441.XA CN113345467B (en) | 2021-05-19 | 2021-05-19 | Spoken language pronunciation evaluation method, device, medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110545441.XA CN113345467B (en) | 2021-05-19 | 2021-05-19 | Spoken language pronunciation evaluation method, device, medium and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113345467A CN113345467A (en) | 2021-09-03 |
CN113345467B true CN113345467B (en) | 2023-10-20 |
Family
ID=77469439
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110545441.XA Active CN113345467B (en) | 2021-05-19 | 2021-05-19 | Spoken language pronunciation evaluation method, device, medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113345467B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102044248A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Objective evaluating method for audio quality of streaming media |
CN102044247A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Objective evaluation method for VoIP speech |
CN106297828A (en) * | 2016-08-12 | 2017-01-04 | 苏州驰声信息科技有限公司 | The detection method of a kind of mistake utterance detection based on degree of depth study and device |
CN110033784A (en) * | 2019-04-10 | 2019-07-19 | 北京达佳互联信息技术有限公司 | A kind of detection method of audio quality, device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070239444A1 (en) * | 2006-03-29 | 2007-10-11 | Motorola, Inc. | Voice signal perturbation for speech recognition |
-
2021
- 2021-05-19 CN CN202110545441.XA patent/CN113345467B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102044248A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Objective evaluating method for audio quality of streaming media |
CN102044247A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Objective evaluation method for VoIP speech |
CN106297828A (en) * | 2016-08-12 | 2017-01-04 | 苏州驰声信息科技有限公司 | The detection method of a kind of mistake utterance detection based on degree of depth study and device |
CN110033784A (en) * | 2019-04-10 | 2019-07-19 | 北京达佳互联信息技术有限公司 | A kind of detection method of audio quality, device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
网络音频质量无参考客观评估;杨佳俊;信息科技辑(第03期);第1-92页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113345467A (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107492382B (en) | Voiceprint information extraction method and device based on neural network | |
CN105976812B (en) | A kind of audio recognition method and its equipment | |
CN110782872A (en) | Language identification method and device based on deep convolutional recurrent neural network | |
EP4018437B1 (en) | Optimizing a keyword spotting system | |
CN110827801A (en) | Automatic voice recognition method and system based on artificial intelligence | |
CN108766415B (en) | Voice evaluation method | |
JP2008152262A (en) | Method and apparatus for transforming speech feature vector | |
CN112233698B (en) | Character emotion recognition method, device, terminal equipment and storage medium | |
CN112397056B (en) | Voice evaluation method and computer storage medium | |
US20230070000A1 (en) | Speech recognition method and apparatus, device, storage medium, and program product | |
CN111667834B (en) | Hearing-aid equipment and hearing-aid method | |
CN109300339A (en) | A kind of exercising method and system of Oral English Practice | |
CN106653002A (en) | Literal live broadcasting method and platform | |
CN114708854A (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
KR100682909B1 (en) | Method and apparatus for recognizing speech | |
Nagano et al. | Data augmentation based on vowel stretch for improving children's speech recognition | |
KR102167157B1 (en) | Voice recognition considering utterance variation | |
Chauhan et al. | Emotion recognition using LP residual | |
CN111326170A (en) | Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution | |
US20240013775A1 (en) | Patched multi-condition training for robust speech recognition | |
JP2011107314A (en) | Speech recognition device, speech recognition method and speech recognition program | |
CN113345467B (en) | Spoken language pronunciation evaluation method, device, medium and equipment | |
CN111640423A (en) | Word boundary estimation method and device and electronic equipment | |
CN114067807A (en) | Audio data processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |