CN113345467B - Spoken language pronunciation evaluation method, device, medium and equipment - Google Patents

Spoken language pronunciation evaluation method, device, medium and equipment Download PDF

Info

Publication number
CN113345467B
CN113345467B CN202110545441.XA CN202110545441A CN113345467B CN 113345467 B CN113345467 B CN 113345467B CN 202110545441 A CN202110545441 A CN 202110545441A CN 113345467 B CN113345467 B CN 113345467B
Authority
CN
China
Prior art keywords
evaluated
acoustic
disturbance
acoustic feature
frequency band
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110545441.XA
Other languages
Chinese (zh)
Other versions
CN113345467A (en
Inventor
王佳珺
杨悦
唐浩元
王欢良
代大明
张李
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qdreamer Network Technology Co ltd
Original Assignee
Suzhou Qdreamer Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Qdreamer Network Technology Co ltd filed Critical Suzhou Qdreamer Network Technology Co ltd
Priority to CN202110545441.XA priority Critical patent/CN113345467B/en
Publication of CN113345467A publication Critical patent/CN113345467A/en
Application granted granted Critical
Publication of CN113345467B publication Critical patent/CN113345467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a spoken language pronunciation evaluation method, a device, a medium and equipment, wherein the method comprises the following steps: acquiring audio to be evaluated and text to be evaluated from a spoken language to be evaluated; extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature; generating a phoneme sequence from a text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network; the method comprises the steps of inputting second acoustic features into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information, wherein the second acoustic features are obtained after pre-emphasis, windowing and framing and frequency domain random disturbance are performed on the audio features, so that signal distortion caused by front-end signal processing is simulated, and the extraction performance of the audio features in an actual noisy environment is improved; and a decoding network is constructed through the text to be evaluated, word pronunciation generation is performed by combining the context, the pronunciation evaluation accuracy under the specific pronunciation phenomenon is improved, and the accuracy of spoken language pronunciation evaluation is ensured.

Description

Spoken language pronunciation evaluation method, device, medium and equipment
Technical Field
The invention belongs to the field of language identification, and particularly relates to a spoken language pronunciation evaluation method, device, medium and equipment.
Background
Computer-aided pronunciation scoring is an automatic pronunciation level evaluation method from which a learner of a language can obtain real-time feedback of pronunciation accuracy.
The main current computer aided pronunciation scoring system is based on an automatic voice recognition framework and generally comprises three parts of an acoustic model, decoding and GOP scoring, wherein the basic thought is to calculate acoustic information such as phoneme likelihood/posterior probability/duration of audio to be evaluated in a decoding network, and then use the acoustic information to calculate GOP scoring.
However, in practical applications, the current methods have the following drawbacks:
(1) the acoustic model is usually obtained by using standard audio training of a quiet scene, which results in that the pronunciation evaluation technology is usually limited to the quiet environment, and under a complex human voice environment such as classroom noise, audio after front-end signal processing is usually directly sent to an evaluation module, and at the moment, voice distortion introduced by the front-end signal processing can cause serious degradation of evaluation performance, so that the pronunciation evaluation technology is difficult to use in an actual English classroom.
(2) The decoding network is constructed based on the pronunciation dictionary to obtain the phoneme sequence of each word, but in practice, the corresponding reasonable phoneme sequences of the same word in different contexts may be different, which may lead to erroneous judgment in a special pronunciation phenomenon such as a loss of explosion/burst or the like.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a spoken language pronunciation assessment method, device, medium and equipment, which are used for improving the extraction performance of audio features in an actual noisy environment by carrying out data amplification on the audio features and simulating signal distortion caused by front-end signal processing, and obtaining more reasonable phoneme information through a decoding network related to a text construction context to be assessed, thereby improving pronunciation assessment accuracy.
In order to achieve the above objective, an embodiment of the present invention provides a spoken utterance evaluation method, including the following steps:
acquiring audio to be evaluated and text to be evaluated from a spoken language to be evaluated;
extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature;
generating a phoneme sequence from the text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network;
and inputting the second acoustic characteristics into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information.
Further, the method of extracting the first acoustic feature from the audio data segment includes: pre-emphasis is carried out on the audio to be evaluated, windowing and framing are carried out, and the output of the Mel frequency spectrum filter bank is used as a first acoustic feature.
Further, the method for generating the second acoustic feature after frequency disturbance of the first acoustic feature is as follows:
s100, randomly generating a starting frequency band number in uniform distribution according to the set starting disturbance coefficient
i to uniform (0, ratio 1. F), wherein,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient;
s101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times. F), wherein,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient;
s102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature.
Further, the step of generating a phoneme sequence from the text to be evaluated is as follows:
s200, performing intent division on a text to be evaluated by using a pre-trained spoken language position prediction model to obtain an intent division boundary; s201, in each intention group, a phoneme sequence of the whole intention group is given by combining predicted pronunciation rules such as continuous reading/blasting/breakdown losing and the like; simultaneously recording the corresponding relation between the phoneme sequence and the word so as to be used for outputting the score of the subsequent word; wherein,,
when modeling the position-dependent phonemes, adjacent words connected by the explosion/breakdown phenomenon are read/lost, and the generated phoneme sequence uses the form of the middle phonemes except the head and tail phonemes.
Further, the acoustic information obtained by inputting the second acoustic feature into the decoding network includes a phoneme likelihood, a posterior probability, and a duration.
An embodiment of the present invention provides a spoken language pronunciation evaluation device, including:
the acquisition module is configured to acquire audio to be evaluated and text to be evaluated from the spoken language to be evaluated;
the feature extraction module is configured to extract a first acoustic feature from the audio to be evaluated, and generate a second acoustic feature after frequency disturbance of the first acoustic feature;
the decoding network module is configured to generate a phoneme sequence from the text to be evaluated, and then combine the phoneme sequence with an HMM model to generate a decoding network;
and the GOD scoring module is configured to input the second acoustic characteristics into a decoding network to obtain acoustic information, and then score calculation is performed by utilizing the acoustic information.
An embodiment of the present invention provides a computer-readable storage medium storing program code which, when executed by a processor, implements the steps of the spoken utterance evaluation method described above.
An embodiment of the present invention provides an electronic device including a processor and a storage medium storing program code which, when executed by the processor, implements the steps of a spoken utterance evaluation method as described above.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
according to the spoken language pronunciation evaluation method, device, medium and equipment, the second acoustic characteristic is obtained after pre-emphasis, windowing and framing and frequency domain random disturbance are carried out on the audio characteristic, so that signal distortion caused by front-end signal processing is simulated, and the extraction performance of the audio characteristic in an actual noisy environment is improved; and a decoding network is constructed through the text to be evaluated, word pronunciation generation is performed by combining the context, and the pronunciation evaluation accuracy under the specific pronunciation phenomenon is improved, so that the accuracy of spoken language pronunciation evaluation is ensured.
Drawings
The technical scheme of the invention is further described below with reference to the accompanying drawings:
FIG. 1 is a flow chart of a spoken utterance evaluation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of boundaries of intent group divisions in a spoken utterance evaluation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a phoneme sequence in a spoken utterance evaluation method according to an embodiment of the present invention;
fig. 4 is a spoken language pronunciation evaluation device according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail with reference to the accompanying drawings and specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.
Referring to fig. 1, a spoken language pronunciation evaluation method according to an embodiment of the invention includes the following steps:
s001, acquiring audio to be evaluated and text to be evaluated from the spoken language to be evaluated.
After the audio to be evaluated and the text to be evaluated are obtained, S002 or S003 may be executed, and in this embodiment, the step S002 is executed first: extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature.
Specifically, after the audio to be evaluated is extracted, pre-emphasis is firstly carried out on the audio to be evaluated, windowing and framing are carried out, and the output of a Mel frequency spectrum filter bank is used as a first acoustic feature; then, frequency domain random disturbance is carried out on the first acoustic feature to generate a second acoustic feature, and the operation method is as follows:
s100, randomly generating a starting frequency band number in uniform distribution according to the set starting disturbance coefficient
i to uniform (0, ratio 1. F), wherein,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient.
S101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times. F), wherein,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient.
S102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature; the frequency bands may be selected to have uniform weighting coefficients, or randomly generated weighting coefficients may be used.
In addition, on the basis of the steps, all frequency bands can be divided into a plurality of blocks, and each block uses different initial disturbance coefficients, frequency band disturbance coefficients and weighting parameters, so that the accuracy of the frequency bands is improved.
In S002, the conventional quiet scene data is subjected to data amplification through the above steps, that is, the audio feature is pre-emphasized, windowed and framed, and frequency domain random disturbance to obtain the second acoustic feature, so as to simulate signal distortion caused by front-end signal processing, and improve the extraction performance of the audio feature in an actual noisy environment.
Then, S003: and generating a decoding network by combining the phoneme sequence with an HMM model from the phoneme sequence generated in the text to be evaluated.
Specifically, the steps of generating a phoneme sequence from the text to be evaluated are as follows:
s200, performing intent division on the text to be evaluated by using a pre-trained spoken language location prediction model to obtain intent division boundaries, and referring to FIG. 2. S201, in each intention group, a phoneme sequence of the whole intention group is given by combining predicted pronunciation rules such as continuous reading/blasting/breakdown losing and the like; and simultaneously recording the corresponding relation between the phoneme sequence and the word so as to be used for outputting the subsequent word score.
When modeling is performed using phonemes related to positions, adjacent words connected by the explosion/breakdown phenomenon are read/lost continuously, the generated phoneme sequences except for the head and tail phonemes all use intermediate phoneme forms, and in particular, see fig. 3, p_b represents a starting phoneme, p_i represents an intermediate phoneme, and p_e represents an end phoneme.
And finally, S004, inputting the second acoustic features into a decoding network to obtain acoustic information, and then performing GOP scoring calculation by utilizing the acoustic information, wherein the acoustic information which is obtained by inputting the second acoustic features into the decoding network in the embodiment comprises phoneme likelihood, posterior probability and duration information, and the GOP calculates a final score by utilizing the acoustic information.
In S004, a decoding network is constructed through the text to be evaluated, word pronunciation is generated by combining the context, and the pronunciation evaluation accuracy under the specific pronunciation phenomenon is improved, so that the accuracy of spoken language pronunciation evaluation is ensured.
The invention also provides a spoken language pronunciation evaluation device, which comprises:
and the acquisition module is configured to acquire the audio to be evaluated and the text to be evaluated from the spoken language to be evaluated.
The feature extraction module is configured to extract a first acoustic feature from the audio to be evaluated, and generate a second acoustic feature after frequency disturbance of the first acoustic feature.
And the decoding network module is configured to generate a phoneme sequence from the text to be evaluated, and then combine the phoneme sequence with an HMM model to generate a decoding network.
And the GOD scoring module is configured to input the second acoustic characteristics into a decoding network to obtain acoustic information, and then score calculation is performed by utilizing the acoustic information.
The invention also discloses a computer readable storage medium, on which a computer program (i.e. a program product) is stored which, when being executed by a processor, carries out the steps described in the above-mentioned method embodiments.
For example, audio to be evaluated and text to be evaluated are obtained from a spoken language to be evaluated; extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature; generating a phoneme sequence from the text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network; inputting the second acoustic characteristics into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information; the specific implementation of each step is not repeated here.
Next, the invention also discloses an electronic device, which can be a computer system or a server; components of an electronic device may include, but are not limited to: one or more processors or processing units, a system memory, and a bus that connects the different system components (including the system memory and the processing units).
The foregoing is merely a specific application example of the present invention, and the protection scope of the present invention is not limited in any way. All technical schemes formed by equivalent transformation or equivalent substitution fall within the protection scope of the invention.

Claims (7)

1. A spoken language pronunciation evaluation method is characterized by comprising the following steps:
acquiring audio to be evaluated and text to be evaluated from a spoken language to be evaluated;
extracting a first acoustic feature from the audio to be evaluated, and generating a second acoustic feature after frequency disturbance of the first acoustic feature, wherein the method for generating the second acoustic feature after frequency disturbance of the first acoustic feature is as follows:
s100, randomly generating an initial frequency band number i-unitorm (0, ratio 1. Times.F) in an even distribution mode according to the set initial disturbance coefficient; wherein,,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient;
s101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times.F); wherein,,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient;
s102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature;
generating a phoneme sequence from the text to be evaluated, and then combining the phoneme sequence with an HMM model to generate a decoding network;
and inputting the second acoustic characteristics into a decoding network to obtain acoustic information, and performing GOP scoring calculation by using the acoustic information.
2. The spoken utterance evaluation method of claim 1, characterized in that the method of extracting first acoustic features from the audio data segment comprises: pre-emphasis is carried out on the audio to be evaluated, windowing and framing are carried out, and the output of the Mel frequency spectrum filter bank is used as a first acoustic feature.
3. The spoken utterance evaluation method of claim 1, wherein: the steps of generating the phoneme sequence from the text to be evaluated are as follows:
s200, performing intent division on a text to be evaluated by using a pre-trained spoken language position prediction model to obtain an intent division boundary;
s201, in each intention group, a phoneme sequence of the whole intention group is given by combining a predicted continuous reading/blasting/breakdown losing sound rule; simultaneously recording the corresponding relation between the phoneme sequence and the word so as to be used for outputting the score of the subsequent word; wherein,,
when modeling the position-dependent phonemes, adjacent words connected by the explosion/breakdown phenomenon are read/lost, and the generated phoneme sequence uses the form of the middle phonemes except the head and tail phonemes.
4. The spoken utterance evaluation method of claim 1, wherein: the acoustic information obtained by using the second acoustic feature input into the decoding network includes a phoneme likelihood, a posterior probability, and a duration.
5. A spoken utterance evaluation device, comprising:
the acquisition module is configured to acquire audio to be evaluated and text to be evaluated from the spoken language to be evaluated;
the feature extraction module is configured to extract a first acoustic feature from the audio to be evaluated, generate a second acoustic feature after frequency disturbance of the first acoustic feature, and generate the second acoustic feature after frequency disturbance of the first acoustic feature, wherein the method comprises the following steps:
s100, randomly generating an initial frequency band number i-unitorm (0, ratio 1. Times.F) in an even distribution mode according to the set initial disturbance coefficient; wherein,,
i is the generated initial frequency band number, F is the maximum frequency band corresponding number of the input characteristic, and ratio1 is the initial disturbance coefficient;
s101, randomly generating disturbance frequency bandwidth in uniform distribution according to the set frequency disturbance coefficient
K-unitorm (0, ratio 2. Times.F); wherein,,
k is the generated disturbance frequency bandwidth, F is the maximum frequency band corresponding number of the input characteristic, and ratio2 is the frequency band disturbance coefficient;
s102, weighting the selected [ i, i+K ] frequency band to generate a second acoustic feature;
the decoding network module is configured to generate a phoneme sequence from the text to be evaluated, and then combine the phoneme sequence with an HMM model to generate a decoding network;
and the GOP scoring module is configured to input the second acoustic characteristics into a decoding network to obtain acoustic information, and then score calculation is performed by utilizing the acoustic information.
6. A computer readable storage medium storing program code which, when executed by a processor, implements the method of one of claims 1-4.
7. An electronic device comprising a processor and a storage medium storing program code which, when executed by the processor, implements the method of one of claims 1-4.
CN202110545441.XA 2021-05-19 2021-05-19 Spoken language pronunciation evaluation method, device, medium and equipment Active CN113345467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110545441.XA CN113345467B (en) 2021-05-19 2021-05-19 Spoken language pronunciation evaluation method, device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110545441.XA CN113345467B (en) 2021-05-19 2021-05-19 Spoken language pronunciation evaluation method, device, medium and equipment

Publications (2)

Publication Number Publication Date
CN113345467A CN113345467A (en) 2021-09-03
CN113345467B true CN113345467B (en) 2023-10-20

Family

ID=77469439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110545441.XA Active CN113345467B (en) 2021-05-19 2021-05-19 Spoken language pronunciation evaluation method, device, medium and equipment

Country Status (1)

Country Link
CN (1) CN113345467B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102044248A (en) * 2009-10-10 2011-05-04 北京理工大学 Objective evaluating method for audio quality of streaming media
CN102044247A (en) * 2009-10-10 2011-05-04 北京理工大学 Objective evaluation method for VoIP speech
CN106297828A (en) * 2016-08-12 2017-01-04 苏州驰声信息科技有限公司 The detection method of a kind of mistake utterance detection based on degree of depth study and device
CN110033784A (en) * 2019-04-10 2019-07-19 北京达佳互联信息技术有限公司 A kind of detection method of audio quality, device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239444A1 (en) * 2006-03-29 2007-10-11 Motorola, Inc. Voice signal perturbation for speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102044248A (en) * 2009-10-10 2011-05-04 北京理工大学 Objective evaluating method for audio quality of streaming media
CN102044247A (en) * 2009-10-10 2011-05-04 北京理工大学 Objective evaluation method for VoIP speech
CN106297828A (en) * 2016-08-12 2017-01-04 苏州驰声信息科技有限公司 The detection method of a kind of mistake utterance detection based on degree of depth study and device
CN110033784A (en) * 2019-04-10 2019-07-19 北京达佳互联信息技术有限公司 A kind of detection method of audio quality, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
网络音频质量无参考客观评估;杨佳俊;信息科技辑(第03期);第1-92页 *

Also Published As

Publication number Publication date
CN113345467A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN107492382B (en) Voiceprint information extraction method and device based on neural network
CN105976812B (en) A kind of audio recognition method and its equipment
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
EP4018437B1 (en) Optimizing a keyword spotting system
CN110827801A (en) Automatic voice recognition method and system based on artificial intelligence
CN108766415B (en) Voice evaluation method
JP2008152262A (en) Method and apparatus for transforming speech feature vector
CN112233698B (en) Character emotion recognition method, device, terminal equipment and storage medium
CN112397056B (en) Voice evaluation method and computer storage medium
US20230070000A1 (en) Speech recognition method and apparatus, device, storage medium, and program product
CN111667834B (en) Hearing-aid equipment and hearing-aid method
CN109300339A (en) A kind of exercising method and system of Oral English Practice
CN106653002A (en) Literal live broadcasting method and platform
CN114708854A (en) Voice recognition method and device, electronic equipment and storage medium
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
KR100682909B1 (en) Method and apparatus for recognizing speech
Nagano et al. Data augmentation based on vowel stretch for improving children's speech recognition
KR102167157B1 (en) Voice recognition considering utterance variation
Chauhan et al. Emotion recognition using LP residual
CN111326170A (en) Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution
US20240013775A1 (en) Patched multi-condition training for robust speech recognition
JP2011107314A (en) Speech recognition device, speech recognition method and speech recognition program
CN113345467B (en) Spoken language pronunciation evaluation method, device, medium and equipment
CN111640423A (en) Word boundary estimation method and device and electronic equipment
CN114067807A (en) Audio data processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant