CN110176249A

CN110176249A - A kind of appraisal procedure and device of spoken language pronunciation

Info

Publication number: CN110176249A
Application number: CN201910266722.4A
Authority: CN
Inventors: 方敏; 蔡雅莉; 戚自力; 惠寅华; 彭书勇; 林远东
Original assignee: Suzhou Chisheng Information Technology Co Ltd
Current assignee: Suzhou Chisheng Information Technology Co Ltd
Priority date: 2019-04-03
Filing date: 2019-04-03
Publication date: 2019-08-27

Abstract

The present invention relates to a kind of appraisal procedure of spoken language pronunciation and devices, which comprises the spoken content of text for answering result of the examinee of acquisition；Extract each word pronunciation character in the content of text；Based on the pronunciation character of each word, determine whether the pronunciation of each word is correct；Based on the quantity of orthoepic word, the spoken language pronunciation result of examinee is assessed.The spoken language of examinee is answered, the present invention can determine the quantity of orthoepic word based on the pronunciation character of each word, so that the spoken language pronunciation effect for assessing examinee is good or bad, thus the targetedly spoken language proficiency of promotion examinee.

Description

A kind of appraisal procedure and device of spoken language pronunciation

Technical field

The present invention relates to field of computer technology, more particularly to the appraisal procedure and device of a kind of spoken language pronunciation.

Background technique

As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.With society The progress of continuous development and economical globalization tendency that can be economic, people are objective to the efficiency of language learning and language assessment Property, fairness and scale test propose increasingly higher demands.For example oral composition of Open-ended Question type in speaking test, story Repetition and picture talk etc. are to reflect an important topic type of the ability to express of examinee's spoken language.In general, teacher is in addition in terms of content Outside being judged, also judge the student pronunciation of words whether standard, and whole pronunciation situation.

Traditional speaking test points-scoring system is directly to learn Rating Model according to the total score labeled data of teacher's marking, is given A total score output out.And pronunciation of the student in oral expression whether standard, whole pronunciation what state has no way of learning.

Summary of the invention

Based on this, it is necessary to aiming at the problem that pronunciation of current speaking test is difficult to assess, provide a kind of commenting for spoken language pronunciation Estimate method and device.

A kind of appraisal procedure of spoken language pronunciation, which comprises

The spoken content of text for answering result of the examinee of acquisition；

Extract the pronunciation character of each word in the content of text；

Based on the pronunciation character of each word, determine whether the pronunciation of each word is correct；

Based on the quantity of orthoepic word, the spoken language pronunciation result of examinee is assessed.

In the present embodiment, the pronunciation character includes the acoustics likelihood feature of word, described to extract in the content of text often The pronunciation character of a word, comprising:

The frame of vowel and consonant based on each word is averaged likelihood score, determines the acoustics Likelihood Score of each word；

By the acoustics Likelihood Score to the acoustics likelihood feature that should be used as each word.

In the present embodiment, the pronunciation character includes gop feature, the pronunciation for extracting each word in the content of text Feature, comprising:

Obtain the gop marking of each word medial vowel and consonant；

It is given a mark based on the gop of each word medial vowel and consonant and determines the gop marking of corresponding each word；

Gop feature by the gop marking of each word as equivalent.

In the present embodiment, the pronunciation character includes the consistency feature of pronunciation, described to extract in the content of text often The pronunciation character of a word, comprising:

Determine the number of each word medial vowel and the consonant frame consistent with standard pronunciation；

The consistency feature that the number for the frame that each word medial vowel is consistent with consonant is pronounced as the equivalent.

In the present embodiment, the pronunciation character includes the accuracy feature of pronunciation, described to extract in the content of text often The pronunciation character of a word, comprising:

Obtain the number of each word medial vowel and the correct frame of consonant；

Number based on each word medial vowel and the correct frame of consonant determines that each word medial vowel and consonant articulation are correct Frame accuracy；

Each word medial vowel and the correct frame accuracy of consonant articulation are being determined as the pronunciation of each word just True rate feature.

A kind of assessment device of spoken language pronunciation, described device include:

Module is obtained, the spoken content of text for answering result of the examinee for acquisition；

Extraction module, for extracting the pronunciation character of each word in the content of text；

Determining module determines whether the pronunciation of each word is correct for the pronunciation character based on each word；

Evaluation module assesses the spoken language pronunciation result of examinee for the quantity based on orthoepic word.

In the present embodiment, the pronunciation character includes the acoustics likelihood feature of word, and the extraction module is used for:

In the present embodiment, the pronunciation character includes gop feature, and the extraction module is used for:

Obtain the gop marking of each word medial vowel and consonant；

Gop feature by the gop marking of each word as equivalent.

In the present embodiment, the pronunciation character includes the consistency feature of pronunciation, and the extraction module is used for:

In the present embodiment, the pronunciation character includes the accuracy feature of pronunciation, and the extraction module is used for:

In the present invention, after the spoken content of text for answering result of the examinee of acquisition, the content of text can be extracted In each word pronunciation character；Based on the pronunciation character of each word, determine whether the pronunciation of each word is correct；Just based on pronunciation The quantity of true word assesses the spoken language pronunciation result of examinee.It is answered accordingly, for the spoken language of examinee, the present invention can be based on every The pronunciation character of a word, determines the quantity of orthoepic word, thus the spoken language pronunciation effect for assessing examinee be it is good or bad, from And targetedly promote the spoken language proficiency of examinee.

Detailed description of the invention

Fig. 1 is the flow chart of the appraisal procedure of the spoken language pronunciation of an embodiment；

Fig. 2 is the structure chart of the assessment device of the spoken language pronunciation of an embodiment.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Fig. 1 is the flow chart of the appraisal procedure of the spoken language pronunciation of an embodiment.As shown in Figure 1, this method comprises:

Step 110, the spoken content of text for answering result of the examinee of acquisition；

Step 120, the pronunciation character of each word in content of text is extracted；

Step 130, the pronunciation character based on each word determines whether the pronunciation of each word is correct；

Step 140, the quantity based on orthoepic word assesses the spoken language pronunciation result of examinee.

Wherein, the spoken result of answering of examinee can be by the audio file etc. of voice acquisition system acquisition.

Content of text can be speech recognition system from the spoken word content answered and extracted in result.It is appreciated that this Embodiment is not limited to the spoken acquisition modes for answering result and content of text.

In the present embodiment, pronunciation character can be acoustics likelihood feature, gop feature, the consistency feature of pronunciation, pronunciation At least one of accuracy feature etc..

In one implementation of the present embodiment, pronunciation character includes the acoustics likelihood feature of word, is extracted every in content of text A word pronunciation character, comprising:

By acoustics Likelihood Score to the acoustics likelihood feature that should be used as each word.

Wherein it is possible to count the acoustics Likelihood Score of word in content of text, and consider the difference of vowel, consonant, specifically may be used To be based on following data:

The frame of word grade is averaged likelihood score (mean, max, min)

The frame of word medial vowel is averaged likelihood score (mean, max, min)

The frame of consonant is averaged likelihood score (mean, max, min) in word

Word medial vowel number, consonant number, phone total number

In one implementation of the present embodiment, pronunciation character includes gop feature, and it is special to extract each word pronunciation in content of text Sign, comprising:

Obtain the gop marking of each word medial vowel and consonant；

Gop feature by the gop marking of each word as equivalent.

Wherein, the phone grade gop marking of word in content of text is counted, and considers the difference of vowel, consonant, it can be specific It is as follows:

The mean, max, min of the gop marking of word medial vowel；

The mean, max, min of the gop marking of consonant in word；

The mean, max, min of the gop marking of phone in word.

In one implementation of the present embodiment, pronunciation character includes the consistency feature of pronunciation, is extracted every in content of text A word pronunciation character, comprising:

The consistency feature that the number for the frame that each word medial vowel is consistent with consonant is pronounced as equivalent.

Wherein it is possible to count the consistency of the vowel of fa, rec of input word $ word_i $, consonant, all in content of text (unit: frame)

All pronunciation consistency:

agr_all=sum(HitFrames_phonex)/NumFramesAll；

Vowel consistency:

agr_vowels=sum(HitFrames_vowelsx)/NumFramesVow；

Consonant consistency:

agr_consonants=sum(Hitframes_consonants)/NumFramesCon；

HitFrames_x indicates the number that the consistent frame of phone is corresponded in rec and fa；NumFramesX indicates word's All or part of frame number.

In one implementation of the present embodiment, pronunciation character includes the accuracy feature of pronunciation, is extracted each in content of text Word pronunciation character, comprising:

Each word medial vowel and the correct frame accuracy of consonant articulation are determined as to the accuracy feature of the pronunciation of each word.

Count the accounting (unit: a of the consistent number of vowel, consonant, all of fa, rec of input word in content of text Number):

The accuracy of all pronunciations:

CountsAcc_all=sum(Hit_phonex)/NumCountsAll；

The accuracy of vowel:

CountsAccl_vowels=sum (Hit_vowelsx)/NumCountsVow；

The accuracy of consonant:

CountsAcc_consonants=sum (Hit_consonants)/NumCountsCon；

Hit_x indicates to be 1 if fa is more than or equal to some threshold value (0.5) with the consistency of the corresponding phone of rec, otherwise It is 0；After NumCountsX indicates the phone number of some or all of word, input word in content of text can be counted Fa, rec accuracy (unit: frame) of vowel, consonant, all:

The frame accuracy of all pronunciations:

FramesAcc_all=sum (Frames_HitPhonex)/NumFramesAll；

The frame accuracy of vowel:

FramesAcc_vowels=sum (Frames_HitVowelsx)/NumFramesVow；

The frame accuracy of consonant:

FramesAcc_consonants=sum (Frames_{HitConsonants})/NumFramesCon；

Frames_HitX is indicated Otherwise frame number of the phone in fa is 0；NumFramesX indicates the frame number of some or all of word

It, can when determining whether the pronunciation of each word is correct based on the pronunciation character of each word in one embodiment of this implementation To use Multilayer Perception classifier (MLP) algorithm.Mainly consider that its model complexity is controllable, supports the training of high-volume data.

It can will be in the above acoustics likelihood feature, gop feature, the consistency feature of pronunciation, accuracy feature of pronunciation etc. At least one be input in MLP model, so that it is determined that whether the pronunciation of each word correct.

Quantity later based on orthoepic word, it can be estimated that the spoken language pronunciation result of examinee.Specifically, can be based on It is divided by with total word quantity, obtains a ratio, can determine examinee's sheet based on the ratio by the quantity of orthoepic word The secondary spoken overall marking answered.Ratio and overall marking are proportional, and therefore, overall marking is based on whole section of spoken Open-ended Question type The measurement of the word accuracy of sample, i.e. accuracy is higher, then shows that the whole pronunciation situation of the sample is better.

Fig. 2 is the structure chart of the assessment device of the spoken language pronunciation of an embodiment.As shown in Fig. 2, the device includes:

Module 210 is obtained, the spoken content of text for answering result of the examinee for acquisition；

Extraction module 220, for extracting the pronunciation character of each word in content of text；

Determining module 230 determines whether the pronunciation of each word is correct for the pronunciation character based on each word；

Evaluation module 240 assesses the spoken language pronunciation result of examinee for the quantity based on orthoepic word.

In one implementation of the present embodiment, pronunciation character includes the acoustics likelihood feature of word, and extraction module is used for:

In one implementation of the present embodiment, pronunciation character includes gop feature, and extraction module is used for:

Obtain the gop marking of each word medial vowel and consonant；

Gop feature by the gop marking of each word as equivalent.

In one implementation of the present embodiment, pronunciation character includes the consistency feature of pronunciation, and extraction module is used for:

In one implementation of the present embodiment, pronunciation character includes the accuracy feature of pronunciation, and extraction module is used for:

The realization process of apparatus above of the present invention and the realization process of above method are identical, are specifically referred to above method The detailed process of embodiment, the present invention are no longer specifically described.

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of appraisal procedure of spoken language pronunciation, which is characterized in that the described method includes:

Extract the pronunciation character of each word in the content of text；

2. the method according to claim 1, wherein the pronunciation character includes the acoustics likelihood feature of word, institute It states and extracts word pronunciation character each in the content of text, comprising:

3. the method according to claim 1, wherein the pronunciation character includes gop feature, described in the extraction The pronunciation character of each word in content of text, comprising:

Obtain the gop marking of each word medial vowel and consonant；

Gop feature by the gop marking of each word as equivalent.

4. the method according to claim 1, wherein the pronunciation character includes the consistency feature of pronunciation, institute State the pronunciation character for extracting each word in the content of text, comprising:

5. the method according to claim 1, wherein the pronunciation character includes the accuracy feature of pronunciation, institute State the pronunciation character for extracting each word in the content of text, comprising:

Number based on each word medial vowel and the correct frame of consonant is determining each word medial vowel and the correct frame of consonant articulation just True rate；

Each word medial vowel and the correct frame accuracy of consonant articulation are determined as to the accuracy of the pronunciation of each word Feature.

6. a kind of assessment device of spoken language pronunciation, which is characterized in that described device includes:

7. device according to claim 7, which is characterized in that the pronunciation character includes the acoustics likelihood feature of word, institute Extraction module is stated to be used for:

8. device according to claim 6, which is characterized in that the pronunciation character includes gop feature, the extraction module For:

Obtain the gop marking of each word medial vowel and consonant；

Gop feature by the gop marking of each word as equivalent.

9. device according to claim 6, which is characterized in that the pronunciation character includes the consistency feature of pronunciation, institute Extraction module is stated to be used for:

10. device according to claim 6, which is characterized in that the pronunciation character includes the accuracy feature of pronunciation, institute Extraction module is stated to be used for: