CN110164447B

CN110164447B - Spoken language scoring method and device

Info

Publication number: CN110164447B
Application number: CN201910266167.5A
Authority: CN
Inventors: 彭书勇; 方敏; 戚自力; 孙婷婷; 林远东
Original assignee: Suzhou Chivox Information Technology Co ltd
Current assignee: Suzhou Chivox Information Technology Co ltd
Priority date: 2019-04-03
Filing date: 2019-04-03
Publication date: 2021-07-27
Anticipated expiration: 2039-04-03
Also published as: CN110164447A

Abstract

The invention relates to a spoken language scoring method and a spoken language scoring device, wherein the method comprises the following steps: acquiring a voice file after answering the spoken language test question; decoding answering text information from the voice file; preprocessing a reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text content, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same; extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information; and inputting the content characteristics into a preset training model and outputting corresponding spoken language scores. The invention can improve the scoring efficiency of spoken language answers in the open question type.

Description

Spoken language scoring method and device

Technical Field

The invention relates to the technical field of computers, in particular to a spoken language scoring method and device.

Background

The test of the spoken open questions refers to an examination mechanism for scoring the content of the student spoken answers through the spoken test questions in the examination.

At present, the test of the open topic of the spoken language cannot realize automatic scoring, and the scoring difficulty of the spoken language test is increased.

Disclosure of Invention

Therefore, it is necessary to provide a method and an apparatus for spoken language scoring for solving the problem that the current spoken language examination cannot be automatically scored.

A method of spoken language scoring, the method comprising:

acquiring a voice file after answering the spoken language test question;

decoding answering text information from the voice file;

preprocessing a reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text content, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;

extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;

and inputting the content characteristics into a preset training model and outputting corresponding spoken language scores.

Preferably, the extracting content features representing semantic similarity between the target reference answer text and the target answer text information includes:

based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of the target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;

counting the frequency of the reference answer keywords appearing in the target answering text information to obtain a keyword hit rate;

counting the hit rate of the N-grams of the target answer text information in the target reference answer text, and taking the maximum value as a Jacard similarity coefficient;

respectively obtaining document vector representations of the target reference answer text and the target answer text information in an idf value weighted word vector mode, calculating an included angle cosine value between the document vector of the target reference answer text and the document vector of the target answer text information, and taking the largest included angle cosine value as cosine similarity;

determining a word shift distance between the target reference answer text and the target answer text information;

and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as the content characteristics.

Preferably, the preprocessing the reference answer text and the answer text information includes:

removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and the method comprises the following steps: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, said morpheme reduction referring to the conversion of a word of a given form into a basic morpheme.

Preferably, the inputting the content features into a preset training model and outputting corresponding spoken language scores includes:

and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting a corresponding spoken language score.

A spoken language scoring apparatus, comprising:

the acquisition module is used for acquiring the voice file after answering the spoken language test question;

the decoding module is used for decoding the answering text information from the voice file;

the processing module is used for preprocessing a reference answer text and the answering text information and respectively and correspondingly generating a target reference answer text and target answering text information with less text contents, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;

the extraction module is used for extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;

and the output module is used for inputting the content characteristics into a preset training model and outputting corresponding spoken language scores.

Preferably, the extraction module is configured to:

Preferably, the processing module is configured to:

Preferably, the output module is configured to:

In the open-ended answer form, the spoken language of the examinee is answered, the spoken language scoring is a quantitative representation of whether the examinee answers questions or not, the matching degree of the answered content and the reference answer can be directly fed back and informed to the students, the rationality of the final overall scoring is also determined, and therefore the scoring efficiency of the spoken language answering in the open-ended answer form is improved; the extraction of the key words in the voice text answered by the examinee is the general description of the expression of the student, and the information feedback can enable the student to have good self-cognition. The synonym expansion is carried out after the keywords of the reference answers are extracted, and the part of information is used as feedback, so that the vocabulary of students can be enriched, and more expression modes can be encouraged.

Drawings

FIG. 1 is a flow chart of a spoken language scoring method according to an example embodiment;

fig. 2 is a block diagram of a spoken language scoring device according to an example embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

FIG. 1 is a flow chart of a spoken language scoring method according to an example embodiment. As shown in fig. 1, the method includes:

step 110, acquiring a voice file after answering the spoken language test question;

step 120, decoding answer text information from the voice file;

step 130, preprocessing the reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text contents, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;

step 140, extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;

and 150, inputting the content characteristics into a preset training model and outputting corresponding spoken language scores.

In this embodiment, in an open-topic scenario, the voice file of the student is decoded, and the answering text information answered by the student can be obtained. The decoding process may be implemented by a speech recognition system.

In this embodiment, in step 140, the extracting content features representing semantic similarity between the target reference answer text and the target answer text information includes:

based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of a target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;

counting the frequency of the keywords of the reference answers appearing in the target answering text information to obtain the keyword hit rate;

counting hit rates of N-grams (also called N-grams) of the target reference answer text information in the target reference answer text, and taking the maximum value of the hit rates as a Jacard similarity coefficient;

respectively obtaining document vector representations of the target reference answer text and the target answering text information in an idf value weighting word vector mode, calculating the cosine value of an included angle between the document vector of the target reference answer text and the document vector of the target answering text information, and taking the largest included angle cosine value as cosine similarity;

and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as content characteristics.

In this embodiment, the content features include four parameters, which are specifically as follows:

1) keyword hit rate. The extraction operation is as follows:

first, keywords of a target reference answer text are extracted. In the preprocessed target reference answer text, according to predefined word inverse document frequency (idf) information, selecting a plurality of words with larger idf values to be subjected to synonym expansion to serve as key words of the reference answer text marked by each item, and then taking the union set of the key words as the key words of the target reference answer text. The keyword list obtained at this time may be regarded as the point of the score content specified by the expert or teacher. Here, the idf value represents the importance degree of each word to the semantic expression of the sentence, and can be obtained by collecting and counting spoken language data of students through an online system.

Secondly, the frequency of the keywords of the target reference answer text appearing in the target answer text information of the student, namely, the keyword hit rate is counted. Particularly, if a plurality of keywords which are synonyms of each other appear in the text of the student at the same time, the student is considered to hit the corresponding point of scoring, and the counting is not repeated; in addition, whether a negative word exists before the keyword can be considered when the matching is hit, and if the negative word exists, the keyword is deleted.

2) Jaccard's (Jaccard) similarity coefficient. The coefficient can be used for comparing the similarity and the difference between two character strings, and the larger the value of the coefficient is, the higher the similarity between the two character strings is. Here, the Jaccard coefficient statistics is N-gram (N-gram) word hit rate of student text in each reference answer, and takes the maximum value as a feature, and the formula is as follows:

wherein REC _ N is an N-gram list of the test taker text, REFER _ N_iThe N-gram list for the ith reference answer, | X | represents the number of different N-grams contained in the text X, N represents the number of words in each phrase, and N ═ 1,2,3,4,5 }.

3) Cosine similarity of the word vectors is weighted based on idf. Each word in the word vector model has vector representation and can be used for quantitatively describing the relation between words, and in an intuitive sense, word vectors corresponding to two words with similar semantics are closer in a vector space. And selecting an idf value weighted word vector mode, respectively obtaining document vector representations of the item mark reference answer text and the target answer text information, calculating the cosine value of an included angle, and taking the maximum value as the cosine similarity characteristic.

Here, the training data of the word vector model is derived from the student spoken training audio collected by the online system and is trained by decoding the text thereof.

4) Word shift distance (WMD) based on word vector space.

In the word vector space, WMD can be understood as the minimum total cost required to convert text a to text B, which is obtained by weighted summation of word-to-word movement costs in two texts. Here, the word-to-word movement cost is measured by the euclidean distance between the normalized word vectors of the two.

In this embodiment, in step 130, the preprocessing of the reference answer text and the answer text information includes:

removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and include but are not limited to: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, word-shape reduction refers to the conversion of a given form of word into a basic word-shape.

In particular, the preprocessing operation may effectively reduce text. The word form reduction means that words in a given form are converted into basic word forms, for example, verb participles are converted into a general form, and noun plural numbers are converted into singular numbers, so that texts can be effectively simplified on the premise of ensuring that semantics are not changed.

In this embodiment, in step 150, inputting the content features into a preset training model and outputting corresponding spoken language scores, including:

and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting the corresponding spoken language score.

During the model training and testing phase, the model selects support vector machine regression (SVR). For the input student audio, the operation of the steps is needed, the content characteristics are extracted, and then the extracted content characteristics are used as the input of the SVR. In the training stage, the audio of the students in the training set is manually marked with content scores, and the content scores are used as labels to obtain model parameters in an adaptive manner through SVR fitting; while in the testing phase, the content scores are output by the model. According to the output content scores, feedback such as the quality of spoken answers of students can be reasonably given.

Fig. 2 is a block diagram of a spoken language scoring device according to an example embodiment. As shown in fig. 2, the apparatus includes:

an obtaining module 210, configured to obtain a voice file after answering a spoken language test question;

a decoding module 220, configured to decode the answering text information from the voice file;

the processing module 230 is configured to pre-process the reference answer text and the answering text information, and generate a target reference answer text and a target answering text information with fewer text contents, where the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;

an extracting module 240, configured to extract content features representing semantic similarities between the target reference answer text and the target response text information based on the target reference answer text and the target response text information;

and the output module 250 is used for inputting the content characteristics into a preset training model and outputting the corresponding spoken language scores.

Preferably, the extraction module 210 is configured to:

counting the hit rate of N-grams of the target reference answer text information in the target reference answer text, and taking the maximum value as the Jacard similarity coefficient;

Preferably, the processing module 230 is configured to:

Preferably, the output module 250 is configured to:

It should be noted that, the implementation process of the above apparatus in this embodiment is the same as the implementation process of the above method, and the specific process may refer to the implementation process of the above method, which is not specifically set forth in this embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A spoken language scoring method, the method comprising:

acquiring a voice file after answering the spoken language test question;

decoding answering text information from the voice file;

inputting the content characteristics into a preset training model and outputting corresponding spoken language scores;

the extracting of the content features representing the semantic similarity between the target reference answer text and the target answer text information comprises:

2. The method of claim 1, wherein the preprocessing the reference answer text and the answer text information comprises:

3. The method of claim 1, wherein inputting the content features into a preset training model and outputting corresponding spoken scores comprises:

4. A spoken language scoring apparatus, comprising:

the output module is used for inputting the content characteristics into a preset training model and outputting corresponding spoken language scores;

the extraction module is configured to:

5. The apparatus of claim 4, wherein the processing module is configured to:

6. The apparatus of claim 4, wherein the output module is configured to: