CN114842875A - Spoken language evaluation method and system based on text and voice recognition - Google Patents

Spoken language evaluation method and system based on text and voice recognition Download PDF

Info

Publication number
CN114842875A
CN114842875A CN202210402853.2A CN202210402853A CN114842875A CN 114842875 A CN114842875 A CN 114842875A CN 202210402853 A CN202210402853 A CN 202210402853A CN 114842875 A CN114842875 A CN 114842875A
Authority
CN
China
Prior art keywords
evaluation
score
text
voice
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210402853.2A
Other languages
Chinese (zh)
Inventor
郭松柳
刘宝泉
张小平
李铭晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tsinghua International Innovation Center
Original Assignee
Shanghai Tsinghua International Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tsinghua International Innovation Center filed Critical Shanghai Tsinghua International Innovation Center
Priority to CN202210402853.2A priority Critical patent/CN114842875A/en
Publication of CN114842875A publication Critical patent/CN114842875A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a spoken language evaluation method and system based on text and voice recognition, comprising the following steps: respectively identifying the text and the voice of a user to be evaluated to obtain a first evaluation score and a second evaluation score; determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; when the first test score difference is larger than a first threshold score or any second test score difference is larger than a second threshold score, determining a target user from the spoken language test and evaluation database; re-scoring the user to be assessed through a spoken language assessment system used by the target user to obtain a third assessment score and a fourth assessment score; and determining the final spoken language evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score. Therefore, the spoken language assessment is carried out by combining the voice and the file of the user to be assessed, and the spoken language ability of the user to be assessed can be more comprehensively assessed.

Description

Spoken language evaluation method and system based on text and voice recognition
Technical Field
The application relates to the technical field of voice evaluation, in particular to a spoken language evaluation method and system based on text and voice recognition.
Background
Pronunciation assessment score is as the oral assessment standard of language learning, along with the popularization of oral english examination, more and more schools all need use oral english training system in daily teaching, carries out the assessment score to student's oral english pronunciation to help the student promote the level of its oral english. The oral English examination system is used in the election test, and the oral English examination scores of students are used as the components of the election test English subject scores.
At present, the mainstream oral english training system in the market usually only adopts a single dimension of identifiability to evaluate the oral level of a subject, and each oral evaluation material is recorded by corresponding oral personnel through standard reading, and the voice similarity between the subject and the standard recording is calculated to evaluate. However, each section of spoken language material requires that the cost for reading the spoken language personnel once is too high, and the obtained spoken language training material is too single, so that students can only practice fixed texts more and more, and the learners only learn the dumb English, and the practical significance is lost.
Disclosure of Invention
In view of this, an object of the present application is to provide a spoken language assessment method and system based on text and voice recognition, which can evaluate spoken language abilities of users to be assessed more comprehensively by assessing reading voices, free statement voices, reading texts and free statement texts, and can improve accuracy of assessment results by performing secondary assessment on the users to be assessed through an assessment system used by different users.
The embodiment of the application provides a spoken language evaluation method based on text and voice recognition, and the spoken language evaluation method comprises the following steps:
acquiring reading voice, free statement voice, reading text corresponding to the reading voice and free statement text corresponding to the free statement voice of a user to be evaluated; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question;
determining a first evaluation score of the reading part based on the reading voice and the reading text, and determining a second evaluation score of the free statement part based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed;
determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; the first evaluation score difference is a score difference value of a first evaluation score and a second evaluation score;
when the first evaluation score difference is larger than a first threshold score or any second evaluation score difference is larger than a second threshold score, according to the first evaluation score and the second evaluation score, a spoken language evaluation score meeting preset requirements is inquired from a spoken language evaluation database, and a target user corresponding to the spoken language evaluation score is determined;
re-scoring the reading voice and the free statement voice of the user to be assessed respectively through the spoken language assessment system used by the target user to obtain a third assessment score and a fourth assessment score of the user to be assessed; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion;
and determining the final oral evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
Optionally, the second evaluation score of the user to be evaluated and the evaluation sub-score of each evaluation dimension included in the second evaluation score are determined by the following steps:
performing initial evaluation on the free statement text, and determining a reference score of each assigned segment included in the free statement text;
respectively extracting evaluation features of each assigned section, and determining evaluation parameters of various evaluation features included in each assigned section;
for each assigned segment, determining an initial segment evaluation score of the assigned segment in each scoring dimension based on the evaluation parameters of various evaluation features included in the assigned segment, the part of freely-stated voices corresponding to the assigned segment, the initial scoring weight in each scoring dimension and the reference score of the assigned segment;
aiming at each assigned segment, respectively adjusting the initial scoring weight under the corresponding scoring dimension based on the evaluation parameter of each evaluation feature included in the assigned segment, and determining the target scoring weight of each scoring dimension;
for each assigned segment, determining a target segment evaluation score of the assigned segment in each evaluation dimension based on the initial segment evaluation score of the assigned segment in each evaluation dimension, the initial evaluation weight of the assigned segment in each evaluation dimension and the target evaluation weight;
and determining a second evaluation score of the user to be evaluated and evaluation sub-scores of each evaluation dimension included by the second evaluation score based on the evaluation score of each assigned segment in each evaluation dimension.
Optionally, the scoring dimension includes at least one of: identification degree, tone, fluency and accuracy of sound.
Optionally, the assessment features include at least one of: the number of text events, the relevance of answer content to topic topics, the number of word vectors and the number of lexical syllables.
Optionally, for each assigned segment, adjusting an initial scoring weight under a corresponding scoring dimension based on a scoring parameter of each scoring feature included in the assigned segment, and determining a target scoring weight of each scoring dimension, includes:
based on the number of the text events included in the segmentation class and the mapping relation between the number of the text events and the weight, adjusting the initial scoring weight of the identifiability, and determining the target scoring weight of the identifiability;
adjusting the initial scoring weight of the tone based on the relevancy between the segmented answer content and the topic theme and the mapping relation between the relevancy and the weight, and determining the target scoring weight of the tone;
adjusting the initial scoring weight of the fluency degree based on the number of the word vectors included in the segmentation-assigned segment and the mapping relation between the number of the word vectors and the weight, and determining the target scoring weight of the fluency degree;
and adjusting the initial scoring weight of the intonation and determining the target scoring weight of the intonation based on the number of the vocabulary syllables included in the segmentation and the mapping relation between the number of the vocabulary syllables and the weight.
Optionally, the relevance of the answer content to the topic is determined by:
obtaining a topic word vector corresponding to the topic text and a paragraph word vector corresponding to the assigned segment; the question text is a text obtained according to the test question;
clustering processing is respectively carried out on the topic word vector and the paragraph word vector to obtain at least one first characteristic cluster corresponding to the topic word vector and at least one second characteristic cluster corresponding to the paragraph word vector;
extracting a central vector of each first feature cluster as a first topic vector, and extracting a central vector of each second feature cluster as a second topic vector;
performing weighted summation on all the first topic vectors to obtain topic vectors, and performing weighted summation on all the second topic vectors to obtain paragraph topic vectors;
and determining the correlation degree of the answer content and the topic based on the topic vector and the paragraph topic vector.
Optionally, the method includes, according to the first evaluation score and the second evaluation score, querying a spoken language evaluation score meeting preset requirements from a spoken language evaluation database, and determining a target user corresponding to the spoken language evaluation score, where:
inquiring reading oral marks of which the difference value between the reading part and the first evaluation mark is less than a third threshold mark and the test sub-mark difference under the same evaluation dimension is less than a fourth threshold mark from a oral mark database;
inquiring the free statement spoken language score of which the difference between the free statement part and the second evaluation score is less than a third threshold score and the test sub-score difference under the same evaluation dimension is less than a fourth threshold score from a spoken language score database;
and determining the users corresponding to the found reading spoken language scores and the free statement spoken language scores as target users respectively.
Optionally, the initially evaluating the free statement text and determining a respective reference score of each assigned segment included in the free statement text includes:
determining an initial evaluation score of a user to be evaluated based on the number of words or the number of words included in the free statement text according to a first assignment rule;
for each assigned segment, determining a reference score of the assigned segment according to a second assignment rule based on the initial evaluation score and the text content of the assigned segment; the sum of the reference scores of all assigned segments equals the initial assessment score.
The embodiment of the present application further provides a spoken language assessment system based on text and speech recognition, the spoken language assessment system includes:
the acquisition module is used for acquiring reading voice, free statement voice, reading text corresponding to the reading voice and free statement text corresponding to the free statement voice of a user to be evaluated; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question;
the recognition module is used for determining a first evaluation score of the reading part based on the reading voice and the reading text and determining a second evaluation score of the free statement part based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed;
the first determining module is used for determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; the first evaluation score difference is a score difference value of a first evaluation score and a second evaluation score;
the query module is used for querying the spoken language evaluation score meeting the preset requirement from the spoken language evaluation database according to the first evaluation score and the second evaluation score and determining a target user corresponding to the spoken language evaluation score when the first evaluation score difference is larger than the first threshold score or any second evaluation score difference is larger than the second threshold score;
the evaluation module is used for re-evaluating the reading voice and the free statement voice of the user to be evaluated respectively through the spoken language evaluation system used by the target user to obtain a third evaluation score and a fourth evaluation score of the user to be evaluated; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion;
and the second determining module is used for determining the final oral evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
Optionally, when the identification module is configured to determine a second evaluation score of the user to be evaluated and evaluation sub-scores of each scoring dimension included in the second evaluation score by:
performing initial evaluation on the free statement text, and determining the reference score of each segmented segment included in the free statement text;
respectively extracting evaluation features of each assigned section, and determining evaluation parameters of various evaluation features included in each assigned section;
for each assigned segment, determining an initial segment evaluation score of the assigned segment in each scoring dimension based on the evaluation parameters of various evaluation features included in the assigned segment, the part of freely-stated voices corresponding to the assigned segment, the initial scoring weight in each scoring dimension and the reference score of the assigned segment;
aiming at each assigned segment, respectively adjusting the initial scoring weight under the corresponding scoring dimension based on the evaluation parameter of each evaluation feature included in the assigned segment, and determining the target scoring weight of each scoring dimension;
for each assigned segment, determining a target segment evaluation score of the assigned segment in each evaluation dimension based on the initial segment evaluation score of the assigned segment in each evaluation dimension, the initial evaluation weight of the assigned segment in each evaluation dimension and the target evaluation weight;
and determining a second evaluation score of the user to be evaluated and evaluation sub-scores of each evaluation dimension included by the second evaluation score based on the evaluation score of each assigned segment in each evaluation dimension.
Optionally, the scoring dimension includes at least one of: identification degree, tone, fluency and accuracy of sound.
Optionally, the assessment features include at least one of: the number of text events, the relevance of answer content to topic topics, the number of word vectors and the number of lexical syllables.
Optionally, when the identification module is configured to, for each assigned segment, adjust an initial scoring weight under a corresponding scoring dimension based on an evaluation parameter of each evaluation feature included in the assigned segment, and determine a target scoring weight of each scoring dimension, the identification module is configured to:
based on the number of the text events included in the segmentation class and the mapping relation between the number of the text events and the weight, adjusting the initial scoring weight of the identifiability, and determining the target scoring weight of the identifiability;
adjusting the initial scoring weight of the tone based on the relevancy between the segmented answer content and the topic theme and the mapping relation between the relevancy and the weight, and determining the target scoring weight of the tone;
adjusting the initial scoring weight of the fluency degree based on the number of the word vectors included in the segmentation-assigned segment and the mapping relation between the number of the word vectors and the weight, and determining the target scoring weight of the fluency degree;
and adjusting the initial scoring weight of the intonation and determining the target scoring weight of the intonation based on the number of the vocabulary syllables included in the segmentation and the mapping relation between the number of the vocabulary syllables and the weight.
Optionally, when the identifying module is configured to determine the relevance of the answer content to the topic, the identifying module is configured to:
obtaining a topic word vector corresponding to the topic text and a paragraph word vector corresponding to the assigned segment; the question text is a text obtained according to the test question;
clustering the topic word vector and the paragraph word vector respectively to obtain at least one first characteristic cluster corresponding to the topic word vector and at least one second characteristic cluster corresponding to the paragraph word vector;
extracting a central vector of each first feature cluster as a first topic vector, and extracting a central vector of each second feature cluster as a second topic vector;
performing weighted summation on all the first topic vectors to obtain topic vectors, and performing weighted summation on all the second topic vectors to obtain paragraph topic vectors;
and determining the correlation degree of the answer content and the topic based on the topic vector and the paragraph topic vector.
Optionally, the spoken language assessment system further includes a third determination module, where the third determination module is configured to:
inquiring reading oral marks of which the difference value between the reading part and the first evaluation mark is less than a third threshold mark and the test sub-mark difference under the same evaluation dimension is less than a fourth threshold mark from a oral mark database;
inquiring the free statement spoken language score of which the difference between the free statement part and the second evaluation score is less than a third threshold score and the test sub-score difference under the same evaluation dimension is less than a fourth threshold score from a spoken language score database;
and determining the users corresponding to the found reading spoken language scores and the free statement spoken language scores as target users respectively.
Optionally, when the identification module is configured to perform initial evaluation on the narrative text and determine a respective reference score of each assigned segment included in the narrative text, the identification module is configured to:
determining an initial evaluation score of a user to be evaluated based on the number of words or the number of words included in the free statement text according to a first assignment rule;
for each assigned segment, determining a reference score of the assigned segment according to a second assignment rule and based on the initial evaluation score and the text content of the assigned segment; the sum of the reference scores of all assigned segments equals the initial assessment score.
An embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the spoken language assessment method as described above.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the steps of the spoken language assessment method as described above.
The spoken language evaluation method and the spoken language evaluation system based on text and voice recognition provided by the embodiment of the application comprise the following steps:
acquiring reading voice, free statement voice, reading text corresponding to the reading voice and free statement text corresponding to the free statement voice of a user to be evaluated; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question; determining a first evaluation score of the reading part based on the reading voice and the reading text, and determining a second evaluation score of the free statement part based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed; determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; the first evaluation score difference is a score difference value of a first evaluation score and a second evaluation score; when the first evaluation difference is larger than a first threshold score or any second evaluation difference is larger than a second threshold score, inquiring a spoken language evaluation score meeting preset requirements from a spoken language evaluation database according to the first evaluation score and the second evaluation score, and determining a target user corresponding to the spoken language evaluation score; re-scoring the reading voice and the free statement voice of the user to be assessed respectively through the spoken language assessment system used by the target user to obtain a third assessment score and a fourth assessment score of the user to be assessed; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion; and determining the final oral evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
Therefore, the method and the device can evaluate the spoken language ability of the user to be evaluated more comprehensively by evaluating the reading voice, the free statement voice, the reading text and the free statement text, and can improve the accuracy of the evaluation result by evaluating the user to be evaluated secondarily by the evaluation systems used by different users. In addition, the application also discloses a technical scheme for carrying out multi-dimensional evaluation on the free statement of the spoken language based on text recognition, and solves the problem that the conventional spoken language evaluation system cannot score the free statement content of the user.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart of a spoken language assessment method based on text and speech recognition according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a spoken language assessment system based on text and speech recognition according to an embodiment of the present application;
fig. 3 is a second schematic structural diagram of a spoken language assessment system based on text and speech recognition according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
Pronunciation assessment score is as the oral assessment standard of language learning, along with the popularization of oral english examination, more and more schools all need use oral english training system in daily teaching, carries out the assessment score to student's oral english pronunciation to help the student promote the level of its oral english. The oral English examination system is used in the election test, and the oral English examination scores of students are used as the components of the election test English subject scores.
At present, the mainstream oral english training system in the market usually only adopts a single dimension of identifiability to evaluate the oral level of a subject, and each oral evaluation material is recorded by corresponding oral personnel through standard reading, and the voice similarity between the subject and the standard recording is calculated to evaluate. However, each section of spoken language material requires that the cost for reading the spoken language personnel once is too high, and the obtained spoken language training material is too single, so that students can only practice fixed texts more and more, and the learners only learn the dumb English, and the practical significance is lost.
Based on the method and the system for spoken language assessment based on text and voice recognition, the spoken language ability of the user to be assessed can be evaluated more comprehensively by assessing the reading voice, the free statement voice, the reading text and the free statement text, and the accuracy of the assessment result can be improved by performing secondary assessment on the user to be assessed through the assessment systems used by different users.
Referring to fig. 1, fig. 1 is a flowchart illustrating a spoken language assessment method based on text and speech recognition according to an embodiment of the present disclosure. As shown in fig. 1, a spoken language evaluation method provided in an embodiment of the present application includes:
s101, reading voice, free statement voice, reading text corresponding to the reading voice and free statement text corresponding to the free statement voice of a user to be evaluated are obtained; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question.
When spoken language assessment is performed, the spoken language assessment generally comprises two parts, namely a standard assessment text provided during a reading test (namely a reading part), and a free statement aiming at relevant contents of an assessment subject (namely a free statement part).
For the reading part, reading voice and a reading text corresponding to the reading voice are obtained; for the free statement section, a free statement voice and a free statement text corresponding to the free statement voice are obtained.
Here, when performing the speech-to-text conversion, the conversion may be performed by a pre-trained speech-to-text conversion model, or may be performed by using an existing speech-to-text conversion tool, which is not limited herein.
Therefore, the spoken language evaluation method can evaluate the spoken language expression ability of the user to be evaluated more comprehensively by acquiring two different texts and voices and performing subsequent text and voice recognition to obtain the spoken language evaluation of the user to be evaluated.
S102, determining a first evaluation score of a reading part based on the reading voice and the reading text, and determining a second evaluation score of a free statement part based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; and different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed.
Here, the first evaluation score is a score obtained by pre-scoring a reading voice and then performing score adjustment by recognizing the reading text, and the second evaluation score is a score obtained by pre-scoring a free statement voice and then performing score adjustment by recognizing the free statement text. The number and the type of the scoring dimensionalities corresponding to the first scoring score and the second scoring score are the same, but the scoring sub-scores in each scoring dimensionality can be the same or different.
The first evaluation score represents the score of the reading part of the user to be evaluated, and the second evaluation score represents the score of the free statement part of the user to be evaluated.
In one embodiment of the present application, the second evaluation score of the user to be evaluated and the evaluation sub-score of each evaluation dimension included in the second evaluation score are determined by: performing initial evaluation on the free statement text, and determining a reference score of each assigned segment included in the free statement text; respectively extracting evaluation features of each assigned section, and determining evaluation parameters of various evaluation features included in each assigned section; for each assigned segment, determining an initial segment evaluation score of the assigned segment in each scoring dimension based on the evaluation parameters of various evaluation features included in the assigned segment, the part of freely-stated voices corresponding to the assigned segment, the initial scoring weight in each scoring dimension and the reference score of the assigned segment; aiming at each assigned subsection, respectively adjusting the initial scoring weight under the corresponding scoring dimension based on the evaluation parameter of each evaluation feature included in the assigned subsection, and determining the target scoring weight of each scoring dimension; for each assigned segment, determining a target segment evaluation score of the assigned segment in each evaluation dimension based on the initial segment evaluation score of the assigned segment in each evaluation dimension, the initial evaluation weight of the assigned segment in each evaluation dimension and the target evaluation weight; and determining a second evaluation score of the user to be evaluated and evaluation sub-scores of each evaluation dimension included by the second evaluation score based on the evaluation score of each assigned segment in each evaluation dimension.
In another embodiment provided by the present application, the initially evaluating the narrative text and determining a respective reference score of each assigned segment included in the narrative text includes: determining an initial evaluation score of a user to be evaluated based on the number of words or the number of words included in the free statement text according to a first assignment rule; for each assigned segment, determining a reference score of the assigned segment according to a second assignment rule based on the initial evaluation score and the text content of the assigned segment; the sum of the reference scores of all assigned segments equals the initial assessment score.
Here, the initial evaluation of the free statement text may determine the number of words or phrases included in the free statement text; the first assigning rule specifies that different word numbers or different word numbers correspond to different initial evaluation scores; at least one assigned paragraph is included in the narrative text, wherein an assigned paragraph specifically included in the narrative text may be determined by identifying a particular character (e.g., period) included in the narrative text.
For example, it is assumed that the number of words included in the free statement text is 260 words, the text of 100-score 200 words specified in the first scoring rule is given an initial score of 60 (full score), the text of 201-score 300 words is given an initial score of 80 (full score), and the text of 300 words or more is given an initial score of 100 (full score). Therefore, the initial evaluation score of the user to be evaluated is 80 according to the first assignment rule.
The second assigning rule specifies that the reference score of each assigning segment is determined based on the initial evaluation score determined by the free statement text according to the respective space proportion of each assigning segment or the correlation between the segment and the test questions.
In order to better understand the second assigning rule, the following example is used for illustration. When the second scoring rule specifies that the reference score of the segmentation is determined according to the space proportion, assuming that the initial evaluation score of the liberty statement text is 80 scores, the liberty statement text comprises three paragraphs of segmentation 1, segmentation 2 and segmentation 3, wherein the number of words in the segmentation 1 is 20% of the total number of words in the liberty statement text, and then determining that the reference score of the segmentation 1 is 16 scores (full scores); if the number of words included in the segmentation class 2 accounts for 30% of the total number of words of the free statement text, determining that the reference score of the segmentation class 2 is 24 (full score); and if the number of words included in the segmentation class 3 accounts for 50% of the total number of words in the free statement text, determining that the reference score of the segmentation class 3 is 40 (full score).
After the reference score of each assigned segment is determined, in order to determine the actual score (i.e., the evaluation score of the initial segment) of each assigned segment, for each assigned segment, evaluation feature extraction needs to be performed on the segment, evaluation features included in the assigned segment and evaluation parameters of each evaluation feature are determined, and meanwhile, a part of free statement voice corresponding to the assigned segment is also determined. The assessment features may include at least one of: the number of text events, the relevance of answer content to topic topics, the number of word vectors and the number of lexical syllables.
The number of the text events refers to the number of the text events included in the assigned segment, and the number of the text events can be determined by determining the text events included in each assigned segment through a text event extraction model.
In another embodiment provided herein, the relevance of the answer content to the topic is determined by: obtaining a topic word vector corresponding to the topic text and a paragraph word vector corresponding to the assigned segment; the question text is a text obtained according to the test question; clustering the topic word vector and the paragraph word vector respectively to obtain at least one first characteristic cluster corresponding to the topic word vector and at least one second characteristic cluster corresponding to the paragraph word vector; extracting a central vector of each first feature cluster as a first topic vector, and extracting a central vector of each second feature cluster as a second topic vector; performing weighted summation on all the first topic vectors to obtain topic vectors, and performing weighted summation on all the second topic vectors to obtain paragraph topic vectors; and determining the correlation degree of the answer content and the topic based on the topic vector and the paragraph topic vector.
Here, topic word vectors corresponding to the topic text and paragraph word vectors corresponding to the segmentation-assigned paragraphs can be obtained through the word vector extraction model; before obtaining the paragraph word vector, obtaining an answer word vector of the free statement text, so as to determine the paragraph word vector corresponding to each segmentation paragraph based on the answer word vector; for the word vector extraction model, a word2vec model may be preferentially adopted.
Clustering the topic word vectors and the paragraph word vectors respectively, wherein the KMeans method can be adopted to cluster the word vectors, and at least one first feature cluster corresponding to the topic word vectors and the central vector of each first feature cluster are determined; and determining at least one second feature cluster corresponding to the paragraph word vector and the central vector of each second feature cluster. And finally, based on vector similarity calculation, the correlation degree between the answer content in the segmentation-assigned paragraph and the topic can be determined.
After the paragraph word vectors of the segmentation-assigned paragraphs are obtained, the number of word vectors and the number of word syllable-weighted words included in the segmentation-assigned paragraphs can be determined.
When determining the initial paragraph evaluation score of the assigned segment under each evaluation dimension, the method may specifically be: firstly, determining the reference score of each assigned segment in each scoring dimension based on the initial scoring weight of each scoring dimension corresponding to the assigned segment and the reference score of the assigned segment, and then determining the initial segment scoring score of each assigned segment in each scoring dimension based on the scoring parameters of the scoring features included in the segment, the part of free statement voice corresponding to the assigned segment and the preset segment scoring rules.
Here, the scoring dimension includes at least one of: identification degree, tone, fluency and accuracy of sound. It should be noted that the scoring dimension and the evaluation feature are in a one-to-one correspondence relationship, the number of text events corresponds to the intelligibility, the relevancy between the answer content and the topic corresponds to the mood, the number of word vectors corresponds to the fluency, and the number of word syllable emphasis corresponds to the intonation. Therefore, the reference score and the initial paragraph evaluation score of the assigned segment under the identifiability can be determined through the specific evaluation parameters of the text event number, and the evaluation scores corresponding to other evaluation dimensions are similar to the reference score and the initial paragraph evaluation score, which are not described again.
To better understand the way in which a given segment falls on the evaluation score of an initial paragraph in each evaluation dimension, the following example is used for illustration. Assuming that the reference score of the paragraph is 24, the initial scoring weights corresponding to the scoring dimensions of intelligibility, tone, fluency and intonation are all 0.25, the assigned paragraph includes 5 text events, 80% relevance between the answer content and the topic, 78 word vectors and 10 vocabulary syllable numbers. Determining that the reference score of each assigned segment in each scoring dimension is 6 based on the initial scoring weight corresponding to the assigned segment and the reference score of the assigned segment, then determining that the initial paragraph scoring score under the degree of recognition is 4 based on that the number of text events included in the paragraph is 5 according to the paragraph scoring rule, determining that the initial paragraph scoring score under the degree of fluency is 5 based on that the correlation degree of answer content included in the paragraph and the topic theme is 80%, determining that the initial paragraph scoring score under the degree of fluency is 6 based on that the number of word vectors included in the paragraph is 78, and determining that the initial paragraph scoring score under the degree of fluency is 4 based on that the number of vocabulary syllable included in the paragraph is 10. This also determines the initial paragraph score for the assigned segment in each scoring dimension. The paragraph scoring rules prescribe the corresponding relation between the evaluation parameters of each evaluation characteristic and the ratio of the initial paragraph evaluation score to the reference score under the corresponding scoring dimension.
In addition, the evaluation score of the initial paragraph can be determined through a pre-trained evaluation model, the part of the free statement voice corresponding to the assigned paragraph is input into the evaluation model, the evaluation score of the initial paragraph of the assigned paragraph is output through the evaluation model, and then the evaluation score is multiplied by the initial evaluation weight under each evaluation dimension, so that the evaluation score of the initial paragraph of the assigned paragraph under each evaluation dimension is determined.
In another embodiment provided by the present application, for each assigned segment, the initial scoring weight under the corresponding scoring dimension is adjusted based on the evaluation parameter of each evaluation feature included in the assigned segment, respectively, and the determining the target scoring weight of each scoring dimension includes: based on the number of the text events included in the segmentation class and the mapping relation between the number of the text events and the weight, adjusting the initial scoring weight of the identifiability, and determining the target scoring weight of the identifiability; adjusting the initial scoring weight of the tone based on the correlation between the answer content and the topic theme included in the segmentation and the mapping relation between the correlation and the weight, and determining the target scoring weight of the tone; adjusting the initial scoring weight of the fluency degree based on the number of the word vectors included in the segmentation-assigned segment and the mapping relation between the number of the word vectors and the weight, and determining the target scoring weight of the fluency degree; and adjusting the initial scoring weight of the intonation on the basis of the number of vocabulary syllable emphasis included by the segmentation-assigned segment and the mapping relation between the number of vocabulary syllable emphasis and the weight, and determining the target scoring weight of the intonation.
Here, the mapping relationship between the number of text events and the weight defines the relationship between the specific evaluation parameter of the number of text events and the corresponding target weight or the weight adjustment parameter, and the other three mapping relationships are similar to the mapping relationship between the number of text events and the weight, and are not described herein again.
To better understand the way the initial scoring weights are adjusted, the above example is continued, and the process of adjusting the initial scoring weights is explained. Here, the initial scoring weight for adjusting the recognizability is taken as an example for explanation, the initial scoring weight corresponding to the recognizability is 0.25, the weight for defining the number of text events to be 4 to 6 in the mapping relationship between the number of text events and the weight is subtracted by 0.05, and the number of determined text events is 5, so that the initial scoring weight corresponding to the recognizability is adjusted from 0.25 to the target scoring weight of 0.2. The adjustment process of the initial scoring weights for other scoring dimensions is similar to the adjustment process of the intelligibility, and is not described herein again.
When determining the initial paragraph evaluation score of the assigned segment in each scoring dimension, the initial scoring weight in each scoring dimension, and the target scoring weight, the specific determination method may be: and aiming at the evaluation score of each assigned segment in each evaluation dimension, dividing the evaluation score of each assigned segment by the initial evaluation weight in the corresponding dimension, and then multiplying the evaluation score by the target evaluation weight in the corresponding dimension, wherein the obtained score is the evaluation score of the target segment of each assigned segment in the evaluation dimension.
Thus, after the target paragraph evaluation score of each assigned segment in each evaluation dimension is determined, the score obtained by adding the target paragraph evaluation scores of each assigned segment in the same dimension is the evaluation sub-score in the evaluation dimension included in the second evaluation score. And adding the evaluation scores of the target paragraphs of all the assigned segments falling under each evaluation dimension to obtain the second evaluation score.
In addition, it should be noted that the determination method of the first evaluation score corresponding to the reading portion is similar to the determination method of the second evaluation score of the free statement portion, and the operations of scoring from four scoring dimensions and modifying the weight are also performed, so that the details are not repeated herein.
S103, determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; the first evaluation score difference is a score difference value between the first evaluation score and the second evaluation score.
Here, the first evaluation score difference may be determined using a subtraction of the first evaluation score and the second evaluation score. The determination of the second measured difference in score for each scoring dimension may be determined by: the determination is made by subtracting the evaluation sub-score comprised by the first evaluation score and the evaluation sub-score comprised by the second evaluation score in the same evaluation dimension. If the obtained score difference is a negative value, the absolute value can be changed, and the score obtained after the absolute value is obtained is determined as the obtained first measurement score difference or the obtained second measurement score difference.
S104, when the first evaluation difference is larger than a first threshold score or any second evaluation difference is larger than a second threshold score, inquiring a spoken language evaluation score meeting preset requirements from a spoken language evaluation database according to the first evaluation score and the second evaluation score, and determining a target user corresponding to the spoken language evaluation score.
Here, the particular numerical applicability of the first threshold score and the second threshold score is selected.
When the first evaluation difference is not greater than the first threshold score and any second evaluation difference is not greater than the second threshold score, the total score of the first evaluation score and the second evaluation score of the user to be evaluated can be determined as the final spoken language evaluation score of the user to be evaluated.
For example, the preset requirement is that each of the first evaluation scores constitutes a proportional deviation smaller than or equal to a preset similarity threshold (e.g., 10% may be set, and 15% and 20% may be set step by step if no retrieval is achieved); or the preset requirement is that the proportional deviation of each evaluation sub-score of the second evaluation score is less than or equal to a preset similar threshold (for example, the preset similar threshold can be set to 10%, and if the preset similar threshold cannot be searched, the preset similar threshold can be set to 15% and 20% step by step);
in an embodiment provided by the present application, the querying, according to the first evaluation score and the second evaluation score, a spoken language evaluation score meeting preset requirements from a spoken language evaluation database, and determining a target user corresponding to the spoken language evaluation score includes: inquiring reading oral marks of which the difference value between the reading part and the first evaluation mark is less than a third threshold mark and the test sub-mark difference under the same evaluation dimension is less than a fourth threshold mark from a oral mark database; inquiring the free statement spoken language score of which the difference between the free statement part and the second evaluation score is less than a third threshold score and the test sub-score difference under the same evaluation dimension is less than a fourth threshold score from a spoken language score database; and determining the users corresponding to the found reading spoken language scores and the free statement spoken language scores as target users respectively.
Here, the spoken language achievement database stores spoken language assessment scores of a plurality of users in different areas, and the spoken language assessment score of each user includes an assessment score of a reading part, an assessment sub-score of each assessment dimension in the reading part, an assessment score of a free statement part, and an assessment sub-score of each assessment dimension in the free statement part.
When the target users are determined, the target users corresponding to the reading part and the target users corresponding to the free statement part are respectively determined, and the number of the target users determined by each part is at least one.
For example, when determining the target user corresponding to the reading part, the method may specifically be: and traversing the corresponding evaluation scores (reading spoken language scores) of the reading parts in the spoken language score database, and determining the users corresponding to the reading spoken language scores as target users when the reading spoken language scores exist, wherein the difference of the first evaluation scores is smaller than a third threshold score, the difference of the test sub-scores under the identifiability is smaller than a fourth threshold score, the difference of the test sub-scores under the fluency is smaller than a fourth threshold score, the difference of the test sub-scores under the moods is smaller than a fourth threshold score, and the difference of the test sub-scores under the intonation is smaller than the fourth threshold score. The free statement section determines the target user in a manner similar to the reading section and is not described in detail herein.
S105, re-scoring the reading voice and the free statement voice of the user to be assessed through the spoken language assessment system used by the target user to obtain a third assessment score and a fourth assessment score of the user to be assessed; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion.
It should be noted that all the spoken language assessment systems used by the target users reevaluate the reading voices and the free statement voices of the users to be assessed because the spoken language assessment systems used in different regions may slightly differ when setting the initial scoring weights corresponding to different scoring dimensions. According to different devices in different regions, a microphone in one region may be good, the acquisition environment is good, the requirement for voice-text conversion is higher for the examiner, or the accuracy for voice-text conversion is lower when the acquisition environment is noisy and the device is old, such as increasing text error correction, or setting some voice to automatically recognize as correct, and the like, the adjustment has many ways, so that the oral language evaluation systems used by different evaluation users can have some differences. In addition, the same speech may be converted into different texts at different places, so that the final spoken language evaluation scores are different.
The spoken language evaluation system used by the target user corresponding to the reading spoken language score is used for re-evaluating the reading voice of the user to be evaluated, and a third evaluation score of the user to be evaluated is determined; and re-grading the free statement voice of the user to be evaluated by the oral language evaluation system used by the target user corresponding to the free statement oral language score, and determining a fourth evaluation score of the user to be evaluated.
It should be further noted that, in the present application, the voice information with abnormal scoring is respectively transmitted to the assessment systems in different areas for re-assessment and mutual verification, so that the accuracy and objectivity of the voice assessment system can be improved.
And S106, determining the final spoken language evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
Here, determining the spoken language evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score, and the fourth evaluation score of the user to be evaluated may be: and taking the average score of the two evaluation scores as the final spoken language evaluation score of the user to be evaluated. The first evaluation score and the third evaluation score are subjected to mean processing to determine a fifth evaluation score, the second evaluation score and the fourth evaluation score are subjected to mean processing to determine a sixth evaluation score, and the score obtained by adding the fifth evaluation score and the sixth evaluation score is determined as the final spoken language evaluation score of the user to be evaluated.
In addition, a difference threshold value of the scores of the first evaluation score and the second evaluation score can be set, when the score difference of the scores of the two evaluation scores is smaller than the preset difference threshold value, the average score of the first evaluation score and the average score of the second evaluation score are selected as the final evaluation score, and when the score difference of the scores of the two evaluation scores is larger than or equal to the preset difference threshold value, the highest score is selected as the final evaluation score.
Therefore, the method and the device can evaluate the spoken language ability of the user to be evaluated more comprehensively by evaluating the reading voice, the free statement voice, the reading text and the free statement text, and can improve the accuracy of the evaluation result by evaluating the user to be evaluated secondarily by the evaluation systems used by different users. In addition, the application also discloses a technical scheme for carrying out multi-dimensional evaluation on the free statement of the spoken language based on text recognition, and solves the problem that the conventional spoken language evaluation system cannot score the free statement content of the user.
Referring to fig. 2 and 3, fig. 2 is a schematic structural diagram of a spoken language evaluation system based on text and speech recognition according to an embodiment of the present disclosure, and fig. 3 is a second schematic structural diagram of a spoken language evaluation system based on text and speech recognition according to an embodiment of the present disclosure. As shown in fig. 2, the spoken language evaluation system 200 includes:
the obtaining module 210 is configured to obtain a reading voice, a free statement voice, a reading text corresponding to the reading voice, and a free statement text corresponding to the free statement voice of a user to be evaluated; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question;
a recognition module 220, configured to determine a first evaluation score of the reading portion based on the reading voice and the reading text, and determine a second evaluation score of the free statement portion based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed;
a first determining module 230, configured to determine, based on the first evaluation score and the second evaluation score, a first evaluation score difference and a second evaluation score difference under each evaluation dimension; the first evaluation score difference is a score difference value of a first evaluation score and a second evaluation score;
the query module 240 is configured to query a spoken language evaluation score meeting preset requirements from a spoken language evaluation database according to the first evaluation score and the second evaluation score when the first evaluation score difference is greater than the first threshold score or any one of the second evaluation score differences is greater than the second threshold score, and determine a target user corresponding to the spoken language evaluation score;
the evaluation module 250 is configured to re-grade the reading voice and the free statement voice of the user to be evaluated respectively through the spoken language evaluation system used by the target user, so as to obtain a third evaluation score and a fourth evaluation score of the user to be evaluated; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion;
and the second determining module 260 is configured to determine a final spoken language evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
Optionally, when the identifying module 220 is configured to determine the second evaluation score of the user to be evaluated and the evaluation sub-score of each evaluation dimension included in the second evaluation score by:
respectively extracting evaluation features of each assigned section, and determining evaluation parameters of various evaluation features included in each assigned section;
for each assigned segment, determining an initial segment evaluation score of the assigned segment in each scoring dimension based on the evaluation parameters of various evaluation features included in the assigned segment, the part of freely-stated voices corresponding to the assigned segment, the initial scoring weight in each scoring dimension and the reference score of the assigned segment;
aiming at each assigned segment, respectively adjusting the initial scoring weight under the corresponding scoring dimension based on the evaluation parameter of each evaluation feature included in the assigned segment, and determining the target scoring weight of each scoring dimension;
for each assigned segment, determining a target segment evaluation score of the assigned segment in each evaluation dimension based on the initial segment evaluation score of the assigned segment in each evaluation dimension, the initial evaluation weight of the assigned segment in each evaluation dimension and the target evaluation weight;
and determining a second evaluation score of the user to be evaluated and evaluation sub-scores of each evaluation dimension included by the second evaluation score based on the evaluation score of each assigned segment in each evaluation dimension.
Optionally, the scoring dimension includes at least one of: identification degree, tone, fluency and accuracy of sound.
Optionally, the assessment features include at least one of: the number of text events, the relevance of answer content to topic topics, the number of word vectors and the number of lexical syllables.
Optionally, when the identifying module 220 is configured to, for each assigned segment, adjust an initial scoring weight in a corresponding scoring dimension based on a scoring parameter of each scoring feature included in the assigned segment, and determine a target scoring weight of each scoring dimension, the identifying module 220 is configured to:
based on the number of the text events included in the segmentation class and the mapping relation between the number of the text events and the weight, adjusting the initial scoring weight of the identifiability, and determining the target scoring weight of the identifiability;
adjusting the initial scoring weight of the tone based on the relevancy between the segmented answer content and the topic theme and the mapping relation between the relevancy and the weight, and determining the target scoring weight of the tone;
adjusting the initial scoring weight of the fluency degree based on the number of the word vectors included in the segmentation-assigned segment and the mapping relation between the number of the word vectors and the weight, and determining the target scoring weight of the fluency degree;
and adjusting the initial scoring weight of the intonation and determining the target scoring weight of the intonation based on the number of the vocabulary syllables included in the segmentation and the mapping relation between the number of the vocabulary syllables and the weight.
Optionally, when the identifying module 220 is configured to determine the relevance of the answer content to the topic, the identifying module 220 is configured to:
obtaining a topic word vector corresponding to the topic text and a paragraph word vector corresponding to the assigned segment; the question text is a text obtained according to the test question;
clustering the topic word vector and the paragraph word vector respectively to obtain at least one first characteristic cluster corresponding to the topic word vector and at least one second characteristic cluster corresponding to the paragraph word vector;
extracting a central vector of each first feature cluster as a first topic vector, and extracting a central vector of each second feature cluster as a second topic vector;
performing weighted summation on all the first topic vectors to obtain topic vectors, and performing weighted summation on all the second topic vectors to obtain paragraph topic vectors;
and determining the correlation degree of the answer content and the topic based on the topic vector and the paragraph topic vector.
Optionally, as shown in fig. 3, the spoken language evaluation system 200 further includes a third determining module 270, where the third determining module 270 is configured to:
inquiring reading oral marks of which the difference value between the reading part and the first evaluation mark is less than a third threshold mark and the test sub-mark difference under the same evaluation dimension is less than a fourth threshold mark from a oral mark database;
inquiring the free statement spoken language score of which the difference between the free statement part and the second evaluation score is less than a third threshold score and the test sub-score difference under the same evaluation dimension is less than a fourth threshold score from a spoken language score database;
and determining the users corresponding to the found reading spoken language score and the free statement spoken language score as target users.
Optionally, when the identifying module 220 is configured to perform initial evaluation on the narrative text and determine a respective reference score of each assigned segment included in the narrative text, the identifying module 220 is configured to:
determining an initial evaluation score of a user to be evaluated based on the number of words or the number of words included in the free statement text according to a first assignment rule;
for each assigned segment, determining a reference score of the assigned segment according to a second assignment rule based on the initial evaluation score and the text content of the assigned segment; the sum of the reference scores of all assigned segments equals the initial assessment score.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.
The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 runs, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the spoken language assessment method in the method embodiment shown in fig. 1 may be performed.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the spoken language assessment method in the embodiment of the method shown in fig. 1 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A spoken language assessment method based on text and speech recognition is characterized by comprising the following steps:
acquiring reading voice, free statement voice, reading text corresponding to the reading voice and free statement text corresponding to the free statement voice of a user to be evaluated; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question;
determining a first evaluation score of the reading part based on the reading voice and the reading text, and determining a second evaluation score of the free statement part based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed;
determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; the first evaluation score difference is a score difference value of a first evaluation score and a second evaluation score;
when the first evaluation difference is larger than a first threshold score or any second evaluation difference is larger than a second threshold score, inquiring a spoken language evaluation score meeting preset requirements from a spoken language evaluation database according to the first evaluation score and the second evaluation score, and determining a target user corresponding to the spoken language evaluation score;
re-scoring the reading voice and the free statement voice of the user to be assessed respectively through the spoken language assessment system used by the target user to obtain a third assessment score and a fourth assessment score of the user to be assessed; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion;
and determining the final oral evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
2. The spoken language assessment method according to claim 1, wherein a second assessment score of a user to be assessed and an assessment sub-score for each scoring dimension comprised by said second assessment score are determined by:
performing initial evaluation on the free statement text, and determining a reference score of each assigned segment included in the free statement text;
respectively extracting evaluation features of each assigned section, and determining evaluation parameters of various evaluation features included in each assigned section;
for each assigned segment, determining an initial segment evaluation score of the assigned segment in each scoring dimension based on the evaluation parameters of various evaluation features included in the assigned segment, the part of freely-stated voices corresponding to the assigned segment, the initial scoring weight in each scoring dimension and the reference score of the assigned segment;
aiming at each assigned segment, respectively adjusting the initial scoring weight under the corresponding scoring dimension based on the evaluation parameter of each evaluation feature included in the assigned segment, and determining the target scoring weight of each scoring dimension;
for each assigned segment, determining a target segment evaluation score of the assigned segment in each evaluation dimension based on the initial segment evaluation score of the assigned segment in each evaluation dimension, the initial evaluation weight of the assigned segment in each evaluation dimension and the target evaluation weight;
and determining a second evaluation score of the user to be evaluated and evaluation sub-scores of each evaluation dimension included by the second evaluation score based on the evaluation score of each assigned segment in each evaluation dimension.
3. The spoken language assessment method according to claim 2, wherein the scoring dimensions comprise at least one of: identification degree, tone, fluency and accuracy of sound.
4. The spoken language assessment method according to claim 3, wherein said assessment features comprise at least one of: the number of text events, the relevance of answer content to topic topics, the number of word vectors and the number of lexical syllables.
5. The spoken language evaluation method according to claim 4, wherein the adjusting, for each assigned segment, the initial scoring weight in the corresponding scoring dimension based on the evaluation parameter of each evaluation feature included in the assigned segment, respectively, and determining the target scoring weight in each scoring dimension comprises:
based on the number of the text events included in the segmentation class and the mapping relation between the number of the text events and the weight, adjusting the initial scoring weight of the identifiability, and determining the target scoring weight of the identifiability;
adjusting the initial scoring weight of the tone based on the relevancy between the segmented answer content and the topic theme and the mapping relation between the relevancy and the weight, and determining the target scoring weight of the tone;
adjusting the initial scoring weight of the fluency degree based on the number of the word vectors included in the segmentation-assigned segment and the mapping relation between the number of the word vectors and the weight, and determining the target scoring weight of the fluency degree;
and adjusting the initial scoring weight of the intonation and determining the target scoring weight of the intonation based on the number of the vocabulary syllables included in the segmentation and the mapping relation between the number of the vocabulary syllables and the weight.
6. The spoken language assessment method according to claim 5, wherein the relevance of the answer content to the topic is determined by:
obtaining a topic word vector corresponding to the topic text and a paragraph word vector corresponding to the assigned segment; the question text is a text obtained according to the test question;
clustering the topic word vector and the paragraph word vector respectively to obtain at least one first characteristic cluster corresponding to the topic word vector and at least one second characteristic cluster corresponding to the paragraph word vector;
extracting a central vector of each first feature cluster as a first topic vector, and extracting a central vector of each second feature cluster as a second topic vector;
performing weighted summation on all the first topic vectors to obtain topic vectors, and performing weighted summation on all the second topic vectors to obtain paragraph topic vectors;
and determining the correlation degree of the answer content and the topic based on the topic vector and the paragraph topic vector.
7. The method according to claim 1, wherein the step of querying the spoken language evaluation score meeting preset requirements from the spoken language evaluation database according to the first evaluation score and the second evaluation score and determining the target user corresponding to the spoken language evaluation score comprises:
inquiring reading oral marks of which the difference value between the reading part and the first evaluation mark is less than a third threshold mark and the test sub-mark difference under the same evaluation dimension is less than a fourth threshold mark from a oral mark database;
inquiring the free statement spoken language score of which the difference between the free statement part and the second evaluation score is less than a third threshold score and the test sub-score difference under the same evaluation dimension is less than a fourth threshold score from a spoken language score database;
and determining the users corresponding to the found reading spoken language scores and the free statement spoken language scores as target users respectively.
8. The spoken language assessment method according to claim 2, wherein said preliminary assessment of said free statement text and determining a respective reference score for each assigned segment included in said free statement text comprises:
determining an initial evaluation score of a user to be evaluated based on the number of words or the number of words included in the free statement text according to a first assignment rule;
for each assigned segment, determining a reference score of the assigned segment according to a second assignment rule based on the initial evaluation score and the text content of the assigned segment; the sum of the reference scores of all assigned segments equals the initial assessment score.
9. A spoken language assessment system based on text and speech recognition, the spoken language assessment system comprising:
the acquisition module is used for acquiring reading voice, free statement voice, reading text corresponding to the reading voice and free statement text corresponding to the free statement voice of a user to be evaluated; the reading voice is the voice of the standard evaluation text read by the user to be evaluated, and the free statement voice is the voice of the user to be evaluated for freely stating the evaluation question;
the recognition module is used for determining a first evaluation score of the reading part based on the reading voice and the reading text and determining a second evaluation score of the free statement part based on the free statement voice and the free statement text; the first evaluation score and the second evaluation score are both composed of evaluation sub-scores under a plurality of evaluation dimensions; different scoring dimensions are used for representing the spoken language abilities of different aspects of the user to be assessed;
the first determining module is used for determining a first measurement grade difference and a second measurement grade difference under each grade dimension based on the first measurement grade and the second measurement grade; the first evaluation score difference is a score difference value of a first evaluation score and a second evaluation score;
the query module is used for querying a spoken language evaluation score meeting preset requirements from a spoken language evaluation database according to the first evaluation score and the second evaluation score and determining a target user corresponding to the spoken language evaluation score when the first evaluation score difference is larger than a first threshold score or any second evaluation score difference is larger than a second threshold score;
the evaluation module is used for re-evaluating the reading voice and the free statement voice of the user to be evaluated respectively through the spoken language evaluation system used by the target user to obtain a third evaluation score and a fourth evaluation score of the user to be evaluated; the reading voice is corresponding to the reading text before voice text conversion, and the free statement voice is corresponding to the free statement text before voice text conversion;
and the second determining module is used for determining the final oral evaluation score of the user to be evaluated based on the first evaluation score, the second evaluation score, the third evaluation score and the fourth evaluation score of the user to be evaluated.
10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operated, the machine-readable instructions being executable by the processor to perform the steps of the spoken language assessment method according to any one of claims 1 to 8.
11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the spoken language assessment method according to any one of claims 1 to 8.
CN202210402853.2A 2022-04-18 2022-04-18 Spoken language evaluation method and system based on text and voice recognition Pending CN114842875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210402853.2A CN114842875A (en) 2022-04-18 2022-04-18 Spoken language evaluation method and system based on text and voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210402853.2A CN114842875A (en) 2022-04-18 2022-04-18 Spoken language evaluation method and system based on text and voice recognition

Publications (1)

Publication Number Publication Date
CN114842875A true CN114842875A (en) 2022-08-02

Family

ID=82565276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210402853.2A Pending CN114842875A (en) 2022-04-18 2022-04-18 Spoken language evaluation method and system based on text and voice recognition

Country Status (1)

Country Link
CN (1) CN114842875A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385230A (en) * 2023-06-07 2023-07-04 北京奇趣万物科技有限公司 Child reading ability evaluation method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385230A (en) * 2023-06-07 2023-07-04 北京奇趣万物科技有限公司 Child reading ability evaluation method and system

Similar Documents

Publication Publication Date Title
CN101105939B (en) Sonification guiding method
KR100733469B1 (en) Pronunciation Test System and Method of Foreign Language
Dalby et al. Explicit pronunciation training using automatic speech recognition technology
JP4002401B2 (en) Subject ability measurement system and subject ability measurement method
US9514109B2 (en) Computer-implemented systems and methods for scoring of spoken responses based on part of speech patterns
Aulia et al. A comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters pronounciation classification system
US9087519B2 (en) Computer-implemented systems and methods for evaluating prosodic features of speech
US10755595B1 (en) Systems and methods for natural language processing for speech content scoring
AU2003300130A1 (en) Speech recognition method
Peabody Methods for pronunciation assessment in computer aided language learning
CN110415725B (en) Method and system for evaluating pronunciation quality of second language using first language data
CN113486970B (en) Reading capability evaluation method and device
CN114842875A (en) Spoken language evaluation method and system based on text and voice recognition
Ryu Korean vowel identification by English and Mandarin listeners: Effects of L1-L2 vowel inventory size and acoustic relationship
AT&T
Ng et al. Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment
Choi et al. Evaluation of English speaking proficiency under fixed speech rate: Focusing on utterances produced by Korean child learners of English
CN117275319B (en) Device for training language emphasis ability
CN113035237B (en) Voice evaluation method and device and computer equipment
KR101949880B1 (en) System for identifying and evaluating error of composition
KR20110024624A (en) System and method for evaluating foreign language pronunciation
Shengnan The Relationship between Metalinguistic Knowledge and the Identification of Thai Vowel Length by Chinese Learners before and after Praxis Intervention.
KR20240011076A (en) Language accuracy measurement system and method
Li et al. Education of Recognition Training Combined with Hidden Markov Model to Explore English Speaking
Digor et al. Instrumental shell for pronunciation training simulator design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination