CN110164447B - Spoken language scoring method and device - Google Patents

Spoken language scoring method and device Download PDF

Info

Publication number
CN110164447B
CN110164447B CN201910266167.5A CN201910266167A CN110164447B CN 110164447 B CN110164447 B CN 110164447B CN 201910266167 A CN201910266167 A CN 201910266167A CN 110164447 B CN110164447 B CN 110164447B
Authority
CN
China
Prior art keywords
target
answer text
text information
reference answer
answering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910266167.5A
Other languages
Chinese (zh)
Other versions
CN110164447A (en
Inventor
彭书勇
方敏
戚自力
孙婷婷
林远东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Chivox Information Technology Co ltd
Original Assignee
Suzhou Chivox Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Chivox Information Technology Co ltd filed Critical Suzhou Chivox Information Technology Co ltd
Priority to CN201910266167.5A priority Critical patent/CN110164447B/en
Publication of CN110164447A publication Critical patent/CN110164447A/en
Application granted granted Critical
Publication of CN110164447B publication Critical patent/CN110164447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention relates to a spoken language scoring method and a spoken language scoring device, wherein the method comprises the following steps: acquiring a voice file after answering the spoken language test question; decoding answering text information from the voice file; preprocessing a reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text content, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same; extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information; and inputting the content characteristics into a preset training model and outputting corresponding spoken language scores. The invention can improve the scoring efficiency of spoken language answers in the open question type.

Description

Spoken language scoring method and device
Technical Field
The invention relates to the technical field of computers, in particular to a spoken language scoring method and device.
Background
The test of the spoken open questions refers to an examination mechanism for scoring the content of the student spoken answers through the spoken test questions in the examination.
At present, the test of the open topic of the spoken language cannot realize automatic scoring, and the scoring difficulty of the spoken language test is increased.
Disclosure of Invention
Therefore, it is necessary to provide a method and an apparatus for spoken language scoring for solving the problem that the current spoken language examination cannot be automatically scored.
A method of spoken language scoring, the method comprising:
acquiring a voice file after answering the spoken language test question;
decoding answering text information from the voice file;
preprocessing a reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text content, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;
extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;
and inputting the content characteristics into a preset training model and outputting corresponding spoken language scores.
Preferably, the extracting content features representing semantic similarity between the target reference answer text and the target answer text information includes:
based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of the target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;
counting the frequency of the reference answer keywords appearing in the target answering text information to obtain a keyword hit rate;
counting the hit rate of the N-grams of the target answer text information in the target reference answer text, and taking the maximum value as a Jacard similarity coefficient;
respectively obtaining document vector representations of the target reference answer text and the target answer text information in an idf value weighted word vector mode, calculating an included angle cosine value between the document vector of the target reference answer text and the document vector of the target answer text information, and taking the largest included angle cosine value as cosine similarity;
determining a word shift distance between the target reference answer text and the target answer text information;
and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as the content characteristics.
Preferably, the preprocessing the reference answer text and the answer text information includes:
removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and the method comprises the following steps: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, said morpheme reduction referring to the conversion of a word of a given form into a basic morpheme.
Preferably, the inputting the content features into a preset training model and outputting corresponding spoken language scores includes:
and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting a corresponding spoken language score.
A spoken language scoring apparatus, comprising:
the acquisition module is used for acquiring the voice file after answering the spoken language test question;
the decoding module is used for decoding the answering text information from the voice file;
the processing module is used for preprocessing a reference answer text and the answering text information and respectively and correspondingly generating a target reference answer text and target answering text information with less text contents, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;
the extraction module is used for extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;
and the output module is used for inputting the content characteristics into a preset training model and outputting corresponding spoken language scores.
Preferably, the extraction module is configured to:
based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of the target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;
counting the frequency of the reference answer keywords appearing in the target answering text information to obtain a keyword hit rate;
counting the hit rate of the N-grams of the target answer text information in the target reference answer text, and taking the maximum value as a Jacard similarity coefficient;
respectively obtaining document vector representations of the target reference answer text and the target answer text information in an idf value weighted word vector mode, calculating an included angle cosine value between the document vector of the target reference answer text and the document vector of the target answer text information, and taking the largest included angle cosine value as cosine similarity;
determining a word shift distance between the target reference answer text and the target answer text information;
and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as the content characteristics.
Preferably, the processing module is configured to:
removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and the method comprises the following steps: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, said morpheme reduction referring to the conversion of a word of a given form into a basic morpheme.
Preferably, the output module is configured to:
and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting a corresponding spoken language score.
In the open-ended answer form, the spoken language of the examinee is answered, the spoken language scoring is a quantitative representation of whether the examinee answers questions or not, the matching degree of the answered content and the reference answer can be directly fed back and informed to the students, the rationality of the final overall scoring is also determined, and therefore the scoring efficiency of the spoken language answering in the open-ended answer form is improved; the extraction of the key words in the voice text answered by the examinee is the general description of the expression of the student, and the information feedback can enable the student to have good self-cognition. The synonym expansion is carried out after the keywords of the reference answers are extracted, and the part of information is used as feedback, so that the vocabulary of students can be enriched, and more expression modes can be encouraged.
Drawings
FIG. 1 is a flow chart of a spoken language scoring method according to an example embodiment;
fig. 2 is a block diagram of a spoken language scoring device according to an example embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is a flow chart of a spoken language scoring method according to an example embodiment. As shown in fig. 1, the method includes:
step 110, acquiring a voice file after answering the spoken language test question;
step 120, decoding answer text information from the voice file;
step 130, preprocessing the reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text contents, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;
step 140, extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;
and 150, inputting the content characteristics into a preset training model and outputting corresponding spoken language scores.
In this embodiment, in an open-topic scenario, the voice file of the student is decoded, and the answering text information answered by the student can be obtained. The decoding process may be implemented by a speech recognition system.
In this embodiment, in step 140, the extracting content features representing semantic similarity between the target reference answer text and the target answer text information includes:
based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of a target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;
counting the frequency of the keywords of the reference answers appearing in the target answering text information to obtain the keyword hit rate;
counting hit rates of N-grams (also called N-grams) of the target reference answer text information in the target reference answer text, and taking the maximum value of the hit rates as a Jacard similarity coefficient;
respectively obtaining document vector representations of the target reference answer text and the target answering text information in an idf value weighting word vector mode, calculating the cosine value of an included angle between the document vector of the target reference answer text and the document vector of the target answering text information, and taking the largest included angle cosine value as cosine similarity;
determining a word shift distance between the target reference answer text and the target answer text information;
and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as content characteristics.
In this embodiment, the content features include four parameters, which are specifically as follows:
1) keyword hit rate. The extraction operation is as follows:
first, keywords of a target reference answer text are extracted. In the preprocessed target reference answer text, according to predefined word inverse document frequency (idf) information, selecting a plurality of words with larger idf values to be subjected to synonym expansion to serve as key words of the reference answer text marked by each item, and then taking the union set of the key words as the key words of the target reference answer text. The keyword list obtained at this time may be regarded as the point of the score content specified by the expert or teacher. Here, the idf value represents the importance degree of each word to the semantic expression of the sentence, and can be obtained by collecting and counting spoken language data of students through an online system.
Secondly, the frequency of the keywords of the target reference answer text appearing in the target answer text information of the student, namely, the keyword hit rate is counted. Particularly, if a plurality of keywords which are synonyms of each other appear in the text of the student at the same time, the student is considered to hit the corresponding point of scoring, and the counting is not repeated; in addition, whether a negative word exists before the keyword can be considered when the matching is hit, and if the negative word exists, the keyword is deleted.
2) Jaccard's (Jaccard) similarity coefficient. The coefficient can be used for comparing the similarity and the difference between two character strings, and the larger the value of the coefficient is, the higher the similarity between the two character strings is. Here, the Jaccard coefficient statistics is N-gram (N-gram) word hit rate of student text in each reference answer, and takes the maximum value as a feature, and the formula is as follows:
Figure GDA0003061203900000051
wherein REC _ N is an N-gram list of the test taker text, REFER _ NiThe N-gram list for the ith reference answer, | X | represents the number of different N-grams contained in the text X, N represents the number of words in each phrase, and N ═ 1,2,3,4,5 }.
3) Cosine similarity of the word vectors is weighted based on idf. Each word in the word vector model has vector representation and can be used for quantitatively describing the relation between words, and in an intuitive sense, word vectors corresponding to two words with similar semantics are closer in a vector space. And selecting an idf value weighted word vector mode, respectively obtaining document vector representations of the item mark reference answer text and the target answer text information, calculating the cosine value of an included angle, and taking the maximum value as the cosine similarity characteristic.
Here, the training data of the word vector model is derived from the student spoken training audio collected by the online system and is trained by decoding the text thereof.
4) Word shift distance (WMD) based on word vector space.
In the word vector space, WMD can be understood as the minimum total cost required to convert text a to text B, which is obtained by weighted summation of word-to-word movement costs in two texts. Here, the word-to-word movement cost is measured by the euclidean distance between the normalized word vectors of the two.
In this embodiment, in step 130, the preprocessing of the reference answer text and the answer text information includes:
removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and include but are not limited to: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, word-shape reduction refers to the conversion of a given form of word into a basic word-shape.
In particular, the preprocessing operation may effectively reduce text. The word form reduction means that words in a given form are converted into basic word forms, for example, verb participles are converted into a general form, and noun plural numbers are converted into singular numbers, so that texts can be effectively simplified on the premise of ensuring that semantics are not changed.
In this embodiment, in step 150, inputting the content features into a preset training model and outputting corresponding spoken language scores, including:
and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting the corresponding spoken language score.
During the model training and testing phase, the model selects support vector machine regression (SVR). For the input student audio, the operation of the steps is needed, the content characteristics are extracted, and then the extracted content characteristics are used as the input of the SVR. In the training stage, the audio of the students in the training set is manually marked with content scores, and the content scores are used as labels to obtain model parameters in an adaptive manner through SVR fitting; while in the testing phase, the content scores are output by the model. According to the output content scores, feedback such as the quality of spoken answers of students can be reasonably given.
In the open-ended answer form, the spoken language of the examinee is answered, the spoken language scoring is a quantitative representation of whether the examinee answers questions or not, the matching degree of the answered content and the reference answer can be directly fed back and informed to the students, the rationality of the final overall scoring is also determined, and therefore the scoring efficiency of the spoken language answering in the open-ended answer form is improved; the extraction of the key words in the voice text answered by the examinee is the general description of the expression of the student, and the information feedback can enable the student to have good self-cognition. The synonym expansion is carried out after the keywords of the reference answers are extracted, and the part of information is used as feedback, so that the vocabulary of students can be enriched, and more expression modes can be encouraged.
Fig. 2 is a block diagram of a spoken language scoring device according to an example embodiment. As shown in fig. 2, the apparatus includes:
an obtaining module 210, configured to obtain a voice file after answering a spoken language test question;
a decoding module 220, configured to decode the answering text information from the voice file;
the processing module 230 is configured to pre-process the reference answer text and the answering text information, and generate a target reference answer text and a target answering text information with fewer text contents, where the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;
an extracting module 240, configured to extract content features representing semantic similarities between the target reference answer text and the target response text information based on the target reference answer text and the target response text information;
and the output module 250 is used for inputting the content characteristics into a preset training model and outputting the corresponding spoken language scores.
Preferably, the extraction module 210 is configured to:
based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of a target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;
counting the frequency of the keywords of the reference answers appearing in the target answering text information to obtain the keyword hit rate;
counting the hit rate of N-grams of the target reference answer text information in the target reference answer text, and taking the maximum value as the Jacard similarity coefficient;
respectively obtaining document vector representations of the target reference answer text and the target answering text information in an idf value weighting word vector mode, calculating the cosine value of an included angle between the document vector of the target reference answer text and the document vector of the target answering text information, and taking the largest included angle cosine value as cosine similarity;
determining a word shift distance between the target reference answer text and the target answer text information;
and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as content characteristics.
Preferably, the processing module 230 is configured to:
removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and include but are not limited to: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, word-shape reduction refers to the conversion of a given form of word into a basic word-shape.
Preferably, the output module 250 is configured to:
and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting the corresponding spoken language score.
It should be noted that, the implementation process of the above apparatus in this embodiment is the same as the implementation process of the above method, and the specific process may refer to the implementation process of the above method, which is not specifically set forth in this embodiment.
In the open-ended answer form, the spoken language of the examinee is answered, the spoken language scoring is a quantitative representation of whether the examinee answers questions or not, the matching degree of the answered content and the reference answer can be directly fed back and informed to the students, the rationality of the final overall scoring is also determined, and therefore the scoring efficiency of the spoken language answering in the open-ended answer form is improved; the extraction of the key words in the voice text answered by the examinee is the general description of the expression of the student, and the information feedback can enable the student to have good self-cognition. The synonym expansion is carried out after the keywords of the reference answers are extracted, and the part of information is used as feedback, so that the vocabulary of students can be enriched, and more expression modes can be encouraged.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A spoken language scoring method, the method comprising:
acquiring a voice file after answering the spoken language test question;
decoding answering text information from the voice file;
preprocessing a reference answer text and the answering text information, and respectively and correspondingly generating a target reference answer text and target answering text information with less text content, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;
extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;
inputting the content characteristics into a preset training model and outputting corresponding spoken language scores;
the extracting of the content features representing the semantic similarity between the target reference answer text and the target answer text information comprises:
based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of the target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;
counting the frequency of the reference answer keywords appearing in the target answering text information to obtain a keyword hit rate;
counting the hit rate of the N-grams of the target answer text information in the target reference answer text, and taking the maximum value as a Jacard similarity coefficient;
respectively obtaining document vector representations of the target reference answer text and the target answer text information in an idf value weighted word vector mode, calculating an included angle cosine value between the document vector of the target reference answer text and the document vector of the target answer text information, and taking the largest included angle cosine value as cosine similarity;
determining a word shift distance between the target reference answer text and the target answer text information;
and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as the content characteristics.
2. The method of claim 1, wherein the preprocessing the reference answer text and the answer text information comprises:
removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and the method comprises the following steps: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, said morpheme reduction referring to the conversion of a word of a given form into a basic morpheme.
3. The method of claim 1, wherein inputting the content features into a preset training model and outputting corresponding spoken scores comprises:
and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting a corresponding spoken language score.
4. A spoken language scoring apparatus, comprising:
the acquisition module is used for acquiring the voice file after answering the spoken language test question;
the decoding module is used for decoding the answering text information from the voice file;
the processing module is used for preprocessing a reference answer text and the answering text information and respectively and correspondingly generating a target reference answer text and target answering text information with less text contents, wherein the semantics of the target reference answer text and the reference answer text are the same, and the semantics of the target answering text information and the answering text information are the same;
the extraction module is used for extracting content characteristics representing semantic similarity of the target reference answer text and the target answering text information based on the target reference answer text and the target answering text information;
the output module is used for inputting the content characteristics into a preset training model and outputting corresponding spoken language scores;
the extraction module is configured to:
based on predefined word inverse document frequency information, selecting a plurality of words with larger word inverse document frequency values as synonym expansion to be used as keywords of the target reference answer text, and then taking a union set of the keywords to obtain reference answer keywords;
counting the frequency of the reference answer keywords appearing in the target answering text information to obtain a keyword hit rate;
counting the hit rate of the N-grams of the target answer text information in the target reference answer text, and taking the maximum value as a Jacard similarity coefficient;
respectively obtaining document vector representations of the target reference answer text and the target answer text information in an idf value weighted word vector mode, calculating an included angle cosine value between the document vector of the target reference answer text and the document vector of the target answer text information, and taking the largest included angle cosine value as cosine similarity;
determining a word shift distance between the target reference answer text and the target answer text information;
and taking the hit rate of the keywords, the Jacard similarity coefficient, the cosine similarity and the word shift distance as the content characteristics.
5. The apparatus of claim 4, wherein the processing module is configured to:
removing stop words and word morphology reduction from the reference answer text and the answer text information respectively, wherein the stop words refer to words which have no influence on content expression in sentences, and the method comprises the following steps: articles, prepositions, conjunctions, mood words, adverbs commonly used as conjunctions, said morpheme reduction referring to the conversion of a word of a given form into a basic morpheme.
6. The apparatus of claim 4, wherein the output module is configured to:
and inputting the content characteristics into the SVR model based on the SVR model generated by training, and outputting a corresponding spoken language score.
CN201910266167.5A 2019-04-03 2019-04-03 Spoken language scoring method and device Active CN110164447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910266167.5A CN110164447B (en) 2019-04-03 2019-04-03 Spoken language scoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910266167.5A CN110164447B (en) 2019-04-03 2019-04-03 Spoken language scoring method and device

Publications (2)

Publication Number Publication Date
CN110164447A CN110164447A (en) 2019-08-23
CN110164447B true CN110164447B (en) 2021-07-27

Family

ID=67638457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910266167.5A Active CN110164447B (en) 2019-04-03 2019-04-03 Spoken language scoring method and device

Country Status (1)

Country Link
CN (1) CN110164447B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852069A (en) * 2019-10-24 2020-02-28 大唐融合通信股份有限公司 Text relevance scoring method and system
CN110706536B (en) * 2019-10-25 2021-10-01 北京猿力教育科技有限公司 Voice answering method and device
CN110797010A (en) * 2019-10-31 2020-02-14 腾讯科技(深圳)有限公司 Question-answer scoring method, device, equipment and storage medium based on artificial intelligence
CN111414456A (en) * 2020-03-20 2020-07-14 北京师范大学 Method and system for automatically scoring open type short answer questions
CN112289308A (en) * 2020-10-23 2021-01-29 上海凯石信息技术有限公司 Voice dictation scoring method and device and electronic equipment
CN113065311A (en) * 2021-02-26 2021-07-02 成都环宇知了科技有限公司 Scoring method and system for processing Power Point manuscript content based on OpenXml
CN112992154A (en) * 2021-05-08 2021-06-18 北京远鉴信息技术有限公司 Voice identity determination method and system based on enhanced voiceprint library
CN115204143B (en) * 2022-09-19 2022-12-20 江苏移动信息系统集成有限公司 Method and system for calculating text similarity based on prompt

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090070100A1 (en) * 2007-09-11 2009-03-12 International Business Machines Corporation Methods, systems, and computer program products for spoken language grammar evaluation
CN101520802A (en) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 Question-answer pair quality evaluation method and system
CN102509483A (en) * 2011-10-31 2012-06-20 苏州思必驰信息科技有限公司 Distributive automatic grading system for spoken language test and method thereof
CN103955874A (en) * 2014-03-31 2014-07-30 西南林业大学 Automatic subjective-question scoring system and method based on semantic similarity interval
US20140229161A1 (en) * 2013-02-12 2014-08-14 International Business Machines Corporation Latent semantic analysis for application in a question answer system
CN104504023A (en) * 2014-12-12 2015-04-08 广西师范大学 High-accuracy computer automatic marking method for subjective items based on domain ontology
CN107273861A (en) * 2017-06-20 2017-10-20 广东小天才科技有限公司 A kind of subjective question marking methods of marking, device and terminal device
CN107436864A (en) * 2017-08-04 2017-12-05 逸途(北京)科技有限公司 A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec
CN108960574A (en) * 2018-06-07 2018-12-07 百度在线网络技术(北京)有限公司 Quality determination method, device, server and the storage medium of question and answer

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10108697B1 (en) * 2013-06-17 2018-10-23 The Boeing Company Event matching by analysis of text characteristics (e-match)
CN104050256B (en) * 2014-06-13 2017-05-24 西安蒜泥电子科技有限责任公司 Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
CN106847260B (en) * 2016-12-20 2020-02-21 山东山大鸥玛软件股份有限公司 Automatic English spoken language scoring method based on feature fusion

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090070100A1 (en) * 2007-09-11 2009-03-12 International Business Machines Corporation Methods, systems, and computer program products for spoken language grammar evaluation
CN101520802A (en) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 Question-answer pair quality evaluation method and system
CN102509483A (en) * 2011-10-31 2012-06-20 苏州思必驰信息科技有限公司 Distributive automatic grading system for spoken language test and method thereof
US20140229161A1 (en) * 2013-02-12 2014-08-14 International Business Machines Corporation Latent semantic analysis for application in a question answer system
CN103955874A (en) * 2014-03-31 2014-07-30 西南林业大学 Automatic subjective-question scoring system and method based on semantic similarity interval
CN104504023A (en) * 2014-12-12 2015-04-08 广西师范大学 High-accuracy computer automatic marking method for subjective items based on domain ontology
CN107273861A (en) * 2017-06-20 2017-10-20 广东小天才科技有限公司 A kind of subjective question marking methods of marking, device and terminal device
CN107436864A (en) * 2017-08-04 2017-12-05 逸途(北京)科技有限公司 A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec
CN108960574A (en) * 2018-06-07 2018-12-07 百度在线网络技术(北京)有限公司 Quality determination method, device, server and the storage medium of question and answer

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
主观题自动评分模型研究与验证;何超;《中国优秀硕士学位论文全文数据库,信息科技辑》;20170515(第5期);第34-39页 *

Also Published As

Publication number Publication date
CN110164447A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110164447B (en) Spoken language scoring method and device
CN110543639B (en) English sentence simplification algorithm based on pre-training transducer language model
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
US11182435B2 (en) Model generation device, text search device, model generation method, text search method, data structure, and program
CN108536654B (en) Method and device for displaying identification text
EP3862889A1 (en) Responding to user queries by context-based intelligent agents
US7587308B2 (en) Word recognition using ontologies
US20200183983A1 (en) Dialogue System and Computer Program Therefor
CN111339283B (en) Method and device for providing customer service answers aiming at user questions
US20080059147A1 (en) Methods and apparatus for context adaptation of speech-to-speech translation systems
KR20160026892A (en) Non-factoid question-and-answer system and method
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN101551947A (en) Computer system for assisting spoken language learning
US11031009B2 (en) Method for creating a knowledge base of components and their problems from short text utterances
US20210117458A1 (en) Response selecting apparatus, response selecting method, and response selecting program
CN110096572B (en) Sample generation method, device and computer readable medium
KR101255957B1 (en) Method and apparatus for tagging named entity
Malandrakis et al. Sail: Sentiment analysis using semantic similarity and contrast features
CN107562907B (en) Intelligent lawyer expert case response device
CN110059318B (en) Discussion question automatic evaluation method based on Wikipedia and WordNet
Vincent et al. Personalised language modelling of screen characters using rich metadata annotations
CN115292492A (en) Method, device and equipment for training intention classification model and storage medium
Anantaram et al. Adapting general-purpose speech recognition engine output for domain-specific natural language question answering
WO2023098971A1 (en) Method and apparatus for self-supervised extractive question answering
Zajíc et al. First insight into the processing of the language consulting center data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant