CN113470651A - Voice scoring method and system based on abstract extraction - Google Patents

Voice scoring method and system based on abstract extraction Download PDF

Info

Publication number
CN113470651A
CN113470651A CN202110625268.4A CN202110625268A CN113470651A CN 113470651 A CN113470651 A CN 113470651A CN 202110625268 A CN202110625268 A CN 202110625268A CN 113470651 A CN113470651 A CN 113470651A
Authority
CN
China
Prior art keywords
text
sentence
sentences
word
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110625268.4A
Other languages
Chinese (zh)
Inventor
李苏梅
陈泽铭
李心广
陈帅
吴伟源
卢树炜
马姗娴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Foreign Studies
Original Assignee
Guangdong University of Foreign Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Foreign Studies filed Critical Guangdong University of Foreign Studies
Priority to CN202110625268.4A priority Critical patent/CN113470651A/en
Publication of CN113470651A publication Critical patent/CN113470651A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a voice scoring method and a system based on abstract extraction, wherein the method comprises the following steps: obtaining a voice segment of an examinee to be scored, and segmenting the voice segment to obtain a plurality of voice sentences; performing text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words forming the text sentence; calculating a word vector for each of the text words; carrying out weighted average processing on the word vector of each text word in each text sentence to obtain the sentence vector of each text sentence, constructing a text network graph model, and carrying out iterative computation by adopting a TextRank algorithm to obtain the importance score of each text sentence; and acquiring text sentences meeting preset conditions, forming the abstract of the examinee speech segment, and scoring the translation content of the examinee speech segment. By adopting the method and the device, the voice of the examinee can be accurately recognized and the abstract information can be accurately extracted, so that the scoring accuracy of the voice of the examinee is improved.

Description

Voice scoring method and system based on abstract extraction
Technical Field
The invention relates to the technical field of voice recognition and evaluation, in particular to a voice scoring method and system based on abstract extraction.
Background
With the rapid development of computer science technologies, the application of leading-edge technologies such as artificial intelligence and machine learning in the aspect of voice makes voice intelligence become a popular technology. The automatic scoring of the oral English language repeating questions is a hot point of research in the technical field of current voice evaluation, and the oral English language repeating questions refer to that examinees listen to a section of played recording first and then repeat the section of recorded recording through one-minute arrangement according to the content heard by the examinees. The main points of manual scoring mainly focus on two aspects of scoring of translation contents and scoring of language expression, wherein a scoring technology for the accuracy of the translation contents is a key technology for successful scoring. Generally, scoring of translated contents mainly considers the number of correct translation key information points in an answer sheet of a test taker, and relates to a summary extraction technology.
In the prior art, in abstract extraction application, a text abstract extraction method based on TF-IDF is the most basic and early-time statistical-based text abstract extraction algorithm. However, the inventors found that the prior art has at least the following problems: the text abstract extraction based on the TF-IDF method does not take semantic related information into consideration, but simply and directly calculates the TF-IDF value, so that the accuracy of the abstract obtained by extraction is not high.
Disclosure of Invention
The embodiment of the invention aims to provide a voice scoring method and system based on abstract extraction, which can accurately realize the recognition of the voices of examinees and the extraction of abstract information, thereby improving the scoring accuracy of the voices of the examinees.
In order to achieve the above object, an embodiment of the present invention provides a speech scoring method based on abstract extraction, including:
obtaining a voice segment of an examinee to be scored, and segmenting the voice segment to obtain a plurality of voice sentences;
performing text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words forming the text sentence;
calculating a word vector for each of the text words;
carrying out weighted average processing on the word vector of each text word in each text sentence to obtain a sentence vector of each text sentence;
constructing a text network graph model according to the sentence vector of each text sentence; the text network graph model takes a sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge;
iterative computation is carried out by adopting a TextRank algorithm to obtain the importance score of each text sentence;
acquiring text sentences meeting preset conditions, forming abstracts of the examinee speech segments, and scoring the translated contents of the examinee speech segments; wherein the preset conditions are as follows: the importance scores of the text sentences are larger than a preset score threshold value, or the text sentences are N text sentences with the highest importance scores.
Compared with the prior art, the speech scoring method based on abstract extraction disclosed by the invention has the advantages that after the speech segments of examinees are processed to obtain the word vector of each text word, the WR algorithm is adopted to perform weighted average processing on the word vector of each text word in each text sentence to obtain the sentence vector of each text sentence, and compared with the traditional weighted summation method, more accurate sentence vectors can be obtained. Constructing a text network graph model according to the sentence vector of each text sentence, wherein the text network graph model takes the sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge; iterative computation is carried out by adopting a TextRank algorithm to obtain the importance score of each text sentence; the method comprises the steps of obtaining text sentences meeting preset conditions, forming abstracts of examinee speech segments, scoring the translated contents of the examinee speech segments, and improving a TextRank algorithm by constructing a text graph model, so that the abstract extraction effect is improved, and compared with a neural network, the method is simpler, more efficient and has no loss of effect.
As an improvement of the above scheme, the performing weighted average processing on the word vector of each text word in each text sentence to obtain a sentence vector of each text sentence specifically includes:
determining the weight of each text word according to a preset parameter factor and a set probability;
carrying out weighted average processing on the word vector of each text word in each text sentence through the following calculation formula to obtain an initial sentence vector of each text sentence:
Figure BDA0003100769060000031
where s is the number of text sentences, ω is the number of text words, vωFor the word vector, a is a preset parameter factor, and p (omega) is a set probability;
and performing dimensionality reduction on each initial sentence vector to obtain a sentence vector of each text sentence.
As an improvement of the above scheme, the dimension reduction processing method includes: singular value decomposition algorithm, principal component analysis algorithm, factor analysis algorithm or independent component analysis algorithm.
As an improvement of the scheme, the method for calculating the similarity of the text sentences is a cosine similarity calculation method or a longest common subsequence algorithm.
As an improvement of the above scheme, the similarity of the text sentences is obtained by the following calculation formula:
Figure BDA0003100769060000032
Si=(x1,x2,...,xn);
Sj=(y1,y2,...,yn);
wherein, Sim (S)i,Sj) As a text sentence SiAnd SjSimilarity of (D), SiAnd SjRepresenting different text sentences, n being the number of said text sentences, xnRepresenting constituent text sentences SiEach text word of (a); y isnRepresenting constituent text sentences SjEach text word of (a).
As an improvement of the above scheme, the TextRank algorithm specifically includes:
Figure BDA0003100769060000033
wherein WS (V)i) Is an importance score, V, of a text sentenceiVertices representing a model of a textual network graph, WijEdge, In (V), representing a model of a textual network graphi) To point to vertex ViSet of points of (c), Out (V)i) Is a vertex ViA set of pointed points; d is a preset damping coefficient.
As an improvement of the above scheme, the obtaining of the examinee speech segment to be scored and the segmentation to obtain a plurality of speech sentences specifically include:
obtaining a voice segment of an examinee to be scored;
windowing the examinee voice segment to be scored by adopting a preset window function to obtain a plurality of audio frames;
calculating the short-time average energy and the short-time average zero crossing rate of each audio frame;
and acquiring the audio frames of which the short-term average energy and the short-term average zero-crossing rate reach corresponding preset threshold values, and taking the audio frames as boundary cutting points to segment the examinee speech segments into a plurality of speech sentences.
As an improvement of the above scheme, the performing text recognition and word segmentation on each of the speech sentences to obtain each text sentence and a plurality of text words constituting the text sentence specifically includes:
performing MFCC (Mel frequency cepstrum coefficient) voice feature extraction on each voice sentence to obtain a language feature value;
inputting each language characteristic value into a BP neural network model which is trained in advance to perform text recognition, and obtaining each text sentence;
and performing word segmentation on each text sentence to obtain a plurality of text words forming the text sentences.
As an improvement of the above scheme, the calculating a word vector of each text word specifically includes:
and calculating a word vector of each text word by using a preset word2vec model.
The embodiment of the invention also provides a voice scoring system based on abstract extraction, which comprises:
the examinee voice segmentation module is used for acquiring examinee voice segments to be scored and segmenting the examinee voice segments to obtain a plurality of voice sentences;
the text word acquisition module is used for performing text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words forming the text sentence;
the word vector calculation module is used for calculating a word vector of each text word;
a sentence vector calculation module, configured to perform weighted average processing on a word vector of each text word in each text sentence to obtain a sentence vector of each text sentence;
the text network graph building module is used for building a text network graph model according to the sentence vector of each text sentence; the text network graph model takes a sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge;
the importance score calculation module is used for carrying out iterative calculation by adopting a TextRank algorithm to obtain an importance score of each text sentence;
the abstract extraction module is used for acquiring text sentences meeting preset conditions, forming an abstract of the examinee speech section and scoring the translation content of the examinee speech section; wherein the preset conditions are as follows: the importance scores of the text sentences are larger than a preset score threshold value, or the text sentences are N text sentences with the highest importance scores.
Compared with the prior art, the speech scoring method and system based on abstract extraction disclosed by the invention have the advantages that the examinee speech segment to be scored is obtained, the sentence segmentation is carried out by using double thresholds according to the characteristics of human pronunciation and the characteristics of self sentence break, and the examinee speech segment is segmented into a plurality of speech sentences; the cutting method is very simple and quick, but has good effect. And aiming at the difference of pronunciation habits of different speakers, the method proposes to establish double-threshold classification to solve the problem of difference of pronunciation habits, thereby improving the stability and accuracy of sentence segmentation. And performing text recognition on each speech sentence by adopting a BP neural network model to obtain each text sentence, changing the traditional HMM or DTW algorithm, and greatly improving the accuracy of speech recognition. Performing word segmentation processing on each text sentence to obtain a plurality of text words forming the text sentence, and calculating a word vector of each text word by using a word2vec model. And performing weighted average processing on the word vector of each text word in each text sentence by adopting a WR (weighted round robin) algorithm to obtain the sentence vector of each text sentence, wherein the sentence vector is more accurate compared with the sentence vector obtained by the traditional weighted summation method. Constructing a text network graph model according to the sentence vector of each text sentence, wherein the text network graph model takes the sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge; iterative computation is carried out by adopting a TextRank algorithm to obtain the importance score of each text sentence; the method comprises the steps of obtaining text sentences meeting preset conditions, forming abstracts of examinee speech segments, scoring the translated contents of the examinee speech segments, and improving a TextRank algorithm by constructing a text graph model, so that the abstract extraction effect is improved, and compared with a neural network, the method is simpler, more efficient and has no loss of effect.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a method for scoring a speech based on abstract extraction according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating the steps of a dual-threshold sentence segmentation method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a single neuron model in the BP neural network model according to an embodiment of the present invention;
FIG. 4 is a diagram of a BP neural network model in an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a speech scoring system based on abstract extraction according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of a speech scoring system based on abstract extraction according to a third embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic diagram illustrating steps of a speech scoring method based on abstract extraction according to an embodiment of the present invention. The voice scoring method based on abstract extraction provided by the embodiment of the invention is implemented through steps S1-S7:
and S1, obtaining the examinee voice segments to be scored, and segmenting to obtain a plurality of voice sentences.
Specifically, the examinee voice segment is an examinee voice formed by repeating the recording segment according to the recording content heard by the examinee when the examinee is speaking to the spoken English language, and can be acquired through a microphone.
Because the examinee speech segment is a piece of speech linked with an article, in order to perform accurate speech recognition by using a speech recognition technology, sentences need to be segmented by using a sentence segmentation algorithm to obtain speech segments using the sentences as units so as to facilitate subsequent processing.
In one embodiment, the examinee speech segments are subjected to speech sentence segmentation by a dual-threshold sentence segmentation method. Since the segmentation of different semantic units (paragraphs, sentences, words, etc.) is found by comparison, there will be a pause between almost every language unit, and some features of speech will change significantly. For example, at sentence boundaries, the energy characteristics of the audio are significantly reduced. Cut through the sentence, the energy characteristics of the audio are significantly higher. Different sound segments have different energies, typically the energy of the pause segments is much smaller than the average energy. An energy threshold can therefore be estimated, but one criterion is that it is not possible to segment sentences accurately, but also by means of the characteristics of the time delay. Speech reduction exists between different speech units, only with different attenuation magnitudes. In view of this feature, a mute delay threshold is used for discrimination. The silence segments among different language units of each type of audio, particularly between sentences, are analyzed, the average segment length and the shortest segment length of the silence segments are counted, and then a preset strategy can be adopted to obtain a silence delay threshold.
Referring to fig. 2, a schematic step diagram of a dual-threshold sentence segmentation method according to an embodiment of the present invention is shown.
Step S1 specifically includes steps S11 to S14:
s11, obtaining a voice segment of the examinee to be scored;
and S12, windowing the examinee voice segment to be scored by adopting a preset window function to obtain a plurality of audio frames.
In an embodiment of the invention, the audio is segmented by windowing, each segment being 10-30ms in length, called a frame, with partial overlap (frame shift) between adjacent frames. The extraction of speech features is usually performed in units of frames, based on the short-time stationarity of speech.
The preset window function includes but is not limited to: rectangular windows, hanning windows and hamming windows.
The window functions are respectively:
rectangular window:
Figure BDA0003100769060000071
hanning Window:
Figure BDA0003100769060000072
haiming window:
Figure BDA0003100769060000081
wherein N is the window length, and different window functions are selected according to different requirements in the short-time analysis process.
By adopting the technical means of the embodiment of the invention, the examinee voice segment to be scored is windowed, so that the global situation is more continuous, and the Gibbs effect is avoided. And after windowing, the speech signal which is not periodic originally presents partial characteristics of the periodic function.
And S13, calculating the short-time average energy and the short-time average zero crossing rate of each audio frame.
Short-time average energy: the energy function describes the change in the amplitude of the audio energy, which can be used to separate silence from non-silence and unvoiced and voiced, where the short-time average energy of the ith frame of speech can be expressed as:
cumulative averaging of absolute values:
Figure BDA0003100769060000082
or cumulative averaging of squares:
Figure BDA0003100769060000083
or cumulative averaging of the logarithms of the squares:
Figure BDA0003100769060000084
where i is the audio frame number, N is the number of sampled values in the audio frame, i.e., the window width, xi() Is the signal sample at the nth point in the ith frame.
It should be noted that the above three expression modes are all calculation modes of short-time average energy, and one of the calculation modes may be selected to perform subsequent threshold determination according to actual application requirements. Most scenes are judged by using short-time average energy obtained by accumulation and averaging of squares as a threshold value.
Short-time average zero crossing rate: for discrete signals, a "zero crossing" occurs when two adjacent samples have different signs. The zero crossing rate is the number of times of zero crossing of the signal in a statistical short time, so that voiced sound and unvoiced sound as well as voiced sound and mute sound in the voice can be easily distinguished by using the zero crossing rate. The short-time average zero-crossing rate can be expressed as:
Figure BDA0003100769060000091
where Sgn () is a sign function, SiRepresenting the short-time average energy of the i-th frame speech signal.
And S14, acquiring the audio frames of which the short-term average energy and the short-term average zero-crossing rate both reach corresponding preset threshold values, and taking the audio frames as boundary cutting points to segment the examinee speech segments into a plurality of speech sentences.
By analyzing the waveform of a particular audio, the amplitude of the waveform is significantly reduced at almost every pause in speech, so that the short-term average energy in the time domain can be used to capture this change. For each type of audio, a threshold called the silence energy threshold is counted, and if the energy of a frame is lower than the threshold, the frame is considered to have entered the interval of speech stop.
The boundary of the voice can be well detected by using the mute energy threshold, but the voice is not reduced only at the sentence boundary, and the phenomenon exists between different semantic units, such as between segments, between clauses, and even between words, which is obviously not required by people. By re-analyzing the waveform, it is found that there is indeed a phenomenon of waveform amplitude attenuation between different semantic units, and in addition to the amplitude of attenuation, there is also a distinct characteristic, i.e. the duration of attenuation, is different. The attenuation is most obvious and the duration is longest between segments, the attenuation is also obvious between sentences, the duration is only slightly short, the duration is shorter between clauses, and the attenuation amplitude and duration between words are not obvious, and the attenuation amplitude and duration between words are distinguished by using a mute delay threshold in view of the characteristic. The silence segments among different semantic units of each type of audio, particularly between sentences, are analyzed, the average segment length and the shortest segment length of the silence segments are counted, and then a certain strategy can be adopted to obtain a silence delay threshold. For example, by multiplying the average segment length by a coefficient smaller than 1 or directly using the shortest segment length, if the added window is non-overlapping, the selected segment length is simply divided by the window length, i.e. the mute delay threshold.
Based on this, a mute energy threshold corresponding to the short-time average energy e (i) and the short-time average zero crossing rate Z can be seti(i) A corresponding mute time domain threshold. Determining whether the short-time average energy of each audio frame is lower thanAnd if the mute energy threshold value and the short-time average zero crossing rate are lower than the mute time domain threshold value, taking the audio frame as a sentence boundary cutting point, thereby cutting the examinee speech segment into a plurality of speech sentences.
By adopting the technical means of the embodiment of the invention, the sentence segmentation is carried out by using double thresholds according to the characteristics of human pronunciation and by utilizing the characteristics of the self sentence break of the human, the stability and the accuracy rate of the sentence segmentation are improved, and the effect is good. Moreover, because the pronunciation of the voice segment of the examinee is basically standard, the voice is clear, and the noise is less, a complex model is not needed, a large amount of operations are not needed, only the time domain characteristics of the examinee and the examinee are carefully analyzed, and the double thresholds are utilized for judgment, so that the calculated amount can be effectively reduced on the basis of ensuring the sentence segmentation accuracy.
It should be understood that, in practical applications, other speech sentence segmentation methods may also be adopted to segment the speech sentences of the examinee speech segments, which do not form specific limitations of the present invention.
And S2, performing text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words forming the text sentence.
Specifically, step S2 includes steps S21 to S23:
s21, performing MFCC voice feature extraction on each voice sentence to obtain a language feature value;
s22, inputting each language characteristic value into a pre-trained BP neural network model for text recognition to obtain each text sentence;
and S23, performing word segmentation on each text sentence to obtain a plurality of text words forming the text sentence.
In the embodiment of the invention, a BP neural network model is adopted for text recognition, so that a speech sentence of a test taker is recognized and converted into a text form. The BP neural network is also called an error reverse transmission neural network, and is a network model constructed by continuously adjusting the connection weight between nodes according to a feedback value.
Referring to fig. 3-4, fig. 3 is a schematic diagram of a single neuron model in a BP neural network model according to an embodiment of the present invention; FIG. 4 is a diagram of a BP neural network model in an embodiment of the present invention. The whole system structure is divided into an input layer, a hidden layer and an output layer, wherein the hidden layer can be a one-layer structure or a multi-layer structure according to the requirements of specific situations. The more the number of hidden layers, the slower the learning speed of the neural network, and according to the Kosmogloov theorem, under the conditions of reasonable structure and proper weight, the 3-layer BP network can approximate any continuous function, so that the 3-layer BP network with a relatively simple structure can be selected.
As shown in fig. 3: yk represents the output value of neuron k at a certain moment; f is an activation function, also called a transfer function; uk represents the net input to the kth neuron, and can be found by:
Uk=Wk1*X1+Wk2*X2+...+Wkm*Xm+bk
x1, X2, … Xm represent a total of m input data; WK1, WK2, … WKm correspond to the weight of each input signal, respectively; bk is the offset value called threshold value.
The above-mentioned single neurons are connected to obtain the multi-layer neural network model shown in fig. 4, and the final output layer outputs the probability of each match.
The training process of the BP neural network model comprises the following steps: and acquiring a plurality of voice sentences and corresponding labeled text sentences in advance as a data training set. The voice parameters are extracted from the voice sentence, in the embodiment of the present invention, the MFCC features extracted from the voice parameters are a two-bit vector with indefinite row number and 24 column number, and a column matrix of numeric characters is formed by using rows × 24-num elements, that is, feature vectors of numeric characters, because rows are all different, num is 600 according to research experience, and 600 cannot be directly supplemented by 0. I.e. 600 input neurons.
The learning process consists of two processes, forward propagation of the signal and back propagation of the error. During forward propagation, input samples are transmitted from the human input layer, processed layer by the hidden layers and transmitted to the output layer. If the actual output of the output layer does not match the expected output, the error is propagated back to the error stage. The error back transmission is to transmit the output error back to the input layer through the hidden layer in some form and distribute the error to all units in each layer to obtain the error signal of each layer unit, which is the basis for correcting the weight of each unit. The weight adjustment process of each layer of signal forward propagation and error backward propagation is performed in a cycle. The process of continuously adjusting the weight value, namely the learning and training process of the network, is carried out until the error of the network output is reduced to an acceptable degree or is carried out to a preset learning frequency.
In the figure, X represents an input layer, b represents a hidden layer, y represents an output layer, and Vh1 represents the weight of the first input neuron of the input layer to the h-th neuron of the hidden layer, and Wd1 represents the weight of the d-th neuron of the hidden layer to the first neuron of the output layer.
The inputs to the first neuron of the hidden layer are:
Figure BDA0003100769060000121
where f () is the hidden layer activation function, λ i asks for the bias of the 1 st neuron of the hidden layer.
For the output layer the first output is:
Figure BDA0003100769060000122
where thetai is the bias of the ith neuron in the output layer.
For each prediction, the error is obtained by using the following formula, and the weight is continuously adjusted:
Figure BDA0003100769060000123
wherein
Figure BDA0003100769060000124
Predicted output for networkAnd then the mixture is discharged out of the furnace,
Figure BDA0003100769060000125
the expected output is the sample.
And then, according to the trained BP neural network model, performing text recognition on each voice sentence to obtain each text sentence. And performing word segmentation on each text sentence to obtain a text word forming each text sentence.
By adopting the technical means of the embodiment of the invention, the traditional HMM or DTW algorithm is changed by using the voice recognition based on the BP neural network, the resource advantages of a corpus in a laboratory are fully utilized, and the accuracy of the voice recognition is greatly improved.
It should be noted that the above scenario is only used as an example, and in practical application, a word segmentation method in the prior art may also be used, which is not specifically limited herein.
And S3, calculating a word vector of each text word.
Specifically, a word vector of each text word is calculated by using a preset word2vec model.
The word2vec model utilizes a deep learning network to model semantic relations of words and contexts of corpus data so as to obtain a low-dimensional word vector. The word vector is generally about 100-300 dimensions, and the problem of high-dimensional sparsity of a traditional vector space model can be well solved.
It should be noted that the word2vec Model includes Continuous Bag-of-WordsModel (CBOW) and Continuous Skip-gram Model (Skip-gram). Both models include an input layer, a hidden layer, and an output layer. The method for constructing and training the word2vec model can refer to the prior art, and is not described herein in detail.
And S4, carrying out weighted average processing on the word vector of each text word in each text sentence to obtain the sentence vector of each text sentence.
Specifically, step S4 includes steps S41 to S43:
s41, determining the weight of each text word according to preset parameter factors and set probabilities;
s42, performing weighted average processing on the word vector of each text word in each text sentence through the following calculation formula to obtain an initial sentence vector of each text sentence:
Figure BDA0003100769060000131
where s is the number of text sentences, ω is the number of text words, vωFor the word vector, a is a preset parameter factor, and p (omega) is a set probability;
and S43, performing dimensionality reduction on each initial sentence vector to obtain a sentence vector of each text sentence.
The traditional process of obtaining a sentence vector from a word vector usually adds the word vectors of each word in a sentence, and then averages the word vectors, which is simple but often not excellent.
In the embodiment of the invention, the WR algorithm is used as an unsupervised sentence modeling method to calculate the sentence vector. Where W denotes Weighted, meaning that each word vector in a sentence is Weighted using pre-estimated parameters. R represents Removal, meaning that irrelevant parts in the sentence vector are removed, and the sentence is subjected to dimensionality reduction.
Firstly, weighting each word vector in a sentence by using a pre-estimated parameter a and a set probability p (omega), carrying out dimension reduction processing on each initial sentence vector after weighting summation, and removing irrelevant parts in the sentence vectors to obtain the sentence vectors of each text sentence.
It should be noted that the parameter a is an empirical value, which can be set according to the actual situation, and exemplarily, a ∈ [1e-4, 1e-3 ]. The probability p (ω) is set as a word frequency estimate, that is, the probability of occurrence of the text word in the whole corpus can be obtained by pre-calculation.
The dimension reduction processing method comprises the following steps: singular Value Decomposition (SVD), Principal Component Analysis (PCA), Factor Analysis (FA), Independent Component Analysis (ICA).
In one embodiment, the PCA algorithm is used to remove irrelevant parts in the vector to finally obtain a sentence vector: setting a singular vector u according to each initial sentence vector, and executing the following steps for the initial sentence vector of each text sentence:
vs′=uuTvs
thereby obtaining a sentence vector v for each of the text sentencess′。
By adopting the technical means of the embodiment of the invention, the WR algorithm is an efficient and convenient modeling method, compared with a neural network, the time consumption is short, but the effect is exactly equivalent to that of the neural network, and the method is very efficient and convenient.
S5, constructing a text network graph model according to the sentence vector of each text sentence; the text network graph model takes the sentence vector of each text sentence as a vertex, and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge.
And S6, carrying out iterative computation by adopting a TextRank algorithm to obtain the importance score of each text sentence.
Specifically, the TextRank algorithm divides a text into a plurality of composition units (words and sentences), establishes a graph model, sorts important components in the text by using a voting mechanism, and can realize keyword extraction and abstract only by using the information of a single document. The TextRank general model can be expressed as a directed weighted graph G ═ V, E, consisting of a set of points V and a set of edges E, E being a subset of V × V.
In the embodiment of the invention, sentence vectors of text sentences are used as vertexes, similarity calculation is carried out through the obtained sentence vectors, the similarity between the sentences is used as the weight of edges between nodes of a network graph, and the importance of the sentence units is finally obtained through iterative calculation until convergence or the calculation upper limit times are reached.
Preferably, the method for calculating the similarity of the text sentences includes: cosine similarity algorithm, longest common subsequence algorithm.
In an alternative embodiment, the similarity between text sentences is calculated by using a cosine similarity algorithm, and the similarity of the text sentences is obtained by the following calculation formula:
Figure BDA0003100769060000151
Si=(x1,x2,...,xn);
Sj=(y1,y2,...,yn);
wherein, Sim (S)i,Sj) As a text sentence SiAnd SjSimilarity of (D), SiAnd SjRepresenting different text sentences, n being the number of said text sentences, xnRepresenting constituent text sentences SiEach text word of (a); y isnRepresenting constituent text sentences SjEach text word of (a).
And if the similarity between the two text sentences is greater than a given similarity threshold value, the two text sentences are considered to be semantically related and are connected, namely the weight value of the edge of the text network graph model is Sim (S)i,Sj)。
Further, the TextRank algorithm specifically includes:
Figure BDA0003100769060000152
wherein WS (V)i) Is an importance score, V, of a text sentenceiVertices representing a model of a textual network graph, WijEdge, In (V), representing a model of a textual network graphi) To point to vertex ViSet of points of (c), Out (V)i) Is a vertex ViA set of pointed points; d is a preset damping coefficient.
Illustratively, the damping coefficient d takes 0.85 for calculating convergence.
And then, iterative computation is carried out by adopting a TextRank algorithm until the result converges or the upper limit number of computation is reached, and the importance score of each text sentence can be obtained.
S7, obtaining text sentences meeting preset conditions, forming abstracts of the speech segments of the examinees, and scoring the translated contents of the speech segments of the examinees; wherein the preset conditions are as follows: the importance scores of the text sentences are larger than a preset score threshold value, or the text sentences are N text sentences with the highest importance scores.
Specifically, after the importance score of each text sentence is obtained, a set in which the text sentences are arranged in descending order according to the importance scores is obtained in descending order according to the order of scores from large to small. Extracting the first N text sentences from the set to form the abstract according to the requirement of word number or sentence number, wherein N is more than or equal to 1; or according to the requirement of the importance score, extracting text sentences which are higher than a preset score threshold value from the set to form the abstract.
And further, scoring the translation content of the speech segment of the examinee according to a preset scoring standard according to the abstract.
By adopting the technical means of the embodiment of the invention, the text is used as the image to be processed, the sentence vector obtained based on the WR algorithm and word2vec is used as the vertex, the cosine similarity between sentences is used for representing the edges between the vertices, and the text graph model is constructed to improve the TextRank algorithm, so that the abstract extraction effect is improved. Compared with a neural network, the method adopted by the embodiment of the invention is simpler and more efficient, and the effect is not lost.
As a preferred embodiment, the method further comprises steps S8 and S9:
s8, calculating the linguistic expression score of the speech segment of the examinee;
and S9, obtaining the total score of the speech segment of the examinee according to the translation content score and the speech expression score of the speech segment of the examinee.
The grading key points of the spoken English repeating questions also comprise language expression grading in the aspect of the grading of translation contents, so that the speech expression scoring is carried out on the voice segments of the examinees, and the translation content scoring and the speech expression scoring are added or weighted and added to obtain the total grading of the voice segments of the examinees.
By adopting the technical means of the embodiment of the invention, the abstract algorithm is applied to the scoring of the spoken language repeating questions, the abstract algorithm is used for extracting the key information of the voice of the examinee, the pronunciation quality scoring is carried out on the voice of the examinee, the two points are comprehensively utilized to give a final scoring to the answer of the examinee, and the scoring accuracy is improved.
The accuracy and the efficiency of the voice scoring method based on abstract extraction provided by the embodiment of the invention are tested by randomly selecting 1 test question, standard answer and answer sheet which are repeated in a short text in the oral english language examination of the Guangdong college entrance examination for comparison.
400 answer sheets are selected for testing according to the grading levels of the answer sheets, answers of 8 examinees are randomly extracted from each grade according to the high grading level, the middle grading level and the low grading level to serve as samples, data comparison is carried out on the abstract grading obtained according to the voice grading method based on abstract extraction and the grading of teachers, and the result is shown in table 1.
The title is as follows:
in summary: tom wore that sisters are short of food and overwinter, and steal rice to her, but discover that the sisters are doing the same thing.
Key words: worry (worry) harverset add pile
Strange (Strange) asleep hide (winter)
Same (Same) farm
TABLE 1 partial sample comparison of the inventive scores with the teacher scores
Figure BDA0003100769060000171
The definition of the student score rating is: the high level means that the omission of information points is less, and the normal and smooth recognition degree of the speech speed is high; the levels mean that information points are omitted, the expression is normal, and the recording content can be basically identified; the low level means that the information points are few, the language is not smooth enough, and the recording content is barely recognized. Wherein, the teacher average score is the average score obtained by scoring the recording by multiple college entrance examination paper-reading teachers.
In the 24 groups of data, the error between the teacher score and the scoring result of the invention is about 4.30 percent, and the method has good reference significance to a certain extent.
The resource level in table 1 is the answer level of the examinee, and it can be seen that the answer level of the test case and the keyword coverage after the system detection basically accord with each other. The high level of answers covers almost all of the keywords and the method of the present invention is essentially correct to identify. The corresponding low-level answers have fewer keywords, and thus the method of the present invention is not recognizable.
The 2000 results selected by the project are counted, the difference number between the manual identification result and the system identification result is compared, and the data is shown in the following table 2:
table 2 difference data results
Figure BDA0003100769060000181
As can be seen from table 2, the man-machine coincidence rate is about 84%, the man-machine difference is about 15% for one keyword, the man-machine difference is about 2% for two keywords, and there is no case of difference of 3 keywords. The consistency rate of the keyword recognition and the manual keyword recognition of the system reaches more than 80 percent, and the data result is integrated to show that the method can finish the summarization work of the voice test paper to a certain degree.
The embodiment one of the invention provides a speech scoring method based on abstract extraction, which comprises the steps of obtaining a speech segment of an examinee to be scored, segmenting sentences by using double thresholds according to the characteristics of human pronunciation and the characteristics of self sentence break of a human, and segmenting the speech segment of the examinee into a plurality of speech sentences; the cutting method is very simple and quick, but has good effect. And aiming at the difference of pronunciation habits of different speakers, the method proposes to establish double-threshold classification to solve the problem of difference of pronunciation habits, thereby improving the stability and accuracy of sentence segmentation. And performing text recognition on each speech sentence by adopting a BP neural network model to obtain each text sentence, changing the traditional HMM or DTW algorithm, and greatly improving the accuracy of speech recognition. Performing word segmentation processing on each text sentence to obtain a plurality of text words forming the text sentence, and calculating a word vector of each text word by using a word2vec model. And performing weighted average processing on the word vector of each text word in each text sentence by adopting a WR (weighted round robin) algorithm to obtain the sentence vector of each text sentence, wherein the sentence vector is more accurate compared with the sentence vector obtained by the traditional weighted summation method. Constructing a text network graph model according to the sentence vector of each text sentence, wherein the text network graph model takes the sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge; iterative computation is carried out by adopting a TextRank algorithm to obtain the importance score of each text sentence; the method comprises the steps of obtaining text sentences meeting preset conditions, forming abstracts of examinee speech segments, scoring the translated contents of the examinee speech segments, and improving a TextRank algorithm by constructing a text graph model, so that the abstract extraction effect is improved, and compared with a neural network, the method is simpler, more efficient and has no loss of effect.
Fig. 5 is a schematic structural diagram of a speech scoring system based on abstract extraction according to a second embodiment of the present invention. The embodiment of the present invention provides a speech scoring system 20 based on abstract extraction, which includes: the system comprises an examinee voice segmentation module 21, a text word acquisition module 22, a word vector calculation module 23, a sentence vector calculation module 24, a text network diagram construction module 25, an importance score calculation module 26 and a summary extraction module 27; wherein,
the examinee voice segmentation module 21 is used for acquiring examinee voice segments to be scored and segmenting the examinee voice segments to obtain a plurality of voice sentences;
the text word obtaining module 22 is configured to perform text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words constituting the text sentence;
the word vector calculation module 23 is configured to calculate a word vector for each text word;
the sentence vector calculation module 24 is configured to perform weighted average processing on a word vector of each text word in each text sentence to obtain a sentence vector of each text sentence;
the text network diagram building module 25 is configured to build a text network diagram model according to the sentence vector of each text sentence; the text network graph model takes a sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge;
the importance score calculating module 26 is configured to perform iterative calculation by using a TextRank algorithm to obtain an importance score of each text sentence;
the abstract extracting module 27 is configured to acquire text sentences meeting preset conditions, form an abstract of the examinee speech segment, and score translation contents of the examinee speech segment; wherein the preset conditions are as follows: the importance scores of the text sentences are larger than a preset score threshold value, or the text sentences are N text sentences with the highest importance scores.
It should be noted that, the speech scoring system based on abstract extraction provided in the embodiment of the present invention is used for executing all the process steps of the speech scoring method based on abstract extraction in the above embodiment, and the working principles and beneficial effects of the two are in one-to-one correspondence, so that details are not repeated.
The second embodiment of the invention provides a speech scoring system based on abstract extraction, which is used for acquiring a speech segment of an examinee to be scored, segmenting sentences by using double thresholds according to the characteristics of human pronunciation and the characteristics of self sentence break of a human, and segmenting the speech segment of the examinee into a plurality of speech sentences; the cutting method is very simple and quick, but has good effect. And aiming at the difference of pronunciation habits of different speakers, the method proposes to establish double-threshold classification to solve the problem of difference of pronunciation habits, thereby improving the stability and accuracy of sentence segmentation. And performing text recognition on each speech sentence by adopting a BP neural network model to obtain each text sentence, changing the traditional HMM or DTW algorithm, and greatly improving the accuracy of speech recognition. Performing word segmentation processing on each text sentence to obtain a plurality of text words forming the text sentence, and calculating a word vector of each text word by using a word2vec model. And performing weighted average processing on the word vector of each text word in each text sentence by adopting a WR (weighted round robin) algorithm to obtain the sentence vector of each text sentence, wherein the sentence vector is more accurate compared with the sentence vector obtained by the traditional weighted summation method. Constructing a text network graph model according to the sentence vector of each text sentence, wherein the text network graph model takes the sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge; iterative computation is carried out by adopting a TextRank algorithm to obtain the importance score of each text sentence; the method comprises the steps of obtaining text sentences meeting preset conditions, forming abstracts of examinee speech segments, scoring the translated contents of the examinee speech segments, and improving a TextRank algorithm by constructing a text graph model, so that the abstract extraction effect is improved, and compared with a neural network, the method is simpler, more efficient and has no loss of effect.
Fig. 6 is a schematic structural diagram of a speech scoring system based on abstract extraction according to a third embodiment of the present invention. The embodiment of the present invention further provides a speech scoring system 30 based on abstract extraction, which includes a processor 31, a memory 32, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the speech scoring method based on abstract extraction as provided in the first embodiment is implemented.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A speech scoring method based on abstract extraction is characterized by comprising the following steps:
obtaining a voice segment of an examinee to be scored, and segmenting the voice segment to obtain a plurality of voice sentences;
performing text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words forming the text sentence;
calculating a word vector for each of the text words;
carrying out weighted average processing on the word vector of each text word in each text sentence to obtain a sentence vector of each text sentence;
constructing a text network graph model according to the sentence vector of each text sentence; the text network graph model takes a sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge;
iterative computation is carried out by adopting a TextRank algorithm to obtain the importance score of each text sentence;
acquiring text sentences meeting preset conditions, forming abstracts of the examinee speech segments, and scoring the translated contents of the examinee speech segments; wherein the preset conditions are as follows: the importance scores of the text sentences are larger than a preset score threshold value, or the text sentences are N text sentences with the highest importance scores.
2. The method according to claim 1, wherein the weighted average processing of the word vector of each text word in each text sentence to obtain the sentence vector of each text sentence comprises:
determining the weight of each text word according to a preset parameter factor and a set probability;
carrying out weighted average processing on the word vector of each text word in each text sentence through the following calculation formula to obtain an initial sentence vector of each text sentence:
Figure FDA0003100769050000011
where s is the number of text sentences, ω is the number of text words, vωFor the word vector, a is a preset parameter factor, and p (omega) is a set probability;
and performing dimensionality reduction on each initial sentence vector to obtain a sentence vector of each text sentence.
3. The method for scoring a speech based on abstract extraction as claimed in claim 2, wherein the dimension reduction processing method comprises: singular value decomposition algorithm, principal component analysis algorithm, factor analysis algorithm or independent component analysis algorithm.
4. The method of claim 1, wherein the similarity of the text sentences is calculated by a cosine similarity algorithm or a longest common subsequence algorithm.
5. The speech scoring method based on abstract extraction as recited in claim 1, wherein the similarity of the text sentences is obtained by the following calculation formula:
Figure FDA0003100769050000021
Si=(x1,x2,…,xn);
Sj=(y1,y2,…,yn);
wherein, Sim (S)i,Sj) As a text sentence SiAnd SjSimilarity of (D), SiAnd SjRepresenting different text sentences, n being the number of said text sentences, xnRepresenting constituent text sentences SiEach text word of (a); y isnRepresenting constituent text sentences SjEach text word of (a).
6. The method for scoring a speech based on abstract extraction as claimed in claim 1, wherein the TextRank algorithm is specifically:
Figure FDA0003100769050000022
wherein WS (V)i) Is an importance score, V, of a text sentenceiVertices representing a model of a textual network graph, WijEdge, In (V), representing a model of a textual network graphi) To point to vertex ViSet of points of (c), Out (V)i) Is a vertex ViA set of pointed points; d is a preset damping coefficient.
7. The speech scoring method based on abstract extraction as claimed in claim 1, wherein the obtaining of the examinee speech segments to be scored and the segmentation into a plurality of speech sentences specifically comprises:
obtaining a voice segment of an examinee to be scored;
windowing the examinee voice segment to be scored by adopting a preset window function to obtain a plurality of audio frames;
calculating the short-time average energy and the short-time average zero crossing rate of each audio frame;
and acquiring the audio frames of which the short-term average energy and the short-term average zero-crossing rate reach corresponding preset threshold values, and taking the audio frames as boundary cutting points to segment the examinee speech segments into a plurality of speech sentences.
8. The method according to claim 1, wherein the performing text recognition and word segmentation on each of the speech sentences to obtain each text sentence and a plurality of text words constituting the text sentence comprises:
performing MFCC (Mel frequency cepstrum coefficient) voice feature extraction on each voice sentence to obtain a language feature value;
inputting each language characteristic value into a BP neural network model which is trained in advance to perform text recognition, and obtaining each text sentence;
and performing word segmentation on each text sentence to obtain a plurality of text words forming the text sentences.
9. The method according to claim 1, wherein the calculating of the word vector for each text word comprises:
and calculating a word vector of each text word by using a preset word2vec model.
10. A speech scoring system based on abstract extraction, comprising:
the examinee voice segmentation module is used for acquiring examinee voice segments to be scored and segmenting the examinee voice segments to obtain a plurality of voice sentences;
the text word acquisition module is used for performing text recognition and word segmentation on each voice sentence to obtain each text sentence and a plurality of text words forming the text sentence;
the word vector calculation module is used for calculating a word vector of each text word;
a sentence vector calculation module, configured to perform weighted average processing on a word vector of each text word in each text sentence to obtain a sentence vector of each text sentence;
the text network graph building module is used for building a text network graph model according to the sentence vector of each text sentence; the text network graph model takes a sentence vector of each text sentence as a vertex and takes the similarity of the text sentences larger than a preset similarity threshold value as an edge;
the importance score calculation module is used for carrying out iterative calculation by adopting a TextRank algorithm to obtain an importance score of each text sentence;
the abstract extraction module is used for acquiring text sentences meeting preset conditions, forming an abstract of the examinee speech section and scoring the translation content of the examinee speech section; wherein the preset conditions are as follows: the importance scores of the text sentences are larger than a preset score threshold value, or the text sentences are N text sentences with the highest importance scores.
CN202110625268.4A 2021-06-04 2021-06-04 Voice scoring method and system based on abstract extraction Pending CN113470651A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625268.4A CN113470651A (en) 2021-06-04 2021-06-04 Voice scoring method and system based on abstract extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625268.4A CN113470651A (en) 2021-06-04 2021-06-04 Voice scoring method and system based on abstract extraction

Publications (1)

Publication Number Publication Date
CN113470651A true CN113470651A (en) 2021-10-01

Family

ID=77872266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625268.4A Pending CN113470651A (en) 2021-06-04 2021-06-04 Voice scoring method and system based on abstract extraction

Country Status (1)

Country Link
CN (1) CN113470651A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741831A (en) * 2016-01-27 2016-07-06 广东外语外贸大学 Spoken language evaluation method based on grammatical analysis and spoken language evaluation system
CN110110326A (en) * 2019-04-25 2019-08-09 西安交通大学 A kind of text cutting method based on subject information
CN110347787A (en) * 2019-06-12 2019-10-18 平安科技(深圳)有限公司 A kind of interview method, apparatus and terminal device based on AI secondary surface examination hall scape
CN111027331A (en) * 2019-12-05 2020-04-17 百度在线网络技术(北京)有限公司 Method and apparatus for evaluating translation quality
CN111125349A (en) * 2019-12-17 2020-05-08 辽宁大学 Graph model text abstract generation method based on word frequency and semantics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741831A (en) * 2016-01-27 2016-07-06 广东外语外贸大学 Spoken language evaluation method based on grammatical analysis and spoken language evaluation system
CN110110326A (en) * 2019-04-25 2019-08-09 西安交通大学 A kind of text cutting method based on subject information
CN110347787A (en) * 2019-06-12 2019-10-18 平安科技(深圳)有限公司 A kind of interview method, apparatus and terminal device based on AI secondary surface examination hall scape
CN111027331A (en) * 2019-12-05 2020-04-17 百度在线网络技术(北京)有限公司 Method and apparatus for evaluating translation quality
CN111125349A (en) * 2019-12-17 2020-05-08 辽宁大学 Graph model text abstract generation method based on word frequency and semantics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
中国人工智能学会: "《中国人工智能进展 2007》", 北京邮电大学出版社, pages: 125 - 127 *
李维: "基于改进TextRank的藏文抽取式摘要生成", 《中文信息学报》, pages 36 - 43 *

Similar Documents

Publication Publication Date Title
Venkataramanan et al. Emotion recognition from speech
CN110400579B (en) Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network
CN107221318B (en) English spoken language pronunciation scoring method and system
US5621857A (en) Method and system for identifying and recognizing speech
CN108428382A (en) It is a kind of spoken to repeat methods of marking and system
Ghai et al. Emotion recognition on speech signals using machine learning
CN112382310B (en) Human voice audio recording method and device
CN114203177A (en) Intelligent voice question-answering method and system based on deep learning and emotion recognition
CN109658918A (en) A kind of intelligence Oral English Practice repetition topic methods of marking and system
Maqsood et al. An efficientmis pronunciation detection system using discriminative acoustic phonetic features for arabic consonants.
Sheikh et al. Advancing stuttering detection via data augmentation, class-balanced loss and multi-contextual deep learning
Elbarougy Speech emotion recognition based on voiced emotion unit
Brena et al. Automated evaluation of foreign language speaking performance with machine learning
Calık et al. An ensemble-based framework for mispronunciation detection of Arabic phonemes
Zhu et al. Study on speech emotion recognition system in E-learning
Ferragne et al. Automatic dialect identification: A study of British English
Lin et al. A Noise Robust Method for Word-Level Pronunciation Assessment.
Yousfi et al. Isolated Iqlab checking rules based on speech recognition system
Hanifa et al. Comparative analysis on different cepstral features for speaker identification recognition
CN113470651A (en) Voice scoring method and system based on abstract extraction
Andra et al. Contextual keyword spotting in lecture video with deep convolutional neural network
Lleida et al. Speaker and language recognition and characterization: introduction to the CSL special issue
Suzuki et al. Automatic evaluation system of English prosody based on word importance factor
İleri et al. Comparison of different normalization techniques on speakers’ gender detection
Sen Voice activity detector for device with small processor and memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211001