CN111105813B - Reading scoring method, device, equipment and readable storage medium - Google Patents

Reading scoring method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN111105813B
CN111105813B CN201911424069.6A CN201911424069A CN111105813B CN 111105813 B CN111105813 B CN 111105813B CN 201911424069 A CN201911424069 A CN 201911424069A CN 111105813 B CN111105813 B CN 111105813B
Authority
CN
China
Prior art keywords
voice
scorer
evaluated
scoring
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911424069.6A
Other languages
Chinese (zh)
Other versions
CN111105813A (en
Inventor
吴奎
竺博
杨康
朱群
江勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201911424069.6A priority Critical patent/CN111105813B/en
Publication of CN111105813A publication Critical patent/CN111105813A/en
Application granted granted Critical
Publication of CN111105813B publication Critical patent/CN111105813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Educational Technology (AREA)
  • Development Economics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

The embodiment of the application discloses a reading scoring method, a reading scoring device, equipment and a readable storage medium, wherein the initial scoring of the speech to be evaluated corresponding to each scorer is determined according to the speech features of the speech to be evaluated and the scoring scale features of at least one scorer; and determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker. The reading scoring method provided by the embodiment of the application considers the influence of the scoring scale of the scorer on the scoring besides the voice characteristics of the voice, so that the determined scoring of the voice to be evaluated is the scoring corresponding to the scoring scale of the scorer, and the scoring accuracy of the voice to be evaluated is improved.

Description

Reading scoring method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a reading scoring method, apparatus, device, and readable storage medium.
Background
With the continuous importance of society on education and the rapid development of artificial intelligence technology, the intelligent voice evaluation technology plays an increasingly important role, can effectively relieve the pressure of manual scoring and reduce the cost of manual scoring. The reading scoring is an important component in the field of voice evaluation, is common in oral examinations and oral learning of Chinese, English and other languages, and requires an examinee to read the pronunciation according to a given text, and the machine gives the scoring of the pronunciation quality according to the test voice.
At present, a more applied scoring scheme is end-to-end scoring based on deep learning, and the scoring method is to extract acoustic features from audio, input the acoustic features into a pre-trained neural network model, and obtain an evaluation score output by the neural network model. However, the inventor of the present application studies and finds that the current end-to-end scoring scheme only scores based on acoustic features, and the scoring accuracy is low.
Disclosure of Invention
In view of this, the present application provides a reading scoring method, device, apparatus and readable storage medium, so as to improve the accuracy of reading scoring.
In order to achieve the above object, the following solutions are proposed:
a reading scoring method comprising:
acquiring voice characteristics of a voice to be evaluated and scoring scale characteristics of at least one scorer;
determining an initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer;
and determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker.
In the above method, preferably, the determining an initial score of the speech to be evaluated corresponding to each scorer according to the speech feature and the scoring scale feature of the at least one scorer includes:
fusing each scoring scale feature with the voice feature respectively to obtain a fused feature corresponding to each scorer;
and corresponding to each marker, determining the initial score of the speech to be evaluated corresponding to the marker by using the fusion characteristics corresponding to the marker.
In the above method, preferably, the speech feature of the speech to be evaluated is a speech feature of each speech frame of the speech to be evaluated; the step of fusing each scoring scale feature with the voice feature to obtain a fused feature corresponding to each scorer comprises the following steps:
corresponding to each marker, splicing the marking scale characteristics of the marker with the voice characteristics of each voice frame respectively to obtain the splicing characteristics corresponding to each voice frame;
and fusing the splicing features corresponding to the same grader to obtain the fusion features corresponding to each grader.
In the above method, preferably, the speech feature of the speech to be evaluated is a speech feature of each speech frame of the speech to be evaluated; the step of fusing each scoring scale feature with the voice feature to obtain a fused feature corresponding to each scorer comprises the following steps:
fusing the voice characteristics of each voice frame to obtain initial fusion characteristics;
and respectively splicing each scoring scale characteristic with the initial fusion characteristic to obtain the fusion characteristic corresponding to each scorer.
Preferably, the method for fusing the speech features of the speech frames includes:
acquiring hidden layer characteristics of each voice frame according to the voice characteristics of each voice frame;
calculating the mean value of the hidden layer characteristics of each voice frame to obtain the initial fusion characteristics; alternatively, the first and second electrodes may be,
and calculating the weight of the hidden layer characteristic of each voice frame according to the hidden layer characteristic of each voice frame, and weighting and summing the hidden layer characteristics of each voice frame according to the weight of the hidden layer characteristic of each voice frame to obtain the initial fusion characteristic.
In the method, preferably, the obtaining of the speech feature of the speech to be evaluated includes:
acquiring at least two voice characteristics of the voice to be evaluated;
and splicing the at least two voice characteristics to obtain the voice characteristics of the voice to be evaluated.
Preferably, the determining the score of the speech to be evaluated according to the initial score of the speech to be evaluated corresponding to each scorer includes:
if the scoring scale feature of only one scorer is obtained, determining the initial score of the voice to be evaluated corresponding to the scorer as the score of the voice to be evaluated;
and if the scoring scale features of at least two scorers are obtained, determining the average value of the initial scores of the speech to be evaluated, which correspond to the at least two scorers, as the score of the speech to be evaluated.
The method preferably obtains a fusion feature corresponding to each scorer; the process of determining the initial score of the speech to be evaluated corresponding to each scorer by using the fusion features corresponding to the scorer comprises the following steps:
inputting the voice characteristics and the grading scale characteristics of the at least one grader into a reading grading model to obtain an initial grade of the voice to be evaluated, which is output by the reading grading model and corresponds to each grader;
the reading scoring model has: fusing each scoring scale feature with the voice feature respectively to obtain a fused feature corresponding to each scorer; and corresponding to each marker, determining the initial scoring capability of the speech to be evaluated corresponding to the marker by using the fusion characteristics corresponding to the marker.
The method preferably, the reading scoring model is trained as follows:
acquiring voice characteristics of sample voice and scoring scale characteristics of a scorer associated with the sample voice;
inputting the voice characteristics of the sample voice and the grading scale characteristics of the grader associated with the sample voice into the reading grading model to obtain an initial grade of the sample voice, which is output by the reading grading model and corresponds to the grader associated with the sample voice;
and updating the parameters of the reading scoring model by taking the initial scoring of the sample voice as a target to approach the scoring given by a grader associated with the sample voice.
A reading scoring device, comprising:
the acquisition module is used for acquiring the voice characteristics of the voice to be evaluated and the grading scale characteristics of at least one grader;
the initial score determining module is used for determining the initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer;
and the score determining module is used for determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each scorer.
The above apparatus, preferably, the initial score determining module includes:
the fusion module is used for fusing each scoring scale characteristic with the voice characteristic respectively to obtain a fusion characteristic corresponding to each scorer;
and the determining module is used for corresponding to each scorer and determining the initial score of the to-be-evaluated voice corresponding to the scorer by using the fusion characteristics corresponding to the scorer.
The above apparatus, preferably, the fusion module includes:
the first splicing module is used for splicing the scoring scale characteristics of each scorer with the voice characteristics of each voice frame corresponding to each scorer to obtain the splicing characteristics corresponding to each voice frame;
and the first fusion module is used for fusing the splicing characteristics corresponding to the same grader to obtain the fusion characteristics corresponding to each grader.
The above apparatus, preferably, the fusion module includes:
the second fusion module is used for fusing the voice characteristics of each voice frame to obtain initial fusion characteristics;
and the second splicing module is used for splicing each scoring scale characteristic with the initial fusion characteristic respectively to obtain the fusion characteristic corresponding to each scorer.
The above apparatus, preferably, the second fusion module includes:
the hidden layer characteristic acquisition module is used for acquiring the hidden layer characteristics of each voice frame according to the voice characteristics of each voice frame;
the mean value module is used for calculating the mean value of the hidden layer characteristics of each voice frame to obtain the initial fusion characteristics; alternatively, the first and second electrodes may be,
and the attention module is used for calculating the weight of the hidden layer characteristic of each voice frame according to the hidden layer characteristic of each voice frame, and weighting and summing the hidden layer characteristics of each voice frame according to the weight of the hidden layer characteristic of each voice frame to obtain the initial fusion characteristic.
Preferably, in the above apparatus, the obtaining module is specifically configured to:
acquiring at least two voice characteristics of the voice to be evaluated;
and splicing the at least two voice characteristics to obtain the voice characteristics of the voice to be evaluated.
The above apparatus, preferably, the score determining module is specifically configured to:
if the scoring scale feature of only one scorer is obtained, determining the initial score of the voice to be evaluated corresponding to the scorer as the score of the voice to be evaluated;
and if the scoring scale features of at least two scorers are obtained, determining the average value of the initial scores of the speech to be evaluated, which correspond to the at least two scorers, as the score of the speech to be evaluated.
The above apparatus, preferably, the initial score determining module 72 is specifically configured to:
inputting the voice characteristics and the grading scale characteristics of the at least one grader into a reading grading model to obtain an initial grade of the voice to be evaluated, which is output by the reading grading model and corresponds to each grader;
the reading scoring model has: fusing each scoring scale feature with the voice feature respectively to obtain a fused feature corresponding to each scorer; and corresponding to each scorer, determining the initial scoring capability of the voice to be evaluated corresponding to the scorer by using the fusion characteristics corresponding to the scorer.
A reading scoring device, comprising: comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the reading scoring method described in any one of the above.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the speakable rating method of any of the above.
According to the technical scheme, the reading scoring method, the reading scoring device, the reading scoring equipment and the readable storage medium provided by the embodiment of the application determine the initial score of the speech to be evaluated corresponding to each scorer according to the speech features of the speech to be evaluated and the scoring scale features of at least one scorer; and determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker. The reading scoring method provided by the embodiment of the application considers the influence of the scoring scale of the scorer on the scoring besides the voice characteristics of the voice, so that the determined scoring of the voice to be evaluated is the scoring corresponding to the scoring scale of the scorer, and the scoring accuracy of the voice to be evaluated is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of a reading scoring method disclosed in an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of determining an initial score of a speech to be evaluated corresponding to each scorer according to speech features and scoring scale features of at least one scorer, according to an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating an implementation of fusing each scoring scale feature with a speech feature to obtain a fused feature corresponding to each scorer according to the embodiment of the present application;
fig. 4 is a flowchart of an implementation of fusing each scoring scale feature with a speech feature to obtain a fused feature corresponding to each scorer, according to the embodiment of the present application;
FIG. 5 is a flowchart of an implementation of obtaining speech characteristics of a speech to be evaluated according to an embodiment of the present application;
FIG. 6 is a diagram illustrating an exemplary prediction of an initial score by the speakable scoring model disclosed in the embodiments of the disclosure;
fig. 7 is a schematic structural diagram of a reading scoring apparatus disclosed in the embodiment of the present application;
fig. 8 is a block diagram of a hardware structure of the reading scoring device disclosed in the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to overcome the problem of low scoring accuracy of the existing end-to-end scoring scheme, the basic idea of the scheme of the application is as follows: and scoring the voice to be evaluated based on the scoring scale of the scorer, so that the determined score of the voice to be evaluated is the score corresponding to the scoring scale of the scorer, and the scoring accuracy of the voice to be evaluated is improved.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a reading scoring method according to an embodiment of the present application, where the method may include:
step S11: and acquiring the voice feature of the voice to be evaluated and the scoring scale feature of at least one scorer.
The speech to be evaluated can be reading speech data sent by a user reading text received by the audio acquisition equipment, and the audio acquisition equipment can be directional microphones, smart phones, home computers and other equipment containing microphones.
In the embodiment of the application, when the voice to be evaluated is evaluated, besides the voice feature of the voice to be evaluated, the scoring scale feature of at least one scorer is also obtained. The scoring scale of the at least one scorer may be at least one of pre-set scoring scales of several scorers. The scoring scale for at least one scorer may be determined by randomly selecting among the scoring scales for the number of scorers.
After the scoring scale of at least one (marked as N, N is a positive integer greater than or equal to 1) scorer is obtained, feature extraction is respectively carried out on the scoring scales of the N scorers, and scoring scale features of the N scorers are obtained.
Optionally, the scoring scale of the scorer may be a scorer identifier, and the corresponding scoring scale feature may be an embedded feature of the scorer identifier.
Optionally, the scoring scale of the scorers may be scoring preference information, which may be information representing the degree of severity of scoring of the scorers, or may be attention emphasis information of the scorers, for example, some scorers compare continuity of attention voices, some scorers compare volume of attention, and so on.
The grading scales of the graders are different, and the modes for extracting the grading scale features can be the same or different. In an alternative embodiment, the manner of extracting the scoring scale features may be: and extracting embedded features of the grading scale through a word embedding technology (embedding technology).
Step S12: and determining the initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of at least one scorer.
In the embodiment of the application, corresponding to each scorer, based on the voice features and the scoring scale features of the scorer, the initial score of the voice to be evaluated corresponding to the scorer can be determined.
Step S13: and determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker.
And after the initial scores of the voices to be evaluated corresponding to the scorers are obtained, fusing the initial scores of the voices to be evaluated corresponding to the scorers to obtain the final scores of the voices to be evaluated.
The reading scoring method provided by the embodiment of the application considers the influence of the scoring scale of the grader on the scoring besides the voice characteristics of the voice, so that the determined scoring of the voice to be evaluated is the scoring corresponding to the scoring scale of the grader, and the scoring accuracy of the voice to be evaluated is improved.
In an optional embodiment, an implementation flowchart of the determining an initial score of a speech to be evaluated according to speech features and scoring scale features of at least one scorer is shown in fig. 2, and may include:
step S21: and respectively fusing each scoring scale characteristic with the voice characteristic to obtain a fused characteristic corresponding to each scorer.
That is, the fusion feature corresponding to each scorer is obtained by fusing the voice feature with the scoring scale feature of the scorer.
Step S22: and corresponding to each scorer, determining the initial score of the voice to be evaluated corresponding to the scorer by using the fusion characteristics corresponding to the scorer.
Because the fusion feature of each scorer is only related to the scoring scale of the scorer, the initial score of the speech to be evaluated corresponding to the scorer is the score on the scoring scale space of the scorer, that is, the initial score of the speech to be evaluated corresponding to the scorer is a clear scoring scale rather than a fuzzy and uncertain scoring scale, so that the speech to be evaluated can be scored more accurately based on the scheme of the application.
The obtaining of the speech characteristics of the speech to be evaluated may be obtaining of speech characteristics of each speech frame of the speech to be evaluated. On the basis of this, the method is suitable for the production,
in an alternative embodiment, an implementation flowchart of the above-mentioned fusing the scoring scale features with the speech features respectively to obtain fused features corresponding to each scorer is shown in fig. 3, and may include:
step S31: and corresponding to each marker, splicing the marking scale characteristics of the marker with the voice characteristics of each voice frame respectively to obtain the splicing characteristics corresponding to each voice frame.
In the embodiment of the application, when it is necessary to obtain the fusion feature corresponding to any scorer (for convenience of description, recorded as scorer S), the scoring scale feature (for convenience of description, recorded as scoring scale feature ts) of the scorer S may be respectively spliced with the voice feature of each voice frame, that is, the scoring scale feature ts is spliced on each voice frame.
Step S32: and fusing the splicing features corresponding to the same grader to obtain the fusion features corresponding to each grader.
Specifically, for any scorer S, each splicing feature corresponding to the scorer S may be encoded to obtain a hidden layer feature of each splicing feature corresponding to the scorer S, and a fusion feature corresponding to the scorer S is obtained according to the hidden layer feature of each splicing feature corresponding to the scorer S, which may specifically have two implementation manners:
in a first mode
And calculating the mean value of the hidden layer characteristics of all the splicing characteristics corresponding to the grader S to obtain the fusion characteristics corresponding to the grader S.
Mode two
And calculating the weight of the hidden layer feature for the hidden layer feature of each splicing feature corresponding to the scorer S, and weighting and summing the hidden layer features of each splicing feature corresponding to the scorer S according to the weight of each hidden layer feature to obtain the fusion feature corresponding to the scorer S.
In the process of implementing the present application, the inventor finds that, in the embodiment shown in fig. 3, score scale features are spliced behind the speech features of each speech frame, the calculation amount is large, and in order to reduce the calculation amount, an implementation flowchart provided in the embodiment of the present application for fusing each score scale feature with the speech features respectively to obtain a fusion feature corresponding to each scorer is shown in fig. 4, and may include:
step S41: and fusing the voice characteristics of each voice frame to obtain initial fusion characteristics.
Optionally, for any scorer S, the speech features of each speech frame corresponding to the scorer S may be encoded to obtain hidden layer features of each speech frame corresponding to the scorer S, and the initial fusion features corresponding to the scorer S are obtained according to the hidden layer features of each speech frame corresponding to the scorer S, which may specifically be implemented in two ways:
in a first mode
And calculating the mean value of the hidden layer characteristics of each voice frame corresponding to the grader S to obtain the initial fusion characteristics corresponding to the grader S.
Mode two
And calculating the weight of the hidden layer feature for the hidden layer feature of each voice frame corresponding to the scorer S, and weighting and summing the hidden layer features of each voice frame corresponding to the scorer S according to the weight of each hidden layer feature to obtain the initial fusion feature corresponding to the scorer S.
Step S42: and splicing each scoring scale characteristic with the initial fusion characteristic respectively to obtain the fusion characteristic corresponding to each scorer.
In this embodiment, the scoring scale feature of each scorer is only spliced with the initial fusion feature, and compared with the embodiment shown in fig. 3 in which the scoring scale of each scorer is spliced with the speech feature of each speech frame, the data amount in calculating the hidden layer feature is greatly reduced.
In an optional embodiment, an implementation flowchart of the obtaining of the speech feature of the speech to be evaluated is shown in fig. 5, and may include:
step S51: at least two voice characteristics of the voice to be evaluated are obtained.
Optionally, for any speech frame of the speech to be evaluated, at least two speech features of the speech frame are obtained. The at least two speech features may include, but are not limited to, at least two of the following: acoustic features, hidden layer features and phoneme embedding features.
The acoustic feature may be a Log Filter Bank Energy (Log Filter Bank Energy) feature, a mel-frequency cepstrum coefficient (MFCC) feature, or a Perceptual Linear Prediction (PLP) feature.
The hidden layer characteristics can be obtained by inputting pre-trained acoustic models by voice data, and the acoustic models can be any one or any combination of the following network structures: deep Neural Networks (DNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), Convolutional Neural Networks (CNN), and the like.
The phoneme embedding characteristic can convert a phoneme sequence corresponding to the voice frame into a phoneme embedding characteristic sequence through an embedding technology, and the phoneme embedding characteristic sequence is used as a phoneme embedding characteristic of the voice frame.
Step S52: and splicing the at least two voice characteristics to obtain the voice characteristics of the voice to be evaluated.
Specifically, when splicing, corresponding to each voice frame, at least two voice features corresponding to the voice frame are spliced together to obtain a splicing feature corresponding to the voice frame, and the splicing feature is used as the voice feature of the voice frame. For example, the acoustic feature, the hidden layer feature, and the phoneme embedding feature may be specifically spliced together to obtain a splicing feature corresponding to the speech frame, which is used as the speech feature of the speech frame. Or splicing the acoustic features and the hidden layer features together to obtain the splicing features corresponding to the voice frame as the voice features of the voice frame.
In an optional embodiment, one implementation manner of determining the score of the speech to be evaluated according to the initial score of the speech to be evaluated corresponding to each scorer may be:
and if the scoring scale feature of only one scorer is obtained, determining the initial score of the voice to be evaluated corresponding to the scorer as the score of the voice to be evaluated.
And if the scoring scale features of at least two scorers are obtained, determining the average value of the initial scores of the speech to be evaluated corresponding to the at least two scorers as the score of the speech to be evaluated.
By averaging the initial scores of the voices to be evaluated corresponding to the multiple scorers, a more objective and stable scoring result can be obtained.
In addition, when the number of the scorers is larger than the threshold, for example, 10, when calculating the initial score average of a plurality of scorers, one highest score and one lowest score may be removed first, and then the average of the remaining initial average scores may be calculated as the score of the speech to be evaluated.
In an optional embodiment, the reading-aloud scoring method provided in the embodiment of the present application may be implemented based on a reading-aloud scoring model, and specifically, the fusion features corresponding to each scorer are obtained; the process of determining the initial score of the speech to be evaluated corresponding to each scorer by using the fusion features corresponding to the scorer may include:
inputting the voice characteristics and the grading scale characteristics of at least one grader into a reading grading model to obtain an initial grade of the voice to be evaluated, which is output by the reading grading model and corresponds to each grader;
the reading scoring model can be used for fusing each scoring scale feature with the voice feature respectively to obtain a fusion feature corresponding to each scorer; and corresponding to each marker, determining the initial score of the voice to be evaluated corresponding to the marker by using the fusion characteristics corresponding to the marker.
In an optional embodiment, the above process of inputting the voice features and the scoring scale features of at least one scorer into the reading scoring model, and obtaining the initial score of the speech to be evaluated output by the reading scoring model and corresponding to each scorer may include:
fusing each scoring scale feature with the voice feature through a fusion feature extraction module of the reading scoring model to obtain a fusion feature corresponding to each scorer;
and determining the initial score of the voice to be evaluated corresponding to each scorer by utilizing the fusion characteristics corresponding to the scorer through an evaluation module of the reading scoring model.
In an optional embodiment, the fusing the voice features with the scoring scale features by the fused feature extracting module of the reading scoring model to obtain a fused feature corresponding to each scorer may include:
corresponding to each marker, splicing the scoring scale characteristics of the marker with the voice characteristics of each voice frame through a first splicing module of the reading scoring model to obtain the splicing characteristics corresponding to each voice frame;
and fusing the splicing features corresponding to the same grader through a first fusion module of the reading grading model to obtain the fusion features corresponding to each grader.
In an optional embodiment, through the first fusion module of the reading scoring model, the splicing features corresponding to the same scorer are fused to obtain a fusion feature corresponding to each scorer, which specifically includes:
the method comprises the steps that a first hidden layer feature acquisition module of a reading scoring model is used for acquiring hidden layer features of voice frames according to the voice features of the voice frames;
calculating the mean value of the hidden layer characteristics of each voice frame through a first mean value module of a reading scoring model to obtain the initial fusion characteristics; alternatively, the first and second electrodes may be,
and calculating the weight of the hidden layer characteristic of each voice frame according to the hidden layer characteristic of each voice frame through a first attention module of the reading scoring model, and weighting and summing the hidden layer characteristics of each voice frame according to the weight of the hidden layer characteristic of each voice frame to obtain the initial fusion characteristic. Specifically, the initial fusion characteristics can be calculated by the following formula:
e t =Vtanh(Wx t +b) (1)
Figure BDA0002353101500000121
Figure BDA0002353101500000122
wherein x is t Representing the t-th hidden layer characteristic, wherein N is the number of the hidden layer characteristics (namely the number of speech frames contained in the speech to be evaluated); w, V are learnable weights, b is a learnable bias; s represents the initial fusion feature.
In an optional embodiment, the fusing the voice features with the scoring scale features by the fused feature extraction module of the reading scoring model to obtain the fused feature corresponding to each scorer may include:
fusing the voice characteristics of each voice frame through a second fusion module of the reading scoring model to obtain initial fusion characteristics;
and splicing each scoring scale characteristic with the initial fusion characteristic respectively through a second splicing module of the reading scoring model to obtain the fusion characteristic corresponding to each scorer.
In an optional embodiment, the fusing the speech features of each speech frame by using the second fusion module of the reading scoring model to obtain initial fusion features includes:
the hidden layer characteristics of each voice frame are obtained through a second hidden layer characteristic obtaining module of the reading scoring model according to the voice characteristics of each voice frame;
calculating the mean value of the hidden layer characteristics of each voice frame through a second mean value module of the reading scoring model to obtain the initial fusion characteristics; alternatively, the first and second electrodes may be,
and calculating the weight of the hidden layer characteristic of each voice frame according to the hidden layer characteristic of each voice frame through a second attention module of the reading scoring model, and weighting and summing the hidden layer characteristics of each voice frame according to the weight of the hidden layer characteristic of each voice frame to obtain the initial fusion characteristic. The specific formula can be referred to the aforementioned formulas (1) - (3), and is not described in detail here. The initial fusion features may reflect the pronunciation quality of more important frames.
In an alternative embodiment, the reading scoring model may be trained as follows:
and acquiring the voice features of the sample voice and the scoring scale features of the scorers associated with the sample voice. The grader associated with the sample voice is the grader scoring the sample voice. The scoring metric of the scorer can be any of the various scoring metrics listed above. Preferably, the square scale of the scorer is the identification of the scorer, such as the number of the scorer.
In the case that there are at least two scorers for the sample voice, the average score of the at least two scorers for the score of the sample voice can be used as the score of the sample, and the numbers of the scorers need to be unified into a new number. For example, if 100 scorers associated with all sample voices in the sample set are assumed, and the numbers are 1-100, if a certain sample voice Y is scored by the scorer No. 1 and the scorer No. 2, the final score of the sample voice Y is the average value of the scorer No. 1 and the scorer No. 2 on the sample voice Y, and the number of the sample voice Y may be 101 or other values as long as the number is not repeated with the existing scorer numbers. Thus, the reading scoring model learns the comprehensive scoring scale of the two scorers.
And inputting the voice characteristics of the sample voice and the grading scale characteristics of the grader associated with the sample voice into the reading grading model to obtain the initial grade of the sample voice, which is output by the reading grading model and corresponds to the grader associated with the sample voice.
And updating the parameters of the reading scoring model by taking the initial scoring of the sample voice as the scoring given by a grader (the marking scoring of the sample voice) associated with the sample voice as a target. Specifically, the mean square error between the initial score and the annotated score of the sample voice may be used as a loss function of the reading scoring model to update parameters of the reading scoring model.
As shown in fig. 6, an exemplary graph for predicting an initial score for the reading scoring model provided in the embodiment of the present application is shown.
In this example, the speech signal of the word "I love disappearing" is divided into N speech frames, feature extraction is performed on the N speech frames, the extracted features are sent to the network 1 to obtain hidden layer features of each speech frame, the hidden layer features of all the speech frames are averaged to obtain fusion features, or fusion features are obtained based on an attention mechanism, the fusion features are fused with square scale features of a certain scorer S, the fusion features are sent to the network 2, and the network 3 predicts the score of the speech signal of the word "I love disappearing" on the square scale space of the scorer S.
Corresponding to the method embodiment, an embodiment of the present application further provides a reading scoring device, a schematic structural diagram of which is shown in fig. 7, and the reading scoring device may include:
an acquisition module 71, an initial score determination module 72 and a score determination module 73; wherein the content of the first and second substances,
the obtaining module 71 is configured to obtain a voice feature of a voice to be evaluated and a scoring scale feature of at least one scorer;
the initial score determining module 72 is configured to determine an initial score of the speech to be evaluated, which corresponds to each scorer, according to the speech feature and the scoring scale feature of the at least one scorer;
the score determining module 73 is configured to determine a score of the speech to be evaluated according to the initial score of the speech to be evaluated corresponding to each scorer.
In an alternative embodiment, the initial score determining module 72 may include:
the fusion module is used for fusing each scoring scale characteristic with the voice characteristic respectively to obtain a fusion characteristic corresponding to each scorer;
and the determining module is used for corresponding to each scorer and determining the initial score of the to-be-evaluated voice corresponding to the scorer by using the fusion characteristics corresponding to the scorer.
In an alternative embodiment, the fusion module may include:
the first splicing module is used for splicing the scoring scale characteristics of each scorer with the voice characteristics of each voice frame corresponding to each scorer to obtain the splicing characteristics corresponding to each voice frame;
and the first fusion module is used for fusing all splicing features corresponding to the same grader to obtain fusion features corresponding to all graders.
In an alternative embodiment, the fusion module may include:
the second fusion module is used for fusing the voice characteristics of each voice frame to obtain initial fusion characteristics;
and the second splicing module is used for splicing each scoring scale characteristic with the initial fusion characteristic respectively to obtain the fusion characteristic corresponding to each scorer.
In an optional embodiment, the second fusion module includes:
the hidden layer characteristic acquisition module is used for acquiring the hidden layer characteristics of each voice frame according to the voice characteristics of each voice frame;
and the mean value module is used for calculating the mean value of the hidden layer characteristics of each voice frame to obtain the initial fusion characteristics.
In an optional embodiment, the second fusion module comprises:
the hidden layer characteristic acquisition module is used for acquiring the hidden layer characteristics of each voice frame according to the voice characteristics of each voice frame;
and the attention module is used for calculating the weight of the hidden layer characteristic of each voice frame according to the hidden layer characteristic of each voice frame, and weighting and summing the hidden layer characteristics of each voice frame according to the weight of the hidden layer characteristic of each voice frame to obtain the initial fusion characteristic.
In an optional embodiment, the obtaining module 71 is specifically configured to:
acquiring at least two voice characteristics of the voice to be evaluated;
and splicing the at least two voice characteristics to obtain the voice characteristics of the voice to be evaluated.
In an alternative embodiment, the score determining module 73 may be specifically configured to:
if the scoring scale feature of only one scorer is obtained, determining the initial score of the voice to be evaluated corresponding to the scorer as the score of the voice to be evaluated;
and if the scoring scale features of at least two scorers are obtained, determining the average value of the initial scores of the speech to be evaluated, which correspond to the at least two scorers, as the score of the speech to be evaluated.
In an optional embodiment, the initial score determining module 72 may specifically be configured to:
inputting the voice characteristics and the grading scale characteristics of the at least one grader into a reading grading model to obtain an initial grade of the voice to be evaluated, which is output by the reading grading model and corresponds to each grader;
the reading scoring model has: fusing each scoring scale characteristic with the voice characteristic to obtain a fused characteristic corresponding to each scorer; and corresponding to each marker, determining the initial scoring capability of the speech to be evaluated corresponding to the marker by using the fusion characteristics corresponding to the marker.
The reading scoring device provided by the embodiment of the application can be applied to reading scoring equipment, such as a PC terminal, a cloud platform, a server cluster and the like. Alternatively, fig. 8 shows a block diagram of a hardware structure of the reading scoring device, and referring to fig. 8, the hardware structure of the reading scoring device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete mutual communication through the communication bus 4;
the processor 1 may be a central processing unit CPU or an ASIC specific integrated circuit
(Application Specific Integrated Circuit), or one or more Integrated circuits or the like configured to implement embodiments of the present invention;
the memory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program and the processor can call the program stored in the memory, the program for:
acquiring voice characteristics of a voice to be evaluated and scoring scale characteristics of at least one scorer;
determining an initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer;
and determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker.
Alternatively, the detailed function and the extended function of the program may be as described above.
Embodiments of the present application further provide a storage medium, where a program suitable for execution by a processor may be stored, where the program is configured to:
acquiring voice characteristics of a voice to be evaluated and scoring scale characteristics of at least one scorer;
determining an initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer;
and determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker.
Alternatively, the detailed function and the extended function of the program may refer to the above description.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method of reading a score, comprising:
acquiring voice characteristics of a voice to be evaluated and scoring scale characteristics of at least one scorer;
determining an initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer;
determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each marker;
the determining the initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer comprises the following steps:
fusing each scoring scale feature with the voice feature respectively to obtain a fused feature corresponding to each scorer;
corresponding to each marker, determining the initial score of the voice to be evaluated corresponding to the marker by using the fusion characteristics corresponding to the marker;
obtaining fusion characteristics corresponding to each marker; the process of determining the initial score of the speech to be evaluated corresponding to each scorer by using the fusion features corresponding to the scorer comprises the following steps:
and inputting the voice characteristics and the grading scale characteristics of the at least one grader into a reading grading model to obtain an initial grade of the voice to be evaluated, which is output by the reading grading model and corresponds to each grader.
2. The method according to claim 1, wherein the speech features of the speech to be evaluated are speech features of respective speech frames of the speech to be evaluated; the step of fusing each scoring scale feature with the voice feature to obtain a fused feature corresponding to each scorer comprises the following steps:
corresponding to each marker, splicing the marking scale characteristics of the marker with the voice characteristics of each voice frame respectively to obtain the splicing characteristics corresponding to each voice frame;
and fusing the splicing features corresponding to the same scorer to obtain the fusion features corresponding to each scorer.
3. The method according to claim 1, wherein the speech features of the speech to be evaluated are speech features of respective speech frames of the speech to be evaluated; the step of fusing each scoring scale feature with the voice feature to obtain a fused feature corresponding to each scorer comprises the following steps:
fusing the voice characteristics of each voice frame to obtain initial fusion characteristics;
and respectively splicing each scoring scale characteristic with the initial fusion characteristic to obtain the fusion characteristic corresponding to each scorer.
4. The method of claim 3, wherein the fusing the speech features of the speech frames comprises:
acquiring hidden layer characteristics of each voice frame according to the voice characteristics of each voice frame;
calculating the mean value of the hidden layer characteristics of each voice frame to obtain the initial fusion characteristics; alternatively, the first and second electrodes may be,
and calculating the weight of the hidden layer characteristic of each voice frame according to the hidden layer characteristic of each voice frame, and weighting and summing the hidden layer characteristics of each voice frame according to the weight of the hidden layer characteristic of each voice frame to obtain the initial fusion characteristic.
5. The method according to any one of claims 1 to 4, wherein the obtaining of the speech characteristics of the speech to be evaluated comprises:
acquiring at least two voice characteristics of the voice to be evaluated;
and splicing the at least two voice characteristics to obtain the voice characteristics of the voice to be evaluated.
6. The method according to claim 1, wherein the determining the score of the speech to be evaluated according to the initial score of the speech to be evaluated corresponding to each scorer comprises:
if the scoring scale characteristic of only one scorer is obtained, determining the initial score of the voice to be evaluated corresponding to the scorer as the score of the voice to be evaluated;
and if the grading scale characteristics of at least two graders are obtained, determining the average value of the initial grades of the voice to be evaluated corresponding to the at least two graders as the grade of the voice to be evaluated.
7. The method of claim 1,
the reading scoring model has: fusing each scoring scale feature with the voice feature respectively to obtain a fused feature corresponding to each scorer; and corresponding to each scorer, determining the initial scoring capability of the voice to be evaluated corresponding to the scorer by using the fusion characteristics corresponding to the scorer.
8. The method of claim 1, wherein the speakable score model is trained by:
acquiring voice characteristics of sample voice and scoring scale characteristics of a scorer associated with the sample voice;
inputting the voice features of the sample voice and the scoring scale features of the scorers associated with the sample voice into the reading scoring model to obtain the initial score of the sample voice, which is output by the reading scoring model and corresponds to the scorers associated with the sample voice;
and updating the parameters of the reading scoring model by taking the initial scoring of the sample voice as a target to approach the scoring given by a grader associated with the sample voice.
9. A reading scoring device, comprising:
the acquisition module is used for acquiring the voice characteristics of the voice to be evaluated and the grading scale characteristics of at least one grader;
the initial score determining module is used for determining the initial score of the voice to be evaluated corresponding to each scorer according to the voice characteristics and the scoring scale characteristics of the at least one scorer;
the score determining module is used for determining the score of the voice to be evaluated according to the initial score of the voice to be evaluated corresponding to each scorer;
the initial scoring determination module is specifically configured to fuse each scoring scale feature with the voice feature to obtain a fused feature corresponding to each scorer; corresponding to each marker, determining the initial score of the voice to be evaluated corresponding to the marker by using the fusion characteristics corresponding to the marker;
obtaining fusion characteristics corresponding to each marker; the process of determining the initial score of the speech to be evaluated corresponding to each scorer by using the fusion features corresponding to the scorer comprises the following steps:
and inputting the voice characteristics and the grading scale characteristics of the at least one grader into a reading grading model to obtain an initial grade of the voice to be evaluated, which is output by the reading grading model and corresponds to each grader.
10. A reading scoring device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is used for executing the program to realize the steps of the reading scoring method according to any one of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the speakable scoring method of any one of claims 1 to 8.
CN201911424069.6A 2019-12-31 2019-12-31 Reading scoring method, device, equipment and readable storage medium Active CN111105813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911424069.6A CN111105813B (en) 2019-12-31 2019-12-31 Reading scoring method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911424069.6A CN111105813B (en) 2019-12-31 2019-12-31 Reading scoring method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111105813A CN111105813A (en) 2020-05-05
CN111105813B true CN111105813B (en) 2022-09-02

Family

ID=70427005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911424069.6A Active CN111105813B (en) 2019-12-31 2019-12-31 Reading scoring method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111105813B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN103065626A (en) * 2012-12-20 2013-04-24 中国科学院声学研究所 Automatic grading method and automatic grading equipment for read questions in test of spoken English
CN103366759A (en) * 2012-03-29 2013-10-23 北京中传天籁数字技术有限公司 Speech data evaluation method and speech data evaluation device
CN103559892A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for evaluating spoken language
CN104464423A (en) * 2014-12-19 2015-03-25 科大讯飞股份有限公司 Calibration optimization method and system for speaking test evaluation
CN109273023A (en) * 2018-09-20 2019-01-25 科大讯飞股份有限公司 A kind of data evaluating method, device, equipment and readable storage medium storing program for executing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2544070B (en) * 2015-11-04 2021-12-29 The Chancellor Masters And Scholars Of The Univ Of Cambridge Speech processing system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN103366759A (en) * 2012-03-29 2013-10-23 北京中传天籁数字技术有限公司 Speech data evaluation method and speech data evaluation device
CN103065626A (en) * 2012-12-20 2013-04-24 中国科学院声学研究所 Automatic grading method and automatic grading equipment for read questions in test of spoken English
CN103559892A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for evaluating spoken language
CN104464423A (en) * 2014-12-19 2015-03-25 科大讯飞股份有限公司 Calibration optimization method and system for speaking test evaluation
CN109273023A (en) * 2018-09-20 2019-01-25 科大讯飞股份有限公司 A kind of data evaluating method, device, equipment and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"FLORA: Fluent oral reading assessment of children"s speech";D Bolaños等;《ACM Transactions on Speech and Language Processing》;20110831;第7卷(第4期);全文 *
"语音评测技术助力英语口语教学与评价";魏思 等;《人工智能》;20190610;全文 *

Also Published As

Publication number Publication date
CN111105813A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
EP3528250B1 (en) Voice quality evaluation method and apparatus
CN108509619B (en) Voice interaction method and device
CN110782921A (en) Voice evaluation method and device, storage medium and electronic device
CN111081280B (en) Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN111460111A (en) Evaluating retraining recommendations for automatic conversation services
CN111145733B (en) Speech recognition method, speech recognition device, computer equipment and computer readable storage medium
JP2015075706A (en) Error correction model learning device and program
CN109036471B (en) Voice endpoint detection method and device
WO2021103712A1 (en) Neural network-based voice keyword detection method and device, and system
CN111640456B (en) Method, device and equipment for detecting overlapping sound
CN107886968B (en) Voice evaluation method and system
CN112017694B (en) Voice data evaluation method and device, storage medium and electronic device
CN115798518B (en) Model training method, device, equipment and medium
CN111508505A (en) Speaker identification method, device, equipment and storage medium
Hansen et al. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models
US11410685B1 (en) Method for detecting voice splicing points and storage medium
CN112562723B (en) Pronunciation accuracy determination method and device, storage medium and electronic equipment
CN110930988B (en) Method and system for determining phoneme score
CN111833842A (en) Synthetic sound template discovery method, device and equipment
CN111105813B (en) Reading scoring method, device, equipment and readable storage medium
CN113160801B (en) Speech recognition method, device and computer readable storage medium
CN113035238B (en) Audio evaluation method, device, electronic equipment and medium
CN111739518B (en) Audio identification method and device, storage medium and electronic equipment
CN114121038A (en) Sound voice testing method, device, equipment and storage medium
CN114528812A (en) Voice recognition method, system, computing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant