CN110322895A

CN110322895A - Speech evaluating method and computer storage medium

Info

Publication number: CN110322895A
Application number: CN201810259445.XA
Authority: CN
Inventors: 吴介圣
Original assignee: Yiduhuida Educational Technology (beijing) Co Ltd
Current assignee: Yiduhuida Educational Technology (beijing) Co Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2019-10-11
Anticipated expiration: 2038-03-27
Also published as: CN110322895B

Abstract

The embodiment of the invention provides a kind of speech evaluating method and computer storage mediums.The speech evaluating method includes: to generate the vector to be evaluated of the voice data to be evaluated according to the corresponding text data of voice data to be evaluated；According to preset standard content vector, the vector sum similarity calculation to be evaluated, calculate the similarity between vector to be evaluated described in the preset standard content vector sum, wherein, cosine value of the similarity calculation for the vector to be evaluated according to the preset standard content vector sum calculates the similarity, and makes calculated numerical value of the similarity greater than 0；According to the similarity and preset speech assessment rule, the evaluation result data of the voice data to be evaluated are generated and exported.The speech evaluating method can assess learning outcome in assertiveness training course.

Description

Speech evaluating method and computer storage medium

Technical field

The present embodiments relate to field of computer technology more particularly to the storage of a kind of speech evaluating method and computer to be situated between Matter.

Background technique

With the development of computer and Internet technology, study is carried out by means of computer and internet and teaching has become A kind of trend.By computer and internet, learn student whenever and wherever possible, it is not necessary to be limited to the environment such as place, number Factor.Especially in terms of underage child education, compensated for using computer and internet progress underage child education existing The blank of underage child education.

Language expression, which is carried out, by computer and internet with 3-8 years old children is trained for example, existing assertiveness training mistake Journey are as follows: by computer or mobile terminal device by one group of interesting picture presentation to student, student can be by describing these Image content carries out assertiveness training.

During existing assertiveness training, do not feed back and judgment mechanism, it cannot be fine after causing student to complete training Ground understands the promotion degree of oneself, less to the study situation awareness of oneself, is unfavorable for that student is motivated to continue study and progress.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of speech evaluating method and computer storage medium, it is existing to solve The problem of student's learning outcome cannot be assessed in some assertiveness training courses.

According to a first aspect of the embodiments of the present invention, a kind of speech evaluating method is provided, this method comprises: according to be evaluated The corresponding text data of voice data is surveyed, the vector to be evaluated of voice data to be evaluated is generated；According to preset standard content to Amount, vector sum similarity calculation to be evaluated, calculate the similarity between preset standard content vector sum vector to be evaluated, Wherein, similarity calculation is used to calculate similarity according to the cosine value of preset standard content vector sum vector to be evaluated, And make numerical value of the calculated similarity greater than 0；According to similarity and preset speech assessment rule, generates and export to be evaluated Survey the evaluation result data of voice data.

The second aspect of embodiment according to the present invention, provides a kind of computer storage medium, and computer storage medium is deposited It contains: for generating the finger of the vector to be evaluated of voice data to be evaluated according to the corresponding text data of voice data to be evaluated It enables；For according to preset standard content vector, vector sum similarity calculation to be evaluated, calculate preset standard content to Measure vector to be evaluated between similarity instruction, wherein similarity calculation be used for according to preset standard content to The cosine value of amount and vector to be evaluated calculates similarity, and makes numerical value of the calculated similarity greater than 0；For according to similar Degree and preset speech assessment rule, generate and export the instruction of the evaluation result data of voice data to be evaluated.

The scheme provided according to embodiments of the present invention, the speech evaluating method can be applied in assertiveness training course, lead to It crosses and the voice of user's input is evaluated and tested, such as the corresponding text data of voice data to be evaluated is converted into direction finding to be evaluated Amount, and the similarity of vector to be evaluated and preset standard content vector is evaluated and tested, evaluation result data are obtained, to pass through The ability to express of evaluation result data characterization user, so that user is allow to understand the expression for easily understanding oneself, from And user is motivated to continue study and progress.

The speech evaluating method is converted to corresponding text data to be evaluated when evaluating and testing to voice data to be evaluated Direction finding amount is calculated using similarity calculation according to the cosine value between vector to be evaluated and preset standard content vector Similarity, and make numerical value of the calculated similarity greater than 0, it solves existing similarity calculating method and is carrying out speech evaluating When the low problem of existing evaluation and test accuracy.In addition, also solving similarity calculation has negative value, so that speech evaluating Result it is more accurate.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in inventive embodiments can also obtain according to these attached drawings for those of ordinary skill in the art Obtain other attached drawings.

Fig. 1 is a kind of step flow chart of according to embodiments of the present invention one speech evaluating method；

Fig. 2 is a kind of step flow chart of according to embodiments of the present invention two speech evaluating method；

Fig. 3 is the structural schematic diagram for the doc2vec model that one of embodiment illustrated in fig. 2 speech evaluating method uses.

Specific embodiment

In order to make those skilled in the art more fully understand the technical solution in the embodiment of the present invention, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described reality Applying example only is a part of the embodiment of the embodiment of the present invention, instead of all the embodiments.Based on the implementation in the embodiment of the present invention The range of protection of the embodiment of the present invention all should belong in example, those of ordinary skill in the art's every other embodiment obtained.

Embodiment one

Referring to Fig.1, a kind of step flow chart of according to embodiments of the present invention one speech evaluating method is shown.

The speech evaluating method of the present embodiment the following steps are included:

Step S101: according to the corresponding text data of voice data to be evaluated, the to be evaluated of voice data to be evaluated is generated Vector.

Wherein, text data corresponding with voice data to be evaluated can be obtained by any suitable mode.For example, sharp With the mode of speech recognition, voice data to be evaluated is identified by speech recognition modeling or algorithm, and generates corresponding text Data.This mode may be implemented that voice to be evaluated is converted to corresponding text data, high conversion efficiency, labor intensity automatically It is low.

It certainly, in other embodiments, can also be by way of manual transcription, by manually turning voice data to be evaluated It is changed to corresponding text data.

Similarly, vector to be evaluated corresponding with voice data to be evaluated can also be obtained by any suitable mode. For example, generating corresponding vector to be evaluated by deep learning model according to text data corresponding with voice data to be evaluated. Wherein, deep learning model can be Word2vec model, doc2vec model etc..Word2vec model and doc2vec model can Text data is converted to corresponding vector to be evaluated on semantic level, voice data to be evaluated can be embodied well Semanteme has great benefit to the accuracy for guaranteeing the subsequent progress speech evaluating on semantic level.In addition, the text that will identify that Notebook data is converted to vector to be evaluated, subsequent that speech evaluating is carried out based on vector to be evaluated, it is ensured that the standard of speech evaluating True property avoids during voice data to be evaluated is identified as text data, causes due to voice is close, phase is same The text inaccuracy identified and the problem of influence speech evaluating accuracy.

Certainly, in other embodiments, only hotlist representation model (one-hot reprentation), shallow-layer can also be passed through Text data is converted to corresponding vector to be evaluated by semantic analysis model (LSA) etc..

Step S102: it according to preset standard content vector, vector sum similarity calculation to be evaluated, calculates preset Similarity between standard content vector sum vector to be evaluated, wherein similarity calculation is used for according in preset standard The cosine value for holding vector sum vector to be evaluated calculates similarity, and makes numerical value of the calculated similarity greater than 0.

Wherein, preset standard content vector is generated according to Key for Reference data.For example, by text vector model, Key for Reference data are converted to corresponding standard content vector by Word2vec model as the aforementioned, and standard content vector is pre- It is first stored in computer equipment and/or in server.

Certainly, in other embodiments, can also in computer equipment and/or server preset reference answer data, And Key for Reference data are converted into standard content vector when needed.

Similarity calculation is used to calculate the similarity between preset standard content vector sum vector to be evaluated, thus According to this similarity characterization Key for Reference data (corresponding with preset standard content vector) and text data (with direction finding to be evaluated Amount correspond to) between similarity degree, thus according to the similarity degree between Key for Reference data and text data be language to be evaluated Sound data score.In the present embodiment, similarity calculation is used for be evaluated according to preset standard content vector sum Cosine value between vector calculates the similarity, and makes numerical value of the calculated similarity greater than 0.

The similarity calculation can determine similarity according to the cosine value of two vectors, and make calculated similarity For the numerical value greater than 0, the purpose that precise and high efficiency determines similarity is not only realized, is avoided when carrying out speech evaluating, is used It is existing in such a way that text keyword determines similarity existing for influence of the unisonance allograph to similarity accuracy, and Solving cosine value, there are negative values, and causing calculated similarity, there are negative values, so that the problem of speech evaluating result inaccuracy.

Step S103: it according to the similarity and preset speech assessment rule, generates and exports voice data to be evaluated Evaluation result data.

Wherein, evaluation result data are used to characterize the level of the ability to express of user.For example, in assertiveness training course, Evaluation result data can characterize the expression of user.The horizontal difference of the ability to express characterized as needed, evaluation and test knot It may include different evaluation and test parameters in fruit data.For example, evaluation result data include semanteme in assertiveness training course above-mentioned Score, dynamics score, tone score etc..

In a kind of feasible pattern, preset speech assessment rule includes score and phase above-mentioned in evaluation result data Like positively related rule is spent, i.e., when similarity is higher, score is higher, and characterization ability to express is better.

The speech evaluating method can be applied in assertiveness training course, by being evaluated and tested to the voice that user inputs, Such as voice data to be evaluated is converted into vector to be evaluated, and vector to be evaluated is similar to preset standard content vector Degree is evaluated and tested, and evaluation result data are obtained, to pass through the ability to express of evaluation result data characterization user, to make user can The expression of oneself is easily understood with clear, so that user be motivated to continue study and progress.

The speech evaluating method of the present embodiment can be realized by any suitable equipment having data processing function, be wrapped It includes: various terminal equipment and server etc..

Embodiment two

Referring to Fig. 2, a kind of step flow chart of according to embodiments of the present invention two speech evaluating method is shown.

In the present embodiment, assertiveness training course is applied to the speech evaluating method, is particularly applied to underage child It is illustrated for the assertiveness training course of (for example, 3-8 years old children).Certainly, in other embodiments, the speech evaluating method It can be applied to any other scene appropriate, for example, for the Training valuation scene etc. to artificial intelligence equipment, the present embodiment With no restriction to this.

Step S201: according to the corresponding text data of voice data to be evaluated, the to be evaluated of voice data to be evaluated is generated Vector.

In assertiveness training course, one group of picture and/or video can be shown to user, user passes through language description picture And/or the content in video, with the ability to express of this training user.In order to allow users to more accurately grasp oneself Practise situation, can to user state language formed voice data evaluate and test, allow it to it is more intuitive, clearly understand The study situation of oneself promotes learning initiative so that user be supervised to continue to learn.

The voice data of user is being evaluated and tested so that parameter characterization table appropriate can be used when embodying the ability to express of user Danone power, for example, a picture is shown to user, it is corresponding with picture by calculating the corresponding text data of user voice data The similarity of Key for Reference data judges the semanteme and the matching journey of the content of the picture shown of the voice data of user's input Degree, to characterize ability to express with this.

In a kind of feasible pattern, when carrying out speech evaluating, step S201 includes following sub-step:

Sub-step 1: identifying voice data to be evaluated using speech recognition modeling, generates and voice data to be evaluated Corresponding text data.

It is alternatively possible to first obtain voice data to be evaluated；Transcoding processing is carried out to voice data to be evaluated again, and is generated Voice data to be evaluated after transcoding；Using the voice data to be evaluated after transcoding as the input of speech recognition modeling, pass through language Sound identification model generates text data corresponding with voice data to be evaluated.

Voice data to be evaluated can be the recording etc. by the sound pick-up outfit user voice acquired or user's input, Or the user voice data of preservation is extracted from database as voice data to be evaluated.

It, can if voice data to be evaluated meets the format of speech recognition modeling needs after obtaining voice data to be evaluated To generate voice number to be evaluated by speech recognition modeling directly by voice data to be evaluated as the input of speech recognition modeling According to corresponding text data.

If voice data to be evaluated is unsatisfactory for the format of speech recognition modeling needs, voice data to be evaluated is turned Code processing, and generate the voice data to be evaluated after transcoding.For example, if voice data to be evaluated is mp3 format, sample rate is 44100Hz, voice-grade channel 2 are carried out language if this phonetic matrix does not meet the input format of speech recognition modeling needs Sound transcoded data, transcoding can use any suitable mode, the phonetic matrix for example, by using ffmpeg mode, after conversion are as follows: Wav format, sample rate 16000Hz, voice-grade channel 2,16bit.

Input of the voice data to be evaluated as speech recognition modeling after transcoding, by speech recognition modeling generate with to Evaluate and test the corresponding text data of voice data.

Those skilled in the art can according to need, and be carried out using speech recognition modeling appropriate to voice data to be evaluated Identification, the present embodiment to this with no restriction.For example, speech recognition modeling can be based on HMM (hidden Markov model, Hidden Markov Model) and N-gram model speech recognition modeling, existing speech recognition work can also be called directly Tool carries out speech recognition.

Same or similar speech recognition will inevitably be pronounced in speech recognition process into different texts, example Such as " ", " ", " horse ".Identifying voice data to be evaluated, and after generating text data, there may be turn in text data The text pronunciation changed is identical as the pronunciation in voice data to be evaluated, but the different problem of word.Especially for underage child Voice data, due to user there is a problem of expression it is unintelligible, pause etc. it is inevitable, cause the accuracy of speech recognition to have It is reduced.

If directly being carried out using this text data and Key for Reference data using existing Text similarity computing model Text similarity computing will be caused similarity calculation inaccurate due to the text inaccuracy identified, influence subsequent voice and comment The problem of accuracy of survey.This is because existing similarity calculation such as T-IDF (term frequency-inverse document frequency model), LSI (Latent semantic indexing, shallow semantic index), LDA (Latent Dirichlet Allocation, document Theme generates model) model is that similarity calculation is carried out by Keywords matching, and the similarity that Keywords matching calculates Accuracy can be influenced by the accuracy of the text identified.

In order to overcome defect above-mentioned, accuracy of speech recognition is promoted, a set of base of machine learning model training can be passed through In the speech recognition modeling of underage child, voice data to be evaluated is identified using the speech recognition modeling after training, to obtain More accurate text data.

Sub-step 2: vectorization processing is carried out to text data by text vector computation model, generates voice number to be evaluated According to vector to be evaluated.

Text vector computation model is used to text data being converted to corresponding vector to be evaluated, to carry out subsequent text Similarity calculation.

Optionally, a kind of mode for realizing this step may include:

Text data is pre-processed, and generates result data according to pre-processed results, wherein result data includes using The participle data of multiple participles in instruction text data；Using the participle data of each participle as text vector computation model Input, the vector to be evaluated of voice data to be evaluated is generated by text vector computation model.

In a kind of feasible pattern, pretreatment includes removal dirty data processing, word segmentation processing and removal stop words processing.

Based on this, text data is pre-processed, and generates result data according to pre-processed results and includes:

Pre-treatment step 1: dirty data processing is removed to text data, and obtains effective text data.

Wherein, dirty data can be text and seldom be not enough to trained data for empty, text useful information, such as in text Number of words be less than preset threshold value, and in text sentence lack subject, predicate it may be considered that this article notebook data be useful information very Few dirty data.

Such as: text 1:,.Text 1 is empty data.Text 2: having a Little Bear,.Useful information in text 2 is seldom. These data are all dirty datas.The method of specific removal dirty data can in any suitable manner, for example, existing go Except the method for dirty data.

Pre-treatment step 2: word segmentation processing is carried out to effective text data, and obtains multiple participles in effective text data Participle data.

Word segmentation processing can in any suitable manner, for example, using hidden Markov model (Hidden is based on Markov Model, HMM) machine learning participle model.

By taking " text 3: having a bear on picture " as an example, after word segmentation processing, data are segmented are as follows:/mono-, picture/go up/have/ Bear.

Pre-treatment step 3: stop words processing is removed to the participle data of multiple participles, obtains result data.

Wherein, stop words refers in information and/or text-processing, for save memory space and improve treatment effeciency and from Information and/or the word of text removal.Stop words can according to need determination, for example, stop words can be auxiliary word or modal particle etc., Such as: " ", " ground ", " " etc., be also possible to other words, such as " having ", "upper".

With the participle data instance of text 3 above-mentioned, after removing stop words are as follows: one bear of picture.

After obtaining result data, using the participle data of wherein each participle as the input of text vector computation model, lead to Cross the vector to be evaluated that text vector computation model generates voice data to be evaluated.

Wherein, text vector computation model can be deep learning model, such as: Word2vec model, doc2vec model Deng.It is illustrated so that text vector computation model is doc2vec as an example in the present embodiment, the structure chart of doc2vec model is such as Shown in Fig. 3, doc2vec model includes input layer (input layer), hidden layer (hidden layer) and output layer (output layer).Wherein, input layer is for obtaining training sample data；Hidden layer be used for training sample data carry out to Quantification treatment；Output layer is for exporting result.

In order to better adapt to underage child voice data, which can use underage child voice data It is trained, so that more preferable by the vector to be evaluated of the voice data to be evaluated of the doc2vec model generation after training.

Wherein, the training of doc2vec model can be realized using conventional training method, for example, using following training process:

Firstly, the word in each training text data is initialized as a N-dimensional vector, N be can according to need really A fixed value appropriate.Preferably, the value range of N are as follows: 30~200, it can adjust as needed.

For example, the participle data of text data are as follows: { one bear of picture is played }；Word initialization vector dimension is 4, then The vector of each word after initialization are as follows:

Picture x1:[1,0,0,0]；Wherein, picture x1 is used to indicate word " picture " and inputs (i.e. in Fig. 3 as first Shown in X₁)。

One x2:[0,1,0,0]；Wherein, an x2 is used to indicate word " one " and inputs (i.e. in Fig. 3 as second Shown in X₂)。

Bear x3:[0,0,1,0]；Wherein, bear x3 is used to indicate word " bear " and inputs as third.

Play x4:[0,0,0,1]；Wherein, the x4 that plays is used to indicate word " playing " as the 4th input.

It, can be using x1, x2 and x4 as the input to training pattern, i.e. in Fig. 3 in training doc2vec model Layer layers of input are exactly x1, x2, x4, and using x3 as amendment benchmark.

When training, the training parameter matrix w by setting up a doc2vec model calculates ayer layers of hidden l and (hides Layer) result.Matrix w is in the side being illustrated as between input layer (input layer) and hidden layer (hidden layer) in Fig. 3 Frame.In a kind of feasible pattern, matrix w is as follows:

It is as follows that Hidden Layer result v2 is calculated with x2 and matrix w:

Similarly, Hidden Layer result v1 and v4 is calculated in x1 and x4 and W matrix.According to the v1 being calculated, V2 and v4 is averaged determining hidden layer.

Later, the matrix o between hidden layer and output layer, matrix o and hidden layer phase are set up Multiply, then obtains probability, such as output layer calculated result with sofmax are as follows: [0.23,0.03,0.62,0.12], the 3rd Value 0.62 is maximum, then close with true expectation [0,0,1,0].According to the true phase of the result of output layer output and x3 It hopes and carries out arameter optimization, such as adjust aforementioned each matrix, to obtain optimal models, complete the training of doc2vec model.

After doc2vec model is completed in training, using the participle data of each participle above-mentioned as text vector computation model The input of (the doc2vec model that training is completed), generates the to be evaluated of voice data to be evaluated by text vector computation model Vector.Since text vector computation model is by training, obtain for the optimized parameter in underage child usage scenario, because This is more preferable in the accuracy for generating vector to be evaluated using text vector computation model.

Step S202: it according to preset standard content vector, vector sum similarity calculation to be evaluated, calculates preset Similarity between standard content vector sum vector to be evaluated, wherein similarity calculation is used for according in preset standard The cosine value for holding vector sum vector to be evaluated calculates the similarity, and makes numerical value of the calculated similarity greater than 0.

The calculated similarity of similarity calculation may be considered Key for Reference data and voice data pair to be evaluated The similarity degree between text data answered, to be that voice data to be evaluated scores according to this similarity.In this reality It applies in example, similarity calculation is used to calculate institute according to the cosine value between preset standard content vector sum vector to be evaluated Similarity is stated, and makes numerical value of the calculated similarity greater than 0.

The similarity calculation can determine similarity according to the cosine value of two vectors, and make calculated similarity For the numerical value greater than 0, the purpose that precise and high efficiency determines similarity is not only realized, is avoided when carrying out speech evaluating using existing Some in such a way that text keyword determines similarity existing for influence of the unisonance allograph to similarity accuracy, and solve Having determined, there are negative values for cosine value, and causing calculated similarity, there are negative values, so that the problem of speech evaluating result inaccuracy.

In a kind of feasible pattern, using preset standard content vector sum vector to be evaluated as similarity calculation Input, calculates similarity by similarity calculation.

Optionally, in order to improve the accuracy of evaluation and test, similarity be greater than 0 and be less than or equal to 1 numerical value.It both solves in this way Similarity existing for existing cosine value instruction similarity has negative value, is unfavorable for the problem of accurately embodying ability to express, and just It is scored in subsequent according to similarity, makes the simpler convenience that scores.

Optionally, similarity calculation includes:

Wherein, x_iIt is used to indicate vector to be evaluated, x_jIt is used to indicate standard content vector corresponding with vector to be evaluated, Score is used to indicate similarity.The similarity calculated by the similarity calculation can be with cosine value linear change, cosine Value is bigger, and similarity is closer to 1.0 after conversion, and cosine value is smaller, and similarity is similar in this way closer to 0.0 after conversion The value range of degree is 0.0~1.0, may be conveniently used and is subsequently generated evaluation result data, such as evaluates and tests score.

Certainly, in other embodiments, similarity calculation includes score=e^cos(xi,xj), wherein x_iBe used to indicate to Evaluate and test vector, x_jIt is used to indicate standard content vector corresponding with vector to be evaluated, score is used to indicate similarity.

Step S203: it according to similarity and preset speech assessment rule, generates and exports commenting for voice data to be evaluated Survey result data.

The speech evaluating method can be applied in assertiveness training course, by being evaluated and tested to the voice that user inputs, Such as the corresponding text data of voice data to be evaluated is converted into vector to be evaluated, and by vector to be evaluated and preset standard The similarity of content vector is evaluated and tested, and evaluation result data are obtained, to pass through the expression energy of evaluation result data characterization user Power, to allow user to understand the expression for easily understanding oneself, so that user be motivated to continue study and progress.

The speech evaluating method is when evaluating and testing voice data to be evaluated, by semantic understanding technology by speech recognition Inconsistent text data is converted to numeric type vector to reduce error, using similarity calculation, according to direction finding to be evaluated out Cosine value between amount and preset standard content vector calculates similarity, and makes numerical value of the calculated similarity greater than 0, Solve the problems, such as that the existing evaluation and test accuracy when carrying out speech evaluating of existing similarity calculating method is low.In addition, also solving There is negative value in similarity calculation, so that the result of speech evaluating is more accurate.

Embodiment three

According to an embodiment of the invention, providing a kind of computer storage medium, computer storage medium is stored with: being used for root According to the corresponding text data of voice data to be evaluated, the instruction of the vector to be evaluated of voice data to be evaluated is generated；For basis Preset standard content vector, vector sum similarity calculation to be evaluated, it is to be evaluated to calculate preset standard content vector sum The instruction of similarity between vector, wherein similarity calculation is used for be evaluated according to preset standard content vector sum The cosine value of vector calculates similarity, and makes numerical value of the calculated similarity greater than 0；For according to similarity and preset Speech assessment rule, generates and exports the instruction of the evaluation result data of voice data to be evaluated.

Optionally, for calculating default according to preset standard content vector, vector sum similarity calculation to be evaluated Standard content vector sum vector to be evaluated between similarity instruction, comprising: for preset standard content vector sum Input of the vector to be evaluated as similarity calculation calculates the instruction of similarity by similarity calculation.

Optionally, similarity is the numerical value greater than 0 and less than or equal to 1.

Optionally, similarity calculation includes:

Wherein, x_iIt is used to indicate vector to be evaluated, x_jIt is used to indicate standard content vector corresponding with vector to be evaluated, Score is used to indicate similarity.

Optionally, for generating the to be evaluated of voice data to be evaluated according to the corresponding text data of voice data to be evaluated The instruction of direction finding amount, comprising: for being identified using speech recognition modeling to voice data to be evaluated, generate and language to be evaluated The instruction of the corresponding text data of sound data；For carrying out vectorization processing to text data by text vector computation model, Generate the instruction of the vector to be evaluated of voice data to be evaluated.

Optionally, it for being identified using speech recognition modeling to voice data to be evaluated, generates and voice to be evaluated The instruction of the corresponding text data of data, comprising: for obtaining the instruction of voice data to be evaluated；For to voice number to be evaluated According to progress transcoding processing, and generate the instruction of the voice data to be evaluated after transcoding；For by the voice number to be evaluated after transcoding According to the input as speech recognition modeling, text data corresponding with voice data to be evaluated is generated by speech recognition modeling Instruction.

Optionally, for carrying out vectorization processing to text data by text vector computation model, language to be evaluated is generated The instruction of the vector to be evaluated of sound data, comprising: for being pre-processed to text data, and generated and tied according to pre-processed results The instruction of fruit data, wherein result data includes the participle data for the multiple participles being used to indicate in text data；For will be each Input of the participle data of a participle as text vector computation model generates voice to be evaluated by text vector computation model The instruction of the vector to be evaluated of data.

Optionally, pretreatment includes removal dirty data processing, word segmentation processing and removal stop words processing；For to textual data According to being pre-processed, and generate according to pre-processed results the instruction of result data, comprising: dirty for being removed to text data Data processing, and obtain the instruction of effective text data；For carrying out word segmentation processing to effective text data, and obtain effectively text The instruction of the participle data of multiple participles in notebook data；It is removed at stop words for the participle data to multiple participles Reason, obtains the instruction of result data.

The instruction stored in the computer storage medium can pass through semanteme when evaluating and testing to voice data to be evaluated Speech recognition is gone out inconsistent text data and is converted to numeric type vector by understanding technology reduces error, utilizes similarity calculation Model calculates similarity according to the cosine value between vector to be evaluated and preset standard content vector, and makes calculated phase It is the numerical value greater than 0 like degree, it is low solves the existing evaluation and test accuracy when carrying out speech evaluating of existing similarity calculating method The problem of.In addition, also solving similarity calculation has negative value, so that the result of speech evaluating is more accurate.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software product in other words, the meter Calculation machine software product may be stored in a computer readable storage medium, and the computer readable recording medium includes by terms of Any mechanism of the readable form storage of calculation machine (such as computer) or transmission information.For example, machine readable media includes read-only Memory (ROM), random access memory (RAM), magnetic disk storage medium, optical storage media, flash medium, electricity, light, sound Or transmitting signal (for example, carrier wave, infrared signal, digital signal etc.) of other forms etc., if the computer software product includes Dry instruction is used so that computer equipment (can be personal computer, server or the network equipment an etc.) execution is each Method described in certain parts of embodiment or embodiment.

Finally, it should be noted that above embodiments are only to illustrate the technical solution of the embodiment of the present invention, rather than it is limited System；Although the present invention is described in detail referring to the foregoing embodiments, those skilled in the art should understand that: its It is still possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is equal Replacement；And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution Spirit and scope.

It will be understood by those skilled in the art that the embodiment of the embodiment of the present invention can provide as method, apparatus (equipment) or Computer program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine soft The form of the embodiment of part and hardware aspect.Moreover, it wherein includes to calculate that the embodiment of the present invention, which can be used in one or more, Computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, the optical memory of machine usable program code Deng) on the form of computer program product implemented.

The embodiment of the present invention referring to according to the method for the embodiment of the present invention, device (equipment) and computer program product Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram The combination of process and/or box in one process and/or box and flowchart and/or the block diagram.It can provide these computers Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute In the dress for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram It sets.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Claims

1. a kind of speech evaluating method characterized by comprising

According to the corresponding text data of voice data to be evaluated, the vector to be evaluated of the voice data to be evaluated is generated；

According to preset standard content vector, the vector sum similarity calculation to be evaluated, the preset standard is calculated Similarity between vector to be evaluated described in content vector sum, wherein the similarity calculation is used for according to described default Standard content vector sum described in the cosine value of vector to be evaluated calculate the similarity, and make the calculated similarity Numerical value greater than 0；

According to the similarity and preset speech assessment rule, the evaluation result of the voice data to be evaluated is generated and exported Data.

2. the method according to claim 1, wherein according to preset standard content vector, the direction finding to be evaluated Amount and similarity calculation calculate the similarity between vector to be evaluated described in the preset standard content vector sum, packet It includes:

Using vector to be evaluated described in preset standard content vector sum as the input of the similarity calculation, by described Similarity calculation calculates the similarity.

3. the method according to claim 1, wherein the similarity is the numerical value greater than 0 and less than or equal to 1.

4. the method according to claim 1, wherein the similarity calculation includes:

Wherein, x_iIt is used to indicate vector to be evaluated, x_jIt is used to indicate standard content vector corresponding with vector to be evaluated, score is used In instruction similarity.

5. method according to claim 1 to 4, which is characterized in that described according to voice data pair to be evaluated The text data answered generates the vector to be evaluated of the voice data to be evaluated, comprising:

The voice data to be evaluated is identified using speech recognition modeling, is generated corresponding with the voice data to be evaluated The text data；

Vectorization processing is carried out to the text data by text vector computation model, generates the voice data to be evaluated Vector to be evaluated.

6. according to the method described in claim 5, it is characterized in that, described use speech recognition modeling to the voice to be evaluated Data are identified, the text data corresponding with the voice data to be evaluated is generated, comprising:

Obtain the voice data to be evaluated；

Transcoding processing is carried out to the voice data to be evaluated, and generates the voice data to be evaluated after transcoding；

Using the voice data to be evaluated after transcoding as the input of the speech recognition modeling, pass through the speech recognition mould Type generates the text data corresponding with the voice data to be evaluated.

7. according to the method described in claim 5, it is characterized in that, it is described by text vector computation model to the textual data According to vectorization processing is carried out, the vector to be evaluated of the voice data to be evaluated is generated, comprising:

The text data is pre-processed, and generates result data according to pre-processed results, wherein the result data packet Include the participle data for the multiple participles being used to indicate in the text data；

Using the participle data of each participle as the input of text vector computation model, mould is calculated by the text vector Type generates the vector to be evaluated of the voice data to be evaluated.

8. the method according to the description of claim 7 is characterized in that described pre-process including at removal dirty data processing, participle Reason and removal stop words processing；

It is described that the text data is pre-processed, and result data is generated according to pre-processed results, comprising:

Dirty data processing is removed to the text data, and obtains effective text data；

Word segmentation processing is carried out to effective text data, and obtains the participle number of multiple participles in effective text data According to；

Stop words processing is removed to the participle data of the multiple participle, obtains the result data.

9. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with: for according to language to be evaluated The corresponding text data of sound data generates the instruction of the vector to be evaluated of the voice data to be evaluated；For according to preset Standard content vector, the vector sum similarity calculation to be evaluated calculate described in the preset standard content vector sum The instruction of similarity between vector to be evaluated, wherein the similarity calculation is used for according in the preset standard The cosine value for holding vector to be evaluated described in vector sum calculates the similarity, and makes the calculated similarity greater than 0 Numerical value；For generating and exporting commenting for the voice data to be evaluated according to the similarity and preset speech assessment rule Survey the instruction of result data.

10. computer storage medium according to claim 9, which is characterized in that described for according in preset standard Hold vector, the vector sum similarity calculation to be evaluated, calculates to be evaluated described in the preset standard content vector sum The instruction of similarity between vector, comprising: be used for using vector to be evaluated described in preset standard content vector sum described in The input of similarity calculation calculates the instruction of the similarity by the similarity calculation.