CN110136721A

CN110136721A - A kind of scoring generation method, device, storage medium and electronic equipment

Info

Publication number: CN110136721A
Application number: CN201910280448.6A
Authority: CN
Inventors: 张岱; 梁光; 谭星; 舒景辰; 王正博
Original assignee: Beijing Dami Technology Co Ltd
Current assignee: Beijing Dami Technology Co Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2019-08-16

Abstract

The application one or more embodiment discloses a kind of scoring generation method, device, storage medium and server, wherein, method includes: the inputted assessment voice set of acquisition, obtain the corresponding assessment text of the assessment voice set, the assessment text is compared with sample text, obtain the difference text information between the assessment text and the sample text, scoring processing is carried out to the assessment voice set based on the difference text information, generates the corresponding scoring of the assessment voice set.Using the application one or more embodiment, the accuracy of speech assessment can be improved.

Description

A kind of scoring generation method, device, storage medium and electronic equipment

Technical field

This application involves field of computer technology more particularly to a kind of scoring generation method, device, storage medium and electronics Equipment.

Background technique

With the development of the times, global integration trend is increasingly obvious, and more and more people wish to learn and grasp one Or several pure fluent foreign languages, in order to more easily exchange.During foreign language studying, spoken status is especially heavy It wants.

Currently, user mostly uses computer auxiliary language learning system study spoken, in computer-assisted language learning system Include spoken scoring unit in system, can determine whether the spoken language pronunciation of user is accurate by spoken language scoring unit.However, this The spoken scoring unit of kind can only score to the standard degree for the word that user has pronounced, once and user is long right in practice There is the case where omitting word and/or redundancy word during words or long article, just accurately the treatise can not be talked with Or long article scores, and thereby reduces the accuracy of speech assessment.

Summary of the invention

The embodiment of the present application provides a kind of scoring generation method, device, storage medium and electronic equipment, and language can be improved The accuracy of sound scoring.The technical solution is as follows:

In a first aspect, the embodiment of the present application provides a kind of scoring generation method, which comprises

Inputted assessment voice set is acquired, the corresponding assessment text of the assessment voice set is obtained；

The assessment text is compared with sample text, is obtained between the assessment text and the sample text Difference text information；

Scoring processing is carried out to the assessment voice set based on the difference text information, generates the assessment voice collection Close corresponding scoring；

Second aspect, the embodiment of the present application provide a kind of scoring generating means, and described device includes:

Text obtains module, and for acquiring inputted assessment voice set, it is corresponding to obtain the assessment voice set Assessment text；

Data obtaining module, for the assessment text to be compared with sample text, obtain the assessment text and Difference text information between the sample text；

Score generation module, for carrying out scoring processing to the assessment voice set based on the difference text information, Generate the corresponding scoring of the assessment voice set.

The third aspect, the embodiment of the present application provide a kind of computer storage medium, and the computer storage medium is stored with A plurality of instruction, described instruction are suitable for being loaded by processor and executing above-mentioned method and step.

Fourth aspect, the embodiment of the present application provide a kind of electronic equipment, it may include: processor and memory；Wherein, described Memory is stored with computer program, and the computer program is suitable for being loaded by the processor and being executed above-mentioned method step Suddenly.

The technical solution bring beneficial effect that some embodiments of the application provide includes at least:

In the application one or more embodiment, user terminal acquires inputted assessment voice set, described in acquisition The corresponding assessment text of voice set of testing and assessing, and the assessment text is compared with sample text, then obtain the survey The difference text information between text and the sample text is commented, finally based on the difference text information to the assessment voice Set carries out scoring processing, generates the corresponding scoring of the assessment voice set.By by acquired assessment text and sample Text compares to obtain difference text information, is then scored according to difference text information, so that it may realize defeated to user The voice assessment set entered is accurately scored, to improve the accuracy of speech assessment.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the scene framework schematic diagram that the scoring that one or more embodiments provide generates；

Fig. 2 is the flow diagram for the scoring generation method that one or more embodiments provide；

Fig. 3 is the interface schematic diagram for the scoring generation method that one or more embodiments provide；

Fig. 4 a is the schematic diagram at the user's checking interface for the scoring generation method that one or more embodiments provide；

Fig. 4 b is the schematic diagram at the test prompts interface for the scoring generation method that one or more embodiments provide；

Fig. 4 c is that the recording of one or more scoring generation methods provided by the embodiments of the present application starts the schematic diagram at interface；

Fig. 4 d is the schematic diagram of the acquisition speech interfaces for the scoring generation method that one or more embodiments provide；

Fig. 4 e is the schematic diagram at the submission assessment interface for the scoring generation method that one or more embodiments provide；

Fig. 4 f is the schematic diagram for submitting successfully interface for the scoring generation method that one or more embodiments provide；

Fig. 5 is the flow diagram for the scoring generation method that one or more embodiments provide；

Fig. 6 is the interface schematic diagram for the scoring generation method that one or more embodiments provide；

Fig. 7 is the interface schematic diagram for the scoring generation method that one or more embodiments provide；

Fig. 8 is the structural schematic diagram for the scoring generating means that one or more embodiments provide；

Fig. 9 is the structural schematic diagram of the set acquisition module in the scoring generating means that one or more embodiments provide；

Figure 10 is the structural schematic diagram for the electronic equipment that one or more embodiments provide.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.

In the description of the present application, it is to be understood that term " first ", " second " etc. are used for description purposes only, without It can be interpreted as indication or suggestion relative importance.In the description of the present application, it should be noted that unless otherwise specific regulation And restriction, " comprising " and " having " and their any deformations, it is intended that cover and non-exclusive include.Such as contain a system The process, method, system, product or equipment of column step or unit are not limited to listed step or unit, but optional Ground further includes the steps that not listing or unit, or optionally further comprising intrinsic for these process, methods, product or equipment Other step or units.For the ordinary skill in the art, above-mentioned term can be understood in the application with concrete condition In concrete meaning.In addition, unless otherwise indicated, " multiple " refer to two or more in the description of the present application." and/ Or ", the incidence relation of affiliated partner is described, indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individually depositing In A, A and B, these three situations of individualism B are existed simultaneously.It is a kind of "or" that character "/", which typicallys represent forward-backward correlation object, Relationship.

The application is described in detail below with reference to specific embodiment.

It referring to Figure 1, is a kind of configuration diagram for the generation system that scores provided by the embodiments of the present application.As shown in Figure 1, The scoring generation system may include user 110 and scoring generating means 120.The scoring generating means 120 can be electricity Sub- equipment, which includes but is not limited to: PC, tablet computer, handheld device, mobile unit, wearable device, Calculate equipment or the other processing equipments for being connected to radio modem etc..User terminal can be called in different networks Different titles, such as: user equipment, access terminal, subscriber unit, subscriber station, movement station, mobile station, remote station, long-range end End, mobile device, user terminal, terminal, wireless telecom equipment, user agent or user apparatus, cellular phone, wireless phone, Terminal device in personal digital assistant (personal digital assistant, PDA), 5G network or future evolution network Deng.Or has the server of scoring processing function.

For convenience, it is illustrated so that the generating means that score are user terminal as an example in the embodiment of the present application.

As shown in Figure 1, user 110 inputs assessment phonetic order to user terminal 120, user terminal 120 receives the survey After commenting phonetic order, user terminal 120 makes a response the assessment phonetic order of user 110, loads the sample pre-set This text, and the sample text is shown on a display screen.

User 110 starts input assessment voice according to sample text on display screen.

At this point, user terminal 120 can be used by built-in recording acquisition device or the acquisition of external recording acquisition device The assessment voice that family 110 inputs, audio collecting device can be one or more microphones (also referred to as microphone).In the number of microphone In the case that amount is multiple, multiple microphones can be distributed in different position composition microphone arrays, and user terminal passes through microphone battle array Column obtain each collected assessment voice set of microphone, and the collected assessment voice set in multiple channels is merged to obtain The assessment voice set of high-fidelity.

Optionally, in the case where audio collecting device is external, audio collecting device can be by preset length (such as Receiver J-Horner, USB interface or bluetooth 3.5mm) is by collected assessment voice real time transport to user terminal 120.User is whole End 120 saves assessment voice to assessment voice set.User terminal 120 can acquire the assessment voice of user 110 several times Set, then selects a final assessment voice set according to the selection instruction of user 110 from multiple assessment voice set.

Such as: the Foreigh-language oral-speech that user Xiao Ming wants test oneself is horizontal, and user Xiao Ming opens the survey of mobile phone terminal at this time Voice software is commented, assessment voice request is issued by clicking assessment talk button in assessment speech interfaces, at this point, mobile phone terminal is rung It should show that on a display screen sample text and corresponding prompting message, mobile phone terminal are built-in with 2 in the assessment voice request of user A microphone, is respectively distributed to the bottom and top of mobile phone terminal, and mobile phone terminal acquires the survey of user Xiao Ming by 2 microphones Comment sound set, is filtered the assessment voice set acquired on two microphone acquisition channels and the processes such as noise reduction obtain later To the assessment voice set of high-fidelity, and preserve.

User terminal 120 obtains the different information between the assessment text and the sample text.

Specifically, user terminal 120 obtains the current assessment voice in assessment voice set in received pronunciation information bank, The corresponding assessment voice curve of the current assessment voice is obtained, the assessment voice curve and received pronunciation curve are then obtained The similarity set of each received pronunciation curve in set, then obtain the similarity maximum value in the similarity set, Yi Jisuo State the target criteria voice curve of similarity maximum value instruction.At this point, by the corresponding target mark of the target criteria voice curve Quasi- voice is determined as the corresponding received pronunciation of the current assessment voice.

User terminal 120 continues the next assessment for obtaining current assessment voice after getting current assessment voice Voice, and next assessment voice is determined as voice of currently testing and assessing, then execute and obtain the corresponding survey of the current assessment voice Comment sound curve obtains the similarity set of each received pronunciation curve in the assessment voice curve and received pronunciation collection of curves The step of.

If user terminal 120 is detected there is no when next assessment voice, the standard comprising the received pronunciation is generated The corresponding received pronunciation group of each assessment voice is combined into standard speech based on the sequencing of each assessment voice by voice set Sound set, and the corresponding relationship based on received pronunciation information and text information determine the corresponding assessment of the received pronunciation set Text.

Wherein, the received pronunciation refers to the voice pre-set, these phonetic storages are in received pronunciation information bank In, the voice messaging can be pitch, loudness of a sound, the duration of a sound, tone color of voice etc..The quasi- voice messaging institute of text information index Corresponding text.It is emphasized that text information herein is corresponding with received pronunciation information, it can be understood as, in standard What is stored in sound bank is received pronunciation information and the corresponding text information of each received pronunciation information.Usually have Completely, the word, sentence, paragraph etc. of system meaning.Such as: text information corresponding to " Today " voice is written word " Today ", text information corresponding to " good " voice are written word " good ", corresponding to " good day " voice Text information is written vocabulary " good day ", and text information corresponding to " Today is a good day " voice is For written sentence " Today is a good day ", etc..

The assessment text is compared with sample text again for user terminal 120, obtain the assessment text with it is described Difference text information between sample text.

The difference text information includes the category of difference amount of text, difference content of text and the difference content of text Property (importance, etc. of such as part of speech of difference word, sentence or paragraph).

Wherein, the difference content of text may include test and assess text in sample text word, sentence or paragraph it is different The text of cause, the text may include certain words, sentence or paragraph in sample text, can also include that user inputs assessment When voice set, word, sentence or the paragraph of addition.

Difference text information is input in Rating Model and carries out according to preset code of points by user terminal 120 Scoring processing, the corresponding scoring of output assessment voice set, then user terminal 120 is shown to user 110 comprising this scoring Scoring report.

In one or more embodiments, user terminal acquires inputted assessment voice set, obtains the assessment language The assessment text is compared sound set corresponding assessment text with sample text, obtain the assessment text with it is described Difference text information between sample text is based on the difference text information and carries out scoring processing to the assessment voice set, Generate the corresponding scoring of the assessment voice set.By comparing acquired assessment text and sample text to obtain difference Then different text information scores according to the difference text information, so that it may realize the voice assessment collection inputted to user Conjunction is accurately scored, to improve the accuracy of speech assessment.

Below in conjunction with attached drawing 2- attached drawing 7, describe in detail to scoring generation method provided by the embodiments of the present application.Its In, the scoring generating means in the embodiment of the present application can be user terminal shown in FIG. 1.

Fig. 2 is referred to, provides a kind of flow diagram of generation method that scores for the embodiment of the present application.This method can be according to Rely and realized in computer program, be can run in the scoring generating means based on von Neumann system.The computer program can collect It is run in the application, also can be used as independent tool-class application.Scoring generating means in the embodiment of the present application can be Fig. 1 Or user terminal described in above-described embodiment.

Specifically, the scoring generation method includes:

S101: the inputted assessment voice set of acquisition obtains the corresponding assessment text of the assessment voice set.

Wherein, the sample text is the referenced text that user terminal provides, which can be is made of character/word Sentence, paragraph, article etc..When voice is tested and assessed in input, the sample text that user provides according to user terminal is inputted pair The assessment voice set answered.

For example, user terminal shows sample text in screen: Today is a good day, user show according to screen Sample text input " Today is a good day " corresponding assessment voice set, user terminal acquires user institute in real time The assessment voice set of input.

Specifically, assessment voice set refers to the assessment voice for the sample text input that user provides according to user terminal, It may include at least one assessment voice in the set, each assessment voice can be understood as an assessment word or vocabulary.

For example, assessment voice collection be combined into " Today is a good day ", " Today ", " is ", " a ", " good ", Pronunciation corresponding to " day " is each assessment voice in the set.Such as: the corresponding assessment word of assessment voice or Word, the assessment text include multiple assessment words or word.

Specifically, assessment phonetic order of the user terminal in response to user, shows sample text in screen and prompts user By microphone input assessment voice, user's sample for reference text completes the input of assessment voice.

Optionally, the assessment phonetic order of user's input can be can be led to by external equipment completion, such as user Cross connection user terminal mouse choose user terminal display interface assessment talk button input assessment voice request, can be with It is that user is carried out by the keyboard or touch tablet input command adapted thereto of connection user terminal, it is defeated by voice can be user Enter the instruction for carrying out assessment voice, can be user by camera and acquire the behaviour that assessment phonetic order is completed in gesture control instruction Make, can also be and assessment talk button etc. is chosen by touch-control user terminal screen.It should be noted that completing the mode of operation There are many, it is not especially limited herein.

Optionally, user terminal acquires the assessment voice set of user's input by microphone, and the microphone can be Built-in or external one or more can design microphone when number of microphone is multiple according to actual needs Placement location, modes of emplacement can be different angle placement, to collect more good assessment voice set, Jin Ersheng At good assessment voice set, when user terminal is after testing and assessing Speech time or user triggers and submits voice assessment instruction Afterwards, collected assessment voice set is saved.

Optionally, during acquiring user's assessment voice set, for user terminal, user terminal passes through wheat Gram elegance collection assessment voice set has an efficient voice acquisition distance range, and the efficient voice acquisition distance range refers to The collected voice of institute can be identified user in the range.

In a kind of possible embodiment, the assessment voice set real-time monitoring that user terminal inputs user, judgement Whether user is within efficient voice acquisition distance range, when user is when except efficient voice acquisition range, user terminal Then show that prompt information, the prompt information are used to indicate the relative distance of user's adjustment and user terminal.

For example, the microphone acquisition efficient voice range of user terminal is 0-30cm, user is from user terminal microphone The position input assessment voice set of 35cm, user terminal real-time monitor user distance too far, do not acquire model in efficient voice Within enclosing, user terminal shows that the text of " hypertelorism please adjust at a distance from microphone " as shown in Figure 3 mentions on a display screen Show information, prompt user's adjustment at a distance from microphone, user at this time can be by furthering at a distance from microphone to acquire To more good assessment voice set.

Optionally, user terminal can acquire the assessment voice set of user several times, then refer to according to the user's choice It enables and selects a final assessment voice set from multiple assessment voice set.

During a kind of feasible realization, the display of user terminal display interface can be with reference to as shown in Fig. 4 a- Fig. 4 f Method wherein verifying interface comprising user images as shown in fig. 4 a on the surface include face-image preview region and mentioning Show the graphical interfaces of information, which can be used for suggestion voice assessment person and input face using the front camera of mobile phone Image, and face-image is placed in face-image preview region, user terminal detects currently in face-image preview region When face-image and preset voice assessment person face-image match, i.e. subscriber authentication success, and then trigger in next step The step of acquisition assessment voice set.

Further, user terminal display interface can also be described comprising prompting interface before the test as shown in Fig. 4 b Prompting interface includes the relevant information and test ACK button of this tone testing, such as testing time, testing process, test tool The points for attention etc. of body, the user terminal current page detect on ACK button when clicking touch action, such as Fig. 4 c Shown, display starts the graphical interfaces of prompt information and voice assessment start button comprising sample text, test, and the voice is surveyed Commenting start button is a control on graphical interfaces, and the operation that the assessment voice for triggering user terminal formally starts is used The corresponding assessment voice set of family information input according to sample text.

Further, user inputs assessment voice set by user terminal, and during input, user terminal can be with The content of pages such as Fig. 4 d is shown by display interface, and content of pages includes the total time of current assessment input, when shown total Between can be convenient user and rationally control time schedule.

In a kind of possible embodiment, the graphical interfaces that user terminal is shown includes the completing button of voice assessment, When user terminal detects the trigger action in completing button, collected assessment voice set is saved, user terminal is facilitated Processor processing.Wherein, instruction is completed in user's false triggering in order to prevent, and the high trigger action of complexity can be set, such as: In the case where being not over the time of testing and assessing, user wants to terminate in advance assessment, if user continuously clicks three times and mentions in 3 seconds It hands over button that can just be successfully generated submission instruction, can also be the process that setting is submitted to avoid maloperation, can refer to Fig. 4 d, user It clicks completing button and submits voice assessment set, triggering user terminal shows the submission confirmation letter such as Fig. 4 e in the display interface Breath, when user wants to submit current speech assessment set, clicking confirmation can be submitted, and trigger user terminal at this time on display circle Face display assessment waits interface, can refer to Fig. 4 f, and shown interface includes the information and prompt information of this voice assessment set, example It such as tests and assesses the used time.

Optionally, user terminal is during the voice for acquiring user's input tests and assesses set, the voice of acquisition can because The quality that the disturbing factors such as ambient noise, echo influence acquisition voice can acquire array to by microphone in actual implementation The voice of acquisition is pre-processed, and the pretreatment includes end-point detection, noise reduction, Wave beam forming, by pretreated voice It carries out post-filtering and eliminates remaining voice noise, the speech energy of acquisition is then adjusted by automatic gain algorithm, is finally used Family terminal saves the voice set after processing, carries out language by assessment voice set of the user terminal processes device to preservation Sound identification is converted into corresponding assessment text.

S102: the assessment text is compared with sample text, obtains the assessment text and the sample text Between difference text information.

Wherein, the difference text refers to word, sentence or the inconsistent text of paragraph in assessment text and sample text This, it can also include that user inputs assessment voice collection that the text, which may include certain words, sentence or paragraph in sample text, When conjunction, word, sentence or the paragraph of addition.The difference text information includes difference amount of text, difference content of text and institute State the attribute (importance, etc. of such as part of speech of difference word, sentence or paragraph) of difference content of text.

Specifically, the assessment text of conversion is compared user terminal with sample text, it will not during comparison The information of consistent difference text is recorded, and is based on the difference text information to the assessment voice collection convenient for user terminal Conjunction carries out scoring processing, generates the corresponding scoring of the assessment voice set.

For example, user terminal is by carrying out the assessment after speech recognition is converted to the tested speech set of acquisition Text.

Above-mentioned assessment text includes e.g. information below:

Best of times it was the of times, it was the worst of times, it was the age of wisdom,it was the age of foolishness.

Sample text includes e.g. information below:

Best of times it was the best of times, it was the worst of times, it was the age of wisdom,it was the age of foolishness.

By comparing, difference text is obtained are as follows: miss " best ", redundancy " the ".

Optionally, user terminal can add attribute-bit to difference text, and the attribute-bit is used to record difference text This attribute, such as position of the difference text in sample text, part of speech, the difference type of difference text etc..Specific addition belongs to Property identification method can be currently according to it is scheduled rule generate one group of binary system random number series, can also be according to scheduled The one group long integer character constant string that rule generates, is not especially limited, herein following for the convenience that embodiment illustrates, with generation One group of binary code indicates.

One of feasible implementation is that each difference text has an attribute-bit, and each attribute-bit uses One group of binary code indicates, here for 10.The front two of 8 binary codes indicates omission and redundancy, specific: 01 (loses Leakage), 10 (redundancies)；3rd and the 4th expression difference type, specific 01 (word), 10 (sentences), 11 (paragraphs), the 5th to the 8th Position indicates the specific location parameter of difference text.

For example, as shown in table 1, above-mentioned difference text information can be to be recorded by method shown in table 1.

Table 1

Difference text	Attribute-bit
		best	0101000111
the	1001001100

S103: scoring processing is carried out to the assessment voice set based on the difference text information, generates the assessment The corresponding scoring of voice set.

Specifically, difference text information can be input in Rating Model by user terminal according to preset code of points Scoring processing, the corresponding scoring of output assessment voice set are carried out, user terminal generates the scoring report comprising this assessment scoring It accuses and is shown in display interface, the test and evaluation report includes the word and voice category of omission or redundancy in user's assessment voice process Property information, voice attributes information includes but is not limited to: word speed information, prosody information and tone color.Further, voice test and evaluation report It can also include the pronunciation of received pronunciation and the evaluation of this voice.

Optionally, the Rating Model, which can be, trains to come using a large amount of test sample, as Rating Model can be with It is based on convolutional neural networks (Convolutional Neural Network, CNN) model, deep neural network (Deep Neural Network, DNN) model, Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNN), model, insertion (embedding) model, gradient promote decision tree (Gradient Boosting Decision Tree, GBDT) model, logic Return what at least one of (Logistic Regression, LR) model was realized, based on the sample data marked to commenting Sub-model is trained, available trained Rating Model.

Fig. 5 is referred to, Fig. 5 is a kind of process signal of another embodiment for scoring generation method that the application proposes Figure.It is specific:

S201: the inputted assessment voice set of acquisition obtains the assessment voice set in received pronunciation information bank In current assessment voice.

Specific acquisition method please refers to step S101, and details are not described herein again.

Specifically, the voice refers to the sound of language, it is the carrier of linguistic notation system, the survey of user terminal acquisition Comment sound is actually a kind of signal wave, and user terminal is needed when starting to identify the assessment voice set to adopting The signal wave of collection is pre-processed, while carrying out framing to signal wave, and voice has reformed into many segments at this time, then to voice Signal waveform make time domain transformation, it is to extract mel-frequency cepstrum feature (Mel that the time domain, which converts common method, Frequency Cepstral Coefficents, MFCC) characteristic information, according to the physiological property of human ear, each frame signal Waveform becomes a multi-C vector, can simply be interpreted as this vector and contain the content information of this frame voice, that is, include The content information of assessment voice.It should be noted that the method for extracting special type information has MFCC incessantly, this is a kind of, and process is above-mentioned After process, collected assessment voice is just at a multi-C vector matrix.

The set being made of on the assessment voice collective entity the pronunciation of multiple words, the pronunciation of word are made of phoneme, For this language of English, a kind of common phone set is made of 39 phonemes, and a phoneme is usually divided into 3 shapes State, the state refer to phonetic unit, after carrying out framing to signal wave, the corresponding state of several frames, and every 3 shapes As soon as state just constitutes a phoneme, several phonemes constitute a word.

The received pronunciation information bank is the information bank for storing word pronunciation, and user terminal collects assessment voice set, and The corresponding received pronunciation of first word pronunciation in the assessment voice set is obtained in received pronunciation information bank, will acquire Received pronunciation is as current assessment voice.

S202: the corresponding assessment voice curve of the current assessment voice is obtained, the assessment voice curve and mark are obtained The similarity set of each received pronunciation curve in quasi- voice collection of curves.

Specifically, voice messaging includes voice curve, by above-mentioned standard voice messaging library can acoustic model and Language model connects, for example, a sentence is segmented into several words and is connected, each word for English The aligned phoneme sequence of pronunciation of words is corresponded in received pronunciation information bank, the aligned phoneme sequence is become by standard voice signals wavelength-division frame It alternatively obtains afterwards, the voice signal wave is commonly referred to as voice curve.

User terminal collects assessment voice set, and obtains in the assessment voice set in received pronunciation information bank Current assessment voice, it is specific to obtain the corresponding assessment voice curve of current assessment voice, then in received pronunciation collection of curves Middle lookup has the received pronunciation curve of similarity with assessment voice curve, and the similarity and received pronunciation curve, which correspond to, to be saved Into similarity set.

For example, the collected assessment voice set of user terminal, corresponds to 5 aligned phoneme sequences by above-mentioned processing, it is each A aligned phoneme sequence corresponds to one section of assessment voice curve, it is assumed that existing assessment voice curve A, B, C, D, E, user terminal obtain first Assessment voice curve A is taken, then in the set for storing received pronunciation curve, searches mark similar with assessment voice curve A Quasi- voice curve.

By searching for 3 voice curves with similarity are had found, corresponding similarity relationship can see the table below user terminal 2。

Table 2

Title	Similarity	State
			Received pronunciation curve 1	80	Normally
Received pronunciation curve 2	60	Normally
			Received pronunciation curve 3	40	Normally

State described in table is used to show whether the voice curve with similarity can be got by terminal.

S203: obtaining the similarity maximum value in the similarity set, obtains the mesh of the similarity maximum value instruction Received pronunciation curve is marked, the corresponding target criteria voice of the target criteria voice curve is determined as the current assessment voice Corresponding received pronunciation.

Specifically, the similarity value of a plurality of received pronunciation curve is shown in the similarity set, for the above table 2, There are 3 received pronunciation curves, corresponding similarity 80,60,40 in similarity set, state is normal, user terminal The available received pronunciation curve into table, user terminal traverse similarity set first, it is right to find out similarity maximum value 80 The title answered is received pronunciation curve 1, and the received pronunciation curve 1 is then found in received pronunciation collection of curves, and user is whole End regard received pronunciation curve 1 as target criteria voice curve, at this point, the corresponding voice of target criteria voice curve is described The current corresponding received pronunciation of voice of testing and assessing.

Specifically, current assessment voice often has the interference such as noise in collection process, after executing above-mentioned steps, just Current assessment voice can be converted into received pronunciation.

S204: next assessment voice of the current assessment voice is obtained, next assessment voice is determined as Current assessment voice, and execute it is described obtain the corresponding assessment voice curve of the current assessment voice, obtain the assessment language In sound curve and received pronunciation collection of curves the step of the similarity set of each received pronunciation curve.

Optionally, user terminal obtains next assessment after the received pronunciation curve for finding current assessment voice Voice, for described above, user terminal is looked for find the corresponding received pronunciation of assessment voice curve A after, by next survey Comment sound is as current assessment voice, particularly currently by searching for the similar standard of assessment voice curve B of assessment voice Voice curve, and then the corresponding received pronunciation of received pronunciation curve is got, the specific finding step can refer to S202, herein It repeats no more.

S205: when detecting there is no when next assessment voice, the received pronunciation collection comprising the received pronunciation is generated It closes.

Specifically, when detecting that all assessment voices have all been searched, that is, next assessment voice is not present in user terminal, User terminal saves all received pronunciations got into received pronunciation set at this time.

For above-mentioned, user terminal searched assessment voice curve A, B, C, D, E after, will acquire corresponding 5 A received pronunciation is corresponding to be saved to received pronunciation set.

S206: the sequencing based on each assessment voice, by the corresponding received pronunciation of each assessment voice Group is combined into received pronunciation set.

Optionally, user terminal to each assessment voice curve be equipped with priority parameters, the priority parameters can be from Arrive greatly it is small, from low to high etc., it is assumed that have assessment voice A, B, C, D, E, priority etc. of the user terminal to each assessment voice Grade is A > B > C > D > E, gets the corresponding received pronunciation of each assessment voice through the above steps, corresponding relationship can refer to Table 3.

Table 3

Assessment voice	Received pronunciation
		A	001
B	002
		C	004
D	003
		E	005

Optionally, user terminal according to assessment voice priority level: A > B > C > D > E, each assessment voice is right respectively 001,002,004,003,005 group of the received pronunciation answered is combined into received pronunciation set, please examine table 4.

Table 4

Received pronunciation 001

Received pronunciation 002

Received pronunciation 004

Received pronunciation 003

Received pronunciation 005

Table 4 is combined received pronunciation set.

S207: the corresponding relationship based on received pronunciation information and text information determines that the received pronunciation set is corresponding Assessment text.

Specifically, text information refers to the written representation of language, usually have one complete, system meaning The combination of sentence or multiple sentences.The text information can be a word, a sentence, a section by taking English language as an example It falls, the text information can be the practice form of language, usually refer to some spoken and written languages information in specific implementation.It needs It is emphasized that text information herein is corresponding with received pronunciation information, it can be understood as, it is stored in received pronunciation library It is received pronunciation information and the corresponding text information of each received pronunciation information.Usually have complete, system meaning Word, sentence, paragraph etc..Such as: text information corresponding to " he " voice is written word " he ", " likes " voice institute Corresponding text information is written word " likes ", and text information corresponding to " middle school " voice is book The vocabulary " middle school " in face, etc..

Specifically, the received pronunciation refers to the voice pre-set, these phonetic storages are in received pronunciation information In library, the voice messaging can be pitch, loudness of a sound, the duration of a sound, tone color of voice etc., and pitch refers to frequency of sound wave, i.e. each second The number of vibration number；Loudness of a sound refers to the size of sonic wave amplitude；The duration of a sound refers to the length of acoustic vibration duration, also referred to as duration； Tone color refers to the characteristic and essence of sound, also referred to as sound quality.

Optionally, user terminal is corresponding from determining received pronunciation with the corresponding relationship of text information based on received pronunciation information Assessment text, can refer to table 5, table 5 is a kind of corresponding relationship of received pronunciation information and text information.

Table 5

Received pronunciation information	Text information	Text type
			Received pronunciation 001	he	Subject
Received pronunciation 002	likes	Verb
			Received pronunciation 003	riding	Verb
Received pronunciation 004	a	Preposition
			Received pronunciation 005	bike	Noun

The corresponding assessment text of the received pronunciation set is assured that by the corresponding relationship of upper table are as follows: He likes riding a bike.

S208: the assessment text is compared with sample text, obtains the assessment text and the sample text Between difference text information, the difference text information includes difference amount of text, difference content of text and the difference The attribute of content of text.

It, can be with when user terminal receives assessment voice request specifically, the sample text is the preset text of terminal It is the sample text that user terminal generates at random from preset sample text library, can also be as shown in fig. 6, user is logical A sample text of touch-control user terminal screen selection is crossed, the sample text is corresponding with sample voice curve.

For example, the sample text of user's selection are as follows:

I have a good friend.She is a pretty girl.She lives in Jiujiang.She is a middle school student.She has big eyes,a small mouth,a small nose and a round face.She is tall and thin.She likes watching TV and playing the basketball.On the weekend,she always plays basketball with her friends in the afternoon and watches TV in the evening.

The assessment text are as follows:

I have a good good friend.She is a pretty girl.She lives in Jiujiang.She is a middle student.She has big eyes,a small mouth,a nose and a round face.She is tall and thin.She likes watching TV and playing the basket.On the weekend,she always always plays basketball with her friends in the afternoon and watches TV in the evening.

User terminal generates difference text information, the difference text information display form can be 6 institute of table by comparing The representation method shown.

Table 6

S209: scoring processing is carried out to the assessment voice set based on the difference text information, generates the assessment The corresponding scoring of voice set.

Specifically, difference text information can be input in Rating Model by user terminal according to preset code of points Scoring processing, the corresponding scoring of output assessment voice set are carried out, terminal generates the scoring report comprising this assessment scoring simultaneously It is shown in display interface, the test and evaluation report includes omission, error, the word of redundancy and voice category in user's assessment voice process Property information, voice attributes information includes but is not limited to: word speed information, prosody information and tone color.Further, voice test and evaluation report It can also include the pronunciation of received pronunciation.

Optionally, the preset code of points of user terminal can be, and setting total score 100 is divided, setting deduction of points radix and difference Type deduction of points coefficient and difference text part of speech deduction of points coefficient.

Such as: setting deduction of points radix is 1 point, and difference deduction of points type has 3 kinds, and the coefficient that redundancy type is respectively set is 2.0, The deduction of points coefficient for omitting type is 3.0, and the deduction of points type for the type that malfunctions is 4.0, and difference text part of speech deduction of points coefficient can be set Are as follows: noun 1.0, verb 2.0, adjective 2.0, interjection 1.5, adverbial word 2.0 etc. can calculate most according to difference text information Whole score:

Final score=100-1*2.0*2.0-1.0*1.0*3.0-1.0*2.0*3.0-1.0*1.0*4.0-1.0*2. 0* 2.0=79

Optionally, the user terminal can test and assess according to this voice and score, and obtain corresponding tone testing evaluation, use Family terminal generates the scoring comprising this assessment scoring and evaluation and reports and show in display interface, as shown in fig. 7, the assessment Report may include the word and voice attributes information of omission or redundancy in user's assessment voice process, and voice attributes information includes But it is not limited to: word speed information, prosody information and tone color.

In one or more embodiments, user terminal acquires inputted assessment voice set, in received pronunciation information The current assessment voice in the assessment voice set is obtained in library, and obtains the corresponding assessment voice of the current assessment voice Curve, then the assessment voice curve is compared with received pronunciation curve each in received pronunciation collection of curves, obtain similarity Then set obtains the target criteria voice curve of the similarity maximum value instruction in the similarity set, by the target mark The corresponding target criteria voice of quasi- voice curve is determined as the corresponding received pronunciation of the current assessment voice, and according to identical Mode generates the corresponding received pronunciation of other assessment voices in assessment voice set, by the corresponding received pronunciation of each assessment voice into It composes a piece of writing this conversion and obtains assessment text, score by comparing text and sample text is tested and assessed voice set of testing and assessing. By comparing acquired assessment text and sample text to obtain difference text information, then according to the difference text Information scores, so that it may realize and accurately be scored the voice assessment set that user inputs, be commented to improve voice The accuracy divided.

Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application device Undisclosed details in example is applied, the application embodiment of the method is please referred to.

Fig. 8 is referred to, it illustrates the structural representations for the scoring generating means that one exemplary embodiment of the application provides Figure.The scoring generating means can by software, hardware or both be implemented in combination with as terminal all or part of.It should Device 1 includes that text obtains module 11, data obtaining module 12 and scoring generation module 13.

Text obtains module 11, and for acquiring inputted assessment voice set, it is corresponding to obtain the assessment voice set Assessment text；

Data obtaining module 12 obtains the assessment text for the assessment text to be compared with sample text Difference text information between the sample text；

Score generation module 13, for being carried out at scoring based on the difference text information to the assessment voice set Reason generates the corresponding scoring of the assessment voice set.

Optionally, as shown in figure 9, set acquisition module 11 can specifically include:

Voice set acquiring unit 110, for obtaining the corresponding received pronunciation set of the assessment voice set；

Test and assess text determination unit 111, for the corresponding relationship based on received pronunciation information and text information, determine described in The corresponding assessment text of received pronunciation set.

Optionally, the voice set acquiring unit 110, is specifically used for:

The corresponding received pronunciation of voice of respectively testing and assessing in the assessment voice set is obtained in received pronunciation information bank；

Based on the sequencing of each assessment voice, the corresponding received pronunciation group of each assessment voice is combined into Received pronunciation set.

Optionally, the voice set acquiring unit 110, is specifically used for:

The current assessment voice in the assessment voice set is obtained in received pronunciation information bank；

The corresponding assessment voice curve of the current assessment voice is obtained, the assessment voice curve and received pronunciation are obtained The similarity set of each received pronunciation curve in collection of curves；

The corresponding received pronunciation of the current assessment voice is determined based on the similarity set；

The next assessment voice for obtaining the current assessment voice, next assessment voice is determined as currently surveying Comment sound, and execute and described obtain the corresponding assessment voice curve of the current assessment voice, the acquisition assessment voice curve The step of with the similarity set of received pronunciation curve each in received pronunciation collection of curves；

When detecting there is no when next assessment voice, the received pronunciation set comprising the received pronunciation is generated.

Optionally, the voice set acquiring unit 110, is specifically used for:

Obtain the similarity maximum value in the similarity set；

The target criteria voice curve of the similarity maximum value instruction is obtained, the target criteria voice curve is corresponding Target criteria voice be determined as the corresponding received pronunciation of the current assessment voice.

Optionally, described device 1, which is characterized in that the difference text information includes difference amount of text, difference text The attribute of content and the difference content of text.

It should be noted that scoring generating means provided by the above embodiment execute score generation method when, only more than The division progress of each functional module is stated for example, can according to need and in practical application by above-mentioned function distribution by difference Functional module complete, i.e., the internal structure of equipment is divided into different functional modules, with complete it is described above whole or Person's partial function.In addition, scoring generating means provided by the above embodiment and scoring generation method embodiment belong to same design, It embodies realization process and is detailed in embodiment of the method, and which is not described herein again.

Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

The embodiment of the present application also provides a kind of computer storage medium, the computer storage medium can store more Item instruction, described instruction are suitable for being loaded by processor and being executed the method and step such as above-mentioned Fig. 1-embodiment illustrated in fig. 7, specifically hold Row process may refer to Fig. 1-embodiment illustrated in fig. 7 and illustrate, herein without repeating.

Present invention also provides a kind of computer program product, which is stored at least one instruction, At least one instruction is loaded as the processor and is executed to realize scoring generation method described in as above each embodiment.

Referring to Figure 10, the structural schematic diagram of a kind of electronic equipment is provided for the embodiment of the present application.As shown in Figure 10, institute Stating server 1000 may include: at least one processor 1001, at least one network interface 1004, and user interface 1003 is deposited Reservoir 1005, at least one communication bus 1002.

Wherein, communication bus 1002 is for realizing the connection communication between these components.

Wherein, user interface 1003 may include display screen (Display), camera (Camera), optional user interface 1003 can also include standard wireline interface and wireless interface.

Wherein, network interface 1004 optionally may include standard wireline interface and wireless interface (such as WI-FI interface).

Wherein, processor 1001 may include one or more processing core.Processor 1001 using it is various excuse and Various pieces in the entire server 1000 of connection, by running or executing the instruction being stored in memory 1005, journey Sequence, code set or instruction set, and call the data that are stored in memory 1005, the various functions of execute server 1000 and Handle data.Optionally, processor 1001 can using Digital Signal Processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array At least one of (Programmable Logic Array, PLA) example, in hardware is realized.Processor 1001 can integrating central Processor (Central Processing Unit, CPU), image processor (Graphics Processing Unit, GPU) With the combination of one or more of modem etc..Wherein, the main processing operation system of CPU, user interface and apply journey Sequence etc.；GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen；Modem is for handling channel radio Letter.It is understood that above-mentioned modem can not also be integrated into processor 1001, carried out separately through chip piece It realizes.

Wherein, memory 1005 may include random access memory (Random Access Memory, RAM), also can wrap Include read-only memory (Read-Only Memory).Optionally, which includes non-transient computer-readable medium (non-transitory computer-readable storage medium).Memory 1005 can be used for store instruction, journey Sequence, code, code set or instruction set.Memory 1005 may include storing program area and storage data area, wherein storing program area Can store the instruction for realizing operating system, the instruction at least one function (such as touch function, sound play function Energy, image player function etc.), for realizing instruction of above-mentioned each embodiment of the method etc.；Storage data area can store each above The data etc. being related in a embodiment of the method.Memory 1005 optionally can also be that at least one is located remotely from aforementioned processing The storage device of device 1001.As shown in Figure 10, as may include in a kind of memory 1005 of computer storage medium operation System, network communication module, Subscriber Interface Module SIM and scoring generate application program.

In server 1000 shown in Fig. 10, user interface 1003 is mainly used for providing the interface of input for user, obtains Take the data of family input；And processor 1001 can be used for that the scoring stored in memory 1005 is called to generate application program, And specifically execute following operation:

Scoring processing is carried out to the assessment voice set based on the difference text information, generates the assessment voice collection Close corresponding scoring.

In one embodiment, the processor 1001 is executing the corresponding assessment text of the acquisition assessment voice set When, it is specific to execute following operation:

Obtain the corresponding received pronunciation set of the assessment voice set；

Corresponding relationship based on received pronunciation information and text information determines the corresponding assessment text of the received pronunciation set This.

In one embodiment, the processor 1001 is executing the corresponding received pronunciation of the acquisition assessment voice set Set is specific to execute following operation:

In one embodiment, the processor 1001 obtains the assessment voice in execution in received pronunciation information bank It respectively tests and assesses in set the corresponding received pronunciation of voice, specific to execute following operation:

In one embodiment, the processor 1001 determines the current assessment based on the similarity set in execution The corresponding received pronunciation of voice, specific to execute following operation:

Obtain the similarity maximum value in the similarity set；

In one embodiment, the processor 1001 is when executing aforesaid operations, it is characterised in that the difference text Information includes the attribute of difference amount of text, difference content of text and the difference content of text.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory or random access memory etc..

Above disclosed is only the application preferred embodiment, cannot limit the right model of the application with this certainly It encloses, therefore according to equivalent variations made by the claim of this application, still belongs to the range that the application is covered.

Claims

1. a kind of scoring generation method, which is characterized in that the described method includes:

The assessment text is compared with sample text, obtains the difference between the assessment text and the sample text Text information；

Scoring processing is carried out to the assessment voice set based on the difference text information, generates the assessment voice set pair The scoring answered.

2. the method according to claim 1, wherein described obtain the corresponding assessment text of the assessment voice set This, comprising:

Corresponding relationship based on received pronunciation information and text information determines the corresponding assessment text of the received pronunciation set.

3. according to the method described in claim 2, it is characterized in that, described obtain the corresponding standard speech of the assessment voice set Sound set, comprising:

Based on the sequencing of each assessment voice, the corresponding received pronunciation group of each assessment voice is combined into standard Voice set.

4. according to the method described in claim 3, it is characterized in that, described obtain the assessment language in received pronunciation information bank It respectively tests and assesses in sound set the corresponding received pronunciation of voice, comprising:

The corresponding assessment voice curve of the current assessment voice is obtained, the assessment voice curve and received pronunciation curve are obtained The similarity set of each received pronunciation curve in set；

The next assessment voice for obtaining the current assessment voice, is determined as language of currently testing and assessing for next assessment voice Sound, and execute it is described obtain the corresponding assessment voice curve of the current assessment voice, obtain the assessment voice curve and mark In quasi- voice collection of curves the step of the similarity set of each received pronunciation curve；

5. according to the method described in claim 4, it is characterized in that, described determine the current survey based on the similarity set The corresponding received pronunciation of comment sound, comprising:

Obtain the similarity maximum value in the similarity set；

The target criteria voice curve for obtaining the similarity maximum value instruction, by the corresponding mesh of the target criteria voice curve Mark received pronunciation is determined as the corresponding received pronunciation of the current assessment voice.

6. the method according to claim 1, wherein the difference text information includes difference amount of text, difference The attribute of different content of text and the difference content of text.

7. a kind of scoring generating means, which is characterized in that described device includes:

Text obtains module, for acquiring inputted assessment voice set, obtains the corresponding assessment of the assessment voice set Text；

Data obtaining module, for the assessment text to be compared with sample text, obtain the assessment text with it is described Difference text information between sample text；

Score generation module, for carrying out scoring processing to the assessment voice set based on the difference text information, generates The corresponding scoring of the assessment voice set.

8. device according to claim 7, which is characterized in that the set obtains module, comprising:

Voice set acquiring unit, for obtaining the corresponding received pronunciation set of the assessment voice set；

Text determination unit of testing and assessing determines the standard speech for the corresponding relationship based on received pronunciation information and text information The corresponding assessment text of sound set.

9. device according to claim 8, which is characterized in that the voice set acquiring unit is specifically used for:

10. device according to claim 9, which is characterized in that the voice set acquiring unit is specifically used for:

11. device according to claim 10, which is characterized in that the voice set acquiring unit is specifically used for:

Obtain the similarity maximum value in the similarity set；

12. device according to claim 7, which is characterized in that the difference text information includes difference amount of text, difference The attribute of different content of text and the difference content of text.

13. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with a plurality of instruction, the finger It enables and is suitable for being loaded by processor and being executed the method and step such as claim 1~6 any one.

14. a kind of electronic equipment is it is characterised by comprising: processor and memory；Wherein, the memory is stored with computer Program, the computer program are suitable for being loaded by the processor and being executed the method step such as claim 1~6 any one Suddenly.