CN110148413A

CN110148413A - Speech evaluating method and relevant apparatus

Info

Publication number: CN110148413A
Application number: CN201910422699.3A
Authority: CN
Inventors: 刘丹; 刘俊华; 刘晨璇; 魏思; 王智国; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2019-08-20
Anticipated expiration: 2039-05-21
Also published as: CN110148413B

Abstract

The embodiment of the present application discloses a kind of speech evaluating method and relevant apparatus, and method includes: the first voice obtained under the first Job evaluation mode as evaluating standard, and the second voice that acquisition is to be evaluated；It handles the first voice and obtains the first text, and the second voice of processing obtains the second text；Obtain the corresponding first text detection strategy of the first Job evaluation mode；The first text and the second text are handled according to the first text detection strategy, obtains the testing result for the second voice.The application is conducive to improve equipment and carries out the flexibility of speech evaluating and comprehensive.

Description

Speech evaluating method and relevant apparatus

Technical field

This application involves technical field of electronic equipment, and in particular to a kind of speech evaluating method and relevant apparatus.

Background technique

Simultaneous interpretation be it is a kind of the high language translation translation activity of difficulty is strictly limited by the time, it requires interpreter listening While distinguishing the speech of source language, prediction, understanding, memory, conversion and the mesh of source language information are quickly completed by existing thematic knowledge The tissue of poster speech and expression, therefore the also referred to as synchronous interpretation of simultaneous interpretation.The culture of simultaneous interpretation student be one very Complicated process mainly includes grasp of the training student to source language and the target language, the extensive understanding to knowledge, to same Sound is interpreted the training of skill.Wherein the training of simultaneous interpretation skill is the most important thing of current simultaneous interpretation students developing.

The grounding of simultaneous interpretation skill at present mainly passes through based on practice culture temporary memory is repeated, and passes through The ability that number number practice culture is said when listening in an interference situation.When student has certain basis and then is passed in unison The training translated.For the teaching exercise in each stage, makes timely and effectively recruitment evaluation and feedback quickly mentions student ability It rises very crucial.

Summary of the invention

The embodiment of the present application provides a kind of speech evaluating method and relevant apparatus, carries out speech evaluating to improve equipment Flexibility and comprehensive.

In a first aspect, the embodiment of the present application provides a kind of speech evaluating method, comprising:

The first voice under the first Job evaluation mode as evaluating standard is obtained, and obtains the second voice to be evaluated；

It handles first voice and obtains the first text, and processing second voice obtains the second text；

Obtain the corresponding first text detection strategy of first Job evaluation mode；

First text and second text are handled according to the first text detection strategy, is obtained for described the The testing result of two voices.

Second aspect, the embodiment of the present application provide a kind of speech evaluating device, including processing unit and communication unit, In,

The processing unit, for by the communication unit obtain the first Job evaluation mode under as evaluating standard first Voice, and the second voice to be evaluated is obtained by the communication unit, first Job evaluation mode includes repeating test mould Formula or test pattern of interpreting；And the first text, and processing second voice are obtained for handling first voice Obtain the second text；And for obtaining the corresponding first text detection strategy of first Job evaluation mode；And for according to The first text detection strategy handles first text and second text, obtains the detection for second voice As a result.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, including processor, memory, communication interface and One or more programs, wherein said one or multiple programs are stored in above-mentioned memory, and are configured by above-mentioned It manages device to execute, above procedure is included the steps that for executing the instruction in the embodiment of the present application first aspect either method.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, wherein above-mentioned computer-readable Storage medium storage is used for the computer program of electronic data interchange, wherein above-mentioned computer program executes computer such as Step some or all of described in the embodiment of the present application first aspect either method.

5th aspect, the embodiment of the present application provide a kind of computer program product, wherein above-mentioned computer program product Non-transient computer readable storage medium including storing computer program, above-mentioned computer program are operable to make to calculate Machine executes the step some or all of as described in the embodiment of the present application first aspect either method.The computer program product It can be a software installation packet.

As can be seen that in the embodiment of the present application, it, can when electronic equipment can carry out speech detection under different test patterns It is adapted to the exclusive text detection strategy of current Job evaluation mode with dynamic select, and handles voice according to the exclusive text detection strategy Corresponding text realizes the detection to voice to be evaluated to obtain testing result, so can be to avoid because using single detection Strategy and appearance the case where different Job evaluation modes can not be adapted to, be conducive to improve electronic equipment carry out voice assessment flexibility and It is comprehensive.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the schematic diagram of a kind of electronic equipment acquisition fingerprint provided by the embodiments of the present application；

Fig. 2 a is a kind of flow diagram of speech evaluating method provided by the embodiments of the present application；

Fig. 2 b is a kind of testing result example interface for repeating test pattern provided by the embodiments of the present application；

Fig. 2 c is a kind of testing result example interface of test pattern of interpreting provided by the embodiments of the present application；

The structural schematic diagram of Fig. 3 a kind of electronic equipment provided by the embodiments of the present application；

A kind of Fig. 4 functional unit composition block diagram of speech evaluating device provided by the embodiments of the present application.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

As shown in FIG. 1, FIG. 1 is the schematic diagram of a speech evaluating system 100, which includes voice Acquisition device 110, voice processing apparatus 120, the voice acquisition device 110 connect the voice processing apparatus 120, and voice obtains Take device 110 to be handled to voice processing apparatus 120 for obtaining voice data concurrency, voice processing apparatus 120 for pair Voice data is handled and is exported processing result, which may include integrated form single devices or more Equipment, for convenience of describing, speech evaluating system 100 is referred to as electronic equipment by the application.Obviously the electronic equipment may include The various handheld devices with wireless communication function, wearable device, calculate equipment or are connected to wireless-modulated solution mobile unit Adjust other processing equipments and various forms of user equipmenies (User Equipment, UE) of device, mobile station (Mobile Station, MS), terminal device (terminal device) etc..

At present there are two types of the recruitment evaluation of simultaneous interpretation student and feedback Mains, first is that showing by teacher or classmate The problem of occasion is done the exercises and is fed back, and student is helped to find interpretation voice；Second is that teacher or student are by recording back Put row analysis and summary into.Cooperation practice for the first teacher or classmate, needs other people cooperation, and practice environment is limited.It is right In the second way, it is lower that playback plays back efficiency.

Based on this, the embodiment of the present application proposes a kind of speech evaluating method to solve the above problems, below to the application reality Example is applied to describe in detail.

Fig. 2 a is please referred to, Fig. 2 a is that the embodiment of the present application provides a kind of flow diagram of speech evaluating method, application In electronic equipment as shown in Figure 1, as shown, this speech evaluating method includes:

S201, electronic equipment obtains the first voice under the first Job evaluation mode as evaluating standard, and obtains to be evaluated The second voice.

Wherein, first voice be source voice, second voice be target voice, source voice can by user (such as: Student or teacher) voluntarily selects or specify, such as English BBC broadcast, target voice are the record of user (such as: student) Interpreter Sound file.

In the specific implementation, first voice can be the voice document that electronic equipment local terminal prestores, alternatively, in real time with clothes The voice document for the server push that business device (such as: cloud) is interactive and obtains, second voice can be electronic equipment and pass through The voice acquisition device of local terminal obtains, or the dedicated recording system by communicating with local terminal obtains, and does not do unique restriction herein.

Wherein, the Job evaluation mode that the electronic equipment is supported includes repeating test pattern and test pattern of interpreting, described multiple It states test pattern and refers to that user listens to the audio content of source voice, and repeat the content information of the audio (such as delay 2-3 seconds).It is logical This method is crossed, the main ability practicing student and saying when listening is not related to language translation conversion, the test pattern of interpreting also at this time Also known as simultaneous interpretation mode, under the mode, user listens to original language audio content on one side, is transcribed into object language on one side And come out by phonetic representation, it is consistent with practical simultaneous interpretation mode.Compared to test pattern is repeated, which increases original language To the translation process of object language, degree-of-difficulty factor is bigger.In addition to the tone is expressed, integrality, accuracy rate and the fluency of translation are The important of simultaneous interpretation ability level considers index.

S202, the electronic equipment handle first voice and obtain the first text, and processing second voice obtains To the second text.

Wherein, the process that the one the second voice of electronic equipment processing obtains the one the second texts not only includes speech-to-text The processing of step further includes the pretreatment for original text after conversion, which may include following at least one: number With Time alignment, filter out meaningless modal particle, and carry out punctuate and punctuate prediction.

In the specific implementation, the processing of the voice to text step of electronic equipment may include following two situation:

The first, is the affiliated languages of first voice and institute at this time when repeating test pattern for the first Job evaluation mode It is identical to state the affiliated languages of the second voice, the affiliated languages of the first text are identical with the affiliated languages of the second text.The electricity Sub- equipment handles first voice and obtains the first text, and processing second voice obtains the second text, comprising: described Electronic equipment calls corresponding first speech recognition system of the affiliated languages of the first voice；Pass through first speech recognition system System handles first voice and obtains the first text；Second voice, which is handled, by first speech recognition system obtains the Two texts.It can be seen that the treatment process only needs to can be completed using the same speech recognition system.

Second, be the affiliated languages of first voice and institute at this time when interpreting test pattern for the first Job evaluation mode It is different to state the affiliated languages of the second voice, the affiliated languages of the affiliated languages of the first text and second text are different.The electricity Sub- equipment handles first voice and obtains the first text, and processing second voice obtains the second text, comprising: described Electronic equipment calls corresponding first speech recognition system of the affiliated languages of the first voice, and calls the second voice institute Belong to corresponding second speech recognition system of languages；First voice, which is handled, by first speech recognition system obtains first Text；Second voice, which is handled, by second speech recognition system obtains the second text.

S203, the electronic equipment obtain the corresponding first text detection strategy of first Job evaluation mode.

Wherein, first Job evaluation mode is when repeating test pattern, and the electronic equipment also supports the second Job evaluation mode, The second Job evaluation mode is test pattern of interpreting at this time, likewise, if the first Job evaluation mode is test pattern of interpreting, electronic equipment Also support the second Job evaluation mode, the second Job evaluation mode is to repeat test pattern at this time.

Wherein, electronic equipment local terminal can prestore the corresponding pass between the first Job evaluation mode and the first text detection strategy System, and the corresponding relationship being stored between the second Job evaluation mode and the second text detection strategy.In the specific implementation, electronic equipment It only needs quickly determine current first Job evaluation mode corresponding first by way of inquiring the mapping relations set prestored The particular content of text detection strategy.Convenience and high-efficiency is quick.

Wherein, above-mentioned corresponding relationship can be configured in server side in advance.In the specific implementation, electronic equipment can be real When interacted with server, notice server inquires the corresponding first text detection strategy of current first Job evaluation mode, and returns It is transmitted to electronic equipment.

It should be noted that in the application electronic equipment obtain the first text detection strategy specific implementation include but It is not limited to above-mentioned example method, can also be other modes, does not do unique restriction herein.

S204, the electronic equipment handle first text and second text according to the first text detection strategy This, obtains the testing result for second voice.

Wherein, if first Job evaluation mode is to repeat test pattern, the testing result includes form presented below In any one: repeat quality comprehensive scoring, each subdivision index scoring (such as: repeat degree of registration scoring, leakage repetition amount, More repetition amounts repeat accuracy), Error Text unit show (such as: the mistake that occurs passes through two sentences during repeating The mode of son alignment is shown) etc..

By taking textual presentation unit as an example, the testing result example interface of repetition test pattern as shown in Figure 2 b, wherein should Interface includes the first playback progress control of the first voice and the second playback progress control of the second voice, further includes wrong sentence Corresponding urtext unit and repetition text unit, wherein inconsistent list in urtext unit and repetition text unit Word or phrase are highlighted out, in addition, the first object section of the progress bar of the first playback progress control is marked Come with facilitate user quickly position the wrong original statement position of repetition (first object section include urtext unit it is corresponding into Spend section), the second target interval of the progress bar of the second playback progress control is labeled to be come out to facilitate user quickly to position again It states mistake and repeats sentence position (first object section includes the corresponding progress section of urtext unit), so that user clicks original When beginning text unit or selection first object section, electronic equipment can play the voice comprising repeating wrong original statement, When user clicks repetition text unit or chooses the second target interval, electronic equipment can be played comprising repeating mistake repetition language The voice of sentence improves and consults convenience.

In addition, playback progress control target interval may include the contextual information for repeating wrong sentence, it such as include previous Sentence and latter sentence can also carry out hierarchical according to paragraph, as only included repeating belonging to wrong sentence when previous paragraphs etc..

If first Job evaluation mode is test pattern of interpreting, the testing result includes appointing in form presented below Anticipate one kind: quality comprehensive of interpreting scoring, each subdivision index scoring (such as: degree of registration of interpreting scoring, the scoring of translation fluency, Leak translation amount, more translation amounts and translation accuracy), Error Text unit show (such as: the mistake occurred during interpreting Shown by way of two sentence alignments) etc..

By taking textual presentation unit as an example, the testing result example interface of test pattern of interpreting as shown in Figure 2 c, wherein should Interface includes the first playback progress control of the first voice and the second playback progress control of the second voice, further includes wrong sentence Corresponding urtext unit and text unit of interpreting, wherein inconsistent list in urtext unit and text unit of interpreting Word or phrase are highlighted out, in addition, the first object section of the progress bar of the first playback progress control is marked Come with facilitate user quickly position wrong original statement position of interpreting (first object section include urtext unit it is corresponding into Spend section), the second target interval of the progress bar of the second playback progress control is labeled to be come out to facilitate user quickly to position biography It translates mistake to interpret sentence position (first object section includes the corresponding progress section of urtext unit), so that user clicks original When beginning text unit or selection first object section, electronic equipment can play the voice comprising wrong original statement of interpreting, User clicks when interpreting text unit or choosing the second target interval, and electronic equipment can be played interprets language comprising mistake of interpreting The voice of sentence improves and consults convenience.

It such as include previous sentence and latter in addition, playback progress control may include the contextual information of translation error sentence Sentence can also carry out hierarchical according to paragraph, as only included belonging to translation error sentence when previous paragraphs etc..

In a possible example, first Job evaluation mode includes repeating test pattern；Belonging to first voice Languages are identical with the affiliated languages of the second voice, the affiliated languages of the first text and the affiliated languages phase of second text Together.

It is understood that above-mentioned electronic equipment handles first text and institute according to the first text detection strategy The second text is stated, obtains can be diversified, this Shen for the specific implementation of the testing result of second voice Unique restriction is not done please, citing is illustrated below.

In the possible example of the application, the electronic equipment is according to the first text detection strategy processing described first Text and second text obtain may is that the electricity for the specific implementation of the testing result of second voice Sub- equipment determines matching degree of second text relative to first text；Testing result is generated according to the matching degree.

In the specific implementation, the electronic equipment determines matching degree of second text relative to first text, packet Include: the electronic equipment is decomposed first text to obtain the first text unit set, and second text is carried out Decomposition obtains the second text unit set；According to the first word grade text unit set and the second word grade text unit collection It is total to calculate matching degree.

Wherein, the text unit of first text unit and second text unit fining level can be word Grade, phrase grade, sentence grade, paragraph level etc. such as calculate the speech vector of sentence, obtain sentence grade matching degree etc., do not do unique limit herein It is fixed.

In the possible example of the application, the electronic equipment is according to the first text detection strategy processing described first Text and second text obtain may also is that for the specific implementation of the testing result of second voice described Electronic equipment determines the alignment information of first text and second text；Described second is determined according to the alignment information The repetition accuracy of voice；Testing result is generated according to the alignment information and/or the repetition accuracy.

In the specific implementation, the electronic equipment determines the alignment information of first text and second text, comprising: The electronic equipment calculates repetition matching degree of second text relative to first text, obtains repeating matching degree square Battle array, wherein the repetition matching degree is for indicating matching degree of second text unit relative to the first text unit, the repetition Matching degree is calculated according to the first quantity and the second quantity, first quantity for indicate second text unit with The quantity of identical word grade text unit in first text unit, second quantity is for indicating the second text list The quantity of word grade text unit in member, first text include at least one first text unit, and second text includes At least one second text unit；Optimal repetition align to path is filtered out from the repetition matching degree matrix；According to it is described most Excellent repetition align to path determines the alignment information.

Wherein, if the text unit of first text unit and second text unit fining level can be word Grade, sentence grade, phrase grade, paragraph level etc., do not do unique restriction herein.The optimal repetition align to path, which refers to, repeats matching degree square The alignment relation path of largest cumulative matching degree score in battle array, specifically can be using common calculations such as Viterbi Viterbi algorithms Method is realized, is not specifically limited herein.

In the specific implementation, the electronic equipment determines the repetition accuracy of second voice according to the alignment information, To include: the electronic equipment, which repeat matching degree according to the optimal sentence grade repeated in align to path, calculates second voice Repeat accuracy.Such as repetition accuracy can be calculated using average weighted mode.

In the possible example of the application, the electronic equipment is according to the alignment information and/or the repetition accuracy Generate testing result, comprising: the electronic equipment repeats quality according to the reference that the alignment information calculates second text Parameter, it is described to include leakage repetition rate and/or more repetition rates with reference to repetition mass parameter；According to described with reference to repetition mass parameter And/or the repetition accuracy determines testing result.

Below by taking sentence grade text unit as an example, above-mentioned calculating process is illustrated.

Assuming that the first text includes five first grade text units (i.e. five words) of A, B, C, D, E, the second text includes A, six second grade text units of b, c, d, e, f can then define sentence grade and repeat matching degree Ai, the calculation formula of j are as follows:

Wherein, i be the first text in i-th of first grade text units, j be the second text in j-th second Grade text unit, Ei, j indicate the identical word grade text of j-th of second grade text units and i-th of first grade text units The quantity of this unit, Qy, j indicate the total quantity of word grade text unit in j-th of second text units.

Then corresponding sentence grade repetition matching degree matrix is

Assuming that repeating matching degree matrix based on above-mentioned sentence grade determines optimal repetition align to path are as follows:

A1,1 → A2,2 → A4,3 → A5,5 → A5,6,

It can then determine that this is repeated in test, the 3rd first grade text unit in the first text is what leakage was repeated Sentence grade text unit, the 4th in the second text unit and the 6th second grade text unit are the sentence grade text list more repeated Member, therefore it is as follows to calculate leakage repetition rate, more repetition rates and repetition accuracy rate difference:

Leak repetition rate are as follows: leak the quantity of the sentence grade text unit of repetition divided by the quantity of first grade text unit, i.e., 1/5 =0.2；

More repetition rates are as follows: the quantity for the sentence grade text unit more repeated divided by second grade text unit quantity, i.e., 2/6 ≈0.33；

Repeat accuracy are as follows: the optimal weighted average for repeating sentence grade repetition matching degree in align to path, such as may is that

(A1,1+A2,2+A4,3+A5,5+A5,6)/6。

It should be noted that the calculation of the alignment information under above-mentioned repetition test pattern is merely illustrative, can also adopt The alignment information that the first text and second text are calculated with other methods well known in the art, as based on space vector Cosine-algorithm etc. does not do unique restriction herein.

As it can be seen that electronic equipment, can pair based on the first text and the second text for test pattern is repeated in this example Its information, which calculates, repeats accuracy, is then based on alignment information and/or repeats accuracy generation testing result, since alignment is believed Breath repeats accuracy and can more comprehensively reflect the repetition quality of user, so assessment accuracy and comprehensive can be improved.

In a possible example, first Job evaluation mode includes test pattern of interpreting；Belonging to first voice The affiliated languages of languages and second voice are different, and the affiliated languages of the first text and the affiliated languages of the second text are not Together.

In the possible example of the application, the electronic equipment is according to the first text detection strategy processing described first Text and second text obtain the testing result for second voice, comprising: the electronic equipment determines described the The alignment information of one text and second text；Determine the translation fluency of second text；According to the alignment information And/or the translation fluency generates testing result.

In the specific implementation, to determine that the implementation of the translation fluency of second text can be more for the electronic equipment Kind multiplicity, do not do unique restriction herein.

For example, electronic equipment can handle second text based on preset fluency prediction model and be translated The prediction result of fluency, the fluency prediction model can use neural network language model, can also be using simple N member N-gram language model.

In the possible example of the application, the electronic equipment is according to the alignment information and/or the translation fluency Generate testing result, comprising: the electronic equipment determines that the reference of the second language is interpreted quality according to the alignment information Parameter, it is described to be comprised at least one of the following with reference to mass parameter of interpreting: leakage translation rate, more translation rates and translation accuracy rate；According to It is described to refer to interpret mass parameter and/or translation fluency generation testing result.

In the possible example of the application, the electronic equipment determines being aligned for first text and second text Information, comprising: the electronic equipment determines the intertranslation confidence level matrix of first text and second text；From it is described mutually It translates and filters out optimal translation align to path in confidence level matrix；Alignment information is determined according to the optimal translation align to path.

In the possible example of the application, the electronic equipment determines the intertranslation of first text and second text Confidence level matrix, comprising: the electronic equipment obtains positive translation model and reverse translation model, and the forward direction translation model is used In being converted to the affiliated languages of the second text by the affiliated languages of the first text, the reverse translation model is used for by described The affiliated languages of second text are converted to languages described in first text；Pass through the positive translation model, the reverse translation Model, first text and second text, determine the intertranslation of each first text unit and each second text unit Confidence level obtains intertranslation confidence level matrix, and first text includes multiple first text units, and second text includes more A second text unit.

In the possible example of the application, the electronic equipment passes through the positive translation model, the reverse translation mould Type, first text and second text determine that each first text unit and the intertranslation of each second text unit are set Reliability obtains intertranslation confidence level matrix, comprising: the electronic equipment by the positive translation model, first text and Second text calculates the positive degree of translation confidence of second text；Pass through the reverse translation model, first text Originally with second text, the reverse translation confidence level of second text is calculated；According to the positive degree of translation confidence and institute It states reverse translation confidence level and determines intertranslation confidence level of each second text unit relative to each first text unit, obtain To intertranslation confidence level matrix.

Wherein, the positive degree of translation confidence is for indicating that the second text unit is turned over relative to the first of the first text unit Confidence level is translated, first degree of translation confidence is multiple according to the first text subelements multiple in first text unit What the one sub- confidence level of translation was weighted and averaged, the sub- confidence level of each first translation is the first defeated of each first text subelement Out in Making by Probability Sets multiple first output probabilities maximum value, each first output probability refers to the positive translation model defeated In the case where entering the first text subelement, the forward direction translation model output result is the second text in second text unit The probability of subunit；The reverse translation confidence level is used to indicate second of the first text unit relative to the second text unit Degree of translation confidence, second degree of translation confidence are according to the multiple of the second text subelements multiple in second text unit What the second sub- confidence level of translation was weighted and averaged, each second sub- degree of translation confidence is the second of each second text subelement The maximum value of multiple second output probabilities in output probability set, each second output probability refer to that the reverse translation model exists In the case where inputting the second text subelement, reverse translation model output result is the in the first text subelement The probability of one text subelement.

By taking sentence grade text unit and word grade text unit as an example, then the positive degree of translation confidence is for indicating second grade First grade degree of translation confidence of the text unit relative to first grade text unit, first grade degree of translation confidence is root According to multiple first word grade degree of translation confidence weighted average of multiple first word grade text units in first grade text unit It obtains, each first word grade degree of translation confidence is multiple in the first output probability set of each first word grade text unit The maximum value of one output probability, each first output probability refer to that the positive translation model is inputting the first word grade text unit In the case where, the forward direction translation model output result is the second word grade text unit in second grade text unit Probability；The reverse translation confidence level is used to indicate second of first grade text unit relative to second grade text unit Grade degree of translation confidence, second grade degree of translation confidence are according to the second word grade texts multiple in second grade text unit What multiple second word grade degree of translation confidence of this unit were weighted and averaged, each second word grade degree of translation confidence is each second The maximum value of multiple second output probabilities, each second output probability refer in second output probability set of word grade text unit For the reverse translation model in the case where inputting the second word grade text unit, the reverse translation model output result is described The probability of the first word grade text unit in first grade text unit.

Below for sentence grade text unit, test pattern of interpreting is illustrated in conjunction with specific example.

(1) the audio signal Wy that the original audio signal Wx and student's practice for obtaining student's practice are generated in the process；Its Middle Wx is that student practices the source language speech used, is voluntarily selected by student or teacher specifies, such as English BBC broadcast, Wy are The recording file of the object language of student Interpreter, is obtained by recording system.

(2) using speech recognition system corresponding with language languages by audio signal Wx transcription at correspondence textual representation Tx, Simultaneously by audio signal Wy transcription at textual representation Ty.

(3) identification text post-processing carried out to Tx and Ty respectively, including number and Time alignment, filters out the meaningless tone Word, and punctuate and punctuate prediction are carried out, obtain output resultWithWherein S_{X, i}, S_{Y, j}It is i-th text of original language audio after making pauses in reading unpunctuated ancient writings, Yi Jixue respectively The jth sentence text of member's output audio, M and N be respectively predicted based on original audio content transcription come sentence number and be based on Student exports audio content transcription and predicts the sentence number come.

(4) the positive Machine Translation Model E of a set of source language and the target language of training_fWith reversed Machine Translation Model E_b, Middle positive translation, which refers to, translates into object language text for source language text, reverse translation be then by object language character translation at Source language text.Machine Translation Model, can using statistical machine translation (Statistical Machine Translation, SMT) scheme or neural machine translation (Neural Machine Translation, NMT) scheme, without limitation.

(5) based on positive translation model E_f, every words of calculating student's simultaneous interpretation are in original language audio in every words Degree of translation confidence F_{I, j}, it is defined as follows:

F_{I, j}=P (S_{Y, j}|S_{X, i}, E_f), i=1,2 ..., M j=1,2 ..., N

F_{I, j}Indicate given source language sentence S_{X, i}With positive Machine Translation Model Ef, target language sentence S is translated_{Y, j}'s Confidence score.Its calculation method is first to source language sentence S_{X, i}It carries out the pretreatment such as segmenting, then uses machine translation mould Type E_f, calculate the target language sentence S under the Machine Translation Model_{Y, j}Scoring probability.The machine of specific calculating process and use It is related that device translates modeling scheme, such as neural machine translation, equally to target language sentence S_{Y, j}After carrying out word segmentation processing, It calculates and corresponds to the probability of target language words at each decoding moment, be finally weighted and averaged the decoding probability of all target words Confidence level as whole sentence.

As an example it is assumed that source language sentence S_{X, i}Text be " I likes to sing ", target language sentence S_{Y, j}For " I love Singing " segments " I likes to sing " for " I ", " love ", " singing ", by " I love singing " segment for " I ", " love ", " singing " then passes through positive translation model E_f, calculate participle " I " and be translated as " I ", " love ", " singing " Probability be respectively 0.9,0,0, calculate participle " love " be translated as " I ", " love ", " singing " probability be respectively 0,0.8, 0, calculate participle " singing " be translated as " I ", " love ", " singing " probability be respectively 0,0,0.7, obtain that word grade is optimal to be turned over It translates path and is translated as " I " 0.9 for participle " I ", participle " love " is translated as " love " 0.8, and participle " singing " is translated as " singing " 0.7, weighted average obtain target language sentence S_{Y, j}" I love singing " is relative to source language sentence S_{X, i}'s Confidence score 0.8.

(6) it is based on reverse translation model, every words for calculating student's simultaneous interpretation are talked about for every in original language audio Degree of translation confidence B_{J, i}, it is defined as follows:

B_{J, i}=P (S_{X, i}|S_{Y, j}, Eb), i=1,2 ..., M j=1,2 ..., N

It is represented to the language sentence S that sets the goal_{Y, j}With reversed Machine Translation Model Eb, source language sentence S is translated_{X, i}Confidence Score is spent, calculation is with positive translation model confidence calculations method, and only input is target language sentence S_{Y, j}, defeated It is source language sentence S out_{X, i}。

(7) based on positive degree of translation confidence score and reverse translation confidence score, source language sentence and target language are calculated Say the intertranslation confidence level C of sentence_{I, j}:

C_{I, j}=(F_{I, j}+B_{J, i})/2, i=1,2 ..., M j=1,2 ..., N

Wherein C_{I, j}Indicate S_{X, i}With S_{Y, j}The confidence level of intertranslation, score is higher to illustrate the more accurate of student's simultaneous interpretation translation.

(8) it is based on intertranslation confidence level Matrix C={ C_{I, j}, source language sentence is calculated using Viterbi Viterbi algorithm and is learned The maximum alignment relation path for the target language sentence that member translates.Define Viterbi decode to j-th of sentence of object language with It is Sigma σ that i-th of sentence maximum align to path of original language, which adds up confidence score,_j(i), an object language sentence on the path Fo-love is designated as under the source language sentence of the corresponding maximum confidence score of sonThen according to Viterbi formula, calculating side Formula is as follows:

Wherein Sigma σ₀(i)=0, i=1,2 ..., M, M and N are source language sentence number and object language sentence respectively Sub- number.Under normal circumstances, the corresponding Chinese of an English, the subscript of source language sentence and corresponding student's translation result sentence Within limits, therefore in Viterbi outcome procedure, extension is the introduction of searching route width K, the value to difference backward It is determined according to experimental result.Specific decoding process is same as the prior art, and this will not be detailed here；It is available finally by recalling Decoded optimal translation respective path, i.e. target language sentence S_{Y, j}Its corresponding source language sentence isWherein,It indicates The neat source language sentence subscript of the object language jth sentence pair obtained using alignment algorithm；In this way, available each mesh Mark the alignment relation of language sentence and source language sentence.

(9) translation error detects.In student's translation process, it is possible that the sentence of translation error, such as leak When turning over (corresponding source language sentence), source language sentence just can not find corresponding target language sentence, is aligned at this time by optimal decoding Path counts the number for the source language sentence that the source language sentence not on optimal path can be missed.Sentence is missed in order Subset is combined into A, then:

Wrong score is turned in definition leakage are as follows:

Score₁=| A |/M

Wherein | A | indicate that sentence number is turned in leakage, M is source language sentence sum.

Other than leakage is turned over, it is also possible to occur turning over unintentionally in student's practice, if target language sentence jth sentence is more more The sentence come is translated, then the confidence score of its corresponding alignment sentenceWill be relatively low, therefore pass through given threshold T, ifThen it is considered to rout up the sentence come more.Assuming that the sentence number more turned over is P, definition turns over wrong score calculating Mode is as follows:

Score₂=P/N

Wherein N is target language sentence sum.

For being aligned sentence pairSy, j, if translation accuracy is preferable, confidence scoreWith regard to relatively high, therefore Definition translation accuracy score are as follows:

Wherein L isSentence number, as normalized parameter.

The translation result of high quality requires translation remarkable fluency, therefore defines fluency score are as follows:

Wherein λ is using the language model of a large amount of object languages training, P (S_{Y, j}| λ) it is that student practices sentence S_{Y, j}In the language Say the fluency score on model.Language model can use neural network language model, can also use simple n-gram Language model.

Based on it is above-mentioned turn over leakage and turn over wrong score, translation accuracy score, and translation fluency score, define this white silk The horizontal total score practised are as follows:

Score=α₁(1-Score₁)+α₂(1-Score₂)+α₃·Score₃+α₄·Score₄

Wherein α_i, i=1,2,3,4 be the weight of each score, and specific value can be obtained according to experimental result or experience.

It is consistent with embodiment shown in above-mentioned Fig. 2 a, referring to Fig. 3, Fig. 3 is a kind of electricity provided by the embodiments of the present application The structural schematic diagram of sub- equipment 300, as shown, the electronic equipment 300 includes application processor 310, memory 320, leads to Believe interface 330 and one or more programs 321, wherein one or more of programs 321 are stored in above-mentioned memory In 320, and it is configured to be executed by above-mentioned application processor 310, one or more of programs 321 include following for executing The instruction of step；

The first voice under the first Job evaluation mode as evaluating standard is obtained, and obtains the second voice to be evaluated；With And the first text is obtained for handling first voice, and processing second voice obtains the second text；And it is used for Obtain the corresponding first text detection strategy of first Job evaluation mode；And it is used at according to the first text detection strategy First text and second text are managed, the testing result for second voice is obtained.

In a possible example, first text and institute are handled according to the first text detection strategy described State the second text, in terms of obtaining the testing result for second voice, the instruction in described program be specifically used for executing with Lower operation: the alignment information of first text and second text is determined；And for being determined according to the alignment information The repetition accuracy of second voice；And for generating detection according to the alignment information and/or the repetition accuracy As a result.

In a possible example, detection is generated according to the alignment information and/or the repetition accuracy described As a result in terms of, the instruction in described program is specifically used for executing following operation: according to alignment information calculating second text Mass parameter is repeated in this reference, described to include leakage repetition rate and/or more repetition rates with reference to repetition mass parameter；And it is used for root Testing result is determined with reference to repetition mass parameter and/or the repetition accuracy according to described.

In a possible example, first text and institute are handled according to the first text detection strategy described State the second text, in terms of obtaining the testing result for second voice, the instruction in described program be specifically used for executing with Lower operation: the alignment information of first text Yu second text is determined；And for determining turning over for second text Translate fluency；And for generating testing result according to the alignment information and/or the translation fluency.

In a possible example, detection is generated according to the alignment information and/or the translation fluency described As a result in terms of, the instruction in described program is specifically used for executing following operation: determining second language according to the alignment information The reference of speech is interpreted mass parameter, described to comprise at least one of the following with reference to mass parameter of interpreting: leakage translation rate, more translation rates and Translate accuracy rate；And for referring to interpret mass parameter and/or translation fluency generation testing result according to described.

In a possible example, in the alignment information side of the determination first text and second text Face, the instruction in described program are specifically used for executing following operation: determining the intertranslation of first text and second text Confidence level matrix；And for filtering out optimal translation align to path from the intertranslation confidence level matrix；And it is used for basis The optimal translation align to path determines alignment information.

In a possible example, in the intertranslation confidence level square of the determination first text and second text In terms of battle array, the instruction in described program is specifically used for executing following operation: the positive translation model of acquisition and reverse translation model, institute Positive translation model is stated to be used to be converted to the affiliated languages of the second text by the affiliated languages of the first text, it is described reversely to turn over Model is translated for being converted to languages described in first text as the affiliated languages of the second text；And for by it is described just To translation model, the reverse translation model, first text and second text, determine each first text unit and The intertranslation confidence level of each second text unit, obtains intertranslation confidence level matrix, and first text includes multiple first texts Unit, second text include multiple second text units.

In a possible example, pass through the positive translation model, the reverse translation model, described the described One text and second text, determine the intertranslation confidence level of each first text unit and each second text unit, obtain In terms of intertranslation confidence level matrix, the instruction in described program is specifically used for executing following operation: by the positive translation model, First text and second text calculate the positive degree of translation confidence of second text；And for by described Reverse translation model, first text and second text calculate the reverse translation confidence level of second text；And For according to the positive degree of translation confidence and the reverse translation confidence level determine each second text unit relative to The intertranslation confidence level of each first text unit obtains intertranslation confidence level matrix.

It is above-mentioned that mainly the scheme of the embodiment of the present application is described from the angle of method side implementation procedure.It is understood that , in order to realize the above functions, it comprises execute the corresponding hardware configuration of each function and/or software mould for electronic equipment Block.Those skilled in the art should be readily appreciated that, in conjunction with each exemplary unit of embodiment description presented herein And algorithm steps, the application can be realized with the combining form of hardware or hardware and computer software.Some function actually with Hardware or computer software drive the mode of hardware to execute, the specific application and design constraint item depending on technical solution Part.Professional technician can specifically realize described function to each using distinct methods, but this reality Now it is not considered that exceeding scope of the present application.

The embodiment of the present application can carry out the division of functional unit according to above method example to electronic equipment, for example, can With each functional unit of each function division of correspondence, two or more functions can also be integrated in a processing unit In.Above-mentioned integrated unit both can take the form of hardware realization, can also realize in the form of software functional units.It needs It is noted that be schematical, only a kind of logical function partition to the division of unit in the embodiment of the present application, it is practical real It is current that there may be another division manner.

Fig. 4 is the functional unit composition block diagram of speech evaluating device 400 involved in the embodiment of the present application.The voice is commented It surveys device 400 and is applied to electronic equipment, the electronic equipment includes processing unit 401 and communication unit 402, wherein

The processing unit 401 is used as evaluating standard for obtaining by the communication unit 402 under first Job evaluation mode The first voice, and the second voice to be evaluated is obtained by the communication unit, first Job evaluation mode includes repeating Test pattern or test pattern of interpreting；And the first text is obtained for handling first voice, and processing described the Two voices obtain the second text；And for obtaining the corresponding first text detection strategy of first Job evaluation mode；And it uses In handling first text and second text according to the first text detection strategy, obtain for second voice Testing result.

Wherein, the speech evaluating device 400 can also include storage unit 403, for storing the program of electronic equipment Code and data.The processing unit 401 can be processor, and the communication unit 402 can be internal communications interface, storage Unit 403 can be memory.

In a possible example, first text and institute are handled according to the first text detection strategy described The second text is stated, in terms of obtaining the testing result for second voice, the processing unit 401 is specifically used for: determining institute State the alignment information of the first text and second text；And for determining second voice according to the alignment information Repeat accuracy；And for generating testing result according to the alignment information and/or the repetition accuracy.

In a possible example, detection is generated according to the alignment information and/or the repetition accuracy described As a result aspect, the processing unit 401 are specifically used for: repeating matter according to the reference that the alignment information calculates second text Parameter is measured, it is described to include leakage repetition rate and/or more repetition rates with reference to repetition mass parameter；And for being repeated according to the reference Mass parameter and/or the repetition accuracy determine testing result.

In a possible example, first text and institute are handled according to the first text detection strategy described The second text is stated, in terms of obtaining the testing result for second voice, the processing unit 401 is specifically used for: determining institute State the alignment information of the first text Yu second text；And the translation fluency for determining second text；And For generating testing result according to the alignment information and/or the translation fluency.

In a possible example, detection is generated according to the alignment information and/or the translation fluency described As a result aspect, the processing unit 401 is specifically used for: determining that the reference of the second language is interpreted matter according to the alignment information Parameter is measured, it is described to be comprised at least one of the following with reference to mass parameter of interpreting: leakage translation rate, more translation rates and translation accuracy rate；With And for referring to interpret mass parameter and/or translation fluency generation testing result according to described.

In a possible example, in the alignment information side of the determination first text and second text Face, the processing unit 401 are specifically used for: determining the intertranslation confidence level matrix of first text and second text；With And for filtering out optimal translation align to path from the intertranslation confidence level matrix；And for according to the optimal translation pair Neat path determines alignment information.

In a possible example, in the intertranslation confidence level square of the determination first text and second text Battle array aspect, the processing unit 401 are specifically used for: obtaining positive translation model and reverse translation model, the positive translation mould Type is used to be converted to the affiliated languages of the second text by the affiliated languages of the first text, the reverse translation model be used for by The affiliated languages of second text are converted to languages described in first text；And for by the positive translation model, The reverse translation model, first text and second text determine each first text unit and each second text The intertranslation confidence level of this unit obtains intertranslation confidence level matrix, and first text includes multiple first text units, and described Two texts include multiple second text units.

In a possible example, pass through the positive translation model, the reverse translation model, described the described One text and second text, determine the intertranslation confidence level of each first text unit and each second text unit, obtain In terms of intertranslation confidence level matrix, the processing unit 401 is specifically used for: passing through the positive translation model, first text With second text, the positive degree of translation confidence of second text is calculated；And for by the reverse translation model, First text and second text calculate the reverse translation confidence level of second text；And for according to Positive degree of translation confidence and the reverse translation confidence level determine each second text unit relative to each first text The intertranslation confidence level of unit obtains intertranslation confidence level matrix.

The embodiment of the present application also provides a kind of computer storage medium, wherein computer storage medium storage is for electricity The computer program of subdata exchange, the computer program make computer execute any as recorded in above method embodiment Some or all of method step, above-mentioned computer include electronic equipment.

The embodiment of the present application also provides a kind of computer program product, and above-mentioned computer program product includes storing calculating The non-transient computer readable storage medium of machine program, above-mentioned computer program are operable to that computer is made to execute such as above-mentioned side Some or all of either record method step in method embodiment.The computer program product can be a software installation Packet, above-mentioned computer includes electronic equipment.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, related actions and modules not necessarily the application It is necessary.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of said units, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, It can be electrical or other forms.

Above-mentioned unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If above-mentioned integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment above method of the application Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), disk or CD etc..

The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas； At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.

Claims

1. a kind of speech evaluating method characterized by comprising

First text and second text are handled according to the first text detection strategy, is obtained for second language The testing result of sound.

2. the method according to claim 1, wherein first Job evaluation mode includes repeating test pattern；Institute It is identical with the affiliated languages of the second voice to state the affiliated languages of the first voice, the affiliated languages of the first text and second text Languages belonging to this are identical.

3. according to the method described in claim 2, it is characterized in that, described according to described in the first text detection strategy processing First text and second text obtain the testing result for second voice, comprising:

Determine the alignment information of first text and second text；

The repetition accuracy of second voice is determined according to the alignment information；

Testing result is generated according to the alignment information and/or the repetition accuracy.

4. according to the method described in claim 3, it is characterized in that, described quasi- according to the alignment information and/or the repetition True property generates testing result, comprising:

Mass parameter is repeated according to the reference that the alignment information calculates second text, it is described with reference to repetition mass parameter packet Include leakage repetition rate and/or more repetition rates；

Testing result is determined with reference to repetition mass parameter and/or the repetition accuracy according to described.

5. the method according to claim 1, wherein first Job evaluation mode includes test pattern of interpreting；Institute It is different to state the affiliated languages of the affiliated languages of the first voice and second voice, the affiliated languages of the first text and second text Languages belonging to this are different.

6. according to the method described in claim 5, it is characterized in that, described according to described in the first text detection strategy processing First text and second text obtain the testing result for second voice, comprising:

Determine the alignment information of first text Yu second text；

Determine the translation fluency of second text；

Testing result is generated according to the alignment information and/or the translation fluency.

7. according to the method described in claim 6, it is characterized in that, described flow according to the alignment information and/or the translation Smooth property generates testing result, comprising:

Determine that the reference of the second language is interpreted mass parameter according to the alignment information, it is described with reference to mass parameter packet of interpreting Include following at least one: leakage translation rate, more translation rates and translation accuracy rate；

According to described with reference to mass parameter and/or the translation fluency generation testing result of interpreting.

8. method according to claim 6 or 7, which is characterized in that determination first text and second text This alignment information, comprising:

Determine the intertranslation confidence level matrix of first text and second text；

Optimal translation align to path is filtered out from the intertranslation confidence level matrix；

Alignment information is determined according to the optimal translation align to path.

9. according to the method described in claim 8, it is characterized in that, the determination first text and second text Intertranslation confidence level matrix, comprising:

Positive translation model and reverse translation model are obtained, the forward direction translation model is used for by the affiliated languages of the first text The affiliated languages of the second text are converted to, the reverse translation model is used to be converted to institute by the affiliated languages of the second text State languages described in the first text；

By the positive translation model, the reverse translation model, first text and second text, determine each The intertranslation confidence level of first text unit and each second text unit obtains intertranslation confidence level matrix, the first text packet Multiple first text units are included, second text includes multiple second text units.

10. according to the method described in claim 9, it is characterized in that, it is described by the positive translation model, described reversely turn over Model, first text and second text are translated, determines the mutual of each first text unit and each second text unit Confidence level is translated, intertranslation confidence level matrix is obtained, comprising:

It is turned over by the positive translation model, first text and second text, the forward direction for calculating second text Translate confidence level；

By the reverse translation model, first text and second text, reversely turning over for second text is calculated Translate confidence level；

According to the positive degree of translation confidence and the reverse translation confidence level determine each second text unit relative to The intertranslation confidence level of each first text unit obtains intertranslation confidence level matrix.

11. a kind of speech evaluating device, which is characterized in that including processing unit and communication unit, wherein

The processing unit, for obtaining the first language under the first Job evaluation mode as evaluating standard by the communication unit Sound, and the second voice to be evaluated is obtained by the communication unit, first Job evaluation mode includes repeating test pattern Or test pattern of interpreting；And the first text is obtained for handling first voice, and processing second voice obtains To the second text；And for obtaining the corresponding first text detection strategy of first Job evaluation mode；And for according to institute It states the first text detection strategy and handles first text and second text, obtain the detection knot for second voice Fruit.

12. a kind of electronic equipment, which is characterized in that one including processor, memory, and one or more programs Or multiple programs are stored in the memory, and are configured to be executed by the processor, described program includes for holding Instruction of the row such as the step in the described in any item methods of claim 1-10.

13. a kind of computer readable storage medium, which is characterized in that storage is used for the computer program of electronic data interchange, In, the computer program makes computer execute such as the described in any item methods of claim 1-10.