CN110176225A - A kind of appraisal procedure and device of prosody prediction effect - Google Patents

A kind of appraisal procedure and device of prosody prediction effect Download PDF

Info

Publication number
CN110176225A
CN110176225A CN201910461506.5A CN201910461506A CN110176225A CN 110176225 A CN110176225 A CN 110176225A CN 201910461506 A CN201910461506 A CN 201910461506A CN 110176225 A CN110176225 A CN 110176225A
Authority
CN
China
Prior art keywords
artificial
test case
result
weight
prosodic labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910461506.5A
Other languages
Chinese (zh)
Other versions
CN110176225B (en
Inventor
杨勤英
吴陈成
宋明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201910461506.5A priority Critical patent/CN110176225B/en
Publication of CN110176225A publication Critical patent/CN110176225A/en
Application granted granted Critical
Publication of CN110176225B publication Critical patent/CN110176225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application provides the appraisal procedures and device of a kind of prosody prediction effect, wherein, method includes: to obtain test case to concentrate multiple corresponding prosody predictions of test case as a result, the corresponding prosody prediction result of each test case is predicted to obtain by prosody prediction engine to be assessed;Weight based on artificial prosodic labeling result each in the corresponding artificial prosodic labeling result set of multiple test cases being obtained ahead of time, determine that the weight of the corresponding prosody prediction result of multiple test cases, the weight of any artificial prosodic labeling result can characterize the resonable degree of the artificial prosodic labeling result;According to the weight of the corresponding prosody prediction result of multiple test cases and the corresponding prosody prediction result of multiple test cases, the assessment result of the prosody prediction effect of prosody prediction engine to be assessed is determined.The appraisal procedure of prosody prediction effect provided by the present application automatic, efficiently, objectively can assess the prediction effect of prosody prediction engine.

Description

A kind of appraisal procedure and device of prosody prediction effect
Technical field
This application involves speech synthesis technique field more particularly to a kind of appraisal procedures and device of prosody prediction effect.
Background technique
Prosody prediction is a part indispensable in speech synthesis system, it belongs at the front end of speech synthesis system Reason, for predicting the rhythm boundary position in text data, the back-end processing of speech synthesis system can be according to rhythm boundary position Provide audio pause.
Prosody prediction is directly affected by prosody prediction engine implementation, the quality of the prosody prediction effect of prosody prediction engine The total quality of speech synthesis, in order to obtain higher speech synthesis quality, it usually needs pre- to the rhythm of prosody prediction engine Effect is surveyed to be assessed.
Currently, the method assessed the prosody prediction effect of prosody prediction engine is artificial appraisal procedure, i.e., by commenting Estimate personnel to assess the prosody prediction result of prosody prediction engine.However, manual evaluation method vulnerable to subjective factor (such as Experience, state of appraiser etc.) it influences, cause assessment result confidence level not high, also, the cost of labor of manual evaluation method It is higher with time cost.
Summary of the invention
In view of this, this application provides the appraisal procedures and device of a kind of prosody prediction result, to solve existing skill Manual evaluation method in art causes assessment result confidence level not high vulnerable to subjective factor, and manual evaluation method it is artificial at Originally with the higher problem of time cost, its technical solution is as follows:
A kind of appraisal procedure of prosody prediction effect, comprising:
It obtains test case and concentrates the corresponding prosody prediction result of multiple test cases, wherein each test case Corresponding prosody prediction result is predicted to obtain by prosody prediction engine to be assessed;
Based on each artificial in the corresponding artificial prosodic labeling result set of being obtained ahead of time, the multiple test case The weight of prosodic labeling result determines the weight of the corresponding prosody prediction result of the multiple test case, wherein any The corresponding artificial prosodic labeling result set of test case include at least one corresponding artificial prosodic labeling of the test case as a result, The weight of any artificial prosodic labeling result can characterize the resonable degree of the artificial prosodic labeling result;
It is right respectively according to the corresponding prosody prediction result of the multiple test case and the multiple test case The weight for the prosody prediction result answered determines the assessment result of the prosody prediction effect of the prosody prediction engine to be assessed.
Optionally, each artificial rhythm in the corresponding artificial prosodic labeling result set of the multiple test case is obtained The weight of annotation results, comprising:
Obtain the corresponding artificial prosodic labeling result set of the multiple test case;
By each of the corresponding artificial prosodic labeling result set of the multiple test case work prosodic labeling knot Fruit carries out audio synthesis, obtains the corresponding Composite tone collection of the multiple test case;
The best artificial prosodic labeling that each test case is chosen is directed to according to audiometry personnel each in multiple audiometry personnel As a result, determining each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of the multiple test case Weight;Wherein, any audiometry personnel are audiometry personnel for the best artificial prosodic labeling result that any test case is chosen Audiometry is carried out by each Composite tone concentrated to the corresponding Composite tone of the test case, it is corresponding artificial from the test case The best artificial prosodic labeling result selected in prosodic labeling result set.
Optionally, described that the most beautiful woman that each test case is chosen is directed to according to audiometry personnel each in multiple audiometry personnel Work prosodic labeling is as a result, determine each artificial rhythm in the corresponding artificial prosodic labeling result set of the multiple test case The weight of annotation results, comprising:
The test case that the test case is concentrated is divided into multiple groups, every group of test case forms test case Collection, obtains multiple test case subsets;
The test case subset that acquisition one had not been obtained from the multiple test case subset is as target detection use-case Subset;
It is surveyed based on the corresponding initial weight of the multiple audiometry personnel and each audiometry personnel for the target The best artificial prosodic labeling that example on probation concentrates each test case to choose is as a result, determine that the multiple audiometry personnel are right respectively The target weight answered;
By the corresponding target weight of the audiometry personnel, each test in the target detection use-case subset is determined The weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of use-case;
It is corresponding just using the corresponding target weight of the multiple audiometry personnel as the multiple audiometry personnel Then beginning weight executes the test case subset that acquisition one had not been obtained from multiple test case subsets and surveys as target Example collection on probation, until there is no the test case subset having not been obtained, it is corresponding to obtain the multiple test case The weight of each artificial prosodic labeling result in artificial prosodic labeling result set.
Optionally, described to be based on the corresponding initial weight of the multiple audiometry personnel and each audiometry personnel needle The best artificial prosodic labeling choose to test case each in the target detection use-case subset is as a result, determined the multiple survey Listen the corresponding target weight of personnel, comprising:
By each audiometry personnel for the best artificial of test case selection each in the target detection use-case subset Prosodic labeling is as a result, determine the corresponding test case quantity ratio of any two audiometry personnel, wherein any two audiometry personnel couple The test case quantity ratio answered are as follows: any two audiometry personnel are for each test case choosing in the target detection use-case subset It is surveyed in the total quantity and the target detection use-case subset of the corresponding test case of identical best manually prosodic labeling result taken The ratio of the total quantity of example on probation;
It is corresponding based on the corresponding initial weight of the multiple audiometry personnel and any two audiometry personnel Test case quantity ratio determines the corresponding target weight of the multiple audiometry personnel.
Optionally, described by the corresponding target weight of the audiometry personnel, determine the target detection example Concentrate the weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of each test case, comprising:
Appointing in artificial prosodic labeling result set corresponding for test case any in the target detection use-case subset One artificial prosodic labeling result:
By choosing the corresponding target of audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result Weight determines the weight of the artificial prosodic labeling result;
It is every in the corresponding artificial prosodic labeling result set of each test case in the target detection use-case subset to obtain The weight of a artificial prosodic labeling result.
Optionally, described by choosing the audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result Corresponding target weight determines the weight of the artificial prosodic labeling result, comprising:
The corresponding target power of audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result will be chosen Value summation, summation obtain weight of the value as the artificial prosodic labeling result.
Optionally, described based on the corresponding artificial prosodic labeling result of being obtained ahead of time, the multiple test case The weight for concentrating each artificial prosodic labeling result, determines the power of the corresponding prosody prediction result of the multiple test case Value, comprising:
For the corresponding prosody prediction result of any test case:
Prosody prediction corresponding with the test case is determined from the corresponding artificial prosodic labeling result set of the test case As a result consistent artificial prosodic labeling is as a result, and using the weight for the artificial prosodic labeling result determined as the test case pair The weight for the prosody prediction result answered;
To obtain the weight of the corresponding prosody prediction result of the multiple test case.
Optionally, described according to the corresponding prosody prediction result of the multiple test case and the multiple test The weight of the corresponding prosody prediction result of use-case determines commenting for the prosody prediction effect of the prosody prediction engine to be assessed Estimate result, comprising:
It is right respectively by the corresponding prosody prediction result of the multiple test case and the multiple test case The weight for the prosody prediction result answered determines the score of the prosody prediction effect of the prosody prediction engine to be assessed;
It determines that the score of the prosody prediction effect of the prosody prediction engine to be assessed accounts for the ratio of artificial top score, makees For the assessment result of the prosody prediction effect of the prosody prediction engine to be assessed;
Wherein, the artificial top score to the corresponding maximum weight of each test case by summing to obtain, any survey Trying out the corresponding maximum weight of example is each artificial prosodic labeling knot in the corresponding artificial prosodic labeling result set of the test case Maximum weight in the weight of fruit.
A kind of assessment device of prosody prediction effect, comprising: prosody prediction result obtains module, prosody prediction result weight Determining module and prosody prediction recruitment evaluation module;
The prosody prediction result obtains module, concentrates the corresponding rhythm of multiple test cases for obtaining test case Restrain prediction result, wherein the corresponding prosody prediction result of each test case is predicted to obtain by prosody prediction engine to be assessed;
The prosody prediction result weight determining module, for based on test case difference be obtained ahead of time, the multiple The weight of each artificial prosodic labeling result in corresponding artificial prosodic labeling result set, determines the multiple test case difference The weight of corresponding prosody prediction result, wherein the corresponding artificial prosodic labeling result set of any test case includes the test At least one corresponding artificial prosodic labeling of use-case is as a result, the weight of any artificial prosodic labeling result can characterize the artificial rhythm Restrain the resonable degree of annotation results;
The prosody prediction recruitment evaluation module, for according to the corresponding prosody prediction knot of the multiple test case The weight of fruit and the corresponding prosody prediction result of the multiple test case determines the prosody prediction engine to be assessed Prosody prediction effect assessment result.
The assessment device of the prosody prediction effect further include: artificial prosodic labeling result set obtains module, audio synthesis Module and artificial prosodic labeling result weight determining module;
The artificial prosodic labeling result set obtains module, corresponding artificial for obtaining the multiple test case Prosodic labeling result set;
The audio synthesis module, being used for will be in the corresponding artificial prosodic labeling result set of the multiple test case Each of artificial prosodic labeling result carry out audio synthesis, obtain the corresponding Composite tone collection of the multiple test case;
The artificial prosodic labeling result weight determining module, for according to audiometry personnel needle each in multiple audiometry personnel The best artificial prosodic labeling choose to each test case is as a result, determined the corresponding artificial rhythm of the multiple test case Restrain the weight that annotation results concentrate each artificial prosodic labeling result;Wherein, any audiometry personnel select for any test case The best artificial prosodic labeling result taken is that audiometry personnel pass through each conjunction for concentrating to the corresponding Composite tone of the test case Audiometry, the best artificial prosodic labeling selected from the corresponding artificial prosodic labeling result set of the test case are carried out at audio As a result.
Optionally, the artificial prosodic labeling result weight determining module includes: grouping submodule, acquisition submodule, the One weight determines that submodule, the second weight determine that submodule and initial weight determine submodule;
The grouping submodule, the test case for concentrating the test case are divided into multiple groups, every group of test case A test case subset is formed, multiple test case subsets are obtained;
The acquisition submodule, for obtaining test case having not been obtained from the multiple test case subset Collection is used as target detection use-case subset;
First weight determines submodule, for being based on the corresponding initial weight of the multiple audiometry personnel, with And the best artificial prosodic labeling knot that each audiometry personnel choose for test case each in the target detection use-case subset Fruit determines the corresponding target weight of the multiple audiometry personnel;
Second weight determines submodule, for determining institute by the corresponding target weight of the audiometry personnel State each artificial prosodic labeling knot in the corresponding artificial prosodic labeling result set of each test case in target detection use-case subset The weight of fruit;
The initial weight determines submodule, for the corresponding target weight of the multiple audiometry personnel to be determined as The corresponding initial weight of the multiple audiometry personnel, then triggers the acquisition submodule from multiple test case subsets The test case subset that acquisition one had not been obtained is as target detection use-case subset, until there is no the test cases having not been obtained Subset, to obtain each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of the multiple test case Weight.
Optionally, first weight determines submodule, is specifically used for surveying by each audiometry personnel for the target The best artificial prosodic labeling that example on probation concentrates each test case to choose is as a result, determine that any two audiometry personnel are corresponding Test case quantity ratio;Based on the corresponding initial weight of the multiple audiometry personnel and any two audiometry people The corresponding test case quantity ratio of member, determines the corresponding target weight of the multiple audiometry personnel;
Wherein, the test case quantity ratio of any two audiometry personnel are as follows: any two audiometry personnel survey for the target The sum for the corresponding test case of identical best manually prosodic labeling result that each test case that example on probation is concentrated is chosen The ratio of amount and the total quantity of test case in the target detection use-case subset.
Optionally, second weight determines submodule, is specifically used for for any in the target detection use-case subset Any artificial prosodic labeling in the corresponding artificial prosodic labeling result set of test case is as a result, by choosing the artificial rhythm mark The corresponding target weight of audiometry personnel that result is best artificial prosodic labeling result is infused, determines the artificial prosodic labeling result Weight, to obtain in the corresponding artificial prosodic labeling result set of each test case in the target detection use-case subset everyone The weight of work prosodic labeling result.
Optionally, the prosody prediction result weight determining module is specifically used for being directed to the corresponding rhythm of any test case It restrains prediction result: determining that the rhythm corresponding with the test case is pre- from the corresponding artificial prosodic labeling result set of the test case The consistent artificial prosodic labeling of result is surveyed as a result, and using the weight for the artificial prosodic labeling result determined as the test case The weight of corresponding prosody prediction result;To obtain the weight of the corresponding prosody prediction result of the multiple test case.
Optionally, the prosody prediction recruitment evaluation module includes: that prosody prediction effect score determines submodule and assessment As a result submodule is determined;
The prosody prediction effect score determines submodule, for passing through the corresponding rhythm of the multiple test case The weight of prediction result and the corresponding prosody prediction result of the multiple test case determines that the rhythm to be assessed is pre- Survey the score of the prosody prediction effect of engine;
The assessment result determines submodule, for determining the prosody prediction effect of the prosody prediction engine to be assessed Score accounts for the ratio of artificial top score, the assessment result of the prosody prediction effect as the prosody prediction engine to be assessed; Wherein, the artificial top score to the corresponding maximum weight of each test case by summing to obtain, any test case pair The maximum weight answered is the weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of the test case In maximum weight.
A kind of assessment equipment of prosody prediction effect, comprising: memory and processor;
The memory, for storing program;
The processor realizes each step of the appraisal procedure of the prosody prediction effect for executing described program.
A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor Each step of the appraisal procedure of the existing prosody prediction effect.
Via above scheme it is found that the appraisal procedure and device of prosody prediction effect provided by the present application, obtain survey first Example on probation concentrates the corresponding prosody prediction of multiple test cases as a result, being then based on the multiple test cases point being obtained ahead of time The weight of each artificial prosodic labeling result, determines that multiple test cases are right respectively in not corresponding artificial prosodic labeling result set The weight for the prosody prediction result answered determines finally according to the weight of the corresponding prosody prediction result of multiple test cases The assessment result of the prosody prediction effect of prosody prediction engine to be assessed, it can be seen that, prosody prediction effect provided by the present application Appraisal procedure, everyone can be concentrated in the corresponding artificial prosodic labeling result set of multiple test cases based on test case The weight of work prosodic labeling result automatically assesses the prosody prediction effect of prosody prediction engine to be assessed, compared to existing Manual evaluation mode, not only avoid influence of the subjective factors to assessment result, and save manpower, reduce assessment It is time-consuming.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is the flow diagram of the appraisal procedure of prosody prediction effect provided by the embodiments of the present application;
Fig. 2 is provided by the embodiments of the present application obtains in the corresponding artificial prosodic labeling result set of multiple test cases The flow diagram of the weight of each artificial prosodic labeling result;
Fig. 3 is that each audiometry personnel provided by the embodiments of the present application according in multiple audiometry personnel use for each test The best artificial prosodic labeling that example is chosen is as a result, determine every in the corresponding artificial prosodic labeling result set of multiple test cases The flow diagram of the weight of a artificial prosodic labeling result;
Fig. 4 is the weight provided by the embodiments of the present application according to the corresponding prosody prediction result of multiple test cases, Determine the flow diagram of the assessment result of the prosody prediction effect of prosody prediction engine to be assessed;
Fig. 5 is the structural schematic diagram of the assessment device of prosody prediction effect provided by the embodiments of the present application;
Fig. 6 is the structural schematic diagram of the assessment equipment of prosody prediction effect provided by the embodiments of the present application.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Existing manual evaluation method is to concentrate each test case corresponding test case by multiple appraisers Prosody prediction result (that is, prosody prediction result that prosody prediction engine to be assessed is directed to test use cases) is assessed, specifically , appraiser carries out prosodic labeling check and correction to prosody prediction result or by the audio that prosody prediction result synthesizes, thus Unacceptable prosody prediction is picked out from the corresponding prosody prediction result of each test case as a result, count in turn can not Receptance.
Inventor it has been investigated that, there are biggish subjectivities for above-mentioned manual evaluation mode, for example, different evaluators For member in the audio that audiometry is synthesized by prosody prediction result, the difference of audiometry results is up to 25% or even same appraiser Also bigger for the audiometry results difference of same audio in different time sections, this causes the confidence level of assessment result lower, and And manpower needed for manual evaluation method is more, and it is longer to assess the time, i.e. the cost of labor and time cost of manual evaluation method It is higher.
In addition, the data manually marked can not reuse, that is to say, that prosody prediction in above-mentioned manual evaluation method Engine optimization front and back, is required to appraiser and closes to the prosody prediction result of prosody prediction engine or by prosody prediction result At audio carry out prosodic labeling check and correction, to pick out unacceptable prosody prediction result, that is, prosody prediction engine optimization before Labeled data to the no any utility value of the assessment of the prosody prediction result of prosody prediction engine after optimization.
In view of above-mentioned manual evaluation mode there are the problem of, inventor has made intensive studies, and finally proposes one Kind of effect is preferable, for the method assessed of prediction effect of prosody prediction engine, which is applicable to pair The application scenarios that the prediction effect of prosody prediction engine is assessed, the appraisal procedure can it is automatic, efficiently, objectively to be evaluated The prediction effect for estimating prosody prediction engine is assessed, which can be applied to terminal, can also be applied to server. It is introduced followed by appraisal procedure of following embodiments to prosody prediction effect provided by the present application.
Referring to Fig. 1, the flow diagram of the appraisal procedure of prosody prediction effect provided by the embodiments of the present application is shown, This method may include:
Step S101: it obtains test case and concentrates the corresponding prosody prediction result of multiple test cases.
Wherein, test case concentrate include chosen from the different field of user's scene in preset big data ratio it is multiple Test case, the corresponding prosody prediction result of each test case are predicted to obtain by prosody prediction engine to be assessed.
Step S102: based on every in the corresponding artificial prosodic labeling result set of being obtained ahead of time, multiple test cases The weight of a artificial prosodic labeling result, determines the weight of the corresponding prosody prediction result of multiple test cases.
Wherein, the corresponding artificial prosodic labeling result set of any test case, the corresponding artificial rhythm of any test case Annotation results collection includes at least one corresponding artificial prosodic labeling result of the test case.
In view of same test case there may be multiple reasonable prosodic labelings as a result, therefore, the application is for any Test case can obtain multiple reasonable artificial prosodic labeling results and form the corresponding artificial prosodic labeling result of the test case Collection, wherein multiple reasonable artificial prosodic labeling results can carry out rhythm boundary to the test case by multiple mark personnel Position marks to obtain.
Wherein, the weight of any prosodic labeling result can characterize the resonable degree of the prosodic labeling result.
Test case is obtained in advance concentrates in the corresponding artificial prosodic labeling result set of multiple test cases everyone The realization process of the weight of work prosodic labeling result can be found in the explanation of subsequent embodiment.
Step S103: right respectively according to the corresponding prosody prediction result of multiple test cases and multiple test cases The weight for the prosody prediction result answered determines the assessment result of the prosody prediction effect of prosody prediction engine to be assessed.
Specifically, multiple corresponding prosody prediction results of test case and multiple surveys can be concentrated according to test case The weight of the corresponding prosody prediction result of example on probation determines obtaining for the prosody prediction effect of prosody prediction engine to be assessed Point, and then the score of the prosody prediction effect based on prosody prediction engine to be assessed, determine the rhythm of prosody prediction engine to be assessed Restrain the assessment result of prediction effect.
The appraisal procedure of prosody prediction effect provided by the embodiments of the present application, first acquisition test case concentrate multiple tests The corresponding prosody prediction of use-case is as a result, be then based on the corresponding artificial rhythm mark of multiple test cases being obtained ahead of time The weight for infusing each artificial prosodic labeling result in result set, determines the corresponding prosody prediction result of multiple test cases Weight determines prosody prediction engine to be assessed finally according to the weight of the corresponding prosody prediction result of multiple test cases Prosody prediction effect assessment result, it can be seen that, the appraisal procedure of prosody prediction effect provided by the embodiments of the present application can Each artificial prosodic labeling knot in the corresponding artificial prosodic labeling result set of multiple test cases is concentrated based on test case The weight of fruit is automatically assessed the prosody prediction effect of prosody prediction engine to be assessed, compared to existing manual evaluation Mode not only avoids influence of the subjective factors to assessment result, and saves manpower, reduces assessment time-consuming.
In addition, in the present embodiment, the corresponding artificial prosodic labeling result of multiple test cases of test case concentration The weight of each artificial prosodic labeling result in collection and the corresponding artificial prosodic labeling result set of multiple test cases It needs to obtain once, i.e., prediction effect of prosody prediction engine of each version can be assessed using it, i.e. test case The corresponding artificial prosodic labeling result set of multiple test cases and multiple test cases concentrated are corresponding artificial The weight of each artificial prosodic labeling result is reusable in prosodic labeling result set.
Next the corresponding artificial prosodic labeling result set of multiple test cases is concentrated to the preparatory test case that obtains In the weight of each artificial prosodic labeling result be introduced.
The corresponding artificial prosodic labeling knot of multiple test cases is concentrated referring to Fig. 2, showing and obtaining test case Fruit concentrates the flow diagram of the weight of each artificial prosodic labeling result, may include:
Step S201: it obtains test case and concentrates the corresponding artificial prosodic labeling result set of multiple test cases.
Wherein, the artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of any test case is by multiple marks Note personnel carry out prosodic phrase boundary position for the test case and mark to obtain.
Step S202: by each artificial prosodic labeling in the corresponding artificial prosodic labeling result set of multiple test cases As a result audio synthesis is carried out, the corresponding Composite tone collection of multiple test cases is obtained.
Specifically, any test case is directed to, respectively in artificial prosodic labeling result set corresponding to the test case Each artificial prosodic labeling result carries out audio synthesis, and the set of the audio composition synthesized is corresponding as the test case Composite tone collection, to obtain the corresponding Composite tone collection of multiple test cases.
Step S203: according to audiometry personnel each in multiple audiometry personnel for the best artificial of each test case selection Prosodic labeling is as a result, determine each artificial prosodic labeling knot in the corresponding artificial prosodic labeling result set of multiple test cases The weight of fruit.
Wherein, any audiometry personnel are audiometry people for the best artificial prosodic labeling result that any test case is chosen Member carries out audiometry by each Composite tone concentrated to the corresponding Composite tone of the test case, from the corresponding people of the test case The best artificial prosodic labeling result selected in work prosodic labeling result set.It should be noted that any audiometry personnel are directed to When the corresponding Composite tone collection of any test case carries out audiometry, picked out most from the corresponding Composite tone concentration of the test case Good news frequency, the corresponding artificial prosodic labeling result of optimal audio are best artificial prosodic labeling result.
Below to " step S203: according to each audiometry personnel in multiple audiometry personnel for the selection of each test case Best artificial prosodic labeling as a result, determining each artificial in the corresponding artificial prosodic labeling result set of multiple test cases The weight of prosodic labeling result " is introduced.
Referring to Fig. 3, showing according to audiometry personnel each in multiple audiometry personnel for the selection of each test case Best artificial prosodic labeling is as a result, determine each artificial rhythm in the corresponding artificial prosodic labeling result set of multiple test cases Restrain annotation results weight flow diagram, may include:
Step S301: the test case that test case is concentrated is divided into multiple groups, every group of test case forms a test and use Example collection obtains multiple test case subsets.
Illustratively, it includes 100 test cases that test case, which is concentrated, 100 test cases can be divided into 5 groups, every group 5 A test case can get 5 test case subsets.
Step S302: the test case subset that acquisition one had not been obtained from multiple test case subsets is as target detection Use-case subset.
Step S303: target is directed to based on the corresponding initial weight of multiple audiometry personnel and each audiometry personnel The best artificial prosodic labeling that each test case is chosen in test case subset is as a result, determine that multiple audiometry personnel respectively correspond Target weight.
Specifically, the realization process of step S303 may include:
Step S3031, it is chosen most by each audiometry personnel for test case each in target detection use-case subset Beautiful woman's work prosodic labeling is as a result, determine the corresponding test case quantity ratio of any two audiometry personnel.
Wherein, the corresponding test case quantity ratio of any two audiometry personnel are as follows: any two audiometry personnel are directed to the mesh Mapping tries out the corresponding test case of identical best manually prosodic labeling result that each test case that example is concentrated is chosen The ratio of the total quantity of test case in total quantity and target detection use-case subset.
Illustratively, target detection use-case subset includes 3 test cases A, B, C, the corresponding artificial rhythm of test case A Annotation results collection is { a1, a2, a3 }, and the corresponding artificial prosodic labeling result set of test case B is { b1, b2, b3 }, test case The corresponding artificial prosodic labeling result set of C is { c1, c2, c3 }, it is assumed that audiometry personnel x is directed to the most beautiful woman that test case A chooses Work prosodic labeling result is a2, is b1 for the test case B best artificial prosodic labeling result chosen, for test case C The best artificial prosodic labeling result chosen is c3, and audiometry personnel y is directed to the best artificial prosodic labeling knot that test case A chooses Fruit is a1, is b1, the most beautiful woman chosen for test case C for the best artificial prosodic labeling result that test case B chooses Work prosodic labeling result is c3, then, audiometry personnel x and audiometry personnel y chooses identical best for the test case A, B, C Manually the total quantity of the corresponding test case of prosodic labeling result is 2, the total quantity of test case in target detection use-case subset Be the corresponding test case quantity ratio in 3, then audiometry personnel x and audiometry personnel y be 2/3.
Step S3032, corresponding based on the corresponding initial weight of multiple audiometry personnel and any two audiometry personnel Test case quantity ratio, determine the corresponding target weight of multiple audiometry personnel.
Wherein, the corresponding target weight of any audiometry personnel is higher, shows that experience of the audiometry personnel in terms of audiometry is got over It is abundant.
Assuming that the quantity of audiometry personnel is Np, in the present embodiment, can be by NpThe corresponding initial weight of a audiometry personnel Form a NpDimensional vector, can be by any two audiometry personnel as the corresponding initial probability distribution of target detection use-case subset Test case quantity than composition one Np*NpMatrix as the corresponding transfer matrix of target detection use-case subset.It needs to illustrate , for first aim test case subset, the value of each element is 1/ in corresponding initial probability distribution Np
For target detection use-case subset Si, obtain target detection use-case subset SiCorresponding initial probability distribution Vi-1With turn Move matrix MiIt afterwards, can be according to limn→∞Mi nVi-1, iteration several times is carried out, until calculated result tends to be steady, final calculating knot Fruit is target detection use-case subset SiCorresponding probability distribution Vi, ViIn the value of each element be that multiple audiometry personnel respectively correspond Target weight, wherein the value of i be 1~Q, Q be test case subset quantity.
Step S304: by the corresponding target weight of audiometry personnel, each survey of target detection use-case subset is determined The weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of example on probation.
Specifically, in artificial prosodic labeling result set corresponding for test case any in target detection use-case subset Any artificial prosodic labeling result: by choosing the audiometry people that the artificial prosodic labeling result is best artificial prosodic labeling result The corresponding target weight of member, determines the weight of the artificial prosodic labeling result, to obtain each survey in target detection use-case subset The weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of example on probation.
Further, in artificial prosodic labeling result set corresponding for test case any in target detection use-case subset Any artificial prosodic labeling result: can will choose the audiometry that the artificial prosodic labeling result is best artificial prosodic labeling result The corresponding target weight summation of personnel, summation obtains weight of the value as the artificial prosodic labeling result, to obtain target detection In use-case subset in the corresponding artificial prosodic labeling result set of each test case each artificial prosodic labeling result weight.
Step S305: the test case subset whether also having not been obtained in multiple test case subsets is judged, if multiple The test case subset also having not been obtained in test case subset, thens follow the steps S306, if in multiple test case subsets The test case subset not having not been obtained then terminates weight and determines process.
Step S306: corresponding just using the corresponding target weight of multiple audiometry personnel as multiple audiometry personnel Then beginning weight executes step S302.
In the present embodiment, the corresponding artificial rhythm mark of each test case in first aim test case subset is calculated When infusing the weight of each artificial prosodic labeling result in result set, with [1/Np 1/Np…1/Np]TIt is tested as first aim The corresponding initial probability distribution V of use-case subset0, pass through initial probability distribution V0It is corresponding with first aim test case subset Transfer matrix M1It determines the corresponding probability distribution V1 of first aim test case subset, is used calculating second target test It, will when example concentrates the weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of each test case V1 passes through V1 and second target test case subset as the corresponding initial probability distribution of second target test case subset Corresponding transfer matrix M2 determines the corresponding probability distribution V2 of second target test case subset, and so on.
Via the above process, it can get test case and concentrate the corresponding artificial prosodic labeling result of multiple test cases Concentrate the weight of each artificial prosodic labeling result.
It should be noted that the present embodiment is grouped the test case that test case is concentrated, used respectively for every group of test Example determines weight, since the data of each operation are relatively fewer, operation efficiency with higher, in addition, by previous target The corresponding probability distribution of test case subset can be reduced as the corresponding initial probability distribution of next target detection use-case subset The number of iterations makes calculated result quickly tend to be steady, and can be further improved operation efficiency.
It is further to note that above-mentioned steps S301~step S306 is a kind of preferred implementation of step S203, The present embodiment does not limit step S203 and can only be realized by step S301~step S306, can also realize otherwise, For example, can not be grouped to test use cases, an initial probability distribution V is set for entire test use cases0(V0For NpDimension Column vector, the NpThe value of each element is 1/N in dimensional vectorp), then the corresponding test of any two audiometry personnel is used Number of cases amount is transfer matrix M (N more corresponding than forming entire test use casesp*Np), pass through limn→∞MnV0Iteration several times is carried out, Until calculated result tends to be steady, final calculated result is the corresponding probability distribution V of entire test use cases, each member in V The value of element is that the corresponding target weight of multiple audiometry personnel is determined whole by the corresponding target weight of audiometry personnel A test case concentrates each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of multiple test cases Weight.
Aforementioned process gives obtains the corresponding artificial prosodic labeling of the multiple test cases of test case concentration in advance Next the process of the weight of each artificial prosodic labeling result in result set " step S102: is based on in above-described embodiment The power of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of test cases be obtained ahead of time, multiple Value, determines the weight of the corresponding prosody prediction result of multiple test cases " it is introduced.
Based on each artificial rhythm in the corresponding artificial prosodic labeling result set of being obtained ahead of time, multiple test cases The weight of annotation results determines that the process of the weight of the corresponding prosody prediction result of multiple test cases may include: needle To the corresponding prosody prediction of any test case as a result, from the corresponding artificial prosodic labeling result set of the test case determine with The consistent artificial prosodic labeling of the corresponding prosody prediction result of the test case is as a result, and artificial prosodic labeling knot that will determine Weight of the weight of fruit as the corresponding prosody prediction result of the test case;Multiple test cases are concentrated to obtain test case The weight of corresponding prosody prediction result.
Illustratively, the corresponding prosody prediction result of a test case is x, the corresponding artificial prosodic labeling of the test case Result set is { a1, a2, a3 }, if prosody prediction result x is consistent with artificial prosodic labeling result a2, by artificial prosodic labeling knot Weight of the weight of fruit a2 as prosody prediction result x.It should be noted that prosody prediction result x and artificial prosodic labeling knot Fruit a2 unanimously can be identical as the prosodic phrase stall position in artificial prosodic labeling result a2 for prosody prediction result x.
It should be noted that for the corresponding prosody prediction of any test case as a result, if the corresponding people of the test case In work prosodic labeling result set be not present the consistent artificial prosodic labeling of prosody prediction result corresponding with the test case as a result, It is then 0 by the weight setting of the corresponding prosody prediction result of test case.
It, can basis after obtaining test case and concentrating the weight of the corresponding prosody prediction result of multiple test cases The weight of the corresponding prosody prediction result of multiple test cases, determines the prosody prediction effect of prosody prediction engine to be assessed Assessment result.
Referring to Fig. 4, showing according to the corresponding prosody prediction result of multiple test cases and multiple tests use The weight of the corresponding prosody prediction result of example, determines the assessment result of the prosody prediction effect of prosody prediction engine to be assessed Flow diagram, may include:
Step S401: right respectively by the corresponding prosody prediction result of multiple test cases and multiple test cases The weight for the prosody prediction result answered determines the score of the prosody prediction effect of prosody prediction engine to be assessed.
Specifically, the weight of the corresponding prosody prediction result of multiple test cases can be summed, summation obtains value and returns Score after one change as the prosody prediction effect of prosody prediction engine to be assessed.
Step S402: determine that the score of the prosody prediction effect of prosody prediction engine to be assessed accounts for the ratio of artificial top score Example, the assessment result of the prosody prediction effect as prosody prediction engine to be assessed.
Wherein, artificial top score to the corresponding maximum weight of multiple test cases by summing to obtain, any survey Trying out the corresponding maximum weight of example is each artificial prosodic labeling knot in the corresponding artificial prosodic labeling result set of the test case Maximum weight in the weight of fruit.
In the present embodiment, the score of the prosody prediction effect of prosody prediction engine to be assessed accounts for the ratio of artificial top score Example is able to reflect most of user to the prosody prediction good results degree of prosody prediction engine to be assessed, it is possible to understand that, it is to be evaluated Estimate the prosody prediction effect of prosody prediction engine score account for artificial top score ratio it is bigger, show pre- to the rhythm to be assessed The prosody prediction good results degree for surveying engine is higher.
The appraisal procedure of prosody prediction effect provided by the embodiments of the present application, can be automatically to prosody prediction engine to be assessed Prosody prediction effect is assessed, and compared to existing manual evaluation mode, not only avoids subjective factors to assessment result Influence, and save manpower, reduce that assessment is time-consuming, i.e., appraisal procedure provided by the embodiments of the present application can be automatic, high Effect objectively assesses the prosody prediction effect of prosody prediction engine to be assessed.In addition, test use cases in the present embodiment In multiple corresponding artificial prosodic labeling result sets of test case and the corresponding artificial rhythm of multiple test cases Annotation results concentrate the weight of each artificial prosodic labeling result reusable.
The embodiment of the present application also provides a kind of assessment devices of prosody prediction effect, provide below the embodiment of the present application Assessment device be described, assessment device described below can correspond to each other reference with above-described appraisal procedure.
Referring to Fig. 5, the structure for showing a kind of assessment device of prosody prediction effect provided by the embodiments of the present application is shown It is intended to, the apparatus may include: prosody prediction result obtains module 501, prosody prediction result weight determining module 502 and the rhythm Evaluation module 503.
Prosody prediction result obtains module 501, concentrates the corresponding rhythm of multiple test cases for obtaining test case Restrain prediction result.
Wherein, the corresponding prosody prediction result of each test case is predicted to obtain by prosody prediction engine to be assessed.
Prosody prediction result weight determining module 502, for based on being obtained ahead of time, multiple test cases are corresponding The weight of each artificial prosodic labeling result, determines that the multiple test case is corresponding in artificial prosodic labeling result set The weight of prosody prediction result.
Wherein, the corresponding artificial prosodic labeling result set of any test case include the test case it is corresponding at least one Artificial prosodic labeling is as a result, the weight of any artificial prosodic labeling result can characterize the reasonable journey of the artificial prosodic labeling result Degree.
Prosody prediction recruitment evaluation module 503, for according to the corresponding prosody prediction knot of the multiple test case The weight of fruit and the corresponding prosody prediction result of the multiple test case determines the prosody prediction engine to be assessed Prosody prediction effect assessment result.
The assessment device of prosody prediction effect provided by the embodiments of the present application, first acquisition test case concentrate multiple tests The corresponding prosody prediction of use-case is as a result, be then based on the corresponding artificial rhythm mark of multiple test cases being obtained ahead of time The weight for infusing each artificial prosodic labeling result in result set, determines the corresponding prosody prediction result of multiple test cases Weight determines prosody prediction engine to be assessed finally according to the weight of the corresponding prosody prediction result of multiple test cases Prosody prediction effect assessment result, it can be seen that, the assessment device of prosody prediction effect provided by the embodiments of the present application can Based on the weight of artificial prosodic labeling result each in the corresponding artificial prosodic labeling result set of multiple test cases, automatically The prosody prediction effect of prosody prediction engine to be assessed is assessed, compared to existing manual evaluation mode, is not only avoided Influences of the subjective factors to assessment result, and save that manpower, to reduce assessment time-consuming.
In one possible implementation, the assessment device of prosody prediction effect provided by the above embodiment further include: Artificial prosodic labeling result set obtains module, audio synthesis module and artificial prosodic labeling result weight determining module.
The artificial prosodic labeling result set obtains module, corresponding artificial for obtaining the multiple test case Prosodic labeling result set.
The audio synthesis module, being used for will be in the corresponding artificial prosodic labeling result set of the multiple test case Each artificial prosodic labeling result carries out audio synthesis, obtains the corresponding Composite tone collection of multiple test cases.
The artificial prosodic labeling result weight determining module, for according to audiometry personnel needle each in multiple audiometry personnel The best artificial prosodic labeling choose to each test case is as a result, determined the corresponding artificial rhythm of the multiple test case Restrain the weight that annotation results concentrate each artificial prosodic labeling result.
Wherein, any audiometry personnel are audiometry people for the best artificial prosodic labeling result that any test case is chosen Member carries out audiometry by each Composite tone concentrated to the corresponding Composite tone of the test case, from the corresponding people of the test case The best artificial prosodic labeling result selected in work prosodic labeling result set.
In one possible implementation, the artificial prosodic labeling result weight determining module includes: grouping submodule Block, acquisition submodule, the first weight determine that submodule, the second weight determine that submodule and initial weight determine submodule;
The grouping submodule, the test case for concentrating the test case are divided into multiple groups, every group of test case A test case subset is formed, multiple test case subsets are obtained;
The acquisition submodule, for obtaining test case having not been obtained from the multiple test case subset Collection is used as target detection use-case subset;
First weight determines submodule, for being based on the corresponding initial weight of the multiple audiometry personnel, with And the best artificial prosodic labeling knot that each audiometry personnel choose for test case each in the target detection use-case subset Fruit determines the corresponding target weight of the multiple audiometry personnel;
Second weight determines submodule, for determining institute by the corresponding target weight of the audiometry personnel State each artificial prosodic labeling knot in the corresponding artificial prosodic labeling result set of each test case in target detection use-case subset The weight of fruit;
The initial weight determines submodule, for the corresponding target weight of the multiple audiometry personnel to be determined as The corresponding initial weight of the multiple audiometry personnel, then triggers the acquisition submodule from multiple test case subsets The test case subset that acquisition one had not been obtained is as target detection use-case subset, until there is no the test cases having not been obtained Subset, it is each artificial in the corresponding artificial prosodic labeling result set of the multiple test cases of the test case concentration to obtain The weight of prosodic labeling result.
In one possible implementation, first weight determines submodule, is specifically used for passing through each audiometry people Member is directed to the best artificial prosodic labeling that each test case is chosen in the target detection use-case subset as a result, determining any two The corresponding test case quantity ratio of a audiometry personnel;Based on the corresponding initial weight of the multiple audiometry personnel, Yi Jisuo The corresponding test case quantity ratio of any two audiometry personnel is stated, determines the corresponding target power of the multiple audiometry personnel Value;
Wherein, the test case quantity ratio of any two audiometry personnel are as follows: any two audiometry personnel survey for the target The sum for the corresponding test case of identical best manually prosodic labeling result that each test case that example on probation is concentrated is chosen The ratio of amount and the total quantity of test case in the target detection use-case subset.
In one possible implementation, second weight determines submodule, is specifically used for surveying the target Example on probation concentrates any artificial prosodic labeling in the corresponding artificial prosodic labeling result set of any test case as a result, passing through The corresponding target weight of audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result is chosen, determines the people The weight of work prosodic labeling result, to obtain the corresponding artificial rhythm mark of each test case in the target detection use-case subset Infuse the weight of each artificial prosodic labeling result in result set.
In one possible implementation, second weight determines submodule, by choosing the artificial rhythm mark The corresponding target weight of audiometry personnel that result is best artificial prosodic labeling result is infused, determines the artificial prosodic labeling result It is corresponding specifically for the audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result will be chosen when weight The summation of target weight, summation obtain weight of the value as the artificial prosodic labeling result.
In one possible implementation, in the assessment device of prosody prediction effect provided by the above embodiment, the rhythm Prediction result weight determining module 502 is specifically used for being directed to the corresponding prosody prediction result of any test case: use from the test The consistent artificial rhythm of prosody prediction result corresponding with the test case is determined in the corresponding artificial prosodic labeling result set of example Annotation results, and using the weight for the artificial prosodic labeling result determined as the corresponding prosody prediction result of the test case Weight;To obtain the weight of the corresponding prosody prediction result of the multiple test case.
In one possible implementation, in the assessment device of prosody prediction effect provided by the above embodiment, the rhythm Evaluation module 503 may include: that prosody prediction effect score determines that submodule and assessment result determine submodule.
The prosody prediction effect score determines submodule, for passing through the corresponding rhythm of the multiple test case The weight of prediction result and the corresponding prosody prediction result of the multiple test case determines that the rhythm to be assessed is pre- Survey the score of the prosody prediction effect of engine.
The assessment result determines submodule, for determining the prosody prediction effect of the prosody prediction engine to be assessed Score accounts for the ratio of artificial top score, the assessment result of the prosody prediction effect as the prosody prediction engine to be assessed.
Wherein, the artificial top score is appointed by summing to obtain to the corresponding maximum weight of multiple test cases The corresponding maximum weight of one test case is each artificial rhythm mark in the corresponding artificial prosodic labeling result set of the test case Infuse the maximum weight in the weight of result.
The embodiment of the present application also provides a kind of assessment equipments of prosody prediction effect, referring to Fig. 6, showing the assessment The structural schematic diagram of equipment, the assessment equipment may include: at least one processor 601, at least one communication interface 602, until A few memory 603 and at least one communication bus 604;
In the embodiment of the present application, processor 601, communication interface 602, memory 603, communication bus 604 quantity be At least one, and processor 601, communication interface 602, memory 603 complete mutual communication by communication bus 604;
Processor 601 may be a central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road etc.;
Memory 603 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory) etc., a for example, at least magnetic disk storage;
Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:
It obtains test case and concentrates the corresponding prosody prediction result of multiple test cases, wherein each test case Corresponding prosody prediction result is predicted to obtain by prosody prediction engine to be assessed;
Based on each artificial in the corresponding artificial prosodic labeling result set of being obtained ahead of time, the multiple test case The weight of prosodic labeling result determines the weight of the corresponding prosody prediction result of the multiple test case, wherein any The corresponding artificial prosodic labeling result set of test case include at least one corresponding artificial prosodic labeling of the test case as a result, The weight of any artificial prosodic labeling result can characterize the resonable degree of the artificial prosodic labeling result;
It is right respectively according to the corresponding prosody prediction result of the multiple test case and the multiple test case The weight for the prosody prediction result answered determines the assessment result of the prosody prediction effect of the prosody prediction engine to be assessed.
Optionally, the refinement function of described program and extension function can refer to above description.
The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor Capable program, described program are used for:
It obtains test case and concentrates the corresponding prosody prediction result of multiple test cases, wherein each test case Corresponding prosody prediction result is predicted to obtain by prosody prediction engine to be assessed;
Based on each artificial in the corresponding artificial prosodic labeling result set of being obtained ahead of time, the multiple test case The weight of prosodic labeling result determines the weight of the corresponding prosody prediction result of the multiple test case, wherein any The corresponding artificial prosodic labeling result set of test case include at least one corresponding artificial prosodic labeling of the test case as a result, The weight of any artificial prosodic labeling result can characterize the resonable degree of the artificial prosodic labeling result;
It is right respectively according to the corresponding prosody prediction result of the multiple test case and the multiple test case The weight for the prosody prediction result answered determines the assessment result of the prosody prediction effect of the prosody prediction engine to be assessed.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (15)

1. a kind of appraisal procedure of prosody prediction effect characterized by comprising
It obtains test case and concentrates the corresponding prosody prediction result of multiple test cases, wherein each test case is corresponding Prosody prediction result predict to obtain by prosody prediction engine to be assessed;
Based on each artificial rhythm in the corresponding artificial prosodic labeling result set of being obtained ahead of time, the multiple test case The weight of annotation results determines the weight of the corresponding prosody prediction result of the multiple test case, wherein any test The corresponding artificial prosodic labeling result set of use-case includes at least one corresponding artificial prosodic labeling of the test case as a result, any The weight of artificial prosodic labeling result can characterize the resonable degree of the artificial prosodic labeling result;
It is corresponding according to the corresponding prosody prediction result of the multiple test case and the multiple test case The weight of prosody prediction result determines the assessment result of the prosody prediction effect of the prosody prediction engine to be assessed.
2. the appraisal procedure of prosody prediction effect according to claim 1, which is characterized in that obtain the multiple test and use The weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of example, comprising:
Obtain the corresponding artificial prosodic labeling result set of the multiple test case;
By each of the corresponding artificial prosodic labeling result set of the multiple test case work prosodic labeling result into The synthesis of row audio, obtains the corresponding Composite tone collection of the multiple test case;
According to audiometry personnel each in multiple audiometry personnel for each test case choose best artificial prosodic labeling as a result, Determine the weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of the multiple test case; Wherein, any audiometry personnel for any test case choose best artificial prosodic labeling result be audiometry personnel by pair Each Composite tone that the corresponding Composite tone of the test case is concentrated carries out audiometry, from the corresponding artificial rhythm mark of the test case The best artificial prosodic labeling result selected in note result set.
3. the appraisal procedure of prosody prediction effect according to claim 2, which is characterized in that described according to multiple audiometry people Each audiometry personnel are for the best artificial prosodic labeling of each test case selection as a result, determining that the multiple test is used in member The weight of each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of example, comprising:
The test case that the test case is concentrated is divided into multiple groups, every group of test case forms a test case subset, obtain To multiple test case subsets;
The test case subset that acquisition one had not been obtained from the multiple test case subset is as target detection use-case subset;
It is used based on the corresponding initial weight of the multiple audiometry personnel and each audiometry personnel for the target detection The best artificial prosodic labeling that example concentrates each test case to choose is as a result, determine that the multiple audiometry personnel are corresponding Target weight;
By the corresponding target weight of the audiometry personnel, each test case in the target detection use-case subset is determined The weight of each artificial prosodic labeling result in corresponding artificial prosodic labeling result set;
Using the corresponding target weight of the multiple audiometry personnel as the corresponding initial power of the multiple audiometry personnel Then value executes the test case subset that acquisition one had not been obtained from multiple test case subsets and uses as target detection Example collection, until there is no the test case subset having not been obtained, it is corresponding artificial to obtain the multiple test case The weight of each artificial prosodic labeling result in prosodic labeling result set.
4. the appraisal procedure of prosody prediction effect according to claim 3, which is characterized in that described to be based on the multiple survey The corresponding initial weight of personnel and each audiometry personnel are listened to use for test each in the target detection use-case subset The best artificial prosodic labeling that example is chosen is as a result, determine the corresponding target weight of the multiple audiometry personnel, comprising:
The best artificial rhythm chosen by each audiometry personnel for test case each in the target detection use-case subset Annotation results determine the corresponding test case quantity ratio of any two audiometry personnel, wherein any two audiometry personnel are corresponding Test case quantity ratio are as follows: any two audiometry personnel choose for each test case in the target detection use-case subset The total quantity and the sub- integrated test of the target detection use-case of the corresponding test case of identical best manually prosodic labeling result are used The ratio of the total quantity of example;
Based on the corresponding initial weight of the multiple audiometry personnel and the corresponding test of any two audiometry personnel Use-case quantity ratio determines the corresponding target weight of the multiple audiometry personnel.
5. the appraisal procedure of prosody prediction effect according to claim 3, which is characterized in that described to pass through the audiometry people The corresponding target weight of member determines the corresponding artificial prosodic labeling of each test case in the target detection use-case subset The weight of each artificial prosodic labeling result in result set, comprising:
Any people in artificial prosodic labeling result set corresponding for test case any in the target detection use-case subset Work prosodic labeling result:
By choosing the corresponding target weight of audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result, Determine the weight of the artificial prosodic labeling result;
To obtain in the corresponding artificial prosodic labeling result set of each test case in the target detection use-case subset everyone The weight of work prosodic labeling result.
6. the appraisal procedure of prosody prediction effect according to claim 5, which is characterized in that described artificial by choosing this Prosodic labeling result is the corresponding target weight of audiometry personnel of best artificial prosodic labeling result, determines the artificial prosodic labeling As a result weight, comprising:
The corresponding target weight of audiometry personnel that the artificial prosodic labeling result is best artificial prosodic labeling result will be chosen to ask With, summation obtain weight of the value as the artificial prosodic labeling result.
7. the appraisal procedure of prosody prediction effect described according to claim 1~any one of 6, which is characterized in that described Based on each artificial prosodic labeling in the corresponding artificial prosodic labeling result set of being obtained ahead of time, the multiple test case As a result weight determines the weight of the corresponding prosody prediction result of the multiple test case, comprising:
For the corresponding prosody prediction result of any test case:
Prosody prediction result corresponding with the test case is determined from the corresponding artificial prosodic labeling result set of the test case Consistent artificial prosodic labeling is as a result, and corresponding as the test case using the weight for the artificial prosodic labeling result determined The weight of prosody prediction result;
To obtain the weight of the corresponding prosody prediction result of the multiple test case.
8. the appraisal procedure of prosody prediction effect described according to claim 1~any one of 6, which is characterized in that described According to the corresponding prosody prediction result of the multiple test case and the corresponding rhythm of the multiple test case The weight of prediction result determines the assessment result of the prosody prediction effect of the prosody prediction engine to be assessed, comprising:
It is corresponding by the corresponding prosody prediction result of the multiple test case and the multiple test case The weight of prosody prediction result determines the score of the prosody prediction effect of the prosody prediction engine to be assessed;
Determine that the score of the prosody prediction effect of the prosody prediction engine to be assessed accounts for the ratio of artificial top score, as institute State the assessment result of the prosody prediction effect of prosody prediction engine to be assessed;
Wherein, the artificial top score by summing to obtain to the corresponding maximum weight of each test case, use by any test The corresponding maximum weight of example is each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of the test case Maximum weight in weight.
9. a kind of assessment device of prosody prediction effect characterized by comprising it is pre- that prosody prediction result obtains module, the rhythm Survey result weight determining module and prosody prediction recruitment evaluation module;
The prosody prediction result obtains module, concentrates the corresponding rhythm of multiple test cases pre- for obtaining test case Survey result, wherein the corresponding prosody prediction result of each test case is predicted to obtain by prosody prediction engine to be assessed;
The prosody prediction result weight determining module, for being respectively corresponded based on test case be obtained ahead of time, the multiple Artificial prosodic labeling result set in each artificial prosodic labeling result weight, determine that the multiple test case respectively corresponds Prosody prediction result weight, wherein the corresponding artificial prosodic labeling result set of any test case includes the test case At least one corresponding artificial prosodic labeling is as a result, the weight of any artificial prosodic labeling result can characterize the artificial rhythm mark Infuse the resonable degree of result;
The prosody prediction recruitment evaluation module, for according to the corresponding prosody prediction result of the multiple test case with And the weight of the corresponding prosody prediction result of the multiple test case, determine the rhythm of the prosody prediction engine to be assessed Restrain the assessment result of prediction effect.
10. the assessment device of prosody prediction effect according to claim 9, which is characterized in that further include: artificial rhythm mark It infuses result set and obtains module, audio synthesis module and artificial prosodic labeling result weight determining module;
The artificial prosodic labeling result set obtains module, for obtaining the corresponding artificial rhythm of the multiple test case Annotation results collection;
The audio synthesis module, for will be every in the corresponding artificial prosodic labeling result set of the multiple test case A artificial prosodic labeling result carries out audio synthesis, obtains the corresponding Composite tone collection of the multiple test case;
The artificial prosodic labeling result weight determining module, it is every for being directed to according to audiometry personnel each in multiple audiometry personnel The best artificial prosodic labeling that a test case is chosen is as a result, determine the corresponding artificial rhythm mark of the multiple test case Infuse the weight of each artificial prosodic labeling result in result set;Wherein, any audiometry personnel choose for any test case Best artificial prosodic labeling result, which is audiometry personnel, passes through each synthesized voice concentrated to the corresponding Composite tone of the test case Frequency carries out audiometry, the best artificial prosodic labeling knot selected from the corresponding artificial prosodic labeling result set of the test case Fruit.
11. the assessment device of prosody prediction effect according to claim 9, which is characterized in that the artificial prosodic labeling As a result weight determining module includes: and is grouped submodule, acquisition submodule, the first weight to determine that submodule, the second weight determine son Module and initial weight determine submodule;
The grouping submodule, the test case for concentrating the test case are divided into multiple groups, every group of test case composition One test case subset obtains multiple test case subsets;
The acquisition submodule is made for obtaining the test case subset that one had not been obtained from the multiple test case subset For target detection use-case subset;
First weight determines submodule, is used to be based on the corresponding initial weight of the multiple audiometry personnel, and every A audiometry personnel are directed to the best artificial prosodic labeling that each test case is chosen in the target detection use-case subset as a result, really Determine the corresponding target weight of the multiple audiometry personnel;
Second weight determines submodule, for determining the mesh by the corresponding target weight of the audiometry personnel Mapping tries out example and concentrates each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of each test case Weight;
The initial weight determines submodule, described for the corresponding target weight of the multiple audiometry personnel to be determined as The corresponding initial weight of multiple audiometry personnel, then triggers the acquisition submodule and obtains from multiple test case subsets The one test case subset having not been obtained is as target detection use-case subset, until there is no the test case having not been obtained Collection, to obtain each artificial prosodic labeling result in the corresponding artificial prosodic labeling result set of the multiple test case Weight.
12. the assessment device of prosody prediction effect according to claim 11, which is characterized in that first weight determines Submodule, specifically for being chosen most by each audiometry personnel for test case each in the target detection use-case subset Beautiful woman's work prosodic labeling is as a result, determine the corresponding test case quantity ratio of any two audiometry personnel;Based on the multiple audiometry The corresponding initial weight of personnel and the corresponding test case quantity ratio of any two audiometry personnel, determine described in The corresponding target weight of multiple audiometry personnel;
Wherein, the test case quantity ratio of any two audiometry personnel are as follows: any two audiometry personnel use for the target detection The total quantity for the corresponding test case of identical best artificial prosodic labeling result that each test case that example is concentrated is chosen with The ratio of the total quantity of test case in the target detection use-case subset.
13. the assessment device of prosody prediction effect according to claim 11, which is characterized in that second weight determines Submodule is specifically used for artificial prosodic labeling result set corresponding for test case any in the target detection use-case subset In any artificial prosodic labeling as a result, by choosing the survey that the artificial prosodic labeling result is best artificial prosodic labeling result The corresponding target weight of personnel is listened, determines the weight of the artificial prosodic labeling result, to obtain the target detection use-case subset In in the corresponding artificial prosodic labeling result set of each test case each artificial prosodic labeling result weight.
14. the assessment device of the prosody prediction effect according to any one of claim 9~13, which is characterized in that institute Prosody prediction result weight determining module is stated, is specifically used for being directed to the corresponding prosody prediction result of any test case: from the survey Determine that prosody prediction result corresponding with the test case is consistent artificial in the corresponding artificial prosodic labeling result set of example on probation Prosodic labeling is as a result, and using the weight for the artificial prosodic labeling result determined as the corresponding prosody prediction knot of the test case The weight of fruit;To obtain the weight of the corresponding prosody prediction result of the multiple test case.
15. the assessment device of the prosody prediction effect according to any one of claim 9~13, which is characterized in that institute Stating prosody prediction recruitment evaluation module includes: that prosody prediction effect score determines that submodule and assessment result determine submodule;
The prosody prediction effect score determines submodule, for passing through the corresponding prosody prediction of the multiple test case As a result and the weight of the corresponding prosody prediction result of the multiple test case, determine that the prosody prediction to be assessed draws The score for the prosody prediction effect held up;
The assessment result determines submodule, the score of the prosody prediction effect for determining the prosody prediction engine to be assessed The ratio for accounting for artificial top score, the assessment result of the prosody prediction effect as the prosody prediction engine to be assessed;Wherein, For the artificial top score by summing to obtain to the corresponding maximum weight of each test case, any test case is corresponding most Big weight be in the corresponding artificial prosodic labeling result set of the test case in the weight of each artificial prosodic labeling result most Big weight.
CN201910461506.5A 2019-05-30 2019-05-30 Method and device for evaluating rhythm prediction effect Active CN110176225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910461506.5A CN110176225B (en) 2019-05-30 2019-05-30 Method and device for evaluating rhythm prediction effect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910461506.5A CN110176225B (en) 2019-05-30 2019-05-30 Method and device for evaluating rhythm prediction effect

Publications (2)

Publication Number Publication Date
CN110176225A true CN110176225A (en) 2019-08-27
CN110176225B CN110176225B (en) 2021-08-13

Family

ID=67696566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910461506.5A Active CN110176225B (en) 2019-05-30 2019-05-30 Method and device for evaluating rhythm prediction effect

Country Status (1)

Country Link
CN (1) CN110176225B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110797005A (en) * 2019-11-05 2020-02-14 百度在线网络技术(北京)有限公司 Prosody prediction method, apparatus, device, and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000765A (en) * 2007-01-09 2007-07-18 黑龙江大学 Speech synthetic method based on rhythm character
CN101051458A (en) * 2006-04-04 2007-10-10 中国科学院自动化研究所 Rhythm phrase predicting method based on module analysis
CN101131818A (en) * 2006-07-31 2008-02-27 株式会社东芝 Speech synthesis apparatus and method
US20100075806A1 (en) * 2008-03-24 2010-03-25 Michael Montgomery Biorhythm feedback system and method
CN104485115A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Pronunciation evaluation equipment, method and system
CN105244020A (en) * 2015-09-24 2016-01-13 百度在线网络技术(北京)有限公司 Prosodic hierarchy model training method, text-to-speech method and text-to-speech device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051458A (en) * 2006-04-04 2007-10-10 中国科学院自动化研究所 Rhythm phrase predicting method based on module analysis
CN101131818A (en) * 2006-07-31 2008-02-27 株式会社东芝 Speech synthesis apparatus and method
CN101000765A (en) * 2007-01-09 2007-07-18 黑龙江大学 Speech synthetic method based on rhythm character
US20100075806A1 (en) * 2008-03-24 2010-03-25 Michael Montgomery Biorhythm feedback system and method
CN104485115A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Pronunciation evaluation equipment, method and system
CN105244020A (en) * 2015-09-24 2016-01-13 百度在线网络技术(北京)有限公司 Prosodic hierarchy model training method, text-to-speech method and text-to-speech device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110797005A (en) * 2019-11-05 2020-02-14 百度在线网络技术(北京)有限公司 Prosody prediction method, apparatus, device, and medium

Also Published As

Publication number Publication date
CN110176225B (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN105323285A (en) APP-product multi-platform release method
CN107544905A (en) The optimization method and system of regression test case collection
CN106326116A (en) Method and device for testing product
CN102831055A (en) Test case selection method based on weighting attribute
CN106325756A (en) Data storage and data computation methods and devices
CN106779086A (en) A kind of integrated learning approach and device based on Active Learning and model beta pruning
CN109657125A (en) Data processing method, device, equipment and storage medium based on web crawlers
CN110176225A (en) A kind of appraisal procedure and device of prosody prediction effect
CN105447072A (en) Configurable interface framework as well as searching method and system utilizing framework
CN106919576A (en) Using the method and device of two grades of classes keywords database search for application now
CN106844319A (en) Report form generation method and device
CN103218419B (en) Web tab clustering method and system
CN103957531B (en) The method and apparatus that signal testing is carried out using intelligent communications terminal
CN105447635A (en) Examination and approval method and device in workflow
CN106919587A (en) Application program search system and method
CN109102303A (en) Risk checking method and relevant apparatus
CN108664593A (en) Data consistency verification method, device, storage medium and electronic equipment
CN104572774A (en) Searching method and device
CN110334019A (en) A kind of test method, device and readable storage medium storing program for executing
CN110297770A (en) Application testing method and relevant apparatus
CN109614465A (en) Data processing method, device and electronic equipment based on citation relations
CN109285009A (en) It brushes single recognition methods and brushes single identification device
CN105204067B (en) Seismic horizon method for tracing and device
CN109002511A (en) A kind of intelligent recommendation method and apparatus of public lavatory
CN101582133A (en) Scheme evaluating system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant