CN106157974A

CN106157974A - Text recites quality assessment device and method

Info

Publication number: CN106157974A
Application number: CN201510161880.5A
Authority: CN
Inventors: 石自强; 刘汝杰
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-04-07
Filing date: 2015-04-07
Publication date: 2016-11-23

Abstract

It relates to text recites quality assessment device and method.Described device includes: acquiring unit, recites characteristic curve for obtaining the text produced by reciting text；Cutting unit, splits for described text is recited characteristic curve, recites characteristic curve obtaining the word of each word in described text；Rhythm score acquiring unit, for the word of described each word is recited characteristic curve, quasi-characteristic curve compares with the sign of each word, to obtain the rhythm score of each word；Acoustic score acquiring unit, recites accuracy, to obtain the acoustic score of each word for recite that characteristic curve determines described each word according to the word of described each word；And assessment unit, described text is recited quality be estimated for rhythm score based on each word and acoustic score.Owing to can give a mark for the rhythm of each word and sounding, it is estimated such that it is able to text is recited quality so that the result of assessment more accurately and meets reality.

Description

Text recites quality assessment device and method

Technical field

It relates to the technical field of Audio Processing, recite quality assessment device more particularly to text And method.

Background technology

This part provides the background information relevant with the disclosure, and this is not necessarily prior art.

The automatically assessment reciting voice quality is such a technology, in this technology, student according to Specify text to recite, and computer is according to reciting voice and standard reads aloud the Quality Feedback of voice Go out mark.This technology can help student to be best understood from the level oneself recited, thus oneself carries Height recites skill, and the amount of labour of teacher can be greatly reduced simultaneously.Although this kind of technology is widely used, But rarely have the work of this respect at present.

Summary of the invention

This part provides the general summary of the disclosure rather than its four corner or its whole features Full disclosure.

The purpose of the disclosure is to provide a kind of text to recite quality assessment device and method, and it can pin The rhythm and sounding to each word in text are given a mark, and enter such that it is able to text is recited quality Row assessment so that the result of assessment more accurately and meets reality.

One side according to the disclosure, it is provided that quality assessment device recited by a kind of text, this device bag Include: acquiring unit, recite characteristic curve for obtaining the text produced by reciting text；Segmentation Unit, splits for described text is recited characteristic curve, to obtain each word in described text Word recite characteristic curve；Rhythm score acquiring unit, for reciting feature by the word of described each word Quasi-characteristic curve compares curve with the sign of each word, to obtain the rhythm score of each word；Sound Learn a point acquiring unit, determines described each word for reciting characteristic curve according to the word of described each word Recite accuracy, to obtain the acoustic score of each word；And assessment unit, for based on each Described text is recited quality and is estimated by the rhythm score of word and acoustic score.

According to another aspect of the present disclosure, it is provided that a kind of based on the text being made up of word and described literary composition The method to the quality that described text is recited assessed in this standard pronunciation, and the method includes: obtain Take the text produced by described reciting and recite characteristic curve；Described text is recited characteristic curve enter Row segmentation, recites characteristic curve obtaining the word of each word in described text；Word by described each word Quasi-characteristic curve compares with the sign of each word, to obtain the rhythm of each word to recite characteristic curve Score；What the word according to described each word recited that characteristic curve determines described each word recites accuracy, To obtain the acoustic score of each word；And rhythm score of based on each word and acoustic score are to described The quality of reciting of text is estimated.

According to another aspect of the present disclosure, it is provided that a kind of machinable medium, it carries Including the program product of the machine readable instructions code being stored therein, wherein, described instruction code is worked as When being read by computer and perform, it is possible to make described computer perform to recite matter according to the text of the disclosure Amount appraisal procedure.

Use and recite quality assessment device and method according to the text of the disclosure, can obtain in text The rhythm score of each word and acoustic score, and can learn by rhythm score harmony based on each word Point text being recited quality is estimated, and so allows for the result of assessment more accurately and symbol Close reality.

From describe provided herein, further suitability region will become obvious.This summary In description and specific examples be intended merely to signal purpose, and be not intended to limit the scope of the present disclosure.

Accompanying drawing explanation

Accompanying drawing described here is intended merely to the purpose of the signal of selected embodiment and not all possible Implement, and be not intended to limit the scope of the present disclosure.In the accompanying drawings:

Fig. 1 is that diagram recites the structure of quality assessment device according to the text that embodiment of the disclosure Block diagram；

Fig. 2 is the schematic diagram that characteristic curve recited by text；

Fig. 3 is that diagram exists the probability carrying out reciting and the schematic diagram of the energy recited；

Fig. 4 is the schematic diagram of the fundamental curve of the word in diagram text；

Fig. 5 is the assessment that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of the structure of unit；

Fig. 6 is the start-stop that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of the structure of position determination unit；

Fig. 7 is the start-stop that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of the structure of cell really in position determination unit；

Fig. 8 is the start-stop that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of another structure of cell really in position determination unit；

Fig. 9 is the start-stop that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of another structure of cell really in position determination unit；

Figure 10 is the start-stop that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of another structure of cell really in position determination unit；

Figure 11 is the rhythm that diagram is recited in quality assessment device according to the text that embodiment of the disclosure The block diagram of the structure of score acquiring unit；

The flow chart of method for evaluating quality recited by the text that embodiment of the disclosure according to Figure 12；With And

Figure 13 is that quality assessment device recited by the text that wherein can realize according to embodiment of the disclosure Block diagram with the example arrangement of the general purpose personal computer of method.

Although the disclosure is subjected to various amendment and alternative forms, but the conduct of its specific embodiment Example is shown in the drawings, and is described in detail here.It should be understood, however, that at this to specific The description of embodiment is not intended to be restricted to the disclosure disclosed concrete form, but on the contrary, this Open purpose is intended to cover all modifications within the spirit and scope of the disclosure, equivalence and replaces Change.It should be noted that run through several accompanying drawing, corresponding label indicates corresponding parts.

Detailed description of the invention

With reference now to accompanying drawing, it is described more fully the example of the disclosure.Hereinafter describe and be substantially Exemplary, and be not intended to limit the disclosure, application or purposes.

Provide example embodiment, in order to the disclosure will become detailed, and will be to this area skill Art personnel pass on its scope fully.Elaborate numerous specific detail such as particular elements, device and side The example of method, to provide the detailed understanding of embodiment of this disclosure.For those skilled in the art Speech is it will be obvious that need not use specific details, and example embodiment can be with many different Form is implemented, and they shall not be interpreted to limit the scope of the present disclosure.Implement in some example In example, do not describe well-known process, well-known structure and well-known skill in detail Art.

Present disclosure proposes a kind of automatic assessment technology reciting voice.It is possible, firstly, to extract such as by The characteristic curve of the voice that student recites.On the basis of this characteristic curve, can will recite voice segmentation Pronunciation character curve for corresponding each word.It is then possible to the characteristic curve of each word is sent out with standard The characteristic curve of each word of sound compares, such that it is able to the pronunciation marking of each word obtained, and Obtain the mark of text further.Owing to this mark is the most relevant to prosodic features, it is therefore possible to go out Existing different text causes the situation of identical characteristic curve.In order to overcome this problem, it is possible to use pre- The acoustic model first obtained calculates the acoustic score reciting sounding according to text.Finally combine rhythm score And acoustic score, it is thus achieved that this recites the quality evaluation marking of voice.

Fig. 1 illustrates the knot reciting quality assessment device 100 according to the text that embodiment of the disclosure Structure.As it is shown in figure 1, it is permissible to recite quality assessment device 100 according to the text that embodiment of the disclosure Obtain including acquiring unit 110, cutting unit 120, rhythm score acquiring unit 130, acoustic score Take unit 140 and assessment unit 150.

Acquiring unit 110 can obtain the text produced by reciting text and recite characteristic curve.

Fig. 2 shows that the example of characteristic curve recited by text.In fig. 2, abscissa represents with the second and is The time of unit, vertical coordinate then represents the amplitude of speech waveform.In order to be conducive to contrasting and subsequently Marking, according to preferred embodiment of the present disclosure, device 100 may further include compression unit (not Illustrate), carry out amplitude compression text being recited characteristic curve.In fig. 2, feature recited by text The amplitude of curve is compressed between [-1,1].

Text can be recited characteristic curve and splits by cutting unit 120, each to obtain in text Characteristic curve recited in the word of word.

After this, rhythm score acquiring unit 130 word of each word can be recited characteristic curve with The quasi-characteristic curve of sign of each word compares, to obtain the rhythm score of each word.

Further, can to recite characteristic curve according to the word of each word true for acoustic score acquiring unit 140 That determines each word recites accuracy, to obtain the acoustic score of each word.

Finally, assessment unit 150 can rhythm score based on each word and acoustic score to text Recite quality to be estimated.

Reciting in quality assessment device 100 according to the text that embodiment of the disclosure, the rhythm must separately win Take unit 130 and can obtain the rhythm score of each word, and acoustic score acquiring unit 140 is permissible Obtain the acoustic score of each word.Rhythm score based on each word and acoustic score, assess unit Text can be recited quality by 150 to be estimated.With based on the whole sentence (or text) recited Continuous prosody characteristics carry out reciting quality evaluation with continuous acoustic characteristic and compare, according to the disclosure Technical scheme makes the result of assessment more accurately and meet reality.

In order to be more fully understood that the technical scheme of the disclosure, quality recited by the text below for the disclosure Apparatus for evaluating is described below in greater detail.

Fig. 5 shows the assessment reciting in quality assessment device according to the text that embodiment of the disclosure Unit 500.Assessment unit 500 shown in Fig. 5 is corresponding to the assessment unit 150 shown in Fig. 1.

As it is shown in figure 5, assessment unit 500 can include that score acquiring unit 510, text recited in word Recite score acquiring unit 520 and quality estimation unit 530.

Score acquiring unit 510 recited in word can merge rhythm score and the sound of each word of each word Learn point, recite score with obtain each word.

Further, text is recited score acquiring unit 520 and can be merged all words of comprising in text Recite score, recite PTS with obtain text.

And then, text can be recited quality carry out according to reciting PTS by quality estimation unit 530 Assessment.

Assessment unit 500 as shown in Figure 5 only completes text is recited what quality was estimated A kind of embodiment of function.Skilled person realizes that is, it is possible to use other Embodiment completes this function.Such as, as described above that first merging text comprises The rhythm score of all words is to obtain the rhythm PTS of text, and be then combined with comprising in text is all The acoustic score of word, to obtain the acoustics PTS of text, finally merges the rhythm PTS harmony of text PTS recites PTS with acquisition text.

It addition, according to embodiment of the disclosure, as it is shown in figure 1, cutting unit 120 can include Only position determination unit 121.

Start-stop position determination unit 121 can according to text recite that characteristic curve determines in text every Individual word start-stop position in characteristic curve recited by text, recites spy obtaining the word of each word in text Levy curve.

Fig. 6 shows the start-stop reciting in quality assessment device according to the text that embodiment of the disclosure Position determination unit 600.Start-stop position determination unit 600 shown in Fig. 6 is corresponding to shown in Fig. 1 Start-stop position determination unit 121.

As shown in Figure 6, start-stop position determination unit 600 can include that computing unit 610, fundamental frequency are bent Line acquiring unit 620 and determine unit 630.

Computing unit 610 can be recited to exist in the characteristic curve each frame of calculating according to text and carry on the back The probability readed aloud and the energy recited.

Fig. 3 shows and there is the probability carrying out reciting and the curve chart of the energy recited.Such as, first Can be to waveform extracting frequency spectrum as shown in Figure 2, and yardstick of taking the logarithm frequency spectrum, to carry out amplitude Compression.It follows that the frequency spectrum of logarithmic scale can be filtered, each frame the most just can be calculated Probability that whether someone speaks and energy.In figure 3, solid line indicates whether it is the probability spoken of people, Dotted line then represents energy.

Fundamental curve acquiring unit 620 can be recited characteristic curve according to text and obtain in text each The fundamental curve of word.

Fig. 4 is the schematic diagram of the fundamental curve of the word in diagram text.Dynamic programming algorithm can be utilized Extract the fundamental frequency feature of waveform, to obtain the fundamental curve of each individual character.Dynamic programming algorithm is in ability Territory is it is well known that this is no longer described in detail by the disclosure.

According to each frame exists the probability carrying out reciting and the energy recited and the fundamental frequency of each word Curve, determines each word rising in characteristic curve recited by text that unit 630 may determine that in text Stop bit is put.

Fig. 7 shows the start-stop reciting in quality assessment device according to the text that embodiment of the disclosure Really cell 700 in position determination unit.Shown in Fig. 7, cell 700 corresponds to Fig. 6 institute really Show cell 630 really.

As it is shown in fig. 7, determine that unit 700 can include fundamental frequency section number decision unit 710 and determine Unit (first determines unit) 720.

Fundamental frequency section number decision unit 710 can determine fundamental frequency section according to the fundamental curve of each word Number.Here, fundamental frequency section be each word fundamental curve in the fragment of continuous continual fundamental frequency.Example As multiple fundamental frequency section can be seen in the diagram.

Then, according to number and the relation of the number of words of text of fundamental frequency section, determine that unit 720 can be true Determine the start-stop position in characteristic curve recited by text of each word in text.

Fig. 8 shows that the text of another embodiment according to the disclosure is recited in quality assessment device Really cell 800 in start-stop position determination unit.Shown in Fig. 8, cell 800 corresponds to figure really Really cell 630 shown in 6.

As shown in Figure 8, determine that unit 800 can include that probability curve determines that unit 810, energy are bent Line determines unit 820, division unit 830 and determines unit (second determines unit) 840.

Probability curve determines that unit 810 can determine according to there is the probability carrying out reciting in each frame Probability curve (shown in solid as in Fig. 3).

Energy curve determines according to the energy recited, unit 820 can determine that energy curve is (such as Fig. 3 In dotted line shown in).

Then, division unit 830 can be according to the valley point in probability curve or energy curve by text Recite characteristic curve and be divided into curved section.

After this, according to number and the relation of the number of words of text of curved section, unit 840 is determined May determine that the start-stop position in characteristic curve recited by text of each word in text.

Fig. 9 shows that the text of another embodiment according to the disclosure is recited in quality assessment device Really cell 900 in start-stop position determination unit.Shown in Fig. 9, cell 900 corresponds to figure really Really cell 630 shown in 6.

As it is shown in figure 9, determine that unit 900 can include that probability curve determines that unit 910, energy are bent Line determines that unit 920, energy probability curve determine unit 930, division unit 940 and determine unit (the 3rd determines unit) 950.

Similarly, probability curve determines that unit 910 can be according to there is carry out reciting general in each frame Rate determines probability curve, and energy curve determines that unit 920 can determine energy according to the energy recited Discharge curve.

Further, energy probability curve determines that unit 930 can be true according to probability curve and energy curve Probability curve surely.Here, energy probability curve not only includes the feature of probability curve but also includes energy The feature of curve.

Then, text can be recited feature according to the valley point in energy probability curve by division unit 940 Curve is divided into curved section.

After this, according to number and the relation of the number of words of text of curved section, unit 950 is determined May determine that the start-stop position in characteristic curve recited by text of each word in text.

Figure 10 shows that the text of another embodiment according to the disclosure is recited in quality assessment device Really cell 1000 in start-stop position determination unit.Shown in Figure 10, cell 1000 is corresponding really In cell 630 really shown in Fig. 6.

As shown in Figure 10, determine that unit 1000 can include that probability curve determines unit 1010, energy Discharge curve determines that unit 1020, fundamental frequency energy probability curve determine unit 1030, division unit 1040 With determine unit (the 4th determines unit) 1050.

Similarly, probability curve determines that unit 1010 can carry out reciting according to existing in each frame Probability determines probability curve, and energy curve determines that unit 1020 can be true according to the energy recited Discharge curve surely.

Further, fundamental frequency energy probability curve determines that unit 1030 can be according to probability curve, energy The fundamental curve of curve and each word determines fundamental frequency energy probability curve.Here, fundamental frequency energy probability is bent Line includes the feature of word fundamental curve, the feature of probability curve and the feature of energy curve.

Then, text can be carried on the back by division unit 1040 according to the valley point in fundamental frequency energy probability curve Read aloud characteristic curve and be divided into curved section.

After this, according to number and the relation of the number of words of text of curved section, unit 1050 is determined May determine that the start-stop position in characteristic curve recited by text of each word in text.

Figure 11 shows the rhythm reciting in quality assessment device according to the text that embodiment of the disclosure Score acquiring unit 1100.Rhythm score acquiring unit 1100 shown in Figure 11 is corresponding to shown in Fig. 1 Rhythm score acquiring unit 130.

As shown in figure 11, rhythm score acquiring unit 1100 can include converting unit (the first conversion Unit) 1110, converting unit (the second converting unit) 1120 and comparing unit 1130.

The word of each word can be recited characteristic curve and be converted into and recite fundamental frequency sequence by converting unit 1110 Row.

Quasi-for the sign of each word characteristic curve can be converted into normal fundamental frequency sequence by converting unit 1120 Row.

After this, comparing unit 1130 can be carried out reciting fundamental frequency sequence with normal fundamental frequency sequence Relatively, to obtain the rhythm score of each word.

It should be noted that the concrete mode that characteristic curve is converted into fundamental frequency sequence is in the art It is well known that and the disclosure this is not had particular restriction.

As seen from Figure 4, the fundamental curve of the word of different pronunciations is different.In conjunction with Fundamental frequency feature and other phonetic features, such as MFCC (Mel-Frequency Cepstral Coefficient, mel-frequency cepstrum coefficient), contrast standard is read aloud and pupil read aloud in individual character Pronunciation, can provide the accuracy that individual character is read aloud.On this basis, so can be given simple sentence and The accuracy rate marking of whole first poem.

Although speech score of reciting based on word can obtain the marking of sentence, and is recited further The score of voice, but owing to this mark is the most relevant to prosodic features, it is thus possible to there is different text and lead Cause the situation of same characteristic features curve.In order to overcome this problem, it is possible to use the acoustics that training in advance is good Model, obtains acoustic score according to described pre-staged text.Then, learn in conjunction with rhythm score harmony Point, it is possible to obtain the final Speech Assessment marking that this recites.

Specifically, acoustic score acquiring unit 140 as shown in Figure 1 can include modeling unit (not Illustrate) and accuracy determine unit (not shown).

Modeling unit can set up hidden Markov model, recites characteristic curve with the word by each word and turns Change characteristic sequence into.

Further, accuracy determines that unit can determine often according to characteristic sequence and hidden Markov model Individual word recite accuracy, to obtain the acoustic score of each word.

Calculating details about acoustic score is presented herein below.Problem herein becomes: give word in correspondence Hidden Markov model on, the probability of the characteristic curve above extracted.Need exist for calculating All possible status switch.Therefore the characteristic curve of a length of L is observed

Y=y (0), y (1) ..., y (L-1)

Probability, namely acoustic score is given by:

P (Y) = \underset{X}{Σ} P (Y | X) P (X),

Wherein summation is for all possible status switch:

X=x (0), x (1) ..., x (L-1).

Concrete computational methods can be solved by dynamic programming principle.Hidden Markov model is in ability Territory is it is well known that this is no longer described in detail by the disclosure.

Quality evaluation side recited by the text described according to embodiment of the disclosure below in conjunction with Figure 12 Method.Can comment by standard pronunciation based on the text being made up of word and text according to disclosed method Estimate the quality that text is recited.

As shown in figure 12, recite method for evaluating quality according to the text that embodiment of the disclosure to start from Step S110.In step s 110, obtain the text produced by reciting and recite characteristic curve.

It follows that in the step s 120, text is recited characteristic curve and splits, to obtain literary composition In Ben, characteristic curve recited in the word of each word.

It follows that in step s 130, the word of each word is recited the word of characteristic curve and each word Standard feature curve compares, to obtain the rhythm score of each word.

It follows that in step S140, recite characteristic curve according to the word of each word and determine each word Recite accuracy, to obtain the acoustic score of each word.

It follows that in step S150, rhythm score based on each word and acoustic score are to text Quality of reciting be estimated.After this, process terminates.

According to embodiment of the disclosure, rhythm score based on each word and acoustics in step S150 Text is recited quality and is estimated may include that and merges the rhythm score of each word and each by score The acoustic score of word, recites score with obtain each word；Merge the back of the body of all words comprised in text Read aloud score, recite PTS with obtain described text；And according to reciting the PTS matter to reciting Amount is estimated.

According to embodiment of the disclosure, in the step s 120 text is recited characteristic curve and split Recite characteristic curve and may include that and recite characteristic curve according to text obtaining the word of each word in text Determine the start-stop position in characteristic curve recited by text of each word in text, every to obtain in text Characteristic curve recited in the word of individual word.

According to embodiment of the disclosure, recite, according to text, each word that characteristic curve determines in text and exist Text is recited the start-stop position in characteristic curve and be may include that reciting characteristic curve according to text calculates every One frame exists and carries out the probability recited and the energy recited；Recite characteristic curve according to text and obtain literary composition The fundamental curve of each word in Ben；And carry out the probability recited according to each frame exists and recite The fundamental curve of energy and each word determines that each word in text is in characteristic curve recited by text Start-stop position.

According to embodiment of the disclosure, carry out the probability recited and the energy recited according to each frame exists The fundamental curve of amount and each word determines that each word in text is in characteristic curve recited by text Start-stop position may include that the fundamental curve according to each word determines the number of fundamental frequency section, wherein fundamental frequency Section be each word fundamental curve in the fragment of continuous continual fundamental frequency；And the number according to fundamental frequency section With the relation of the number of words of text, mesh determines that each word in text is in characteristic curve recited by text Start-stop position.

According to embodiment of the disclosure, carry out the probability recited and the energy recited according to each frame exists The fundamental curve of amount and each word determines that each word in text is in characteristic curve recited by text Start-stop position can also include: determines probability curve according to there is the probability carrying out reciting in each frame； Energy according to reciting determines energy curve；According to the valley point in probability curve or energy curve by literary composition Originally recite characteristic curve and be divided into curved section；And the pass according to the number of curved section with the number of words of text System determines the start-stop position in characteristic curve recited by text of each word in text.

According to embodiment of the disclosure, carry out the probability recited and the energy recited according to each frame exists The fundamental curve of amount and each word determines that each word in text is in characteristic curve recited by text Start-stop position can also include: determines probability curve according to there is the probability carrying out reciting in each frame； Energy according to reciting determines energy curve；Determine that energy probability is bent according to probability curve and energy curve Line；According to the valley point in energy probability curve, text is recited characteristic curve and be divided into curved section；And With the relation of the number of words of text, number according to curved section determines that each word in text is carried on the back at text Read aloud the start-stop position in characteristic curve.

According to embodiment of the disclosure, carry out the probability recited and the energy recited according to each frame exists The fundamental curve of amount and each word determines that each word in text is in characteristic curve recited by text Start-stop position can also include: determines probability curve according to there is the probability carrying out reciting in each frame； Energy according to reciting determines energy curve；According to probability curve, energy curve and the fundamental frequency of each word Curve determines fundamental frequency energy probability curve；According to the valley point in fundamental frequency energy probability curve, text is recited Characteristic curve is divided into curved section；And come really with the relation of the number of words of text according to the number of curved section Determine the start-stop position in characteristic curve recited by text of each word in text.

According to embodiment of the disclosure, the word of each word is recited characteristic curve accurate with the sign of each word Characteristic curve compares to obtain the rhythm score of each word and may include that and recited by the word of each word Characteristic curve is converted into recites fundamental frequency sequence；The quasi-characteristic curve of sign of each word is converted into standard base Frequency sequence；Fundamental frequency sequence will be recited compare with normal fundamental frequency sequence, to obtain the rhythm of each word Score.

According to embodiment of the disclosure, recite characteristic curve according to the word of each word and determine the back of the body of each word Read aloud accuracy to may include that set up hidden Markov model with the acoustic score obtaining each word, with will The word of each word is recited characteristic curve and is converted into characteristic sequence；And according to characteristic sequence and hidden Ma Erke What husband's model determined each word recites accuracy, to obtain the acoustic score of each word.

According to embodiment of the disclosure, text is being recited before characteristic curve splits, the method Can also include: text is recited characteristic curve and carries out amplitude compression.

The various tools of the above-mentioned steps of method for evaluating quality are recited according to the text that embodiment of the disclosure Made detailed description before body embodiment, be not repeated.

Obviously, reciting each operating process of method for evaluating quality according to the text of the disclosure can be to deposit The mode of storage computer executable program in various machine-readable storage mediums realizes.

And, the purpose of the disclosure can also be accomplished by: storage has above-mentioned execution The storage medium of program code is supplied to system or equipment directly or indirectly, and this system or set Computer or CPU (CPU) in Bei read and perform said procedure code.Now, As long as this system or equipment have the function of execution program, then embodiment of the present disclosure is not limited to Program, and this program can also be arbitrary form, and such as, target program, interpreter perform Program or be supplied to the shell script etc. of operating system.

These machinable mediums above-mentioned include but not limited to: various memorizeies and memory element, Semiconductor equipment, disk cell such as light, magnetic and magneto-optic disk, and other is suitable to Jie of storage information Matter etc..

It addition, computer is by the corresponding website being connected on the Internet, and by according to the disclosure Computer program code is downloaded and is installed in computer then perform this program, it is also possible to realize these public affairs The technical scheme opened.

As shown in figure 13, CPU 1301 is according to the program of storage in read only memory (ROM) 1302 Or from storage part 1308 be loaded into random access memory (RAM) 1303 program perform various from Reason.In RAM 1303, store when CPU 1301 performs various process etc. also according to needs Required data.CPU 1301, ROM 1302 and RAM 1303 connect each other via bus 1304 Connect.Input/output interface 1305 is also connected to bus 1304.

Components described below is connected to input/output interface 1305: importation 1306 (includes keyboard, Mus Mark etc.), output part 1307 (include display, such as cathode ray tube (CRT), liquid crystal Show device (LCD) etc., and speaker etc.), storage part 1308 (including hard disk etc.), communications portion 1309 (including NIC such as LAN card, modem etc.).Communications portion 1309 warp Communication process is performed by network such as the Internet.As required, driver 1310 can be connected to defeated Enter/output interface 1305.Detachable media 1311 such as disk, CD, magneto-optic disk, quasiconductor are deposited Reservoir etc. is installed in driver 1310 as required so that the computer program read out It is installed to as required store in part 1308.

In the case of realizing above-mentioned series of processes by software, it is situated between from network such as the Internet or storage Matter such as detachable media 1311 installs the program constituting software.

It will be understood by those of skill in the art that this storage medium is not limited to its shown in Figure 13 In have program stored therein and equipment distributes the detachable media of the program that provides a user with separately 1311.The example of detachable media 1311 comprises disk (comprising floppy disk (registered trade mark)), CD (comprises Compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be ROM 1302, Hard disk of comprising etc. in storage part 1308, wherein computer program stored, and with comprise setting of they For being distributed to user together.

In the system and method for the disclosure, it is clear that each parts or each step be can decompose and/or Reconfigure.These decompose and/or reconfigure the equivalents that should be regarded as the disclosure.Further, The step performing above-mentioned series of processes can order the most following the instructions perform in chronological order, but It is to be not required to perform the most sequentially in time.Some step can be held parallel or independently of one another OK.

Although combine accompanying drawing above to describe in detail and embodiment of the disclosure, it is to be understood that above Described embodiment is only intended to the disclosure is described, and is not intended that restriction of this disclosure.Right For those skilled in the art, above-mentioned embodiment can be made various changes and modifications and do not have There is the spirit and scope deviating from the disclosure.Therefore, the scope of the present disclosure only by appended claim and Its equivalents limits.

About including the embodiment of above example, following remarks is also disclosed:

Quality assessment device recited by 1. 1 kinds of texts of remarks, including:

Acquiring unit, recites characteristic curve for obtaining the text produced by reciting text；

Cutting unit, splits for described text is recited characteristic curve, to obtain described text In the word of each word recite characteristic curve；

Rhythm score acquiring unit, for reciting characteristic curve and each word by the word of described each word The quasi-characteristic curve of sign compares, to obtain the rhythm score of each word；

Acoustic score acquiring unit, determines described for reciting characteristic curve according to the word of described each word Each word recite accuracy, to obtain the acoustic score of each word；And

Assessment unit, recites described text for rhythm score based on each word and acoustic score Quality is estimated.

Remarks 2. is according to the device described in remarks 1, and wherein, described assessment unit includes:

Score acquiring unit recited in word, for merging the rhythm score of described each word and described each word Acoustic score, recite score with obtain each word；

Score acquiring unit recited by text, for merging reciting of all words of comprising in described text Point, recite PTS with obtain described text；And

Quality estimation unit, for according to described in recite PTS and described text is recited quality carry out Assessment.

Remarks 3. is according to the device described in remarks 1, and wherein, described cutting unit includes:

Start-stop position determination unit, determines in described text for reciting characteristic curve according to described text Each word recite the start-stop position in characteristic curve at described text, each to obtain in described text Characteristic curve recited in the word of word.

Remarks 4. is according to the device described in remarks 3, and wherein, described start-stop position determination unit includes:

Computing unit, carries on the back for reciting according to described text to exist in the characteristic curve each frame of calculating The probability readed aloud and the energy recited；

Fundamental curve acquiring unit, obtains in described text for reciting characteristic curve according to described text The fundamental curve of each word；And

Determine unit, for carrying out the probability recited and the energy recited according to existence in described each frame And the fundamental curve of each word to determine that each word in described text recites feature at described text bent Start-stop position in line.

Remarks 5. is according to the device described in remarks 4, wherein, described determines that unit includes:

Fundamental frequency section number decision unit, for determining fundamental frequency section according to the fundamental curve of described each word Number, wherein said fundamental frequency section be described each word fundamental curve in the sheet of continuous continual fundamental frequency Section；And

First determines unit, for the relation of the number according to described fundamental frequency section with the number of words of described text Determine that each word in described text recites the start-stop position in characteristic curve at described text.

Remarks 6. is according to the device described in remarks 4, wherein, described determines that unit includes:

Probability curve determines unit, determines for the probability carrying out reciting according to existence in described each frame Probability curve；

Energy curve determines unit, determines energy curve for the energy recited described in basis；

Division unit, is used for described according to the valley point in described probability curve or described energy curve Text is recited characteristic curve and is divided into curved section；And

Second determines unit, for the relation of the number according to described curved section with the number of words of described text Determine that each word in described text recites the start-stop position in characteristic curve at described text.

Remarks 7. is according to the device described in remarks 4, wherein, described determines that unit includes:

Energy probability curve determines unit, for determining according to described probability curve and described energy curve Energy probability curve；

Division unit, for reciting feature according to the valley point in described energy probability curve by described text Curve is divided into curved section；And

3rd determines unit, for the relation of the number according to described curved section with the number of words of described text Determine that each word in described text recites the start-stop position in characteristic curve at described text.

Remarks 8. is according to the device described in remarks 4, wherein, described determines that unit includes:

Fundamental frequency energy probability curve determines unit, for according to described probability curve, described energy curve Fundamental frequency energy probability curve is determined with the fundamental curve of described each word；

Division unit, for reciting described text according to the valley point in described fundamental frequency energy probability curve Characteristic curve is divided into curved section；And

4th determines unit, for the relation of the number according to described curved section with the number of words of described text Determine that each word in described text recites the start-stop position in characteristic curve at described text.

Remarks 9. is according to the device described in remarks 1, and wherein, described rhythm score acquiring unit includes:

First converting unit, is converted into recites fundamental frequency for the word of described each word is recited characteristic curve Sequence；

Second converting unit, for being converted into normal fundamental frequency by the quasi-characteristic curve of sign of described each word Sequence；And

Comparing unit, for described fundamental frequency sequence of reciting is compared with described normal fundamental frequency sequence, To obtain the rhythm score of described each word.

Remarks 10. is according to the device described in remarks 1, and wherein, described acoustic score acquiring unit includes:

Modeling unit, is used for setting up hidden Markov model, recites feature with the word by described each word Curve transform becomes characteristic sequence；And

Accuracy determines unit, for determining according to described characteristic sequence and described hidden Markov model Described each word recite accuracy, to obtain the acoustic score of described each word.

Remarks 11., according to the device described in remarks 1, farther includes:

Compression unit, carries out amplitude compression for described text is recited characteristic curve.

Remarks 12. 1 kinds standard based on the text being made up of word and described text pronunciation is assessed Method to the quality that described text is recited, including:

Obtain the text produced by described reciting and recite characteristic curve；

Described text is recited characteristic curve split, to obtain the word back of the body of each word in described text Read aloud characteristic curve；

The word of described each word is recited characteristic curve, and quasi-characteristic curve compares with the sign of each word Relatively, to obtain the rhythm score of each word；

What the word according to described each word recited that characteristic curve determines described each word recites accuracy, with Obtain the acoustic score of each word；And

Described text is recited quality and is estimated by rhythm score based on each word and acoustic score.

Remarks 13. is according to the method described in remarks 12, wherein, rhythm score based on each word and Described text is recited quality and is estimated including by acoustic score:

Merge rhythm score and the acoustic score of described each word of described each word, to obtain each word Recite score；

Merge all words of comprising in described text recites score, total to obtain reciting of described text Score；And

According to described PTS of reciting, the described quality recited is estimated.

Remarks 14. is according to the method described in remarks 12, wherein, described text is recited characteristic curve Carry out splitting to obtain the word of each word in described text to recite characteristic curve and include:

Recite, according to described text, each word that characteristic curve determines in described text to carry on the back at described text Read aloud the start-stop position in characteristic curve, recite characteristic curve obtaining the word of each word in described text.

Remarks 15. is according to the method described in remarks 14, wherein, recites feature according to described text bent Line determines that the start-stop position that each word in described text is recited in characteristic curve at described text includes:

Recite in the characteristic curve each frame of calculating according to described text and there is probability and the back of the body carrying out reciting The energy readed aloud；

Recite characteristic curve according to described text and obtain the fundamental curve of each word in described text；And

The probability recited and the energy recited and each word is carried out according to described each frame exists Fundamental curve determines that each word in described text recites the start stop bit in characteristic curve at described text Put.

Remarks 16. is according to the method described in remarks 15, wherein, according to described each frame exists into It is every that the fundamental curve of the probability recited of row and the energy recited and each word determines in described text The start-stop position that individual word is recited in characteristic curve at described text includes:

Fundamental curve according to described each word determines that the number of fundamental frequency section, wherein said fundamental frequency section are institutes State the fragment of continuous continual fundamental frequency in the fundamental curve of each word；And

Number according to described fundamental frequency section determines in described text with the relation of the number of words of described text Each word recite the start-stop position in characteristic curve at described text.

Remarks 17. is according to the method described in remarks 15, wherein, according to described each frame exists into It is every that the fundamental curve of the probability recited of row and the energy recited and each word determines in described text The start-stop position that individual word is recited in characteristic curve at described text includes:

Probability curve is determined according to described each frame exists the probability carrying out reciting；

Energy curve is determined according to the described energy recited；

According to the valley point in described probability curve or described energy curve, described text is recited feature Curve is divided into curved section；And

Number according to described curved section determines in described text with the relation of the number of words of described text Each word recite the start-stop position in characteristic curve at described text.

Remarks 18. is according to the method described in remarks 15, wherein, according to described each frame exists into It is every that the fundamental curve of the probability recited of row and the energy recited and each word determines in described text The start-stop position that individual word is recited in characteristic curve at described text includes:

Energy curve is determined according to the described energy recited；

Energy probability curve is determined according to described probability curve and described energy curve；

According to the valley point in described energy probability curve, described text is recited characteristic curve and be divided into song Line segment；And

Remarks 19. is according to the method described in remarks 15, wherein, according to described each frame exists into It is every that the fundamental curve of the probability recited of row and the energy recited and each word determines in described text The start-stop position that individual word is recited in characteristic curve at described text includes:

Energy curve is determined according to the described energy recited；

Fundamental curve according to described probability curve, described energy curve and described each word determines fundamental frequency Energy probability curve；

According to the valley point in described fundamental frequency energy probability curve, described text is recited characteristic curve to divide For curved section；And

20. 1 kinds of machinable mediums of remarks, it carries the machine including being stored therein The program product of instructions code, wherein, described instruction code when being read by computer and performing, Described computer can be made to perform according to the method in any of the one of remarks 12-19.

Claims

1. a quality assessment device recited by text, including:

Device the most according to claim 1, wherein, described assessment unit includes:

Device the most according to claim 1, wherein, described cutting unit includes:

Device the most according to claim 3, wherein, described start-stop position determination unit includes:

Device the most according to claim 4, wherein, described determines that unit includes:

Device the most according to claim 1, wherein, described rhythm score acquiring unit includes:

10. a standard pronunciation based on the text being made up of word and described text is assessed institute State the method that text carries out the quality recited, including: