CN102394061A

CN102394061A - Text-to-speech method and system based on semantic retrieval

Info

Publication number: CN102394061A
Application number: CN2011103512258A
Authority: CN
Inventors: 傅泽田; 李鑫星; 张领先; 温皓杰; 李道亮; 刘雪
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2011-11-08
Filing date: 2011-11-08
Publication date: 2012-03-28
Anticipated expiration: 2031-11-08
Also published as: CN102394061B

Abstract

The invention relates to the technical field of speech synthesis, and provides a text-to-speech method and system based on semantic retrieval. Firstly, both forward maximum match and reverse maximum match are used in the method and system, so that almost all non-ambiguous segments can be acquired through segmentation, which greatly improves the accuracy of text segmentation and betters the speech synthesis effect; and secondly, through the combination of the text segmentation method and the traditional semantic-based information retrieval method, the keyword processing methods of information retrieval technologies are learned for the processing of ambiguous segments, so that the efficiency and accuracy of automatic ambiguous segment identification are effectively improved, and the speech synthesis effect is much better.

Description

Text-to-speech method of same and system based on semantic retrieval

Technical field

The present invention relates to the speech synthesis technique field, particularly a kind of text-to-speech method of same and system based on semantic retrieval.

Background technology

Phonetic synthesis produces the technology of artificial voice through the method for machinery, electronics, and phonetic synthesis is the gordian technique that realizes senior man-machine interaction modes such as the man machine language communicates by letter with speech recognition technology.The purpose of phonetic synthesis is any information real-time to be converted into the smooth massage voice reading of standard come out, and relates to the technology in a plurality of fields such as acoustics, linguistics, digital signal processing, computing machine, is a cutting edge technology of field of information processing.Phonetic synthesis will let computing machine can produce the continuous speech of high definition, high naturalness automatically, and this and traditional acoustic playback technology are essentially different.Traditional acoustic playback equipment, like blattnerphone etc., through prerecord sound then playback realize voice reproduction; This mode is all to have very big restriction at aspects such as content, storage, transmission or convenience, promptnesses.The voice that the phonetic synthesis of carrying out through computing machine then can at any time become have high naturalness with any information translation, thus really realize the intelligent interaction between man-machine.

Literary composition language conversion (Text to Speech) technology is under the jurisdiction of phonetic synthesis; Be Word message that computer equipment is automatically produced or outside input change into can listen Chinese that understand, fluent (or other languages; Look real needs and decide) technology of spoken output, in fact text-to-speech system can be regarded as an artificial intelligence system.In order to synthesize high-quality language, except depending on various rules, comprise outside semantics rule, lexical rule, the phonetics rule, also must be to having good understanding in the literal, this also relates to the problem of natural language understanding.Literary composition language transfer process generally includes the linguistics processing, the rhythm is handled and the several aspects of acoustic treatment; Good text-to-speech system should be able to be exported sound clarity, natural and tripping voice; Rather than simple phonetic synthesis; This needs system to carry out philological analysis to text, confirms in the text adjustment that the differentiation of word, speech, sentence about voice is handled with the convenient follow-up rhythm.

In the present text-to-speech system, majority is to combine dictionary that text is traveled through in full, thereby by maximum match length identification word and short sentence text is carried out cutting.But owing to have a large amount of ambiguity field (field that promptly possibly have multiple RM) usually in the text, because the diversity of the text dividing that the ambiguity field causes has been brought great difficulty to automatic word segmentation; At present most literary composition language switch technology; All have to each word of ambiguity field all cutting be a speech, between each speech, all insert the dwell interval mark, and voice one word one that causes synthesizing is disconnected; Factitious pause is more; Mechanicalness is extremely strong, and the pronunciation natural and tripping with the mankind greatly differs from each other, and this also is the maximum bottleneck of restriction literary composition language switch technology development.

Summary of the invention

The technical matters that (one) will solve

The present invention provides a kind of text-to-speech method of same and system based on semantic retrieval in order to solve in the prior art the not good problem of the automatic recognition effect of ambiguity field, and the ambiguity field in the automatic segmentation text effectively significantly improves the quality of synthetic speech.

(2) technical scheme

For realizing above-mentioned purpose, on the one hand, the present invention provides a kind of text-to-speech method of same based on semantic retrieval, and said method comprises step:

S1 carries out forward maximum match and reverse maximum match respectively to the text message of importing;

S2 contrasts the result of two kinds of matching ways to text dividing, and the text word string that cutting is identical is directly as cutting execution in step S6～S7 as a result; The text word string execution in step S3～S7 different to cutting;

S3 extracts the maximum ambiguous phrase of overlap type in each text word string;

S4 carries out semantic retrieval to each uttrance of ambiguous segmentation;

S5 discerns each uttrance of ambiguous segmentation as the cutting result according to the semantic retrieval match condition;

S6 carries out the phonetic-rhythm of word and handles according to the cutting result;

S7, whole words that will pass through rhythm processing synthesize voice output by sequences of text.

Preferably, among the step S3, extract maximum ambiguous phrase of overlap type through the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.

Preferably, among the step S4, said semantic retrieval comprises:

S401 adopting resource description framework ontology model to be carried out carry out on the basis of formalization representation the qualitative reasoning of each uttrance of ambiguous segmentation, realizes the semantic extension of the uttrance of ambiguous segmentation;

S402 representes the quantitative reasoning of the uttrance of ambiguous segmentation after the basic enterprising lang justice expansion of ontology model with the Voronoi diagram formization, obtain the semantic similarity of the uttrance of ambiguous segmentation and expansion back semantic concept;

S403 adopts the vocabulary degree of association to retrieve expanding the back semantic concept, judges that can the uttrance of ambiguous segmentation give expression to clear and definite semantic concept.

Preferably, among the step S403, in instances of ontology, retrieve.

Preferably, among the step S5, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.

On the other hand, the present invention also provides a kind of text-to-speech system based on semantic retrieval simultaneously, and said system comprises:

Forward maximum match module and reverse maximum match module, the text message to input carries out forward maximum match and reverse maximum match respectively;

The matching result comparing module contrasts the result of two kinds of matching ways to text dividing, and the text word string that cutting is identical is directly given rhythm processing module as the cutting result; The text word strings that cutting is different are given extraction module;

Extraction module is given the semantic retrieval module according to the text word string extraction maximum ambiguous phrase of overlap type wherein that the matching result comparing module provides;

The semantic retrieval module is carried out semantic retrieval to each uttrance of ambiguous segmentation;

The coupling identification module is given rhythm processing module according to each uttrance of ambiguous segmentation of semantic retrieval match condition identification as the cutting result;

Rhythm processing module is carried out the phonetic-rhythm of word and is handled according to the cutting result;

The phonetic synthesis module, whole words that will pass through rhythm processing synthesize voice output by sequences of text.

Preferably, in the said extraction module, extract maximum ambiguous phrase of overlap type through the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.

Preferably, said semantic retrieval module further comprises:

The qualitative reasoning module adopting resource description framework ontology model to be carried out carry out on the basis of formalization representation the qualitative reasoning of each uttrance of ambiguous segmentation, realizes the semantic extension of the uttrance of ambiguous segmentation;

The quantitative reasoning module is represented the quantitative reasoning of the uttrance of ambiguous segmentation after the basic enterprising lang justice expansion of ontology model with the Voronoi diagram formization, obtain the semantic similarity of the uttrance of ambiguous segmentation and expansion back semantic concept;

The conceptual retrieval module adopts the vocabulary degree of association to retrieve expanding the back semantic concept, judges that can the uttrance of ambiguous segmentation give expression to clear and definite semantic concept.

Preferably, in the said conceptual retrieval module, in instances of ontology, retrieve.

Preferably, in the said coupling identification module, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.

(3) beneficial effect

Utilize method and system of the present invention; Text dividing method is combined with traditional information retrieval method based on semanteme; Use for reference in the information retrieval technique processing mode for keyword; Carry out the processing of ambiguity field, can effectively improve automatic identification efficiency of ambiguity field and accuracy, improve the effect of phonetic synthesis greatly.

Description of drawings

Fig. 1 among the present invention based on the process flow diagram of the text-to-speech method of same of semantic retrieval.

Embodiment

To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, all other embodiment that those of ordinary skills are obtained under the prerequisite of not making creative work belong to the scope that the present invention protects.

The objective of the invention is text-to-speech method of same is combined with traditional semantic retrieving method, for the processing mode of keyword, carry out the processing of ambiguity field in the reference semantic retrieval art.The semantic searching method based on body of technology maturation is incorporated in the conversion of literary composition language; Only needing in the ontology model of having built up, to expand few according to concrete ken gets final product; And need not semantic search is made any change; Also need not to remove to retrieve text again, the system that makes is prone to realize, greatly reduces cost of development.

Describe according to 1 pair of text-to-speech method of same based on semantic retrieval of the present invention of accompanying drawing below.In Fig. 1, be based on the entire flow of the text-to-speech method of same of semantic retrieval:

S1: the text message to input carries out forward maximum match and reverse maximum match respectively;

S2: contrast the result of two kinds of matching ways to text dividing, the text word string that cutting is identical is directly as cutting execution in step S6 as a result; The text word string execution in step S3～S7 different to cutting;

S3: extract the maximum ambiguous phrase of overlap type in each text word string;

S4: each uttrance of ambiguous segmentation is carried out semantic retrieval;

S5: discern each uttrance of ambiguous segmentation as the cutting result according to the semantic retrieval match condition;

S6: the phonetic-rhythm that carries out word according to the cutting result is handled;

S7: whole words that will pass through rhythm processing synthesize voice output by sequences of text.

Wherein, among the step S1, the forward maximum match is meant natural order (natural reading order or input sequence) the traversal text by text, and the text word string that obtains during with traversal is carried out cutting by the discernible maximum length word string of dictionary; Reverse maximum match is meant the natural ordered backward traversal text by text, and the text word string that obtains during with traversal is carried out cutting by the discernible maximum length word string of dictionary.In general; The cutting precision of reverse coupling is mated a little more than forward; The ambiguity phenomenon that runs into is also less: statistics shows that the error rate of using the forward maximum match merely is 1/169, and the error rate of using reverse maximum match merely is 1/245; Usually adopt reverse maximum match (discerning the input words) in the common text recognition method as adopting this mode in the input method usually; But for phonetic synthesis, this precision also can not satisfy actual needs far away, and the voice output after synthesizing is paused and felt more serious.Use forward maximum match and reverse maximum match among the present invention simultaneously, make the precision of the preliminary cutting of text to promote greatly.

For two kinds of identical word strings of coupling cutting, can regard as is that correct cutting result exports.But use forward maximum match and reverse maximum match the incomplete same word string of cutting will inevitably occur simultaneously (only if there is not the ambiguity statement in the text fully; This can not occur in actual conditions basically); This part word string still can influence the phonetic synthesis effect; For can extract the ambiguity field more accurately, also will carry out processing further among the present invention to different portions as a result.Particularly, among the step S3, each text word string that the cutting result is different is therefrom extracted whole maximum ambiguous phrase of overlap types:

Wherein, the crossing ambiguity field is meant that in word string ABC, AB belongs to word or the short sentence in the dictionary, and BC belongs to word or the short sentence in the dictionary equally, but whole word string ABC claims that then ABC is the crossing ambiguity field again not in dictionary.And being defined as of maximum ambiguous phrase of overlap type: establish S=C ₁C ₂C ₃C _nBe arbitrary word string of length n, S _Max=C _iC _jBe the sub-strings of S (1≤i＜j≤n), and S _MaxBe ambiguous phrase of overlap type; If in S, do not exist and comprise S _MaxBigger ambiguous phrase of overlap type, then claim S _MaxMaximum ambiguous phrase of overlap type for S.For example in sentence " whenever you can look for me "; Whenever when any " " and " " is ambiguous phrase of overlap type; But " whenever " to have contained when any " "; Whenever do not comprised by any ambiguous phrase of overlap type simultaneously, so " " is maximum ambiguous phrase of overlap type, when any " " then is not.New cross reference no longer takes place with any word on every side in maximum ambiguous phrase of overlap type, has certain independence, and this makes us from context environmental, to separate them to carry out independent processing.But during system works, only can't confirm maximum ambiguous phrase of overlap type, obviously lose the synthetic meaning of automatic speech again as adopting artificial cognition according to dictionary.

Thereby in the present invention, extract maximum ambiguous phrase of overlap type according to the mutual information between the Chinese character.Mutual information is meant, for orderly Chinese character string xy, the mutual information between Chinese character x and the y be I (x, y)=p (x, y)/p (x) p (y); Wherein (x y) representes the probability that Chinese character string xy occurs as two words to p, and p (x), p (y) represent the probability that x and y independently occur as monosyllabic word respectively.For the maximum ambiguity field of the thing that guarantees when the ambiguity field is extracted; Can calculate forward maximum match and the reverse maximum match mutual information of adjacent two words of different piece front and back boundary as a result earlier, if mutual information I (x, y) ≠ 0; Then incorporate the word of boundary into the ambiguity field; And continue to calculate the mutual information of new boundary, up to I (x, y)=0 till.

For example above-mentioned text " whenever you can look for me " is used forward maximum matching method and reverse maximum matching method respectively, its result is:

FMM: you/any/time/all/can/come/look for/I

RMM: you/appoint/when/wait/all/can/come/look for/I

When any " " be the ambiguity field if extract, then the mutual information located of ambiguity field front and rear side circle:

Preceding boundary, I (you appoint)=0; Back boundary, I (time, wait) ≠ 0.Therefore " appoint " the preceding border that can be used as the ambiguity field, but " time " can not be as the back border of ambiguity field, and should it be incorporated in the ambiguity field.

Whenever the ambiguity field becomes " " like this, calculates the mutual information of new ambiguous word segment boundary once more:

Preceding boundary, I (you appoint)=0; Back boundary, I (waits)=0.Whenever so far satisfied the requirement of mutual information, the final ambiguity field of extracting is " ", has also realized extracting the purpose of maximum ambiguous phrase of overlap type.

Extracted in the word string behind the maximum ambiguous phrase of overlap type and do not had the ambiguity field, can directly the residue words have been exported as the cutting result.

Among the step S4, can be through maximum ambiguous phrase of overlap type is carried out as keyword that semantic retrieval judges whether with this field as the complete words cutting.In this step,, will carry out cutting to keyword as a complete speech, otherwise just its cutting is several individual character combinations as long as can confirm that field (keyword) can give expression to clear and definite semantic concept.

Thereby need confirm at first in this step whether field can give expression to clear and definite semantic concept.Among the present invention, at first need carry out semantic reasoning, to extract its semantic concept to the keyword of submitting to.The extraction of semantic concept is on the basis of ontology model, to realize, comprises two types of qualitative reasoning and quantitative reasonings.Qualitative reasoning is to adopt resource description framework (Resource Description Framework; RDF (S)) ontology model is carried out realize on the basis of formalization representation, and quantitative reasoning is on the basis of representing ontology model with the Voronoi diagram formization, to realize.

Particularly, the binary relation in the first definition set is expressed as: the binary relation R of set on the U is the sub-set of U * U, i.e. ordered pair < x, y>set, and x wherein, y ∈ U is expressed as: xRy.All last binary relation set of U are expressed as Rel (U); The last minimum relation of U is an empty set; Maximum relation is that complete or collected works concern U * U to be expressed as

, is expressed as V.In addition, establishing R is a relation on the U, has:

(1) as if x ∈ U, xRx, then R is reflexive;

(2) as if x, y ∈ U, xRy → yRx, then R is symmetrical;

(3) as if x, y, z ∈ U, xRy and yRz → xRz, then R transmits.

If the keyword of submitting to is the node in the ontologies, this node is through the succession of class, and one type instance is another kind of attribute, and one type instance is another kind of subclass, and the class with predicable has correlationship.Between the notion of ontologies, carry out reasoning, thereby the node of keyword representative is associated with predefined instances of ontology node, obtain the expansion concept of keyword, realize the semantic extension of the first step.

After the semantic extension through a last step; Obtained the expansion concept of keyword; These notions are semantic relevant with keyword; But relevant degree is not still measured, and the degree of correlation between ability objective description keyword and related notion has adopted the semantic similarity algorithm between the notion of representing based on the Voronoi diagram formization.When the semantic similarity of computing node, calculate through path distance: the path distance of supposing two nodes among the Voronoi figure is d, obtains two internodal semantic similarities and is:

Sim (n_{1}, n_{2}) = \frac{α}{d + α};

N wherein ₁, n ₂Two nodes among the expression Voronoi figure, d is n ₁, n ₂The path distance of two nodes, α are adjustable parameters.

When two nodes have the correlationship of transmission; This transitive relation comprises following three kinds of situation: (comprising: one type instance is another kind of attribute to directly related relation; One type instance is another kind of subclass, has the class of predicable), comprise relevant (the secondary subclass is to the succession of its higher level's one-level subclass) and transmit relevant (through directly related relation or comprise relevant transmission produce).These three kinds of situation are different to the influence degree of similarity result of calculation, and for distinguishing the Different Effects degree of these three kinds of relations, the present invention adopts following computing formula to calculate the semantic similarity of two nodes:

Sim (n_{1}, n_{2}) = Σ_{i = 1}^{3} β_{i} Π_{j = 1}^{i} {Sim}_{j} (n_{1}, n_{2});

Wherein, n ₁, n ₂Be two nodes of Voronoi figure, β ₁, β ₂, β ₃Represent the directly related similarity Sim1 (n that concerns respectively ₁, n ₂), comprise relevant similarity Sim ₂(n ₁, n ₂) with transmit relevant similarity Sim ₃(n ₁, n ₂) shared weight in semantic similarity calculates.

Go to retrieve with semantic concept at last; The present invention uses the semantic concept of keyword to retrieve, but the content that is retrieved is not a large amount of text, but the instances of ontology that builds can be saved a large amount of retrieval times like this, raises the efficiency.In other words, if be regarded as the text in the search problem to instances of ontology, the text to be retrieved in the text library all is single speech so.If can retrieve, just think that keyword can give expression to clear and definite semantic concept.

When semantic retrieval, for whether the semantic concept of judging keyword matees instances of ontology, adopt a kind of method of the vocabulary degree of association to extract the gene pairs that contains potential relation, utilize thesaurus to provide the relation between the gene pairs then.Particularly, for gene k and l, the computing formula of its degree of association is:

association [k] [l] = Σ_{i = 1}^{N} W_{i} [k] * W_{i} [l];

Wherein, k is a k gene item in the document, has represented the semantic concept of keyword, and l is an instances of ontology, W _i[k]=T _i[k] * Log (N/n [k]), N are the sum (being the number of instances of ontology) of language material Chinese version, T _i[k] is document d _iIn the frequency that occurs of k gene item (because the content of document is single speech, T _iThe value of [k] can only be 0 or 1), n [k] is for comprising the text number (number that comprises the instances of ontology of k gene item, the value of n [k] can only be 0 or 1 equally) of k gene item in the language material.

When degree of association association [k] [l] surpasses the threshold values of setting, promptly retrieved instances of ontology, think promptly that also keyword can give expression to clear and definite semantic concept, carries out cutting as complete words; Otherwise, be the individual character combination with its cutting.

The present invention introduces the text retrieval module, but and actually remove to retrieve text, the content that is retrieved is not a large amount of text, but instances of ontology.Therefore in the text dividing process, need expand a small amount of instances of ontology according to the needs of practical field knowledge.

The present invention is an example with cotton diseases and insect pests knowledge, introduces the expansion process of instances of ontology.Cotton diseases and insect pests knowledge comprises three types of subclass: name part of speech notion class subclass, individual type subclass and predicate property notion class subclass.

(1) name part of speech notion subclass (Norminal.SC).

(1) indispensable element

Instance: nitrogen, phosphorus, potassium ...

(2) cotton disease

Instance: nitrogen stress disease, anthracnose, brown spot ...

(3) common drug

A. pulvis

Instance: dichloro quinone pulvis, three second aluminum phosphate pulvis, Bravo pulvis ...

B. spraying

Instance: Bordeaux mixture, thiophanate methyl, nematicide agent ...

(4) the cotton growth stage

Instance: seedling stage, the term of opening bolls, flowering and boll-setting period ...

(5) cotton type

Instance: saw ginned cotton, long-staple cotton, Shandong cotton are ground No. 20 ...

(2) individual type subclass (Individual-Organization.SC)

(1) cotton position

Instance: root, stem, leaf ...

(2) cotton pest

Instance: corn underworld, cutworm, whitefly in bt cotton ...

(3) predicate property notion subclass (Verbal.SC)

Notion in the predicate subclass of cotton diseases and insect pests knowledge has action behavior, and the member of this verb subclass includes:

(1) seed selection

(2) keep a full stand of seedings

(3) dispenser

The present invention uses Prot é g é Ontology Modeling instrument to create the cotton diseases and insect pests ontologies.Leftmost row type of a being label in Prot é g é, all class formations that use type label to navigate and create the cotton diseases and insect pests ontologies.Class is with the hierarchical structure tissue in Prot é g é, and each type can comprise subclass, and class and subclass can define the attribute of oneself.In the class formation that just begins Prot é g é, only comprising the THING class, is the parent of all types among the Prot é g é.In Prot é g é, to create a new class and can create a type button through clicking, the editing area of class can type of input the title, document, constraint etc. of relevant information such as class.Create the subclass of certain type, at first choose this type of to click then and create a type button.As create the subclass of " common drug " in the system type, must select earlier " common drug " type.The detailed process of using Prot é g é Ontology Modeling instrument creation of knowledge body to expand instances of ontology is actually an artificial predefine instances of ontology process of rule; The developer handles according to the knowledge of association area and gets final product; Be not crucial implementation step of the present invention, be not described in detail in this.

At last; Rhythm processing and phonetic synthesis are also directly determining the quality of synthetic speech, and the major function of step S6, S7 is: text and corresponding prosodic parameter according to cutting finishes, take out the relevant voice unit from the raw tone storehouse; And the voice unit in the sound bank has been recorded in advance; Include the pronunciation of all Chinese characters, utilized the method for voice annotation that voice unit is carried out the adjustment and the modification of prosodic parameter, finally synthesized satisfactory voice.More specifically, the present invention utilizes Microsoft SDK to gather 1176 Chinese band tuning joints as the raw tone unit, and synthesizes the voice of all Chinese characters with this voice unit.The present invention not only synthesizes the voice of single word, also synthesizes all bodies in the ontology library, just the voice of each " speech ".When the voice of compound word, also need eliminate the unvoiced segments or the white noise of each prefix tail.All greater than the characteristic of unvoiced segments, at first calculate voice signal short-time energy and the short-time zero-crossing rate of recording at this short-time energy and zero-crossing rate that utilizes sound section of voice signal, different threshold is set, adopt the double threshold relative method just can delete unvoiced segments.The frequency acquisition of voice unit is 22050Hz, and final synthetic voice storage format is " .wav ".

The present invention has designed a kind of text-to-speech method of same and system based on semantic retrieval; Text dividing method is combined with traditional information retrieval method based on semanteme; For the processing mode of keyword, carry out the processing of ambiguity field in the reference information retrieval technique.Simultaneously the comparatively ripe semantic searching method based on body is introduced in the text dividing; And this introducing, almost need not semantic search is made any change, just need not to go again to retrieve text; And body constructing method, all need not to do any change based on the semantic reasoning mould model of body, retrieval model or the like; Just need in the ontology model of having built up, to expand few with according to concrete ken, this has also demonstrated fully thought cheaply.The present invention can effectively solve the problem that the text dividing technology is handled about the ambiguity field, and then can break through the bottleneck of restriction literary composition language shifting method development.

According to statistics, about 90% sentence in the Chinese language text, the cutting of forward maximum match and reverse maximum match overlaps and correctly fully; Though and two kinds of cutting differences of the sentence about 9%, it is correct wherein must having one; Less than 1% sentence the cutting mistake can appear only.Thereby in the present invention, owing to use forward maximum match and reverse maximum match simultaneously, most non-ambiguity field can be syncopated as, and the accuracy that this has improved text dividing has greatly improved the phonetic synthesis effect.

And the part that is not syncopated as is promptly used two kinds of inconsistent parts of method cutting respectively, the text at the place of ambiguity field just, and the present invention has also carried out further processing.According to statistics, in the Chinese real text, the probability that the ambiguity partition phenomenon occurs is about 1/110, occurs an ambiguity partition in promptly average 110 Chinese characters, and wherein, the crossing ambiguity cutting accounts for 86%.The present invention is through having carried out processing efficiently and accurately to the crossing ambiguity cutting; Utilize the semantic retrieval mode of ambiguity field to improve the automatic discrimination of ambiguity field greatly; Thereby factitious pause sense when having reduced phonetic synthesis most possibly, improved the phonetic synthesis effect.

At last, along with information, development of Communication Technique, country's 12316 new rural area hot lines is open-minded especially, if can agricultural knowledge be transferred to the peasant with the form of voice through Call Center Platform, will live for peasants'production very big help will be provided.But various places 12312 hot lines can only adopt the mode of manual position and expert consulting mostly, and a large amount of books, text knowledge do not recorded into voice and directly be transferred to the peasant's because the automatic literary composition language switch technology of prior art is also immature at present.Adopt method and system of the present invention can effectively support this type of voice hot line, promptly reduced the cost that service is provided, greatly enriched service content again and realized effect.

Above embodiment only is used to explain the present invention; And be not limitation of the present invention; The those of ordinary skill in relevant technologies field under the situation that does not break away from the spirit and scope of the present invention, can also be made various variations and modification; Therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.

Claims

1. text-to-speech method of same based on semantic retrieval is characterized in that the method comprising the steps of:

S4 carries out semantic retrieval to each uttrance of ambiguous segmentation;

2. method according to claim 1 is characterized in that, among the step S3, extracts maximum ambiguous phrase of overlap type through the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.

3. method according to claim 1 is characterized in that, among the step S4, said semantic retrieval comprises:

4. method according to claim 3 is characterized in that, among the step S403, in instances of ontology, retrieves.

5. method according to claim 3 is characterized in that, among the step S5, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.

6. the text-to-speech system based on semantic retrieval is characterized in that, said system comprises:

7. system according to claim 6 is characterized in that, in the said extraction module, extracts maximum ambiguous phrase of overlap type through the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.

8. system according to claim 6 is characterized in that, said semantic retrieval module further comprises:

9. system according to claim 8 is characterized in that, in the said conceptual retrieval module, in instances of ontology, retrieves.

10. system according to claim 8; It is characterized in that; In the said coupling identification module, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.