CN102394061B - Text-to-speech method and system based on semantic retrieval - Google Patents

Text-to-speech method and system based on semantic retrieval Download PDF

Info

Publication number
CN102394061B
CN102394061B CN2011103512258A CN201110351225A CN102394061B CN 102394061 B CN102394061 B CN 102394061B CN 2011103512258 A CN2011103512258 A CN 2011103512258A CN 201110351225 A CN201110351225 A CN 201110351225A CN 102394061 B CN102394061 B CN 102394061B
Authority
CN
China
Prior art keywords
semantic
text
uttrance
ambiguous
cutting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2011103512258A
Other languages
Chinese (zh)
Other versions
CN102394061A (en
Inventor
傅泽田
李鑫星
张领先
温皓杰
李道亮
刘雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN2011103512258A priority Critical patent/CN102394061B/en
Publication of CN102394061A publication Critical patent/CN102394061A/en
Application granted granted Critical
Publication of CN102394061B publication Critical patent/CN102394061B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of speech synthesis, and provides a text-to-speech method and system based on semantic retrieval. Firstly, both forward maximum match and reverse maximum match are used in the method and system, so that almost all non-ambiguous segments can be acquired through segmentation, which greatly improves the accuracy of text segmentation and betters the speech synthesis effect; and secondly, through the combination of the text segmentation method and the traditional semantic-based information retrieval method, the keyword processing methods of information retrieval technologies are learned for the processing of ambiguous segments, so that the efficiency and accuracy of automatic ambiguous segment identification are effectively improved, and the speech synthesis effect is much better.

Description

Text-to-speech method of same and the system of semantic-based retrieval
Technical field
The present invention relates to the speech synthesis technique field, particularly a kind of text-to-speech method of same and system of semantic-based retrieval.
Background technology
Phonetic synthesis produces the technology of artificial voice by the method for machinery, electronics, phonetic synthesis is the gordian technique that realizes the senior man-machine interaction modes such as the man machine language communicates by letter with speech recognition technology.The purpose of phonetic synthesis be any information is converted into the standard smoothness in real time massage voice reading out, relate to the technology in a plurality of fields such as acoustics, linguistics, digital signal processing, computing machine, be a cutting edge technology of field of information processing.Phonetic synthesis will allow computing machine can automatically produce the continuous speech of high definition, high naturalness, and this and traditional acoustic playback technology are essentially different.Traditional acoustic playback equipment, such as blattnerphone etc., by prerecord sound then playback realize voice reproduction; This mode is all to have very large restriction at aspects such as content, storage, transmission or convenience, promptnesses.The phonetic synthesis of being undertaken by computing machine then can at any time convert any information to high naturalness voice, thereby really realizes the intelligent interaction between man-machine.
Literary composition language conversion (Text to Speech) technology is under the jurisdiction of phonetic synthesis, the automatic Word message that produce or outside input of computer equipment to be changed into can listen to get Chinese that understand, fluent (or other languages, decide on real needs) the spoken technology of exporting, in fact text-to-speech system can be regarded as an artificial intelligence system.In order to synthesize high-quality language, except depending on various rules, comprise outside semantics rule, lexical rule, the phonetics rule, also must be to having good understanding in the literal, this also relates to the problem of natural language understanding.Literary composition language transfer process generally includes philology disposal, the rhythm is processed and the several aspects of acoustic treatment, good text-to-speech system should be able to be exported sound clarity, natural and tripping voice, rather than simple phonetic synthesis, this needs system to carry out philological analysis to text, determines in the text adjustment that the differentiation of word, word, sentence about voice is processed with the convenient follow-up rhythm.
In the present text-to-speech system, majority is in conjunction with dictionary text to be traveled through in full, thereby by maximum matching length identification word and short sentence text is carried out cutting.But owing to usually having a large amount of ambiguity field (field that namely may have multiple recognition method) in the text, because the diversity of the text dividing that the ambiguity field causes, bring great difficulty to automatic word segmentation, at present most literary composition language switch technology, all having to each word cutting of ambiguity field is a word, between each word, insert the dwell interval mark, and voice one word one that causes synthesizing breaks, factitious pause is more, mechanicalness is extremely strong, the pronunciation natural and tripping with the mankind greatly differs from each other, and this also is the maximum bottleneck of restriction literary composition language switch technology development.
Summary of the invention
The technical matters that (one) will solve
The present invention provides a kind of text-to-speech method of same and system of semantic-based retrieval in order to solve in the prior art the not good problem of the automatic recognition effect of ambiguity field, and the ambiguity field in the automatic segmentation text effectively significantly improves the quality of synthetic speech.
(2) technical scheme
For achieving the above object, on the one hand, the invention provides a kind of text-to-speech method of same of semantic-based retrieval, described method comprises step:
S1 carries out respectively Forward Maximum Method and reverse maximum coupling to the text message of inputting;
S2 contrasts two kinds of matching ways to the result of text dividing, and the text word string that cutting is identical is directly as cutting execution in step S6~S7 as a result; The text word string execution in step S3~S7 different to cutting;
S3 extracts the maximum ambiguous phrase of overlap type in each text word string;
S4 carries out semantic retrieval to each uttrance of ambiguous segmentation;
S5 identifies each uttrance of ambiguous segmentation as the cutting result according to the semantic retrieval match condition;
S6 carries out the phonetic-rhythm of word and processes according to the cutting result;
S7 will synthesize voice output by sequences of text through whole words that the rhythm is processed.
Preferably, among the step S3, extract maximum ambiguous phrase of overlap type by the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.
Preferably, among the step S4, described semantic retrieval comprises:
S401 carries out the qualitative reasoning of each uttrance of ambiguous segmentation on the basis of adopting resource description framework that ontology model is carried out formalization representation, realize the semantic extension of the uttrance of ambiguous segmentation;
S402 represents the quantitative reasoning of the uttrance of ambiguous segmentation after the basic enterprising lang justice expansion of ontology model with the Voronoi diagram form, obtain the semantic similarity of semantic concept after the uttrance of ambiguous segmentation and the expansion;
S403 adopts the vocabulary degree of association to retrieve expanding rear semantic concept, judges that can the uttrance of ambiguous segmentation give expression to clear and definite semantic concept.
Preferably, among the step S403, in instances of ontology, retrieve.
Preferably, among the step S5, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.
On the other hand, the present invention also provides a kind of text-to-speech system of semantic-based retrieval simultaneously, and described system comprises:
Forward Maximum Method module and reverse maximum matching module carry out Forward Maximum Method and reverse maximum coupling to the text message of input respectively;
The matching result comparing module contrasts two kinds of matching ways to the result of text dividing, and the text word string that cutting is identical is directly given rhythm processing module as the cutting result; The text word strings that cutting is different are given extraction module;
Extraction module is given the semantic retrieval module according to the text word string extraction maximum ambiguous phrase of overlap type wherein that the matching result comparing module provides;
The semantic retrieval module is carried out semantic retrieval to each uttrance of ambiguous segmentation;
The coupling identification module is identified each uttrance of ambiguous segmentation according to the semantic retrieval match condition and is given rhythm processing module as the cutting result;
Rhythm processing module is carried out the phonetic-rhythm of word and is processed according to the cutting result;
Voice synthetic module will synthesize voice output by sequences of text through whole words that the rhythm is processed.
Preferably, in the described extraction module, extract maximum ambiguous phrase of overlap type by the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.
Preferably, described semantic retrieval module further comprises:
The qualitative reasoning module is carried out the qualitative reasoning of each uttrance of ambiguous segmentation on the basis of adopting resource description framework that ontology model is carried out formalization representation, realize the semantic extension of the uttrance of ambiguous segmentation;
The quantitative reasoning module represents the quantitative reasoning of the uttrance of ambiguous segmentation after the basic enterprising lang justice expansion of ontology model with the Voronoi diagram form, obtain the semantic similarity of semantic concept after the uttrance of ambiguous segmentation and the expansion;
The conceptual retrieval module adopts the vocabulary degree of association to retrieve expanding rear semantic concept, judges that can the uttrance of ambiguous segmentation give expression to clear and definite semantic concept.
Preferably, in the described conceptual retrieval module, in instances of ontology, retrieve.
Preferably, in the described coupling identification module, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.
(3) beneficial effect
Utilize method and system of the present invention, text dividing method is combined with the information retrieval method of traditional semantic-based, use for reference in the information retrieval technique processing mode for keyword, carry out the processing of ambiguity field, but Effective Raise ambiguity field is efficient and the accuracy of identification automatically, greatly improves the effect of phonetic synthesis.
Description of drawings
Fig. 1 is the process flow diagram of the text-to-speech method of same of semantic-based retrieval among the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, all other embodiment that those of ordinary skills obtain under the prerequisite of not making creative work belong to the scope of protection of the invention.
The objective of the invention is text-to-speech method of same is combined with traditional semantic retrieving method, use for reference in the semantic retrieval art for the processing mode of keyword, carry out the processing of ambiguity field.The semantic searching method based on body of technology maturation is incorporated in the conversion of literary composition language, only needing to expand a small amount of example according to the concrete knowledge field in the ontology model of having built up gets final product, and need not semantic search is made any change, also need not to go again to retrieve text, so that system more easily realizes, greatly reduce cost of development.
The below with reference to the accompanying drawings text-to-speech method of same of 1 pair of semantic-based retrieval of the present invention describes.In Fig. 1, the entire flow of the text-to-speech method of same of semantic-based retrieval is:
S1: the text message to input carries out respectively Forward Maximum Method and reverse maximum coupling;
S2: contrast two kinds of matching ways to the result of text dividing, the text word string that cutting is identical is directly as cutting execution in step S6 as a result; The text word string execution in step S3~S7 different to cutting;
S3: extract the maximum ambiguous phrase of overlap type in each text word string;
S4: each uttrance of ambiguous segmentation is carried out semantic retrieval;
S5: identify each uttrance of ambiguous segmentation as the cutting result according to the semantic retrieval match condition;
S6: the phonetic-rhythm that carries out word according to the cutting result is processed;
S7: will synthesize voice output by sequences of text through whole words that the rhythm is processed.
Wherein, among the step S1, Forward Maximum Method refers to natural order (natural reading order or input sequence) the traversal text by text, and the text word string that obtains during with traversal is carried out cutting by the discernible maximum length word string of dictionary; Reverse maximum coupling refers to the natural ordered backward traversal text by text, and the text word string that obtains during with traversal is carried out cutting by the discernible maximum length word string of dictionary.In general, the cutting precision of reverse coupling is mated a little more than forward, the Ambiguity that runs into is also less: statistics shows, the error rate of simple use Forward Maximum Method is 1/169, the simple error rate of using reverse maximum to mate is 1/245, usually adopt reverse maximum coupling (identifying the input words as usually adopting this mode in the input method) in the common text recognition method, but for phonetic synthesis, this precision also can not satisfy actual needs far away, and the voice output after synthesizing is paused and felt more serious.Use simultaneously Forward Maximum Method and reverse maximum coupling among the present invention, so that the precision of the preliminary cutting of text can promote greatly.
For two kinds of identical word strings of coupling cutting, can regard as is that correct cutting result exports.But use simultaneously Forward Maximum Method and reverse maximum coupling the incomplete same word string of cutting will inevitably occur (unless there is not the ambiguity statement in the text fully, this can not occur in actual conditions substantially), this part word string still can affect the phonetic synthesis effect, for can extract more accurately the ambiguity field, also will further process the different part of result among the present invention.Particularly, among the step S3, each text word string that the cutting result is different is therefrom extracted whole maximum ambiguous phrase of overlap types:
Wherein, the crossing ambiguity field refers to, in word string ABC, AB belongs to word or the short sentence in the dictionary, and BC belongs to word or the short sentence in the dictionary equally, but whole word string ABC claims that then ABC is the crossing ambiguity field again not in dictionary.And maximum ambiguous phrase of overlap type is defined as: establish S=C 1C 2C 3C nBe arbitrary word string of length n, S Max=C iC jBe the substring of S (1≤i<j≤n), and S MaxBe ambiguous phrase of overlap type; If in S, do not exist and comprise S MaxLarger ambiguous phrase of overlap type, then claim S MaxMaximum ambiguous phrase of overlap type for S.For example in sentence " whenever you can look for me ", whenever when any " " and " " is ambiguous phrase of overlap type, but " whenever " to have contained when any " ", do not comprised by any ambiguous phrase of overlap type simultaneously, therefore " whenever " to be maximum ambiguous phrase of overlap type, when any " " then is not.The cross reference that maximum ambiguous phrase of overlap type is no longer new with any word generation on every side has certain independence, and this carries out independent processing so that we might separate them from context environmental.But during system works, only can't confirm maximum ambiguous phrase of overlap type according to dictionary, obviously lose again the synthetic meaning of automatic speech as adopting artificial cognition.
Thereby in the present invention, extract maximum ambiguous phrase of overlap type according to the mutual information between the Chinese character.Mutual information refers to, for orderly Chinese character string xy, the mutual information between Chinese character x and the y is I (x, y)=p (x, y)/p (x) p (y); Wherein p (x, y) expression Chinese character string xy is as the probability of two words appearance, and p (x), p (y) represent respectively the probability that x and y independently occur as monosyllabic word.For the maximum ambiguity field of the thing that guarantees when the ambiguity field is extracted, to extract, can calculate first the mutual information of Forward Maximum Method and adjacent two words of reverse maximum matching result different piece front and back boundary, if mutual information I is (x, y) ≠ 0, then incorporate the word of boundary into the ambiguity field, and the mutual information of the new boundary of continuation calculating, until till I (x, y)=0.
For example above-mentioned text " whenever you can look for me " is used respectively Forward Maximum Method method and reverse maximum matching method, its result is:
FMM: you/any/time/all/can/come/look for/I
RMM: you/appoint/when/wait/all/can/come/look for/I
If extracting when any " " be the ambiguity field, the mutual information located of ambiguity field front and rear side circle then:
Front boundary, I (you appoint)=0; Rear boundary, I (time, wait) ≠ 0.Therefore " appoint " the front border that can be used as the ambiguity field, but " time " can not be as the rear border of ambiguity field, and it should be incorporated in the ambiguity field.
Whenever the ambiguity field becomes " " like this, again calculates the mutual information of new ambiguous word segment boundary:
Front boundary, I (you appoint)=0; Rear boundary, I (waits)=0.Whenever so far satisfied the requirement of mutual information, the final ambiguity field of extracting is " ", has also realized extracting the purpose of maximum ambiguous phrase of overlap type.
Extracted in the word string behind the maximum ambiguous phrase of overlap type and do not had the ambiguity field, can directly the residue words have been exported as the cutting result.
Among the step S4, can be with this field as the complete words cutting by maximum ambiguous phrase of overlap type is carried out as keyword that semantic retrieval judges whether.In this step, as long as can determine that field (keyword) can give expression to clear and definite semantic concept, will carry out cutting to keyword as a complete word, otherwise just its cutting is several individual character combinations.
Thereby need at first to determine in this step whether field can give expression to clear and definite semantic concept.Among the present invention, at first need the keyword of submitting to is carried out semantic reasoning, to extract its semantic concept.The extraction of semantic concept is to realize on the basis of ontology model, comprises qualitative reasoning and quantitative reasoning two classes.Qualitative reasoning is to adopt resource description framework (Resource Description Framework, RDF (S)) ontology model is carried out realize on the basis of formalization representation, and quantitative reasoning is to represent that with the Voronoi diagram form basis of ontology model realizes.
Particularly, the binary relation in the first definition set is expressed as: the binary relation R of set on the U is the subset of U * U, i.e. ordered pair<x, y〉set, x wherein, y ∈ U is expressed as: xRy.All binary relation set expressions on the U are Rel (U), and the minimum relation on the U is empty set, is expressed as
Figure BDA0000106511530000081
Maximum relation is that complete or collected works concern U * U, is expressed as V.In addition, establish R and be a relation on the U, have:
(1) if x ∈ is U, xRx, then R is reflexive;
(2) if x, y ∈ U, xRy → yRx, then R is symmetrical;
(3) if x, y, z ∈ U, xRy and yRz → xRz, then R transmits.
If the keyword of submitting to is the node in the ontologies, this node is by the succession of class, and the example of a class is another kind of attribute, and the example of a class is another kind of subclass, and the class with predicable has correlationship.Between the concept of ontologies, carry out reasoning, thereby the node of keyword representative is associated with predefined instances of ontology node, obtain the expansion concept of keyword, realize the semantic extension of the first step.
Through after the semantic extension of previous step, obtained the expansion concept of keyword, these concepts are semantic relevant with keyword, but relevant degree is not still measured, degree of correlation between energy objective description keyword and related notion has adopted the Arithmetic of Semantic Similarity between the concept that represents based on the Voronoi diagram form.When the semantic similarity of computing node, calculate by path distance: the path distance of supposing two nodes among the Voronoi figure is d, obtains two internodal semantic similarities to be:
Sim ( n 1 , n 2 ) = α d + α ;
N wherein 1, n 2Two nodes among the expression Voronoi figure, d is n 1, n 2The path distance of two nodes, α are adjustable parameters.
When two nodes have the correlationship of transmission, this transitive relation comprises following three kinds of situations: (comprising: the example of a class is another kind of attribute to directly related relation, the example of one class is another kind of subclass, has the class of predicable), comprise relevant (the secondary subclass is to the succession of its higher level's one-level subclass) and transmit relevant (by directly related relation or comprise relevant transmission produce).These three kinds of situations are different to the influence degree of similarity result of calculation, and for distinguishing the Different Effects degree of these three kinds of relations, the present invention adopts following computing formula to calculate the semantic similarity of two nodes:
Sim ( n 1 , n 2 ) = Σ i = 1 3 β i Π j = 1 i Sim j ( n 1 , n 2 ) ;
Wherein, n 1, n 2Two nodes of Voronoi figure, β 1, β 2, β 3Represent respectively the directly related similarity Sim1 (n that concerns 1, n 2), comprise relevant similarity Sim 2(n 1, n 2) with transmit relevant similarity Sim 3(n 1, n 2) shared weight in semantic similarity calculates.
Go to retrieve with semantic concept at last; The present invention uses the semantic concept of keyword to retrieve, but the content that is retrieved is not a large amount of text, but the instances of ontology that builds can be saved a large amount of retrieval times like this, raises the efficiency.In other words, if instances of ontology is considered as text in the search problem, the text to be retrieved in the text library all is single word so.If can retrieve, just think that keyword can give expression to clear and definite semantic concept.
When semantic retrieval, for whether the semantic concept of judging keyword mates instances of ontology, adopt a kind of method of the vocabulary degree of association to extract the gene pairs that contains potential relation, then utilize thesaurus to provide relation between the gene pairs.Particularly, for gene k and l, the computing formula of its degree of association is:
association [ k ] [ l ] = Σ i = 1 N W i [ k ] * W i [ l ] ;
Wherein, k is k gene item in the document, has represented the semantic concept of keyword, and l is instances of ontology, W i[k]=T i[k] * Log (N/n[k]), N is the sum (being the number of instances of ontology) of language material Chinese version, T i[k] is document d iIn the frequency that occurs of k gene item (because the content of document is single word, T iThe value of [k] can only be 0 or 1), n[k] be the text number that comprises k gene item in the language material (number that comprises the instances of ontology of k gene item, n[k] value can only be 0 or 1 equally).
As degree of association association[k] when [l] surpasses the threshold values of setting, namely retrieved instances of ontology, think namely that also keyword can give expression to clear and definite semantic concept, carries out cutting as complete words; Otherwise, be the individual character combination with its cutting.
The present invention introduces the text retrieval module, but and actually remove to retrieve text, the content that is retrieved is not a large amount of text, but instances of ontology.Therefore in the text dividing process, need to according to the needs of practical field knowledge, expand a small amount of instances of ontology.
The present invention introduces the expansion process of instances of ontology take cotton diseases and insect pests knowledge as example.Cotton diseases and insect pests knowledge comprises three class subsets: name part of speech concept class subset, individual class subset and predicate concept class subset.
(1) name part of speech concept subset (Norminal.SC).
(1) indispensable element
Example: nitrogen, phosphorus, potassium ...
(2) cotton disease
Example: nitrogen stress disease, anthracnose, brown spot ...
(3) common drug
A. pulvis
Example: dichloro quinone pulvis, three second aluminum phosphate pulvis, Bravo pulvis ...
B. spraying
Example: Bordeaux mixture, thiophanate methyl, nematicide agent ...
(4) the cotton growth stage
Example: seedling stage, the term of opening bolls, flowering and boll-setting period ...
(5) cotton type
Example: saw ginned cotton, long-staple cotton, Shandong cotton are ground No. 20 ...
(2) individual class subset (Individual-Organization.SC)
(1) cotton position
Example: root, stem, leaf ...
(2) cotton pest
Example: corn underworld, cutworm, whitefly in bt cotton ...
(3) predicate concept subset (Verbal.SC)
Concept in the predicate subset of cotton diseases and insect pests knowledge has action behavior, and the member of this verb subclass includes:
(1) seed selection
(2) keep a full stand of seedings
(3) dispenser
The present invention creates the cotton diseases and insect pests ontologies with Prot é g é ontology modeling tool.Leftmost row are class labels in Prot é g é, navigate to create all class formations of cotton diseases and insect pests ontologies with the class label.Class is with the hierarchical structure tissue in Prot é g é, and each class can comprise subclass, and class and subclass can define the attribute of oneself.Only comprising the THING class in the class formation that just begins Prot é g é, is the parent of all classes among the Prot é g é.Will create a new class in Prot é g é can be by click creating the class button, can input the title, document, constraint of the relevant information of class such as class etc. at the editing area of class.Create the subclass of certain class, at first choose this type of then to click and create the class button.As create the subclass of " common drug " class in the system, must select first " common drug " class.The detailed process that expands instances of ontology with Prot é g é ontology modeling tool creation of knowledge body is actually the process of an artificial predefine instances of ontology rule, the developer processes according to the knowledge of association area and gets final product, be not crucial implementation step of the present invention, be not described in detail in this.
At last, rhythm processing and phonetic synthesis are also directly determining the quality of synthetic speech, the major function of step S6, S7 is: text and the corresponding prosodic parameter complete according to cutting, from the raw tone storehouse, take out corresponding voice unit, and the voice unit in the sound bank has been recorded in advance, include the pronunciation of all Chinese characters, utilized the method for voice annotation voice unit to be carried out adjustment and the modification of prosodic parameter, finally synthesized satisfactory voice.More specifically, the present invention utilizes Microsoft SDK to gather 1176 Chinese band tuning joints as the raw tone unit, and synthesizes the voice of all Chinese characters with this voice unit.The present invention not only synthesizes the voice of single character, also synthesizes all bodies in the ontology library, namely the voice of each " word ".When the voice of compound word, also need eliminate unvoiced segments or the white noise of each prefix tail.Utilize the short-time energy of sound section of voice signal and zero-crossing rate all greater than the characteristic of unvoiced segments at this, at first calculate voice signal short-time energy and the short-time zero-crossing rate of recording, different thresholdings is set, adopt the double threshold relative method just can delete unvoiced segments.The frequency acquisition of voice unit is 22050Hz, and final synthetic voice storage format is " .wav ".
The present invention has designed a kind of text-to-speech method of same and system of semantic-based retrieval, text dividing method is combined with the information retrieval method of traditional semantic-based, for the processing mode of keyword, carry out the processing of ambiguity field in the reference information retrieval technique.Simultaneously the comparatively ripe semantic searching method based on body is introduced in the text dividing, and this introducing, almost need not semantic search is made any change, just need not to go again to retrieve text, and body constructing method, all need not to do any change based on the semantic reasoning mould model of body, retrieval model etc., just need to follow according to the concrete knowledge field, expand a small amount of example in the ontology model of having built up, this has also demonstrated fully cheaply thought.The present invention can effectively solve the problem that the text dividing technology is processed about the ambiguity field, and then can break through the bottleneck of restriction literary composition language shifting method development.
According to statistics, about 90% sentence in the Chinese language text, the cutting of Forward Maximum Method and reverse maximum coupling overlaps and correctly fully; Although and two kinds of cutting differences of the sentence about 9%, it is correct wherein must having one; Less than 1% sentence the cutting mistake can appear only.Thereby in the present invention, owing to using simultaneously Forward Maximum Method and reverse maximum coupling, most non-ambiguity field can be syncopated as, and the accuracy that this has improved text dividing has greatly improved the phonetic synthesis effect.
And the part that is not syncopated as is namely used respectively two kinds of inconsistent parts of method cutting, the text at the place of ambiguity field namely, and the present invention has also carried out further processing.According to statistics, in the Chinese real text, the probability that the ambiguity partition phenomenon occurs is about 1/110, namely occurs an ambiguity partition in average 110 Chinese characters, and wherein, the crossing ambiguity cutting accounts for 86%.The present invention is by having carried out efficiently and accurately processing to the crossing ambiguity cutting, utilize the semantic retrieval mode of ambiguity field greatly to improve the automatic discrimination of ambiguity field, thereby factitious pause sense when having reduced phonetic synthesis most possibly, improved the phonetic synthesis effect.
At last, along with the development of information, the communication technology, country's 12316 new rural area hot lines is open-minded especially, if can agricultural knowledge be transferred to the peasant with the form of voice by Call Center Platform, will provide for peasant's productive life very large help.But various places 12312 hot lines mostly can only adopt the mode of manual position and expert consulting, and a large amount of books, text knowledge do not recorded into voice and directly be transferred to the peasant's because the automatic literary composition language switch technology of prior art is also immature at present.Adopt the method and system of the present invention can this type of voice hot line of effective support, namely reduced the cost that service is provided, greatly enriched again service content and realized effect.
Above embodiment only is used for explanation the present invention; and be not limitation of the present invention; the those of ordinary skill in relevant technologies field; in the situation that do not break away from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.

Claims (10)

1. the text-to-speech method of same of a semantic-based retrieval is characterized in that, the method is used for the literary composition language conversion of Chinese, and comprises step:
S1 carries out respectively Forward Maximum Method and reverse maximum coupling to the text message of inputting;
S2 contrasts two kinds of matching ways to the result of text dividing, and the text word string that cutting is identical is directly as cutting execution in step S6~S7 as a result; The text word string execution in step S3~S7 different to cutting;
S3 extracts the maximum ambiguous phrase of overlap type in each text word string;
S4 carries out semantic retrieval to each uttrance of ambiguous segmentation;
S5 identifies each uttrance of ambiguous segmentation as the cutting result according to the semantic retrieval match condition;
S6 carries out the phonetic-rhythm of word and processes according to the cutting result;
S7 will synthesize voice output by sequences of text through whole words that the rhythm is processed.
2. method according to claim 1 is characterized in that, among the step S3, extracts maximum ambiguous phrase of overlap type by the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.
3. method according to claim 1 is characterized in that, among the step S4, described semantic retrieval comprises:
S401 carries out the qualitative reasoning of each uttrance of ambiguous segmentation on the basis of adopting resource description framework that ontology model is carried out formalization representation, realize the semantic extension of the uttrance of ambiguous segmentation;
S402 represents the quantitative reasoning of the uttrance of ambiguous segmentation after the basic enterprising lang justice expansion of ontology model with the Voronoi diagram form, obtain the semantic similarity of semantic concept after the uttrance of ambiguous segmentation and the expansion;
S403 adopts the vocabulary degree of association to retrieve expanding rear semantic concept, judges that can the uttrance of ambiguous segmentation give expression to clear and definite semantic concept.
4. method according to claim 3 is characterized in that, among the step S403, retrieves in instances of ontology.
5. method according to claim 3 is characterized in that, among the step S5, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.
6. the text-to-speech system of a semantic-based retrieval is characterized in that, described system is used for the literary composition language conversion of Chinese, and comprises:
Forward Maximum Method module and reverse maximum matching module carry out Forward Maximum Method and reverse maximum coupling to the text message of input respectively;
The matching result comparing module contrasts two kinds of matching ways to the result of text dividing, and the text word string that cutting is identical is directly given rhythm processing module as the cutting result; The text word strings that cutting is different are given extraction module;
Extraction module is given the semantic retrieval module according to the text word string extraction maximum ambiguous phrase of overlap type wherein that the matching result comparing module provides;
The semantic retrieval module is carried out semantic retrieval to each uttrance of ambiguous segmentation;
The coupling identification module is identified each uttrance of ambiguous segmentation according to the semantic retrieval match condition and is given rhythm processing module as the cutting result;
Rhythm processing module is carried out the phonetic-rhythm of word and is processed according to the cutting result;
Voice synthetic module will synthesize voice output by sequences of text through whole words that the rhythm is processed.
7. system according to claim 6 is characterized in that, in the described extraction module, extracts maximum ambiguous phrase of overlap type by the mutual information between the Chinese character that calculates ambiguity field front and rear side circle place in the text word string.
8. system according to claim 6 is characterized in that, described semantic retrieval module further comprises:
The qualitative reasoning module is carried out the qualitative reasoning of each uttrance of ambiguous segmentation on the basis of adopting resource description framework that ontology model is carried out formalization representation, realize the semantic extension of the uttrance of ambiguous segmentation;
The quantitative reasoning module represents the quantitative reasoning of the uttrance of ambiguous segmentation after the basic enterprising lang justice expansion of ontology model with the Voronoi diagram form, obtain the semantic similarity of semantic concept after the uttrance of ambiguous segmentation and the expansion;
The conceptual retrieval module adopts the vocabulary degree of association to retrieve expanding rear semantic concept, judges that can the uttrance of ambiguous segmentation give expression to clear and definite semantic concept.
9. system according to claim 8 is characterized in that, in the described conceptual retrieval module, retrieves in instances of ontology.
10. system according to claim 8, it is characterized in that, in the described coupling identification module, as the complete words cutting, cutting was the individual character combination when uttrance of ambiguous segmentation can not give expression to clear and definite semantic concept when the uttrance of ambiguous segmentation can give expression to clear and definite semantic concept.
CN2011103512258A 2011-11-08 2011-11-08 Text-to-speech method and system based on semantic retrieval Expired - Fee Related CN102394061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103512258A CN102394061B (en) 2011-11-08 2011-11-08 Text-to-speech method and system based on semantic retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103512258A CN102394061B (en) 2011-11-08 2011-11-08 Text-to-speech method and system based on semantic retrieval

Publications (2)

Publication Number Publication Date
CN102394061A CN102394061A (en) 2012-03-28
CN102394061B true CN102394061B (en) 2013-01-02

Family

ID=45861360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103512258A Expired - Fee Related CN102394061B (en) 2011-11-08 2011-11-08 Text-to-speech method and system based on semantic retrieval

Country Status (1)

Country Link
CN (1) CN102394061B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765838A (en) * 2012-10-23 2015-07-08 海信集团有限公司 Word segmenting method and device
CN103198149B (en) * 2013-04-23 2017-02-08 中国科学院计算技术研究所 Method and system for query error correction
CN105095924A (en) * 2014-04-25 2015-11-25 夏普株式会社 Handwriting recognition method and device
CN104391837A (en) * 2014-11-19 2015-03-04 熊玮 Intelligent grammatical analysis method based on case semantics
CN105740412B (en) * 2016-01-29 2020-07-10 昆明理工大学 Vietnamese cross ambiguity disambiguation method based on maximum entropy
CN108962242A (en) * 2018-06-28 2018-12-07 盐城工学院 A kind of industrial carrying machine human speech justice recognition methods
CN112289302B (en) * 2020-12-18 2021-03-26 北京声智科技有限公司 Audio data synthesis method and device, computer equipment and readable storage medium
CN113191158B (en) * 2021-05-21 2021-10-26 润联软件系统(深圳)有限公司 Voronoi diagram-based training sample masking method, device and related equipment
CN114639371B (en) * 2022-03-16 2023-08-01 马上消费金融股份有限公司 Voice conversion method, device and equipment
CN116884390B (en) * 2023-09-06 2024-01-26 四川蜀天信息技术有限公司 Method and device for improving user interaction fluency

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001043221A (en) * 1999-07-29 2001-02-16 Matsushita Electric Ind Co Ltd Chinese word dividing device
JP2004294639A (en) * 2003-03-26 2004-10-21 Omron Corp Text analyzing device for speech synthesis and speech synthesiser
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
JP2006145690A (en) * 2004-11-17 2006-06-08 Kenwood Corp Speech synthesizer, method for speech synthesis, and program
CN101064103A (en) * 2006-04-24 2007-10-31 中国科学院自动化研究所 Chinese voice synthetic method and system based on syllable rhythm restricting relationship
CN101464855A (en) * 2009-01-13 2009-06-24 吴长林 Word separation method for character string containing Chinese language, and method for searching words in character string

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001043221A (en) * 1999-07-29 2001-02-16 Matsushita Electric Ind Co Ltd Chinese word dividing device
JP2004294639A (en) * 2003-03-26 2004-10-21 Omron Corp Text analyzing device for speech synthesis and speech synthesiser
JP2006145690A (en) * 2004-11-17 2006-06-08 Kenwood Corp Speech synthesizer, method for speech synthesis, and program
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
CN101064103A (en) * 2006-04-24 2007-10-31 中国科学院自动化研究所 Chinese voice synthetic method and system based on syllable rhythm restricting relationship
CN101464855A (en) * 2009-01-13 2009-06-24 吴长林 Word separation method for character string containing Chinese language, and method for searching words in character string

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
应志伟等.文语转换系统中基于语料的汉语自动分词研究.《计算机应用》.2000,第20卷(第02期),8-11. *

Also Published As

Publication number Publication date
CN102394061A (en) 2012-03-28

Similar Documents

Publication Publication Date Title
CN102394061B (en) Text-to-speech method and system based on semantic retrieval
Dolan et al. Automatically deriving structured knowledge bases from on-line dictionaries
Jacquemin et al. NLP for term variant extraction: synergy between morphology, lexicon, and syntax
CN109492077A (en) The petrochemical field answering method and system of knowledge based map
CN102799577B (en) A kind of Chinese inter-entity semantic relation extraction method
US20080221863A1 (en) Search-based word segmentation method and device for language without word boundary tag
CA2562366A1 (en) A system for multiligual machine translation from english to hindi and other indian languages using pseudo-interlingua and hybridized approach
CN110175585B (en) Automatic correcting system and method for simple answer questions
CN101261690A (en) A system and method for automatic problem generation
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
CN109871543A (en) A kind of intention acquisition methods and system
CN110287482A (en) Semi-automation participle corpus labeling training device
CN107247613A (en) Sentence analytic method and sentence resolver
Dinarelli et al. Discriminative reranking for spoken language understanding
CN111538847A (en) Ningxia rice knowledge graph construction method
CN110297880A (en) Recommended method, device, equipment and the storage medium of corpus product
CN112036178A (en) Distribution network entity related semantic search method
CN117093729B (en) Retrieval method, system and retrieval terminal based on medical scientific research information
CN111444704A (en) Network security keyword extraction method based on deep neural network
CN116775812A (en) Traditional Chinese medicine patent analysis and excavation tool based on natural voice processing
Basili et al. A shallow syntactic analyser to extract word associations from corpora
Godard et al. Adaptor grammars for the linguist: Word segmentation experiments for very low-resource languages
Galvez et al. Term conflation methods in information retrieval: Non‐linguistic and linguistic approaches
Han et al. A novel part of speech tagging framework for nlp based business process management
Pajila et al. A Survey on Natural Language Processing and its Applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130102

Termination date: 20131108