CN103440253A

CN103440253A - Speech retrieval method and system

Info

Publication number: CN103440253A
Application number: CN2013103152393A
Authority: CN
Inventors: 吴及; 李伟; 贺志阳; 吕萍; 何婷婷
Original assignee: Tsinghua University; iFlytek Co Ltd
Current assignee: Tsinghua University; iFlytek Co Ltd
Priority date: 2013-07-25
Filing date: 2013-07-25
Publication date: 2013-12-11

Abstract

The invention relates to the technical field of speech retrieval, and discloses a speech retrieval method and a speech retrieval system. The method comprises the following steps that retrieval keywords input by users are received; the retrieval keywords are subjected to single word segmentation to obtain single word segmentation phases; the retrieval keywords are expanded according to the single word segmentation phases to generate a key phase picture structure; and phases on each arch in the key phase picture structure are retrieved according to a pre-built index database to obtain retrieval results. The method and the system provided by the invention are utilized, and the effectiveness and the comprehensiveness of the retrieval results can be improved.

Description

The speech retrieval method and system

Technical field

The present invention relates to the speech retrieval technical field, be specifically related to a kind of speech retrieval method and system.

Background technology

Speech retrieval is a branch in multimedia retrieval, is mainly used in realizing the magnanimity speech documents is carried out to quick-searching.Speech retrieval receives the keyword query input of text vocabulary or phrase (being commonly called as keyword), and determines the speech documents that comprises keyword and determine the positional information of described keyword in speech documents in speech documents storehouse to be retrieved.

The traditional voice searching system need to adopt speech recognition technology to carry out text identification to speech documents in advance, then create search index according to recognition result, in retrieval phase, when receiving the search key of user's input, method by direct retrieval is determined the speech documents information that comprises keyword from described search index, and directly the keyword of user's input carries out the keyword match search on described search index.

The traditional voice searching system has following two class problems to solve in retrieval phase:

(1) the inconsistent search problem brought of participle

May there be inconsistency in the recognition result of large vocabulary Continuous Speech Recognition System based on word and the word segmentation result of user input query item, when this can cause the user to inquire about, the query word of input may be incomplete same with the word that comprises in recognition result, thereby can't obtain this result for retrieval.Such as user input query " conference agenda ", Words partition system provides word segmentation result " conference agenda ", comprise voice content " conference agenda " in some voice documents, but the recognition result of corresponding sound bite is " conference agenda ", system just can't retrieve this recognition result so simultaneously.

(2) search problem that the recognition result mistake is brought

Because any speech recognition system now all can't guarantee absolutely correct discrimination, the retrieval effectiveness so the search index that the voice identification result based on comprising mistake builds also is bound to have influence on.Such as user input query is " conference ", some voice documents comprise voice content " conference " but the recognition result of corresponding sound bite is " opening conference ".

Due to above reason, the traditional voice searching system can not obtain desirable retrieval effectiveness.

Summary of the invention

The embodiment of the present invention provides a kind of speech retrieval method and system, to solve the retrieval error caused due to problems such as voice identification results in prior art, and the validity of raising result for retrieval and comprehensive.

For this reason, the invention provides following technical scheme:

A kind of speech retrieval method comprises:

Receive the search key of user's input;

Described search key is carried out to the individual character cutting, obtain individual character cutting participle;

According to described individual character cutting participle, described search key is expanded, generated the keyword graph structure;

Successively the word on every arc in described keyword graph structure is retrieved according to the index database built in advance, obtained result for retrieval.

Preferably, describedly according to described individual character cutting participle, described search key is expanded, is generated the keyword graph structure and comprise:

Whether the combination that judges successively adjacent two or more individual characters of described individual character cutting participle is the word in preset dictionary;

If so, the sub-word using described combination as described search key;

All sub-vocabularys are shown in a digraph, obtain the keyword graph structure.

Preferably, describedly according to described individual character cutting participle, described search key is expanded, is generated the keyword graph structure and also comprise:

According to the word in described preset dictionary, the participle in described keyword graph structure is carried out to Prefix Expansion; And/or

According to the word in described preset dictionary, the participle in described keyword graph structure is carried out to the suffix expansion.

Preferably, describedly described individual character cutting participle carried out to Prefix Expansion comprise:

Judge successively whether forward part or whole word in described search key are the suffix of specific word in described preset dictionary;

If so, the expansion word using described specific word as described search key;

Described expansion word is added in described keyword graph structure.

Preferably, describedly described individual character cutting participle carried out to suffix expansion comprise:

Judge successively whether rear section or whole word in described search key are the prefix of specific word in described preset dictionary;

Described expansion word is added in described keyword graph structure.

Preferably, the index database that described basis builds is in advance retrieved the word on every arc in the keyword graph structure after expansion successively, obtains result for retrieval and comprises:

Travel through each arc in described keyword graph structure, according to the index database built in advance, the word on arc is retrieved, result for retrieval is kept on described arc;

To the node in described keyword graph structure according to the order from left to right retrieving information of new node more successively: wherein, for each node, process out successively arc, the retrieving information that will go out on arc is delivered to out on the corresponding terminal node of arc, while on this terminal node, receiving new result for retrieval, will carry out order-preserving merger operation with original result for retrieval.

Preferably, described method also comprises: adopt offline mode to build described index database, the described index database of described structure comprises:

Speech documents to be retrieved is carried out to speech recognition, obtain the word figure that comprises word and temporal information, the predicate figure of institute has meaned a plurality of recognition results of one section voice in the mode of directed acyclic graph;

Each word in the predicate figure of institute is built to inverted index.

A kind of speech searching system comprises:

Receiver module, for receiving the search key of user's input;

The cutting module, for described search key is carried out to the individual character cutting, obtain individual character cutting participle;

Expansion module, for according to described individual character cutting participle, described search key being expanded, generate the keyword graph structure;

Retrieval module, retrieved the word on every arc of described keyword graph structure successively for the index database according to building in advance, obtains result for retrieval.

Preferably, described expansion module comprises:

Whether judging unit is the word of preset dictionary for the combination of adjacent two or more individual characters of judging successively described individual character cutting participle; If so, the sub-word using described combination as described search key;

The graph structure generation unit, for all sub-vocabularys are shown in to a digraph, obtain the keyword graph structure.

Preferably, described expansion module also comprises:

The Prefix Expansion unit, carry out Prefix Expansion for the word according to described preset dictionary to the participle in described keyword graph structure; And/or

The suffix expanding element, carry out the suffix expansion for the word according to described preset dictionary to the participle in described keyword graph structure.

Preferably, described Prefix Expansion unit, specifically for judging successively whether forward part or whole word in described search key are the suffix of specific word in described preset dictionary; If so, the expansion word using described specific word as described search key; And described expansion word is added in described keyword graph structure.

Preferably, described suffix expanding element, specifically for judging successively whether rear section or whole word in described search key are the prefix of specific word in described preset dictionary; If so, the expansion word using described specific word as described search key; And described expansion word is added in described keyword graph structure.

Preferably, described retrieval module comprises:

The traversal unit, for traveling through each arc of described keyword graph structure, retrieved the word on arc according to the index database built in advance, and result for retrieval is kept on described arc;

Updating block, for to the node of described keyword graph structure according to the order from left to right retrieving information of new node more successively, wherein, for each node, process out successively arc, the retrieving information that will go out on arc is delivered to out on the corresponding terminal node of arc, while on this terminal node, receiving new result for retrieval, will carry out order-preserving merger operation with original result for retrieval.

Preferably, described system also comprises:

Index database builds module, for adopting offline mode, builds described index database, and described index database builds module and comprises:

Voice recognition unit, carry out speech recognition for the speech documents to be retrieved, obtains the word figure that comprises word and temporal information, and the predicate figure of institute has meaned a plurality of recognition results of one section voice in the mode of directed acyclic graph;

The inverted index construction unit, build inverted index for each word to the predicate figure of institute.

The speech retrieval method and system that the embodiment of the present invention provides, when receiving search key, utilize full cutting and front and back to sew expansion technique, search key is carried out to the correlogram structure extension, at the enterprising line retrieval of inverted index created based on many candidates voice identification result, greatly improved the validity of result for retrieval and comprehensive according to the keyword after expansion.

Further, when retrieval, adopt the mode based on the graph structure search to obtain result for retrieval, reduced the complexity of retrieval, effectively improved recall precision.

The accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, below will the accompanying drawing of required use in embodiment be briefly described, apparently, the accompanying drawing the following describes is only some embodiment that put down in writing in the present invention, for those of ordinary skills, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the process flow diagram in index building storehouse in the embodiment of the present invention;

Fig. 2 is the schematic diagram of a kind of inverted index in the embodiment of the present invention;

Fig. 3 is the process flow diagram of embodiment of the present invention speech retrieval method;

Fig. 4 is keyword figure structure schematic representation in the embodiment of the present invention;

Fig. 5 is the schematic diagram after based on full cutting, the keyword graph structure being expanded in the embodiment of the present invention;

Fig. 6 sews the schematic diagram after the keyword graph structure is expanded based on full cutting and front and back in the embodiment of the present invention;

Fig. 7 is the structural representation of embodiment of the present invention speech searching system.

Embodiment

In order to make those skilled in the art person understand better the scheme of the embodiment of the present invention, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.

For the dissatisfactory problem of the retrieval effectiveness that traditional speech searching system is inconsistent due to participle and speech recognition errors causes, the invention provides a kind of speech retrieval method and system, when receiving search key, utilize full cutting and/or front and back to sew expansion technique, search key is carried out to the correlogram structure extension, at the enterprising line retrieval of inverted index created based on many candidates voice identification result, greatly improved the validity of result for retrieval and comprehensive according to the keyword after expansion.

In embodiments of the present invention, described index database can adopt offline mode to build, and as shown in Figure 1, is the process flow diagram in index building storehouse in the embodiment of the present invention, comprises the following steps:

Step 101, carry out speech recognition to speech documents to be retrieved, obtains the word figure that comprises word and temporal information.

The process of speech recognition mainly comprises: to the segmentation of speech data, extract acoustic feature, decoding etc., finally obtain the word figure of the recognition result that speech data is corresponding, comprise the directed acyclic graph of word and temporal information.

Node on the predicate figure of institute is described temporal information, the arc description word information on word figure, and start-stop node and the arc that connects them have been described the word information that voice signal may be corresponding within certain time period jointly.A fullpath in word figure has been described the word string that time sequence information is arranged, the i.e. content of text of voice signal.

The predicate figure of institute has meaned many candidates recognition result of one section voice in the mode of directed acyclic graph.Under probabilistic framework, the recognition result of speech recognition system contains certain mistake.Therefore the many candidates recognition result meaned in word figure mode can improve the coverage rate of correct result, and is conducive to improve the performance of searching system.

Step 102, build inverted index to each word in the predicate figure of institute.

The process in the process in index building storehouse and text based searching system index building storehouse is similar, need to build inverted index to each word in word figure (being each arc).Different from traditional text retrieval system is, the content of index entry comprises is no longer that document information, lexeme such as put at the information, but voice document information, segment information, time point information etc. record respectively which voice document under word, which fragment and relative starting and ending time.

As shown in Figure 2, be the schematic diagram of a kind of inverted index in the embodiment of the present invention.

The index database built based on above-mentioned off-line, when carrying out speech retrieval, carry out the graph structure expansion by the search key to user input, at the enterprising line retrieval of inverted index created based on many candidates voice identification result, effectively improved the validity of result for retrieval and comprehensive.

As shown in Figure 3, be the process flow diagram of embodiment of the present invention speech retrieval method, comprise the following steps:

Step 301, receive the search key that the user inputs.

Step 302, carry out the individual character cutting to described search key, obtains individual character cutting participle.

Step 303, expanded described search key according to described individual character cutting participle, generates the keyword graph structure.

In embodiments of the present invention, can the mode based on full cutting participle be expanded by the search key of user's input, the dictionary based on preset carries out various possible word segmentations (being participle) processing to the search key of user's input.That is to say, consider the combination between various possible adjacent two or more individual characters, if formed the some words in described dictionary after meeting combination, be a kind of possible participle situation, thereby can obtain all sub-word of described search key.All sub-vocabularys are shown in a digraph, can obtain the keyword expansion graph structure based on full cutting.

Full cutting can avoid the participle in the traditional voice searching system only to provide a word segmentation result, thereby causes this word segmentation result to retrieve the situation less than result in index database.

Suppose the search key " conference agenda " of user's input, individual character cutting meeting obtains the word segmentation result based on all individual characters, " large | meeting | view | journey ".Then, based on this word segmentation result, expanded.If in described dictionary except " greatly ", " meeting ", " view ", " journey " four monosyllabic words, also have " conference ", " meeting ", " agenda " three two-character words, the sub-word using these all possible individual character combinations as described search key, finally form the digraph meaned by Fig. 4.

Can see, through after the keyword expansion of full cutting, any fullpath in figure is all a kind of word segmentation result.

For traditional speech searching system, if the user inputs search key " conference agenda ", the word segmentation result obtained may be " conference | agenda ".Can only for " conference " and " agenda ", be retrieved when retrieval, do not carried out the participle expansion.

Suppose to comprise in the recognition result file of certain voice document " large | meeting | journey " such recognition result, thereby traditional speech searching system can't retrieve this result and causes undetected situation, but this situation obviously should be avoided.

In embodiments of the present invention, by full cutting, process, search key " conference agenda " will obtain keyword graph structure as shown in Figure 4 after being expanded, thereby make all word segmentation result all can enter searching system, be retrieved.Owing to comprising in the keyword graph structure after full cutting is processed " large | meeting | journey " such path, therefore adopt the disposal route of full cutting just still can obtain result for retrieval.

Further, although consider that the search key that the user inputs does not appear in recognition result or described search index, but have overlapping with some words in recognition result, such as the search key of user input is " conference agenda ", and " conference " and " agenda " only arranged in recognition result.

For this situation, in embodiments of the present invention, also can be further to keyword on the basis based on full cutting expansion, it is carried out to prefix and/or suffix expansion, to obtain more perfect expanded keyword graph structure.

When carrying out Prefix Expansion, the search key that comprises N word to user's input, judge whether its front K word (1≤K≤N) is the suffix of certain word in dictionary successively, if, using this word as described search key expansion word, and it is added in the keyword graph structure.

As shown in Figure 5, wherein dotted line means " vast ", " opening conference ", " people's congress " are respectively the Prefix Expansion for " greatly ", " conference ", " Great Council ".

Equally, when carrying out the suffix expansion, to the search key that comprises N word of user's input, judge successively whether K word (1≤K≤N) is the prefix of certain word in dictionary thereafter, if, using this word also as the expansion word of described search key, and it is added in the keyword graph structure.

Still, Figure 5 shows that example, arc " agenda ", " program " that wherein dotted line means is respectively the suffix expansion for " agenda ", " journey ".

By above-mentioned forward and backward sew expansion after, need the object of retrieval further to increase on the basis of full cutting, as shown in the digraph in Fig. 5.

Compare with Fig. 4, increased all paths relevant to broken arcs in Fig. 5.By the above-mentioned forward and backward extension process of sewing, in the search key of effectively having avoided the user to input, front portion or rear portion do not appear in search library, but appear at the problem in search library as suffix or the prefix of certain word.

The user of still take input " conference agenda " is example, if " conference " and " agenda " only arranged in recognition result, can't find according to the conventional method " conference agenda " with this index building so, only have by the search key to user input and undertaken forward and backwardly sewing expansion and could finally retrieving " conference agenda ".Therefore greatly strengthen the fault-tolerant ability of searching system, further promoted the retrieval performance of system.

Step 304, retrieved the word on every arc in described keyword graph structure successively according to the index database built in advance, obtains result for retrieval.

Traditional speech retrieval method is that single word string is retrieved to single word segmentation result.Particularly, system is retrieved each word in order successively, result current in retrieving is called retrieval status, it will merge with the result for retrieval of " next word " the new retrieval status of formation, last until all words all are disposed, if final retrieval status is not empty, obtained result for retrieval corresponding to this word segmentation result.

The criterion that the retrieval status of current retrieval status and " next word " merges is: in the word in current retrieval status and " next word " same fragment in same voice document, and the start time interval of the concluding time of last word of comprising of current retrieval status and " next word " is in certain scope.

Traditional speech retrieval method can't directly apply to the keyword graph structure retrieval of this case expansion, in the keyword graph structure, user's search terms is no longer single word segmentation result, but the figure (as shown in Figure 5) formed by multiple possible word segmentation result and expansion word string.The word string of considering each the fullpath representative in the keyword graph structure all needs to be retrieved, and when total path number is compared single-pathway, can be exponential growth, if retrieved successively, can obviously reduce recall precision.

For this reason, in embodiments of the present invention, can adopt the search method of transmitting based on retrieval status, to improve recall precision.

In the keyword graph structure, an arc may belong to a plurality of different paths, and also an arc may be shared by a plurality of paths, and the path number through an arc is exactly the number of times that this arc is shared accordingly.The word of an arc representative only need be retrieved and once get final product in theory, if this arc is shared by different paths, its result for retrieval can be multiplexing so.As shown in Figure 4, " conference | agenda " and " conference | agenda " two paths, the former need to retrieve " conference ", " agenda " successively, the latter need to retrieve " conference ", " agenda " successively, the two all needs retrieval " conference ", so can be multiplexing to the result for retrieval of " conference ".

Based on These characteristics, in the embodiment of the present invention, the process that the word on every arc in described keyword graph structure is retrieved is as follows:

(1) each arc in traversal keyword graph structure, retrieved the word on arc, and result for retrieval is kept on arc.

Described result for retrieval has meaned the document and the beginning and ending time information of this word in document that comprise this word." conference " this arc of take in Fig. 5 is example, reads the index entry of " conference " from inverted index table, and content is saved in arc " conference ".

(2) retrieving information of new node more: the retrieving information of at first all nodes is set to sky, then the retrieving information of the node in the new keywords graph structure more successively from left to right.For each node, process out successively arc, the retrieving information that will go out on arc is delivered to out on the corresponding terminal node of arc, while on this terminal node, receiving new result for retrieval, will carry out order-preserving merger operation with original result.Use the purpose of order-preserving conflation algorithm to have two, the one, eliminate redundant information, another is to accelerate union operation.

Take Fig. 5 as example, and the retrieving information of at first all nodes is set to sky.Then start the transmission of retrieving information from first node.First node is processed successively it and is gone out that arc-be arc " conference ", arc " greatly ", arc " vast ", arc " are opened conference ", arc " people's congress ", result for retrieval on arc " conference " is delivered on the 3rd node, because the 3rd retrieving information that node is present is empty, it has directly retained the result for retrieval on arc " conference ".In like manner arc " greatly " is delivered to its result for retrieval on second node.Therefore and the terminal node of arc " vast " is also second node, second node can not directly receive the result for retrieval of arc " vast ", also needs to carry out the order-preserving merger with result for retrieval before.When in like manner arc " is opened conference " its result for retrieval is passed to the 3rd node, also need to do order-preserving merger operation on the 3rd node.For remaining arc and node, with same principle, operate.

(3) return to the retrieving information of the terminal node of keyword graph structure.

Take Fig. 5 as example, and the 5th node is the terminal node of keyword figure, and the retrieving information on it is exactly final result for retrieval.

Whole process can be regarded as that retrieving information constantly is passed and with the process of new retrieving information.The method makes the result for retrieval on every arc all only once processed, and the retrieval complexity reduces greatly, under the prerequisite that guarantees retrieval effectiveness, can meet efficiency requirements fully.

The speech retrieval method that the embodiment of the present invention provides, when receiving search key, utilize full cutting and/or front and back to sew expansion technique, search key is carried out to the correlogram structure extension, at the enterprising line retrieval of inverted index created based on many candidates voice identification result, greatly improved the validity of result for retrieval and comprehensive according to the keyword after expansion.

Correspondingly, the embodiment of the present invention also provides a kind of speech searching system, as shown in Figure 6, is a kind of structural representation of this system.

In this embodiment, described system comprises:

Receiver module 601, for receiving the search key of user's input;

Cutting module 602, for described search key is carried out to the individual character cutting, obtain individual character cutting participle;

Expansion module 603, for according to described individual character cutting participle, described search key being expanded, generate the keyword graph structure;

Retrieval module 604, retrieved the word on every arc of described keyword graph structure successively for the index database according to building in advance, obtains result for retrieval.

In embodiments of the present invention, described index database can adopt offline mode to build.For this reason, in the system of the embodiment of the present invention, as shown in Figure 7, also can further comprise: index database builds module 701, for adopting offline mode to build described index database, described index database builds module 701 and comprises: voice recognition unit and inverted index construction unit (not shown).Wherein:

Described voice recognition unit, carry out speech recognition for the speech documents to be retrieved, obtains the word figure that comprises word and temporal information, and the predicate figure of institute has meaned a plurality of recognition results of one section voice in the mode of directed acyclic graph;

Described inverted index construction unit, build inverted index for each word to the predicate figure of institute.

In embodiments of the present invention, described expansion module 603 not only can be expanded keyword based on full slit mode; But also can to keyword, be expanded based on full cutting and the forward and backward mode of sewing.

For this reason, a kind of embodiment of described expansion module 603 comprises: judging unit and graph structure generation unit (not shown).Wherein:

Whether described judging unit is the word of preset dictionary for the combination of adjacent two or more individual characters of judging successively described individual character cutting participle; If so, the sub-word using described combination as described search key;

Described graph structure generation unit, for all sub-vocabularys are shown in to a digraph, obtain the keyword graph structure.

Another embodiment of described expansion module not only comprises above-mentioned judging unit and graph structure generation unit, also further comprises: Prefix Expansion unit and/or suffix expanding element (not shown).Wherein:

Described Prefix Expansion unit, carry out Prefix Expansion for the word according to described preset dictionary to the participle in described keyword graph structure.Particularly, can judge successively whether forward part or whole word in described search key are the suffix of specific word in described preset dictionary; If so, the expansion word using described specific word as described search key; And described expansion word is added in described keyword graph structure.

Described suffix expanding element, carry out the suffix expansion for the word according to described preset dictionary to the participle in described keyword graph structure.Particularly, can judge successively whether rear section or whole word in described search key are the prefix of specific word in described preset dictionary; If so, the expansion word using described specific word as described search key; And described expansion word is added in described keyword graph structure.

Utilize the speech searching system of the embodiment of the present invention, when carrying out speech retrieval, carry out the graph structure expansion by the search key to user input, at the enterprising line retrieval of inverted index created based on many candidates voice identification result, effectively improved the validity of result for retrieval and comprehensive.

Further, because an arc in the keyword graph structure may belong to a plurality of different paths, also an arc can be shared by a plurality of paths, therefore, in the system of the embodiment of the present invention, retrieval module 604 in carrying out retrieving, multiplexing by the result for retrieval to arc, can effectively reduce the complexity of retrieval, improve recall precision.

For this reason, a kind of specific implementation structure of described retrieval module 604 comprises: traversal unit and updating block, wherein:

Described traversal unit, for traveling through each arc of described keyword graph structure, retrieved the word on arc according to the index database built in advance, and result for retrieval is kept on described arc;

Described updating block, for to the node of described keyword graph structure according to the order from left to right retrieving information of new node more successively, wherein, for each node, process out successively arc, the retrieving information that will go out on arc is delivered to out on the corresponding terminal node of arc, while on this terminal node, receiving new result for retrieval, will carry out order-preserving merger operation with original result for retrieval.

The speech searching system that the embodiment of the present invention provides, when receiving search key, utilize full cutting and/or front and back to sew expansion technique, search key is carried out to the correlogram structure extension, at the enterprising line retrieval of inverted index created based on many candidates voice identification result, greatly improved the validity of result for retrieval and comprehensive according to the keyword after expansion.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and between each embodiment, identical similar part is mutually referring to getting final product, and each embodiment stresses is the difference with other embodiment.Especially, for system embodiment, due to it, substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.System embodiment described above is only schematic, the wherein said unit as the separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed on a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the purpose of the present embodiment scheme.Those of ordinary skills in the situation that do not pay creative work, can understand and implement.

Above the embodiment of the present invention is described in detail, has applied embodiment herein the present invention is set forth, the explanation of above embodiment is just for helping to understand method and apparatus of the present invention; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention simultaneously.

Claims

1. a speech retrieval method, is characterized in that, comprising:

Receive the search key of user's input;

2. method according to claim 1, is characterized in that, describedly according to described individual character cutting participle, described search key expanded, and generates the keyword graph structure and comprise:

If so, the sub-word using described combination as described search key;

All sub-vocabularys are shown in a digraph, obtain the keyword graph structure.

3. method according to claim 2, is characterized in that, describedly according to described individual character cutting participle, described search key expanded, and generates the keyword graph structure and also comprise:

4. method according to claim 3, is characterized in that, describedly described individual character cutting participle is carried out to Prefix Expansion comprises:

Described expansion word is added in described keyword graph structure.

5. method according to claim 3, is characterized in that, describedly described individual character cutting participle is carried out to suffix expansion comprises:

Described expansion word is added in described keyword graph structure.

6. method according to claim 1, is characterized in that, the index database that described basis builds is in advance retrieved the word on every arc in the keyword graph structure after expansion successively, obtains result for retrieval and comprises:

7. according to the described method of claim 1 to 6 any one, it is characterized in that, described method also comprises: adopt offline mode to build described index database, the described index database of described structure comprises:

Each word in the predicate figure of institute is built to inverted index.

8. a speech searching system, is characterized in that, comprising:

Receiver module, for receiving the search key of user's input;

9. system according to claim 8, is characterized in that, described expansion module comprises:

10. system according to claim 9, is characterized in that, described expansion module also comprises:

11. system according to claim 10, is characterized in that,

Described Prefix Expansion unit, specifically for judging successively whether forward part or whole word in described search key are the suffix of specific word in described preset dictionary; If so, the expansion word using described specific word as described search key; And described expansion word is added in described keyword graph structure.

12. system according to claim 10, is characterized in that,

Described suffix expanding element, specifically for judging successively whether rear section or whole word in described search key are the prefix of specific word in described preset dictionary; If so, the expansion word using described specific word as described search key; And described expansion word is added in described keyword graph structure.

13. system according to claim 8, is characterized in that, described retrieval module comprises:

14. the described system of according to Claim 8 to 13 any one, is characterized in that, described system also comprises: