CN102314876A

CN102314876A - Speech retrieval method and system

Info

Publication number: CN102314876A
Application number: CN 201010212269
Authority: CN
Inventors: 史达飞; 鲁耀杰; 王磊; 尹悦燕; 郑继川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2010-06-29
Filing date: 2010-06-29
Publication date: 2012-01-11
Anticipated expiration: 2030-06-29
Also published as: CN102314876B

Abstract

The invention provides speech retrieval method and system. The speech retrieval method comprises the steps of: receiving retrieval input from a user; extracting a plurality of retrieval input speech features and acquiring a first confidence degree of each retrieval input speech feature by utilizing multiple groups acoustic models and language models; respectively retrieving the plurality of retrieval input speech features to obtain a retrieval result list corresponding to each retrieval input speech feature as well as a second confidence degree and a search engine score recorded by each result record in the retrieval result list; calculating the retrieval score of each result record of each speech feature according to the first confidence degree, the second confidence degree and the search engine score of the voice feature and normalizing the retrieval scores; re-sequencing each retrieval result list according to the normalized retrieval score; and merging the resequenced retrieval result list of each feature to obtain a final retrieval list.

Description

The method and system of speech retrieval

Technical field

The embodiment of the invention relates to a kind of method and speech searching system of speech retrieval, and more specifically, relates to the method and the speech searching system that adopt the speech retrieval that a plurality of phonetic features retrieve.

Background technology

In recent years, traditional character search more and more can not satisfy people's needs of diversification day by day.Along with the development of speech recognition technology, more and more receive people's attention based on the speech retrieval of speech recognition technology.But; Because user's pronunciation possibly there are differences with predetermined RP in the searching system when retrieval; Present passing through in the mode that individual voice characteristic (for example word) retrieves of adopting; The error rate of speech recognition is higher, and for the pronunciation of user search input very high requirement is arranged.

A kind of speech index and search method are disclosed in United States Patent (USP) US 2009/0030894A1 number (patent documentation 1).This method comprises: receive the retrieval input that is formed by one or more search terms; Judge that it still is " not at lexical set " that each search terms belongs to " at lexical set "; Select one or more index to retrieve according to the aggregate type under the search terms; Merge result for retrieval to each search terms; And the result for retrieval that merges all search terms.The index of patent documentation 1 and search method be divided into a plurality of search terms through retrieving input, search terms is carried out under the lexical set aggregate type judges and selects different search engines to retrieve and adopt the merging method of twice merging based on judged result, has improved the accuracy of speech recognition.

In United States Patent (USP) US 2009/0132251A1 number (patent documentation 2), a kind of method of speech retrieval is also disclosed.In the method, through using the retrieval character of time text unit, improved retrieval rate and improved the accuracy of speech recognition as speech retrieval.

Yet, in existing speech retrieval method, still have the problems such as position that retrieval precision is not high, the desirable result of user is not come relative front in the result for retrieval.In addition, when needs were retrieved multilingual or emerging vocabulary, existing speech retrieval method need rebuild searching system, this means huge workload.

Summary of the invention

To above problem, be desirable to provide a kind of method and system that can improve the speech retrieval of retrieval precision.

A kind of method of speech retrieval is provided according to an aspect of the present invention.The method of this speech retrieval comprises: receive the retrieval input from the user; The a plurality of retrievals input phonetic features of utilization many groups acoustic models and language model extraction from retrieve input also obtain first degree of confidence that phonetic feature is imported in each retrieval; Respectively a plurality of retrievals input phonetic features are retrieved, to obtain second degree of confidence and search engine score corresponding to every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the result for retrieval tabulation; Calculate according to first degree of confidence of each phonetic feature, second degree of confidence and search engine score this phonetic feature every outcome record the retrieval score and carry out normalization; According to normalized retrieval score, each result for retrieval tabulation is resequenced; And merge the result for retrieval after the rearrangement of each characteristic tabulation to obtain final retrieval tabulation.

A kind of system of speech retrieval is provided according to an aspect of the present invention.The system of this speech retrieval comprises: load module is used to receive the retrieval input from the user; Decoder module is used for utilizing many group acoustic models and language model to extract a plurality of retrieval input phonetic features and obtain first degree of confidence that phonetic feature is imported in each retrieval from retrieving input; Retrieval module; The a plurality of retrievals input phonetic features that respectively decoder module extracted are retrieved, to obtain second degree of confidence and the search engine score corresponding to every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the result for retrieval tabulation; Module reorders; Calculate according to first degree of confidence, second degree of confidence and the search engine score of the phonetic feature of each phonetic feature this phonetic feature every outcome record the retrieval score and carry out normalization; And, each result for retrieval tabulation is resequenced according to normalized retrieval score; And the merging module, merge the tabulation of the result for retrieval after the rearrangement of each characteristic to obtain final retrieval tabulation.

Through utilizing a plurality of characteristics of voice, speech retrieval method and system of the present invention can obtain than the better result of speech retrieval method and system who uses a phonetic feature.And through utilizing degree of confidence that result for retrieval is resequenced, when having reduced speech recognition than the influence of low confidence result to phonetic search.

In addition, speech retrieval method and system of the present invention is applicable to multilingual speech retrieval and to the speech retrieval to emerging vocabulary.

Description of drawings

In order to be illustrated more clearly in the technical scheme of the embodiment of the invention; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is a process flow diagram of having described speech retrieval method according to an embodiment of the invention;

Fig. 2 is the process flow diagram of having described according to an example of the step of the method shown in Figure 1 of the embodiment of the invention;

Fig. 3 is the process flow diagram of having described according to another example of the step of the method shown in Figure 1 of the embodiment of the invention;

Fig. 4 shows the block diagram of speech searching system according to an embodiment of the invention.

Fig. 5 shows the block diagram of speech searching system according to another embodiment of the invention.

Fig. 6 shows the block diagram of speech searching system according to another embodiment of the invention.

Embodiment

To combine the accompanying drawing in the embodiment of the invention below, describe the preferred embodiments of the present invention in detail.Note, in this instructions and accompanying drawing, have substantially the same step and represent with identical Reference numeral, and will be omitted the repetition of explanation of these steps and element with element.

Fig. 1 is a process flow diagram of having described speech retrieval method 1000 according to an embodiment of the invention.Below, will the speech retrieval method 1000 based on one embodiment of the present of invention be described with reference to Fig. 1.

In the step S1010 of Fig. 1, receive retrieval input from the user.Utilization in step S 1020 organizes acoustic models and a plurality of retrievals input phonetic features of language model extraction from the retrieval input and obtains first degree of confidence that each retrieves the input phonetic feature.Phonetic feature can be acoustic feature, phoneme characteristic, inferior character features, speech (word) characteristic or voice identification result etc.The phoneme characteristic is meant the voice unit (VU) of the minimum that the mankind can send, and the permutation and combination of a plurality of phonemes can be formed a word, and the set of phonemes of different language all is limited, and the English International Phonetic Symbols are a kind of the most frequently used set of phonemes.Digital speech can be converted into aligned phoneme sequence through speech recognition technology.With English word banana is example: digital speech banana can be converted into B AANAANAAHH through speech recognition technology, and ' B ' is exactly a phoneme here.Inferior text unit characteristic is meant a reasonably combination (usually greater than two phonemes) of phoneme characteristic, but this combination can not constitute the pronunciation of a word.The inferior text unit of different language also is a finite aggregate, and inferior text unit is more than the quantity of factor.Utilize speech recognition technology can say that voice change into time text unit sequence.With English word banana is example: digital speech banana can be converted into B-AAN-AAN-AA-HH through speech recognition technology, and ' B-AA ' is exactly one text unit here.Voice identification result is to utilize speech recognition technology a digital voice file to be changed the true literal that can read for the people.This literal just is called voice identification result.The recognition result of each phonetic feature that produces through speech recognition is not absolutely accurately.

Acoustic model is through utilizing digital voice file and the corresponding artificial markup information as phonetic material, the probability model that uses speech recognition technology between the acoustic feature of word (word) and digital voice file, to set up.Acoustic model is that voice are important inputs of speech recognition engine.So before carrying out speech recognition, must train acoustic model in advance.Through using different training utterance materials can obtain different acoustic models.

Language model is based on the frequency of occurrences and the front and back order of word in the text material, the statistical model that uses a large amount of text material training to obtain.Be that example describes with 3 kinds of language models in an embodiment of the present invention, that is, and phonemic language model, inferior text unit language model and speech language model.Acoustic model and language model can use in a plurality of fields such as natural language processing, machine learning, text marking and full-text search.

Next, in step S1030, respectively a plurality of retrievals input phonetic features are retrieved, to obtain degree of confidence and search engine score corresponding to every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the result for retrieval tabulation.For first degree of confidence with the front phonetic feature differentiates, the degree of confidence of outcome record is called second degree of confidence.Utilize different search engines that different input phonetic features is retrieved, and retrieve for same input phonetic feature a plurality of search engines capable of using.

Degree of confidence is an output of decoding in the speech recognition process.As stated; Because the recognition result of each phonetic feature that produces through speech recognition is not absolutely accurately; The decode procedure of speech recognition is actually the output procedure of probability statistics, so the output result of each decoding can have a degree of confidence to represent the correct probability of decoding this time.Usually the value of degree of confidence is between 0.0 and 1.0, to change.In addition, the concrete grammar that calculates degree of confidence is to scope formation of the present invention restriction.Can calculate first and second degree of confidence through any method known in the art.For example, can come the first or second degree of confidence CLi of computing voice characteristic Ei through following formula 1:

{CL}_{i} = Π_{i = 1}^{n} P_{i} (E_{i} | E_{1}, E_{2}, . . ., E_{i - 1})

... formula 1

Wherein, use one group of acoustic model and language model decoded speech file, so that voice document is transformed into sequence E1, E2, E3, the E4......En of a phonetic feature.Pi (Ei|E1, E2 ..., Ei-1) expression when sequence E1, E2 ..., after Ei-1 occurs, i position is the probability of Ei.When calculating degree of confidence, acoustic feature also can be utilized in addition.Node degree of confidence when for example decoding in the hidden Markov model of acoustic feature (HMM) model can form new degree of confidence with the top degree of confidence acting in conjunction of mentioning.The HMM model mainly is used for making up acoustic model (referring to http://www.hudong.com/wiki/ hidden Markov model) in speech recognition.The node of HMM model is generally represented phoneme (phone).Migration probability between the sequence node that the output result of HMM model forms, this mobility can form new degree of confidence jointly with the degree of confidence in the language model as stated.

In step S1040, calculate the retrieval score of every outcome record of this phonetic feature according to first degree of confidence, second degree of confidence and the search engine score of each phonetic feature.In addition, because different search engines does not have a comparability, so in this step, also need carry out normalization to the retrieval score.In addition, the retrieval score is carried out normalized concrete grammar and also scope of the present invention is not constituted restriction.Can carry out normalization through any method known in the art to the retrieval score.For example, according to linear model, become 0.0 to minimum score, maximum score becomes 1.0; Perhaps, become 0.0 to the minimum value place of statistical distribution, the summation place 1.0 of statistical distribution according to statistical model; Perhaps according to statistical model, Gaussization makes average become 0.0, and variance is set as 1.0 or the like.

Then, in step S1050,, each result for retrieval tabulation is resequenced according to normalized retrieval score.At last, in step S1060, merge the tabulation of the result for retrieval after the rearrangement of each characteristic to obtain final retrieval tabulation.Can the retrieval process for the retrieval input be called online treatment.

In the present embodiment, the retrieval input can be phonetic entry and also can be the literal input.When said user is input as the literal input; Can use dictionary from the retrieval input, to extract a plurality of retrieval input phonetic features; And first degree of confidence is set to represent the value that the phonetic feature of the phonetic feature that extracted and actual input matees fully; For example, the value of first degree of confidence is set to 1.0.

Below will combine the step in the accompanying drawing specific descriptions method 1000.Fig. 2 is the process flow diagram of having described according to an example of the step of the method shown in Figure 1 1000 of the embodiment of the invention.The example of the concrete implementation of the step S1030 among Fig. 1 is described with reference to Fig. 2 below.

In method shown in Figure 2, in order to improve the speed of retrieval, index capable of using is a plurality of retrieval input of retrieval phonetic feature in the voice record set.Can obtain the index of a plurality of retrieval input phonetic features through the mode shown in Fig. 2.Particularly, at first in step S2010, read voice document from the voice record set.Then, in step S2020, with step S1020 similarly, utilize many group acoustic models and language model from voice document, to extract a plurality of file voice characteristics, and the degree of confidence of calculating each file voice characteristic is as second degree of confidence.Next in step S2030, each file voice characteristic and the voice document at its place, the position and second degree of confidence in voice document are associated, so that improve retrieval rate.Last in step S2040 the related information of voice document, the position in voice document and second degree of confidence at storage file phonetic feature and its place as index.Table 1 schematically shows the index that is generated, and wherein E representes phonetic feature, and AM representes acoustic model, LM representation language model.

Table 1 phonetic feature index and description

An example of the index that obtains according to present embodiment has been shown in the table 1.As shown in table 1; Phonetic feature and the position of this characteristic in voice document are (promptly; The start and end time that this phonetic feature occurs in comprising the voice document of this phonetic feature) degree of confidence of this phonetic feature is associated and in this voice document, forms the index of this phonetic feature.A phonetic feature can mate with the voice document in a plurality of voice set, also can mate with a plurality of voice segments in the voice document.Can the processing that generate index be called processed offline.

When retrieving, the file voice characteristic that comprises in the retrieval that is retrieved input phonetic feature and the index is corresponding.Below, also the file voice characteristic is called phonetic feature to be retrieved.For example, retrieval input phoneme characteristic is corresponding with phoneme characteristic to be retrieved, and retrieval input time character features is corresponding with to be retrieved character features, and retrieval input speech characteristic is corresponding with speech characteristic to be retrieved.Table 2 shows the exemplary illustration figure according to the retrieval corresponding relation of the embodiment of the invention.Wherein, ' E11 (phoneme, AM1, LM1) ' expression phoneme characteristic of using acoustic model 1 and language model 1 from the retrieval input, to extract.E11 (phoneme, dictionary) ' is illustrated in that the user is input as literal input but not during phonetic entry, the phoneme characteristic of using dictionary to be converted to.' √ ' expression has corresponding relation to need retrieval, so just can obtain the retrieval tabulation on this characteristic.In retrieving, when obtaining type of retrieval table, obtained degree of confidence through making index of reference, therefore improve retrieval rate.

Table 2 retrieval corresponding relation

Fig. 3 is the process flow diagram of having described according to another example of the step of the method shown in Figure 1 1000 of the embodiment of the invention.With reference to Fig. 3 the example of a concrete implementation of retrieval score that first degree of confidence, second degree of confidence and search engine score according to each phonetic feature calculated every outcome record of this phonetic feature is described among the step S1040 among Fig. 1 below.As shown in Figure 3, in step S3010, from result for retrieval tabulation, obtain a record, and the retrieval score (TSi) that this record is set is 0.0, i is the position of record in the tabulation here.In step S3020, obtain the degree of confidence of a retrieval input phonetic feature, and in step 3025 the scanning result record, import phonetic feature and whether be present in the outcome record to check this retrieval.If retrieval input phonetic feature is present in the outcome record characteristic, advances to step S3030, otherwise advance to step S3040.

In step S3030, calculate retrieval score (TSi): TSi+=Si * CLq * CLr.Here Si is the search engine score of result for retrieval record, and CLq is first degree of confidence of retrieval input phonetic feature, and CLr is second degree of confidence of the current results record of retrieval input phonetic feature.After this, at step S3045, judged whether more retrieval input phonetic feature.If have then return step S3020, otherwise advance to step S3050.

In step S3040, PTS TSi+=Si, Si is the search engine score of result for retrieval record here, and has judged whether more retrieval input phonetic feature at step S3045.If have then return step S3020, otherwise advance to step S3050.

Next, in step S3050, preserve TSi.And judged whether more result for retrieval record at step S3055.If more result for retrieval record is arranged then turn back to step S3010, otherwise advance to step S3060.At last, in step S3060, use TSi that the retrieval tabulation is sorted again.Replacedly, also can after carrying out normalization, resequence again.

Below the concrete implementation of the step S1060 among Fig. 1 will be described based on another embodiment of the present invention.Because the standard of the score of different search engines is different, need be based on concrete requirement to result for retrieval tabulation carrying out weighting.

In step S1060,, merge each retrieval tabulation after the weighting to obtain final retrieval tabulation then the tabulation carrying out of the result for retrieval after rearrangement weighting.The most simply merging method is linear the merging.Can give weight of each signature search engine during linear the merging, the summation of weight is required to be 1.0.Can carry out linearity according to formula 2 closes.Weighti is the weight of i signature search engine in formula 2.Weighted value is trained often and is obtained in actual applications, and will pay close attention to application during training is to stress precision or recall rate.For same phonetic feature, precision and recall rate are inversely proportional to.' n ' is the number of signature search engine.If one outcome record appears in i the search engine, so score _iBe exactly that this is recorded in the score in the different search engines.Otherwise, score _iBe 0.0.NewScorei representes new score.

NewScorei = Σ_{i}^{n} {Weight}_{i} \cdot {Score}_{i}

... formula 2

Merging method in addition such as Comb MNZ are shown in formula 3.The new score of ' CombMNZ ' expression.' SUM (Individual Similarities) ' representes the same score that is recorded in the different search engines.' what signature search engines Number ofNonzero Similarities ' expression has comprised this record.

CombMNZ＝SUM(Individual?Similarities)×Number?of?Nonzero

Similarities ... formula 3

Also can use other merging method, for example Borda-fuse, Bayes-fuse method etc.The concrete grammar that merges does not constitute restriction to scope of the present invention.

For different search engines, the standard of the new score that obtains through weighting be unified be comparable.So just can merge to obtain final retrieval tabulation based on the score after the weighting.

In addition, speech retrieval method according to an embodiment of the invention is also applicable to multilingual speech retrieval.The voice record set is often very big, can comprise multilingual voice sometimes.When extracting phonetic feature, add the language model that different language was trained if can cross, just can obtain the phonetic feature of different language.Through the described speech retrieval method of above embodiment, can handle multilingual problem like a cork.For example, when the acoustic model that uses japanese voice train and language model carried out the phonetic feature extraction to japanese voice, the degree of confidence of the phonetic feature that extracts can be very high.And when the acoustic model of training with japanese voice and language model processing Chinese and English, the degree of confidence of the phonetic feature that extracts so will be very low.So according to the retrieval score, the result for retrieval record nature of Japanese will come forward position when in the end merging.

In addition, the also dynamically language of extended voice searching system and vocabulary of speech retrieval method according to an embodiment of the invention.For speech searching system extended language and vocabulary are very important tasks.In practical application, voice record is gathered very complicacy and all increasing every day.Often the reason of Search Results variation possibly be exactly that many new vocabulary or new language file have joined the voice record set.In speech retrieval method, only need with new model of training voice document to be carried out phonetic feature then and extract, and generate index with new vocabulary or new acoustic model and the language model of speech training based on above embodiment.When user input comprised the retrieval input of new term or language, the new phonetic feature that produces will join search, and in the effect of retrieving, reorder, last the results list being contributed after normalization and the merging oneself.Because the degree of confidence that the phonetic feature that extracts like this obtains is higher, therefore in retrieval, can improve the rank of himself.Because the new acoustic model that when having new vocabulary or language to add, uses and language model are only based on new vocabulary or speech training, the vocabulary that comprises is fewer, so processing speed is than comparatively fast.

Below, with reference to Fig. 4 speech searching system according to an embodiment of the invention is described.Fig. 4 shows the block diagram of speech searching system 400 according to an embodiment of the invention.As shown in Figure 4, the speech searching system 400 of present embodiment comprises load module 410, decoder module 420, retrieval module 430, the module that reorders 440 and merges module 450.Each module of speech searching system 400 can be carried out each step/function of the speech retrieval method among above-mentioned each embodiment respectively, and is therefore succinct in order to describe, and no longer specifically describes.

For example, load module 410 can receive the retrieval input from the user.Decoder module 420 many group acoustic models capable of using and language model extract a plurality of retrieval input phonetic features and obtain first degree of confidence of each retrieval input phonetic feature from the retrieval input.Retrieval module 430 is retrieved a plurality of retrievals input phonetic features that decoder module 420 extracts respectively, to obtain second degree of confidence and the search engine score corresponding to every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the result for retrieval tabulation.The module that reorders 440 can calculate according to first degree of confidence, second degree of confidence and the search engine score of the phonetic feature of each phonetic feature this phonetic feature every outcome record the retrieval score and carry out normalization; And, each result for retrieval tabulation is resequenced according to normalized retrieval score.Merging module 450 can merge the tabulation of the result for retrieval after the rearrangement of each characteristic to obtain final retrieval tabulation.

Fig. 5 shows the block diagram of speech searching system 500 according to another embodiment of the invention.In speech searching system shown in Figure 5 500, represent the identical parts of parts in the speech searching system 400 with Fig. 4 with identical Reference numeral.

In the speech searching system 500 of another embodiment of the present invention, load module 510 is except receiving from also reading voice document from the voice record set user's the retrieval input.Decoder module 520 utilizes many group acoustic models and language model from the retrieval input, to extract a plurality of retrievals input phonetic features and obtains first degree of confidence of each retrieval input phonetic feature, and be used for utilizing said many group acoustic models and language model from the said voice document a plurality of file voice characteristics of extraction and the degree of confidence of calculating each file voice characteristic as said second degree of confidence.Speech searching system 500 also can comprise index module 560 in addition.Each file voice characteristic that index module 560 can be extracted decoder module 420 and the voice document at its place, position and said second degree of confidence in said voice document are associated, and store related information as index.Retrieval module 430 utilizes the said a plurality of retrieval input phonetic features of index retrieval in the voice record set of storage in the index module 560.In addition; Similar with speech searching system 400; Speech searching system 500 also can comprise according to first degree of confidence of the phonetic feature of each phonetic feature, second degree of confidence and search engine score can calculate this phonetic feature every outcome record the retrieval score and carry out normalization; And according to normalized retrieval score; Reorder module 440 and merge the tabulation of the result for retrieval after the rearrangement of each characteristic to obtain the merging module 450 of final retrieval tabulation that tabulation is resequenced to each result for retrieval here repeats no more.

Through utilizing a plurality of characteristics of voice, speech searching system of the present invention can obtain than the better result of speech searching system who uses a phonetic feature.And through utilizing degree of confidence that result for retrieval is resequenced, when having reduced speech recognition than the influence of low confidence result to phonetic search.

In addition, can be externally in the storer with acoustic model and language model, voice record set and the index stores that is obtained.And speech searching system also can comprise output module, to export result for retrieval to the user.

In addition, can centralized (for example Fig. 4 and shown in Figure 5) or distributed formation embodiments of the invention in speech searching system.Fig. 6 shows the block diagram of speech searching system 600 according to another embodiment of the invention.In Fig. 6, with the speech searching system shown in the distribution mode pie graph 5.For example, load module 510 is arranged in distribution apparatus 610, and decoder module 420, index module 560, retrieval module 430, the module that reorders 440, merges module 450 and be arranged in distribution apparatus 620.

Distribution apparatus

610 and 620 is the devices that separate, and the position can for example be connected to each other through network 630 away from each other.Certainly, above-mentioned module also can make up according to other/the sub mode that makes up, and is distributed in the device of each long-distance distribution.

In addition, can also the speech searching system shown in a plurality of Fig. 4 and/or Fig. 5 be interconnected through network.

It should be noted that the method shown in Fig. 1-3 each step needn't according to shown in order carry out.Can put upside down or carry out concurrently some step.For example in step S1020, utilize many group acoustic models and language model from the retrieval input, to extract a plurality of retrievals input phonetic features and obtain after first degree of confidence of each retrieval input phonetic feature, can be for the many retrievals input phonetic features while execution in step S1030 that extracted to step S1050.

Those of ordinary skills can recognize; The unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein; Can realize with electronic hardware, computer software or the combination of the two; For the interchangeability of hardware and software clearly is described, the composition and the step of each example described prevailingly according to function in above-mentioned explanation.These functions still are that software mode is carried out with hardware actually, depend on the application-specific and the design constraint of technical scheme.Those skilled in the art can use distinct methods to realize described function to each certain applications, but this realization should not thought and exceeds scope of the present invention.

It should be appreciated by those skilled in the art that can be dependent on design requirement and other factors carries out various modifications, combination, part combination and replacement to the present invention, as long as they are in the scope of appended claims and equivalent thereof.

Claims

1. the method for a speech retrieval may further comprise the steps:

Reception is from user's retrieval input;

Utilize to organize acoustic models and a plurality of retrieval input phonetic features of language model extraction from said retrieval is imported more and obtain each and retrieve first degree of confidence of importing phonetic feature;

Respectively said a plurality of retrievals input phonetic features are retrieved, to obtain second degree of confidence and search engine score corresponding to every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the tabulation of said result for retrieval;

Calculate according to first degree of confidence of each phonetic feature, second degree of confidence and search engine score this phonetic feature every outcome record the retrieval score and carry out normalization;

According to normalized retrieval score, each result for retrieval tabulation is resequenced; And

Merge the tabulation of the result for retrieval after the rearrangement of each characteristic to obtain final retrieval tabulation.

2. the method for claim 1, the wherein said merging tabulation of the result for retrieval after the rearrangement of each characteristic to obtain final retrieval tabulation comprises:

To the tabulation carrying out of the result for retrieval after said rearrangement weighting; And

Merge each retrieval tabulation after the weighting to obtain final retrieval tabulation.

3. the method for claim 1; Saidly respectively said a plurality of retrievals input phonetic features are retrieved, are comprised corresponding to second degree of confidence and the search engine score of every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the tabulation of said result for retrieval obtaining:

Utilize the said a plurality of retrieval input phonetic features of index retrieval in the voice record set,

Wherein obtaining said index comprises:

Read voice document from said voice record set;

The degree of confidence of utilizing said many group acoustic models and language model from said voice document, to extract a plurality of file voice characteristics and calculating each file voice characteristic is as said second degree of confidence;

Each file voice characteristic and the voice document at its place, position and said second degree of confidence in said voice document are associated; And

The storage related information is as index.

4. the file voice characteristic that comprises in the method as claimed in claim 3, the retrieval that wherein is retrieved input phonetic feature and index is corresponding.

5. the method for claim 1, wherein

Said many group acoustic models and language model are corresponding at least a language.

6. the method for claim 1, wherein

Said many group acoustic models comprise different vocabulary each other with language model.

7. the method for claim 1, wherein

Said retrieval is input as phonetic entry and/or literal input,

When said retrieval is input as literal when input, said first degree of confidence is set to represent the value that the phonetic feature of the phonetic feature that extracted and actual input matees fully.

8. the method for claim 1, wherein said phonetic feature is acoustic feature, phoneme characteristic, inferior character features, speech characteristic or voice identification result.

9. the system of a speech retrieval comprises:

Load module is used to receive the retrieval input from the user;

Decoder module is used for utilizing many group acoustic models and language model to extract first degree of confidence that a plurality of retrievals are imported phonetic features and obtained each retrieval input phonetic feature from said retrieval input;

Retrieval module; The said a plurality of retrievals input phonetic features that respectively said decoder module extracted are retrieved, to obtain second degree of confidence and the search engine score corresponding to every outcome record in the result for retrieval tabulation of each retrieval input phonetic feature and the tabulation of said result for retrieval;

Module reorders; Calculate according to first degree of confidence, second degree of confidence and the search engine score of the phonetic feature of each phonetic feature this phonetic feature every outcome record the retrieval score and carry out normalization; And, each result for retrieval tabulation is resequenced according to normalized retrieval score; And

Merge module, merge the tabulation of the result for retrieval after the rearrangement of each characteristic to obtain final retrieval tabulation.

10. system as claimed in claim 9, wherein

Said load module also is used for reading voice document from the voice record set;

Said decoder module also be used for utilizing said many group acoustic models and language model from the said voice document a plurality of file voice characteristics of extraction and the degree of confidence of calculating each file voice characteristic as said second degree of confidence;

Said system also comprises:

Index module, is associated in the position and said second degree of confidence of said voice document at each the file voice characteristic that is used for decoder module is extracted and the voice document at its place, and stores related information as index,

Said retrieval module utilizes the said a plurality of retrieval input phonetic features of index retrieval in the voice record set of storing in the said index module.