CN109410935A

CN109410935A - A kind of destination searching method and device based on speech recognition

Info

Publication number: CN109410935A
Application number: CN201811295008.XA
Authority: CN
Inventors: 安栋; 伍朗; 刘继鹏; 魏斌斌; 冯智斌
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-11-01
Filing date: 2018-11-01
Publication date: 2019-03-01

Abstract

The embodiment of the invention provides a kind of destination searching method and device based on speech recognition, the present invention relates to artificial intelligence fields, method includes: the first voice signal for obtaining the first user, and the first voice signal is the voice signal for being used to indicate search destination real picture that the first user issues；Feature extraction is carried out to the first voice signal, obtains characteristic information；Characteristic voice is identified according to characteristic information, and identifies that the first voice signal obtains destination document using the speech recognition modeling to match with characteristic voice；Grab the structured field of multiple real pictures and place webpage on a search engine according to destination document；The keyword in structured field is extracted, and calculates the similarity value of keyword and destination document；Real picture corresponding to the maximum keyword of similarity value real picture and is exported to the first user as a purpose.Technical solution provided in an embodiment of the present invention is able to solve the low problem of accuracy of the search of destination in the prior art.

Description

A kind of destination searching method and device based on speech recognition

[technical field]

The present invention relates to artificial intelligence field more particularly to a kind of destination searching methods and dress based on speech recognition It sets.

[background technique]

When user needs to search for certain pictures, inputting word information in a search engine is needed, then search engine exhibition Show corresponding picture.For example, user inputs " West Lake " the two words in a search engine, search engine shows the picture of the West Lake. But when driving, user wants to obtain the picture of destination to be very inconvenient, or even will affect user security driving.Mesh Before, user is difficult to intuitively understand the outdoor scene situation of the local environment of destination when searching for destination, is only confirmed by text Destination is easy to appear the low problem of accuracy of destination search.

[summary of the invention]

In view of this, the embodiment of the invention provides a kind of destination searching method and device based on speech recognition, is used Accuracy to solve the problems, such as destination search in the prior art is low.

To achieve the goals above, according to an aspect of the invention, there is provided a kind of destination based on speech recognition Searching method, which comprises

The first voice signal of the first user is obtained, first voice signal is first user sending for referring to Show the voice signal of search destination real picture；Feature extraction is carried out to first voice signal, obtains characteristic information；Root Characteristic voice is identified according to the characteristic information, and using the speech recognition modeling identification to match with the characteristic voice described the One voice signal, obtains destination document；Multiple real pictures and more are grabbed on a search engine according to the destination document The structured field of webpage where a real picture；The keyword in the structured field is extracted, and calculates the keyword With the similarity value of the destination document；Using real picture corresponding to the maximum keyword of the similarity value as described in Destination real picture, and the destination real picture is exported to first user.

Further, after first voice signal for obtaining the first user, and believe to first voice Number carry out feature extraction, before obtaining characteristic information, the method also includes:

Obtain multiple Noisy Speech Signal samples and multiple clean speech sample of signal；Noise reduction model is constructed and trains, In, the noise reduction model includes a generator and an arbiter, and the generator receives the Noisy Speech Signal sample, And new voice signal is generated according to the Noisy Speech Signal sample, the arbiter identifies the described of the generator generation New voice signal is the signal of actual signal or generation；By the dual training of the arbiter and the generator, obtain To trained noise reduction model；First voice signal is inputted into the trained noise reduction model, wherein the noise reduction Model carries out noise reduction process to first voice signal, and generates the second voice signal；Obtain the noise reduction model output Second voice signal, to replace the first voice signal of the acquisition.

Further, after first voice signal for obtaining the first user, and believe to first voice Number carry out feature extraction, before obtaining characteristic information, the method also includes: using least mean square algorithm to first voice Signal carries out noise reduction process, and obtains the corresponding mean square error gradient of current an iteration；According to the corresponding mean square error of M iteration The concussion situation of poor gradient, it is determined whether have reached mean square error convergence sensitizing range, include described current in the M iteration An iteration；According to definitive result, convergence factor used by least mean square algorithm when next iteration is updated；Based on institute The first voice signal after stating convergence factor output noise reduction；The of the acquisition is replaced with the first voice signal after the noise reduction One voice signal.

Further, the keyword extracted in the structured field, and calculate the keyword and the purpose The similarity value of ground text, comprising: word segmentation processing is carried out to the structured field；Structured field after extracting word segmentation processing In keyword；The keyword and the destination document that mode input extracts are indicated to preset term vector, and are obtained The term vector is taken to indicate that the vector of each keyword of model output indicates and the vector of the destination document indicates；Using Cosine similarity formula calculates the similarity value of the vector of the keyword and the vector of the destination document.

Further, after the output destination real picture to first user, the method is also wrapped It includes: the comment text information about the destination is obtained according to the destination document；The comment text information is based on The segmenting method of string matching carries out word segmentation processing, obtains the keyword with evaluation attributes；It is generated based on the keyword Evaluate voice；The evaluation voice is exported to first user.

To achieve the goals above, according to an aspect of the invention, there is provided a kind of destination based on speech recognition Searcher, described device include: first acquisition unit, for obtaining the first voice signal of the first user, first language Sound signal is the voice signal for being used to indicate search destination real picture that first user issues；Extraction unit is used for Feature extraction is carried out to first voice signal, obtains characteristic information；First recognition unit, for according to the characteristic information It identifies characteristic voice, and first voice signal is identified using the speech recognition modeling to match with the characteristic voice, obtain To destination document；Picking unit, for grabbing multiple real pictures and more on a search engine according to the destination document The structured field of webpage where a real picture；Computing unit for extracting the keyword in the structured field, and is counted Calculate the similarity value of the keyword Yu the destination document；First output unit, for the similarity value is maximum Real picture corresponding to keyword exports the destination real picture to described as the destination real picture One user.

Further, the computing unit includes: the first processing subelement, for segmenting to the structured field Processing；Second processing subelement, for extracting the keyword in the structured field after word segmentation processing；Subelement is obtained, is used for The keyword and the destination document that mode input extracts are indicated to preset term vector, and obtain the term vector Indicate the vector of each keyword of model output and the vector of the destination document；Computation subunit, for using cosine Similarity formula calculates the similarity value of the vector of the keyword and the vector of the destination document.

Further, described device further include: second acquisition unit, for being obtained according to the destination document about institute State the comment text information of destination；Processing unit, for by participle side of the comment text information based on string matching Method carries out word segmentation processing, obtains the keyword with evaluation attributes；Generation unit, for generating evaluation language based on the keyword Sound；Second output unit, for exporting the evaluation voice to first user.

To achieve the goals above, according to an aspect of the invention, there is provided a kind of storage medium, the storage medium Program including storage, wherein equipment where controlling the storage medium in described program operation executes above-mentioned based on language The destination searching method of sound identification.

To achieve the goals above, according to an aspect of the invention, there is provided a kind of server, including memory and place Device is managed, the memory is used to control the execution of program instruction, institute for storing the information including program instruction, the processor State the step of above-mentioned destination searching method based on speech recognition is realized when program instruction is loaded and executed by processor.

In the present solution, carrying out voice knowledge to user speech by using the speech recognition modeling to match with language feature Not, real picture is further obtained according to the destination document recognized, it will be in the structured field of webpage where real picture Keyword and destination document carry out similarity calculation, to obtain the maximum real picture of similarity value outdoor scene as a purpose Picture, the accuracy searched for can be improved customer objective allow users to the locating ring for more intuitively understanding destination Border, so that user be assisted more accurately to reach the destination.

[Detailed description of the invention]

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is a kind of flow chart of destination searching method based on speech recognition according to an embodiment of the present invention；

Fig. 2 is a kind of schematic diagram of destination searcher based on speech recognition according to an embodiment of the present invention.

[specific embodiment]

For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing It states.

It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its Its embodiment, shall fall within the protection scope of the present invention.

The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning.

It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".

It will be appreciated that though terminal may be described using term first, second, third, etc. in embodiments of the present invention, But these terminals should not necessarily be limited by these terms.These terms are only used to for terminal being distinguished from each other out.For example, not departing from the present invention In the case where scope of embodiments, first acquisition unit can also be referred to as second acquisition unit, similarly, second acquisition unit First acquisition unit can be referred to as.

Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination " or " in response to detection ".Similarly, depend on context, phrase " if it is determined that " or " if detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when the detection (condition of statement Or event) when " or " in response to detection (condition or event of statement) ".

Fig. 1 is a kind of flow chart of destination searching method based on speech recognition according to an embodiment of the present invention, such as Fig. 1 It is shown, this method comprises:

Step S101, obtains the first voice signal of the first user, and the first voice signal is being used for for the first user sending Indicate the voice signal of search destination real picture；

Step S102 carries out feature extraction to the first voice signal, obtains characteristic information；

Step S103 identifies characteristic voice according to characteristic information, and using the speech recognition mould to match with characteristic voice Type identifies the first voice signal, obtains destination document；

Step S104 grabs multiple real pictures and multiple real pictures place according to destination document on a search engine The structured field of webpage；

Step S105 extracts the keyword in structured field, and calculates the similarity value of keyword and destination document；

Step S106, by the real picture as a purpose of real picture corresponding to the maximum keyword of similarity value, and Output destination real picture is to the first user.

Optionally, feature extraction for example can be spectrum signature extraction, fundamental frequency feature extraction, power feature extraction or zero passage Rate extraction etc..And it is possible to using support vector machines (support vector machine, SVM) or hidden Markov mould The modeling techniques such as type (Hidden Markov Model, HMM) establish discrimination model, wherein discrimination model includes mandarin mould Type, Chongqing accent model, Henan accent model, Guangdong language accent model, Wu dialect spoken language model and northern accent model；To identify Characteristic voice is mandarin, Chongqing accent, Wu dialect accent, Henan accent or Guangdong accent etc..

Optionally, after the first voice signal for obtaining the first user, and feature is being carried out to the first voice signal It extracts, before obtaining characteristic information, method further include:

Obtain multiple Noisy Speech Signal samples and multiple clean speech sample of signal；Noise reduction model is constructed and trained, is dropped Model of making an uproar includes a generator and an arbiter, and generator receives Noisy Speech Signal sample, and is believed according to noisy speech Number sample generates new voice signal, and the new voice signal that arbiter judges that generator generates is actual signal or generation Signal；By the dual training of arbiter and generator, trained noise reduction model is obtained；It is defeated into trained noise reduction model Enter the first voice signal, wherein noise reduction model carries out noise reduction process to the first voice signal, and generates the second voice signal；It obtains The second voice signal for taking noise reduction model to export, to replace the first voice signal obtained.

By the study to a large amount of noisy speeches and clean speech, so that the slave noisy speech that generator learns is generated and is done The ability of net voice, and enable to the new voice signal generated that can out-trick arbiter, this deep learning noise reduction model Suitable for various noise types and environment, there is general applicability, convenient for promoting.

Optionally, after the first voice signal for obtaining the first user, and feature is being carried out to the first voice signal It extracts, before obtaining characteristic information, method further include: noise reduction process is carried out to the first voice signal using least mean square algorithm, And obtain the corresponding mean square error gradient of current an iteration；According to the concussion situation of the corresponding mean square error gradient of M iteration, Determine whether to have reached mean square error convergence sensitizing range, includes current an iteration in M iteration；According to definitive result, more Convergence factor used by least mean square algorithm when new next iteration；The first voice letter after exporting noise reduction based on convergence factor Number；The first voice signal obtained with the first voice signal replacement after noise reduction.

Specifically, if the number that the corresponding mean square error gradient of M iteration meets formula A is greater than or equal to default value, It then determines and has reached mean square error convergence sensitizing range, otherwise, it determines not up to mean square error sensitizing range: A=[e (i) x (i)] [e (i-1) x (i-1)] < 0；Wherein, e (i) is the corresponding error signal of i-th iteration in M iteration, and x (i) is The corresponding voice signal of i-th iteration；E (i-1) is the corresponding error signal of (i-1)-th iteration in M iteration, and x (i-1) is The corresponding voice signal of (i-1)-th iteration.

Optionally, structured field includes web page title field, Anchor Text field, at least one in picture attribute field Kind.

Optionally, the keyword in structured field is extracted, and calculate the similarity value of keyword and destination document Method, comprising:

Word segmentation processing is carried out to structured field, the keyword in structured field after extracting word segmentation processing；Such as it can With using Open-Source Tools are segmented, such as ICTCLAS, SCWS etc. can also be directly using independently developed participle interface to structure Change field to be segmented.The keyword and destination document that mode input extracts are indicated to preset term vector, and obtain word The vector of each keyword of vector table representation model output indicates and the vector of destination document indicates；Term vector indicates model example It such as can be WORD2VEC neural network model.Computation subunit, for calculating keyword using cosine similarity formula The similarity value of the vector of vector and destination document.

Optionally, after output destination real picture to the first user, method further include: obtained according to destination document Take the comment text information about destination；Comment text information is carried out at participle based on the segmenting method of string matching Reason, obtains the keyword with evaluation attributes；Evaluation voice is generated based on keyword；Output evaluates voice to the first user.Example Such as, it goes to some group buying websites, comment website etc. directly to crawl related commentary text according to destination document, is mentioned from comment text Take keyword, for example, completely, nice, dirty and messy, service is good etc..By directly broadcasting the evaluation of destination, can allow user more It is perceive intuitively that whether match with oneself expectation.

Optionally, after output destination real picture to the first user, method further include: obtain the of the first user Two voices, the second voice are the voice for being used to indicate confirmation destination that the first user issues；Match using with characteristic voice Speech recognition modeling identify the second voice, be confirmed text；Based on confirmation text, output destination text information to navigation System.It is to be appreciated that consulting real picture by driver, driver can be allowed quickly to sentence to whether destination is correctly made It is disconnected, then destination document after will confirm that exports to navigation system, avoids causing during realizing because place name is identical or identical Boat goes wrong, and does the path planning to make mistake.

The embodiment of the invention provides a kind of destination searcher based on speech recognition, the speech recognition equipment are used for The above-mentioned destination searching method based on speech recognition is executed, as shown in Fig. 2, the device includes: first acquisition unit 10, extracts Unit 20, the first recognition unit 30, picking unit 40, computing unit 50 and the first output unit 60.

First acquisition unit 10, for obtaining the first voice signal of the first user, the first voice signal is the first user What is issued is used to indicate the voice signal of search destination real picture；Extraction unit 20, for being carried out to the first voice signal Feature extraction obtains characteristic information；First recognition unit 30, for identifying characteristic voice, and use and language according to characteristic information The speech recognition modeling that sound feature matches identifies the first voice signal, obtains destination document；Picking unit 40 is used for basis The structured field of webpage where destination document grabs multiple real pictures and multiple real pictures on a search engine；It calculates Unit 50 for extracting the keyword in structured field, and calculates the similarity value of keyword and destination document；First is defeated Unit 60 out for by the real picture as a purpose of real picture corresponding to the maximum keyword of similarity value, and export Destination real picture is to the first user.

Optionally, device further includes third acquiring unit, construction unit, training unit, input unit, the first replacement list Member.

Third acquiring unit, for obtaining multiple Noisy Speech Signal samples and multiple clean speech sample of signal；Building Unit, for constructing and training noise reduction model, noise reduction model includes a generator and an arbiter, and generator receives band and makes an uproar Voice signal, and new voice signal is generated according to Noisy Speech Signal, arbiter judges the new voice letter that generator generates It number is the signal of actual signal or generation；Training unit is trained for passing through the dual training of arbiter and generator Good noise reduction model；Input unit, for inputting the first voice signal into trained noise reduction model, wherein noise reduction model Noise reduction process is carried out to the first voice signal, and generates the second voice signal；First replacement unit, it is defeated for obtaining noise reduction model The second voice signal out, to replace the first voice signal obtained.

Optionally, device further includes the 4th acquiring unit, determination unit, updating unit, third output unit, the second replacement Unit.

4th acquiring unit for carrying out noise reduction process to the first voice signal using least mean square algorithm, and obtains and works as The corresponding mean square error gradient of preceding an iteration；Determination unit, for the concussion according to the corresponding mean square error gradient of M iteration Situation, it is determined whether have reached mean square error convergence sensitizing range, include current an iteration in M iteration；Updating unit is used According to definitive result, convergence factor used by least mean square algorithm when updating next iteration；Third output unit, is used for The first voice signal after exporting noise reduction based on convergence factor；Second replacement unit, for the first voice signal after noise reduction Replace the first voice signal obtained.

Optionally, computing unit 50 includes the first processing subelement, second processing subelement, obtains subelement, calculates son Unit.

First processing subelement, for carrying out word segmentation processing to structured field；Second processing subelement divides for extracting Keyword in word treated structured field；Such as can be using participle Open-Source Tools, such as ICTCLAS, SCWS etc., Directly structured field can be segmented using independently developed participle interface.Subelement is obtained, is used for preset word Vector table representation model inputs the keyword and destination document extracted, and obtains each key of term vector expression model output The vector of word and the vector of destination document；Term vector indicates that model for example can be the neural network models such as WORD2VEC.Meter Operator unit, for calculating the similarity value of the vector of keyword and the vector of destination document using cosine similarity formula.

Optionally, device further include: second acquisition unit, processing unit, generation unit, the second output unit.

Second acquisition unit, for the comment text information according to destination document acquisition of information about destination；Processing Unit is obtained for comment text information to be carried out word segmentation processing based on the segmenting method of string matching with evaluation attributes Keyword；Generation unit, for generating evaluation voice based on keyword.Second output unit, for exporting evaluation voice extremely First user.For example, going the direct of some group buying websites, comment website etc to crawl related commentary according to destination document information Text, to obtain keyword, such as completely, nice, dirty and messy, service is good etc..It, can be with by directly broadcasting the evaluation of destination User is allowed more to be perceive intuitively that whether match with oneself expectation.

Optionally, device further include: the 5th acquiring unit, the second recognition unit, the 4th output unit.5th obtains list Member, for obtaining the second voice of the first user, the second voice is the language for being used to indicate confirmation destination that the first user issues Sound；Second recognition unit is confirmed text for identifying the second voice using the speech recognition modeling to match with characteristic voice This；4th output unit, for based on confirmation text, output destination text information to navigation system.It is to be appreciated that passing through Whether driver consults real picture, driver can be allowed quickly correctly to judge to destination, then the purpose after will confirm that Ground text information is exported to navigation system, avoids causing navigation to go wrong because place name is identical or identical during realizing, making The path planning of mistake.

The embodiment of the invention provides a kind of storage medium, storage medium includes the program of storage, wherein is run in program When control storage medium where equipment execute following steps:

The first voice signal of the first user is obtained, the first voice signal is used to indicate search mesh for what the first user issued Ground real picture voice signal；Feature extraction is carried out to the first voice signal, obtains characteristic information；Known according to characteristic information Other characteristic voice, and the first voice signal is identified using the speech recognition modeling to match with characteristic voice, obtain destination text This；The structuring word of webpage where grabbing multiple real pictures and multiple real pictures on a search engine according to destination document Section；The keyword in structured field is extracted, and calculates the similarity value of keyword and destination document；By similarity value maximum Keyword corresponding to real picture real picture as a purpose, and output destination real picture is to the first user.

Optionally, when program is run, equipment where control storage medium also executes following steps: obtaining multiple bands and makes an uproar language Sound signal sample and multiple clean speech sample of signal；It constructs and trains noise reduction model, wherein noise reduction model includes a generation Device and an arbiter, generator receives Noisy Speech Signal sample, and new voice is generated according to Noisy Speech Signal sample Signal, arbiter identify the signal that the new voice signal that generator generates is actual signal or generation；By arbiter with The dual training of generator obtains trained noise reduction model；The first voice signal is inputted into trained noise reduction model, In, noise reduction model carries out noise reduction process to the first voice signal, and generates the second voice signal；Obtain the of noise reduction model output Two voice signals, to replace the first voice signal obtained.

Optionally, when program is run, equipment where control storage medium also executes following steps: being calculated using lowest mean square Method carries out noise reduction process to the first voice signal, and obtains the corresponding mean square error gradient of current an iteration；According to M iteration The concussion situation of corresponding mean square error gradient, it is determined whether have reached mean square error convergence sensitizing range, include in M iteration Current an iteration；According to definitive result, convergence factor used by least mean square algorithm when updating next iteration；Based on receipts The first voice signal after holding back factor output noise reduction；The first voice signal obtained with the first voice signal replacement after noise reduction.

Optionally, program run when control storage medium where equipment also execute following steps: to structured field into Row word segmentation processing；The keyword in structured field after extracting word segmentation processing；Indicate that mode input mentions to preset term vector The keyword and destination document got, and obtain vector expression and purpose that term vector indicates each keyword of model output The vector of ground text indicates；The similarity of the vector of keyword and the vector of destination document is calculated using cosine similarity formula Value.

Optionally, when program is run, equipment where control storage medium also executes following steps: according to destination document Obtain the comment text information about destination；Comment text information is carried out at participle based on the segmenting method of string matching Reason, obtains the keyword with evaluation attributes；Evaluation voice is generated based on keyword；Output evaluates voice to the first user.

The embodiment of the invention provides a kind of server, including memory and processor, memory includes journey for storing The information of sequence instruction, processor are used to control the execution of program instruction, when program instruction load and is executed by processor realization with Lower step:

Optionally, it is also performed the steps of when program instruction is loaded and executed by processor and obtains multiple noisy speech letters Number sample and multiple clean speech sample of signal；Construct and train noise reduction model, wherein noise reduction model include a generator and One arbiter, generator receives Noisy Speech Signal sample, and new voice signal is generated according to Noisy Speech Signal sample, Arbiter identifies the signal that the new voice signal that generator generates is actual signal or generation；Pass through arbiter and generator Dual training, obtain trained noise reduction model；The first voice signal is inputted into trained noise reduction model, wherein drop Model of making an uproar carries out noise reduction process to the first voice signal, and generates the second voice signal；Obtain the second language of noise reduction model output Sound signal, to replace the first voice signal obtained.

Optionally, it also performs the steps of when program instruction is loaded and executed by processor using least mean square algorithm pair First voice signal carries out noise reduction process, and obtains the corresponding mean square error gradient of current an iteration；It is corresponding according to M iteration Mean square error gradient concussion situation, it is determined whether have reached mean square error convergence sensitizing range, include currently in M iteration An iteration；According to definitive result, convergence factor used by least mean square algorithm when updating next iteration；Based on convergence because The first voice signal after son output noise reduction；The first voice signal obtained with the first voice signal replacement after noise reduction.

Optionally, it is also performed the steps of when program instruction is loaded and executed by processor and structured field is divided Word processing；The keyword in structured field after extracting word segmentation processing；Indicate that mode input extracts to preset term vector Keyword and destination document, and obtain term vector indicate model output each keyword vector indicate and destination text This vector indicates；The similarity value of the vector of keyword and the vector of destination document is calculated using cosine similarity formula.

Optionally, it also performs the steps of when program instruction is loaded and executed by processor and is obtained according to destination document Comment text information about destination；Comment text information is subjected to word segmentation processing based on the segmenting method of string matching, Obtain the keyword with evaluation attributes；Evaluation voice is generated based on keyword；Output evaluates voice to the first user.

It should be noted that terminal involved in the embodiment of the present invention can include but is not limited to personal computer (Personal Computer, PC), personal digital assistant (Personal Digital Assistant, PDA), wireless handheld Equipment, tablet computer (Tablet Computer), mobile phone, MP3 player, MP4 player etc..

It is understood that using the application program (nativeApp) that can be mounted in terminal, or can also be One web page program (webApp) of the browser in terminal, the embodiment of the present invention is to this without limiting.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit, Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can be with In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or unit or Communication connection can be electrical property, mechanical or other forms.

Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the present invention The part steps of embodiment method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.

The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of destination searching method based on speech recognition, which is characterized in that the described method includes:

The first voice signal of the first user is obtained, first voice signal is that being used to indicate for first user sending is searched The voice signal of rope destination real picture；

Feature extraction is carried out to first voice signal, obtains characteristic information；

Characteristic voice is identified according to the characteristic information, and is identified using the speech recognition modeling to match with the characteristic voice First voice signal, obtains destination document；

The knot of webpage where grabbing multiple real pictures and multiple real pictures on a search engine according to the destination document Structure field；

The keyword in the structured field is extracted, and calculates the similarity value of the keyword Yu the destination document；

Using real picture corresponding to the maximum keyword of the similarity value as the destination real picture, and export institute Destination real picture is stated to first user.

2. the method according to claim 1, wherein it is described obtain the first user the first voice signal it Afterwards, and feature extraction is being carried out to first voice signal, before obtaining characteristic information, the method also includes:

Obtain multiple Noisy Speech Signal samples and multiple clean speech sample of signal；

It constructs and trains noise reduction model, wherein the noise reduction model includes a generator and an arbiter, the generator The Noisy Speech Signal sample is received, and new voice signal, the differentiation are generated according to the Noisy Speech Signal sample Device identifies the signal that the new voice signal that the generator generates is actual signal or generation；

By the dual training of the arbiter and the generator, trained noise reduction model is obtained；

First voice signal is inputted into the trained noise reduction model, wherein the noise reduction model is to described first Voice signal carries out noise reduction process, and generates the second voice signal；

Second voice signal of the noise reduction model output is obtained, to replace the first voice signal of the acquisition.

3. the method according to claim 1, wherein it is described obtain the first user the first voice signal it Afterwards, and feature extraction is being carried out to first voice signal, before obtaining characteristic information, the method also includes:

Noise reduction process is carried out to first voice signal using least mean square algorithm, and it is corresponding to obtain current an iteration Square error gradient；

According to the concussion situation of the corresponding mean square error gradient of M iteration, it is determined whether it is sensitive to have reached mean square error convergence Area includes the current an iteration in the M iteration；

According to definitive result, convergence factor used by least mean square algorithm when next iteration is updated；

The first voice signal after exporting noise reduction based on the convergence factor；

The first voice signal of the acquisition is replaced with the first voice signal after the noise reduction.

4. the method according to claim 1, wherein the keyword extracted in the structured field, and Calculate the similarity value of the keyword Yu the destination document, comprising:

Word segmentation processing is carried out to the structured field；

The keyword in structured field after extracting word segmentation processing；

The keyword and the destination document that mode input extracts are indicated to preset term vector, and obtain institute's predicate The vector of each keyword of vector table representation model output indicates and the vector of the destination document indicates；

The similarity value of the vector of the keyword and the vector of the destination document is calculated using cosine similarity formula.

5. method according to any of claims 1-4, which is characterized in that in the output destination realistic picture After piece to first user, the method also includes:

The comment text information about the destination is obtained according to the destination document；

The comment text information is subjected to word segmentation processing based on the segmenting method of string matching, obtains that there are evaluation attributes Keyword；

Evaluation voice is generated based on the keyword；

The evaluation voice is exported to first user.

6. a kind of destination searcher based on speech recognition, which is characterized in that described device includes:

First acquisition unit, for obtaining the first voice signal of the first user, first voice signal is first use What family issued is used to indicate the voice signal of search destination real picture；

Extraction unit obtains characteristic information for carrying out feature extraction to first voice signal；

First recognition unit for identifying characteristic voice according to the characteristic information, and uses and matches with the characteristic voice Speech recognition modeling identify first voice signal, obtain destination document；

Picking unit, for grabbing multiple real pictures and multiple real pictures on a search engine according to the destination document The structured field of place webpage；

Computing unit for extracting the keyword in the structured field, and calculates the keyword and destination text This similarity value；

First output unit, for using real picture corresponding to the maximum keyword of the similarity value as the destination Real picture, and the destination real picture is exported to first user.

7. device according to claim 6, which is characterized in that the computing unit includes:

First processing subelement, for carrying out word segmentation processing to the structured field；

Second processing subelement, for extracting the keyword in the structured field after word segmentation processing；

Subelement is obtained, the keyword and destination text for extracting to preset term vector expression mode input This, and obtain the vector for each keyword that the term vector indicates that model exports and the vector of the destination document；

Computation subunit, for using cosine similarity formula calculate the keyword vector and the destination document to The similarity value of amount.

8. device according to claim 6, which is characterized in that described device further include:

Second acquisition unit, for obtaining the comment text information about the destination according to the destination document；

Processing unit is obtained for the comment text information to be carried out word segmentation processing based on the segmenting method of string matching Keyword with evaluation attributes；

Generation unit, for generating evaluation voice based on the keyword；

Second output unit, for exporting the evaluation voice to first user.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require any one of 1 to 5 described in the purpose based on speech recognition Ground searching method.

10. a kind of server, including memory and processor, the memory is for storing the information including program instruction, institute Processor is stated for controlling the execution of program instruction, it is characterised in that: described program instruction is real when being loaded and executed by processor The step of showing the destination searching method described in claim 1 to 5 any one based on speech recognition.