CN109410935A - A kind of destination searching method and device based on speech recognition - Google Patents
A kind of destination searching method and device based on speech recognition Download PDFInfo
- Publication number
- CN109410935A CN109410935A CN201811295008.XA CN201811295008A CN109410935A CN 109410935 A CN109410935 A CN 109410935A CN 201811295008 A CN201811295008 A CN 201811295008A CN 109410935 A CN109410935 A CN 109410935A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- destination
- keyword
- voice
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000000605 extraction Methods 0.000 claims abstract description 26
- 230000009467 reduction Effects 0.000 claims description 48
- 238000012545 processing Methods 0.000 claims description 34
- 238000011156 evaluation Methods 0.000 claims description 22
- 230000011218 segmentation Effects 0.000 claims description 19
- 238000011946 reduction process Methods 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 230000009514 concussion Effects 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims description 6
- 230000004069 differentiation Effects 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000001235 sensitizing effect Effects 0.000 description 9
- 241001672694 Citrus reticulata Species 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a kind of destination searching method and device based on speech recognition, the present invention relates to artificial intelligence fields, method includes: the first voice signal for obtaining the first user, and the first voice signal is the voice signal for being used to indicate search destination real picture that the first user issues;Feature extraction is carried out to the first voice signal, obtains characteristic information;Characteristic voice is identified according to characteristic information, and identifies that the first voice signal obtains destination document using the speech recognition modeling to match with characteristic voice;Grab the structured field of multiple real pictures and place webpage on a search engine according to destination document;The keyword in structured field is extracted, and calculates the similarity value of keyword and destination document;Real picture corresponding to the maximum keyword of similarity value real picture and is exported to the first user as a purpose.Technical solution provided in an embodiment of the present invention is able to solve the low problem of accuracy of the search of destination in the prior art.
Description
[technical field]
The present invention relates to artificial intelligence field more particularly to a kind of destination searching methods and dress based on speech recognition
It sets.
[background technique]
When user needs to search for certain pictures, inputting word information in a search engine is needed, then search engine exhibition
Show corresponding picture.For example, user inputs " West Lake " the two words in a search engine, search engine shows the picture of the West Lake.
But when driving, user wants to obtain the picture of destination to be very inconvenient, or even will affect user security driving.Mesh
Before, user is difficult to intuitively understand the outdoor scene situation of the local environment of destination when searching for destination, is only confirmed by text
Destination is easy to appear the low problem of accuracy of destination search.
[summary of the invention]
In view of this, the embodiment of the invention provides a kind of destination searching method and device based on speech recognition, is used
Accuracy to solve the problems, such as destination search in the prior art is low.
To achieve the goals above, according to an aspect of the invention, there is provided a kind of destination based on speech recognition
Searching method, which comprises
The first voice signal of the first user is obtained, first voice signal is first user sending for referring to
Show the voice signal of search destination real picture;Feature extraction is carried out to first voice signal, obtains characteristic information;Root
Characteristic voice is identified according to the characteristic information, and using the speech recognition modeling identification to match with the characteristic voice described the
One voice signal, obtains destination document;Multiple real pictures and more are grabbed on a search engine according to the destination document
The structured field of webpage where a real picture;The keyword in the structured field is extracted, and calculates the keyword
With the similarity value of the destination document;Using real picture corresponding to the maximum keyword of the similarity value as described in
Destination real picture, and the destination real picture is exported to first user.
Further, after first voice signal for obtaining the first user, and believe to first voice
Number carry out feature extraction, before obtaining characteristic information, the method also includes:
Obtain multiple Noisy Speech Signal samples and multiple clean speech sample of signal;Noise reduction model is constructed and trains,
In, the noise reduction model includes a generator and an arbiter, and the generator receives the Noisy Speech Signal sample,
And new voice signal is generated according to the Noisy Speech Signal sample, the arbiter identifies the described of the generator generation
New voice signal is the signal of actual signal or generation;By the dual training of the arbiter and the generator, obtain
To trained noise reduction model;First voice signal is inputted into the trained noise reduction model, wherein the noise reduction
Model carries out noise reduction process to first voice signal, and generates the second voice signal;Obtain the noise reduction model output
Second voice signal, to replace the first voice signal of the acquisition.
Further, after first voice signal for obtaining the first user, and believe to first voice
Number carry out feature extraction, before obtaining characteristic information, the method also includes: using least mean square algorithm to first voice
Signal carries out noise reduction process, and obtains the corresponding mean square error gradient of current an iteration;According to the corresponding mean square error of M iteration
The concussion situation of poor gradient, it is determined whether have reached mean square error convergence sensitizing range, include described current in the M iteration
An iteration;According to definitive result, convergence factor used by least mean square algorithm when next iteration is updated;Based on institute
The first voice signal after stating convergence factor output noise reduction;The of the acquisition is replaced with the first voice signal after the noise reduction
One voice signal.
Further, the keyword extracted in the structured field, and calculate the keyword and the purpose
The similarity value of ground text, comprising: word segmentation processing is carried out to the structured field;Structured field after extracting word segmentation processing
In keyword;The keyword and the destination document that mode input extracts are indicated to preset term vector, and are obtained
The term vector is taken to indicate that the vector of each keyword of model output indicates and the vector of the destination document indicates;Using
Cosine similarity formula calculates the similarity value of the vector of the keyword and the vector of the destination document.
Further, after the output destination real picture to first user, the method is also wrapped
It includes: the comment text information about the destination is obtained according to the destination document;The comment text information is based on
The segmenting method of string matching carries out word segmentation processing, obtains the keyword with evaluation attributes;It is generated based on the keyword
Evaluate voice;The evaluation voice is exported to first user.
To achieve the goals above, according to an aspect of the invention, there is provided a kind of destination based on speech recognition
Searcher, described device include: first acquisition unit, for obtaining the first voice signal of the first user, first language
Sound signal is the voice signal for being used to indicate search destination real picture that first user issues;Extraction unit is used for
Feature extraction is carried out to first voice signal, obtains characteristic information;First recognition unit, for according to the characteristic information
It identifies characteristic voice, and first voice signal is identified using the speech recognition modeling to match with the characteristic voice, obtain
To destination document;Picking unit, for grabbing multiple real pictures and more on a search engine according to the destination document
The structured field of webpage where a real picture;Computing unit for extracting the keyword in the structured field, and is counted
Calculate the similarity value of the keyword Yu the destination document;First output unit, for the similarity value is maximum
Real picture corresponding to keyword exports the destination real picture to described as the destination real picture
One user.
Further, the computing unit includes: the first processing subelement, for segmenting to the structured field
Processing;Second processing subelement, for extracting the keyword in the structured field after word segmentation processing;Subelement is obtained, is used for
The keyword and the destination document that mode input extracts are indicated to preset term vector, and obtain the term vector
Indicate the vector of each keyword of model output and the vector of the destination document;Computation subunit, for using cosine
Similarity formula calculates the similarity value of the vector of the keyword and the vector of the destination document.
Further, described device further include: second acquisition unit, for being obtained according to the destination document about institute
State the comment text information of destination;Processing unit, for by participle side of the comment text information based on string matching
Method carries out word segmentation processing, obtains the keyword with evaluation attributes;Generation unit, for generating evaluation language based on the keyword
Sound;Second output unit, for exporting the evaluation voice to first user.
To achieve the goals above, according to an aspect of the invention, there is provided a kind of storage medium, the storage medium
Program including storage, wherein equipment where controlling the storage medium in described program operation executes above-mentioned based on language
The destination searching method of sound identification.
To achieve the goals above, according to an aspect of the invention, there is provided a kind of server, including memory and place
Device is managed, the memory is used to control the execution of program instruction, institute for storing the information including program instruction, the processor
State the step of above-mentioned destination searching method based on speech recognition is realized when program instruction is loaded and executed by processor.
In the present solution, carrying out voice knowledge to user speech by using the speech recognition modeling to match with language feature
Not, real picture is further obtained according to the destination document recognized, it will be in the structured field of webpage where real picture
Keyword and destination document carry out similarity calculation, to obtain the maximum real picture of similarity value outdoor scene as a purpose
Picture, the accuracy searched for can be improved customer objective allow users to the locating ring for more intuitively understanding destination
Border, so that user be assisted more accurately to reach the destination.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field
For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of flow chart of destination searching method based on speech recognition according to an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of destination searcher based on speech recognition according to an embodiment of the present invention.
[specific embodiment]
For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing
It states.
It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its
Its embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments
The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the"
It is also intended to including most forms, unless the context clearly indicates other meaning.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate
There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three
Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though terminal may be described using term first, second, third, etc. in embodiments of the present invention,
But these terminals should not necessarily be limited by these terms.These terms are only used to for terminal being distinguished from each other out.For example, not departing from the present invention
In the case where scope of embodiments, first acquisition unit can also be referred to as second acquisition unit, similarly, second acquisition unit
First acquisition unit can be referred to as.
Depending on context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determination " or " in response to detection ".Similarly, depend on context, phrase " if it is determined that " or " if detection
(condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when the detection (condition of statement
Or event) when " or " in response to detection (condition or event of statement) ".
Fig. 1 is a kind of flow chart of destination searching method based on speech recognition according to an embodiment of the present invention, such as Fig. 1
It is shown, this method comprises:
Step S101, obtains the first voice signal of the first user, and the first voice signal is being used for for the first user sending
Indicate the voice signal of search destination real picture;
Step S102 carries out feature extraction to the first voice signal, obtains characteristic information;
Step S103 identifies characteristic voice according to characteristic information, and using the speech recognition mould to match with characteristic voice
Type identifies the first voice signal, obtains destination document;
Step S104 grabs multiple real pictures and multiple real pictures place according to destination document on a search engine
The structured field of webpage;
Step S105 extracts the keyword in structured field, and calculates the similarity value of keyword and destination document;
Step S106, by the real picture as a purpose of real picture corresponding to the maximum keyword of similarity value, and
Output destination real picture is to the first user.
In the present solution, carrying out voice knowledge to user speech by using the speech recognition modeling to match with language feature
Not, real picture is further obtained according to the destination document recognized, it will be in the structured field of webpage where real picture
Keyword and destination document carry out similarity calculation, to obtain the maximum real picture of similarity value outdoor scene as a purpose
Picture, the accuracy searched for can be improved customer objective allow users to the locating ring for more intuitively understanding destination
Border, so that user be assisted more accurately to reach the destination.
Optionally, feature extraction for example can be spectrum signature extraction, fundamental frequency feature extraction, power feature extraction or zero passage
Rate extraction etc..And it is possible to using support vector machines (support vector machine, SVM) or hidden Markov mould
The modeling techniques such as type (Hidden Markov Model, HMM) establish discrimination model, wherein discrimination model includes mandarin mould
Type, Chongqing accent model, Henan accent model, Guangdong language accent model, Wu dialect spoken language model and northern accent model;To identify
Characteristic voice is mandarin, Chongqing accent, Wu dialect accent, Henan accent or Guangdong accent etc..
Optionally, after the first voice signal for obtaining the first user, and feature is being carried out to the first voice signal
It extracts, before obtaining characteristic information, method further include:
Obtain multiple Noisy Speech Signal samples and multiple clean speech sample of signal;Noise reduction model is constructed and trained, is dropped
Model of making an uproar includes a generator and an arbiter, and generator receives Noisy Speech Signal sample, and is believed according to noisy speech
Number sample generates new voice signal, and the new voice signal that arbiter judges that generator generates is actual signal or generation
Signal;By the dual training of arbiter and generator, trained noise reduction model is obtained;It is defeated into trained noise reduction model
Enter the first voice signal, wherein noise reduction model carries out noise reduction process to the first voice signal, and generates the second voice signal;It obtains
The second voice signal for taking noise reduction model to export, to replace the first voice signal obtained.
By the study to a large amount of noisy speeches and clean speech, so that the slave noisy speech that generator learns is generated and is done
The ability of net voice, and enable to the new voice signal generated that can out-trick arbiter, this deep learning noise reduction model
Suitable for various noise types and environment, there is general applicability, convenient for promoting.
Optionally, after the first voice signal for obtaining the first user, and feature is being carried out to the first voice signal
It extracts, before obtaining characteristic information, method further include: noise reduction process is carried out to the first voice signal using least mean square algorithm,
And obtain the corresponding mean square error gradient of current an iteration;According to the concussion situation of the corresponding mean square error gradient of M iteration,
Determine whether to have reached mean square error convergence sensitizing range, includes current an iteration in M iteration;According to definitive result, more
Convergence factor used by least mean square algorithm when new next iteration;The first voice letter after exporting noise reduction based on convergence factor
Number;The first voice signal obtained with the first voice signal replacement after noise reduction.
Specifically, if the number that the corresponding mean square error gradient of M iteration meets formula A is greater than or equal to default value,
It then determines and has reached mean square error convergence sensitizing range, otherwise, it determines not up to mean square error sensitizing range: A=[e (i) x
(i)] [e (i-1) x (i-1)] < 0;Wherein, e (i) is the corresponding error signal of i-th iteration in M iteration, and x (i) is
The corresponding voice signal of i-th iteration;E (i-1) is the corresponding error signal of (i-1)-th iteration in M iteration, and x (i-1) is
The corresponding voice signal of (i-1)-th iteration.
Optionally, structured field includes web page title field, Anchor Text field, at least one in picture attribute field
Kind.
Optionally, the keyword in structured field is extracted, and calculate the similarity value of keyword and destination document
Method, comprising:
Word segmentation processing is carried out to structured field, the keyword in structured field after extracting word segmentation processing;Such as it can
With using Open-Source Tools are segmented, such as ICTCLAS, SCWS etc. can also be directly using independently developed participle interface to structure
Change field to be segmented.The keyword and destination document that mode input extracts are indicated to preset term vector, and obtain word
The vector of each keyword of vector table representation model output indicates and the vector of destination document indicates;Term vector indicates model example
It such as can be WORD2VEC neural network model.Computation subunit, for calculating keyword using cosine similarity formula
The similarity value of the vector of vector and destination document.
Optionally, after output destination real picture to the first user, method further include: obtained according to destination document
Take the comment text information about destination;Comment text information is carried out at participle based on the segmenting method of string matching
Reason, obtains the keyword with evaluation attributes;Evaluation voice is generated based on keyword;Output evaluates voice to the first user.Example
Such as, it goes to some group buying websites, comment website etc. directly to crawl related commentary text according to destination document, is mentioned from comment text
Take keyword, for example, completely, nice, dirty and messy, service is good etc..By directly broadcasting the evaluation of destination, can allow user more
It is perceive intuitively that whether match with oneself expectation.
Optionally, after output destination real picture to the first user, method further include: obtain the of the first user
Two voices, the second voice are the voice for being used to indicate confirmation destination that the first user issues;Match using with characteristic voice
Speech recognition modeling identify the second voice, be confirmed text;Based on confirmation text, output destination text information to navigation
System.It is to be appreciated that consulting real picture by driver, driver can be allowed quickly to sentence to whether destination is correctly made
It is disconnected, then destination document after will confirm that exports to navigation system, avoids causing during realizing because place name is identical or identical
Boat goes wrong, and does the path planning to make mistake.
The embodiment of the invention provides a kind of destination searcher based on speech recognition, the speech recognition equipment are used for
The above-mentioned destination searching method based on speech recognition is executed, as shown in Fig. 2, the device includes: first acquisition unit 10, extracts
Unit 20, the first recognition unit 30, picking unit 40, computing unit 50 and the first output unit 60.
First acquisition unit 10, for obtaining the first voice signal of the first user, the first voice signal is the first user
What is issued is used to indicate the voice signal of search destination real picture;Extraction unit 20, for being carried out to the first voice signal
Feature extraction obtains characteristic information;First recognition unit 30, for identifying characteristic voice, and use and language according to characteristic information
The speech recognition modeling that sound feature matches identifies the first voice signal, obtains destination document;Picking unit 40 is used for basis
The structured field of webpage where destination document grabs multiple real pictures and multiple real pictures on a search engine;It calculates
Unit 50 for extracting the keyword in structured field, and calculates the similarity value of keyword and destination document;First is defeated
Unit 60 out for by the real picture as a purpose of real picture corresponding to the maximum keyword of similarity value, and export
Destination real picture is to the first user.
In the present solution, carrying out voice knowledge to user speech by using the speech recognition modeling to match with language feature
Not, real picture is further obtained according to the destination document recognized, it will be in the structured field of webpage where real picture
Keyword and destination document carry out similarity calculation, to obtain the maximum real picture of similarity value outdoor scene as a purpose
Picture, the accuracy searched for can be improved customer objective allow users to the locating ring for more intuitively understanding destination
Border, so that user be assisted more accurately to reach the destination.
Optionally, feature extraction for example can be spectrum signature extraction, fundamental frequency feature extraction, power feature extraction or zero passage
Rate extraction etc..And it is possible to using support vector machines (support vector machine, SVM) or hidden Markov mould
The modeling techniques such as type (Hidden Markov Model, HMM) establish discrimination model, wherein discrimination model includes mandarin mould
Type, Chongqing accent model, Henan accent model, Guangdong language accent model, Wu dialect spoken language model and northern accent model;To identify
Characteristic voice is mandarin, Chongqing accent, Wu dialect accent, Henan accent or Guangdong accent etc..
Optionally, device further includes third acquiring unit, construction unit, training unit, input unit, the first replacement list
Member.
Third acquiring unit, for obtaining multiple Noisy Speech Signal samples and multiple clean speech sample of signal;Building
Unit, for constructing and training noise reduction model, noise reduction model includes a generator and an arbiter, and generator receives band and makes an uproar
Voice signal, and new voice signal is generated according to Noisy Speech Signal, arbiter judges the new voice letter that generator generates
It number is the signal of actual signal or generation;Training unit is trained for passing through the dual training of arbiter and generator
Good noise reduction model;Input unit, for inputting the first voice signal into trained noise reduction model, wherein noise reduction model
Noise reduction process is carried out to the first voice signal, and generates the second voice signal;First replacement unit, it is defeated for obtaining noise reduction model
The second voice signal out, to replace the first voice signal obtained.
By the study to a large amount of noisy speeches and clean speech, so that the slave noisy speech that generator learns is generated and is done
The ability of net voice, and enable to the new voice signal generated that can out-trick arbiter, this deep learning noise reduction model
Suitable for various noise types and environment, there is general applicability, convenient for promoting.
Optionally, device further includes the 4th acquiring unit, determination unit, updating unit, third output unit, the second replacement
Unit.
4th acquiring unit for carrying out noise reduction process to the first voice signal using least mean square algorithm, and obtains and works as
The corresponding mean square error gradient of preceding an iteration;Determination unit, for the concussion according to the corresponding mean square error gradient of M iteration
Situation, it is determined whether have reached mean square error convergence sensitizing range, include current an iteration in M iteration;Updating unit is used
According to definitive result, convergence factor used by least mean square algorithm when updating next iteration;Third output unit, is used for
The first voice signal after exporting noise reduction based on convergence factor;Second replacement unit, for the first voice signal after noise reduction
Replace the first voice signal obtained.
Specifically, if the number that the corresponding mean square error gradient of M iteration meets formula A is greater than or equal to default value,
It then determines and has reached mean square error convergence sensitizing range, otherwise, it determines not up to mean square error sensitizing range: A=[e (i) x
(i)] [e (i-1) x (i-1)] < 0;Wherein, e (i) is the corresponding error signal of i-th iteration in M iteration, and x (i) is
The corresponding voice signal of i-th iteration;E (i-1) is the corresponding error signal of (i-1)-th iteration in M iteration, and x (i-1) is
The corresponding voice signal of (i-1)-th iteration.
Optionally, structured field includes web page title field, Anchor Text field, at least one in picture attribute field
Kind.
Optionally, computing unit 50 includes the first processing subelement, second processing subelement, obtains subelement, calculates son
Unit.
First processing subelement, for carrying out word segmentation processing to structured field;Second processing subelement divides for extracting
Keyword in word treated structured field;Such as can be using participle Open-Source Tools, such as ICTCLAS, SCWS etc.,
Directly structured field can be segmented using independently developed participle interface.Subelement is obtained, is used for preset word
Vector table representation model inputs the keyword and destination document extracted, and obtains each key of term vector expression model output
The vector of word and the vector of destination document;Term vector indicates that model for example can be the neural network models such as WORD2VEC.Meter
Operator unit, for calculating the similarity value of the vector of keyword and the vector of destination document using cosine similarity formula.
Optionally, device further include: second acquisition unit, processing unit, generation unit, the second output unit.
Second acquisition unit, for the comment text information according to destination document acquisition of information about destination;Processing
Unit is obtained for comment text information to be carried out word segmentation processing based on the segmenting method of string matching with evaluation attributes
Keyword;Generation unit, for generating evaluation voice based on keyword.Second output unit, for exporting evaluation voice extremely
First user.For example, going the direct of some group buying websites, comment website etc to crawl related commentary according to destination document information
Text, to obtain keyword, such as completely, nice, dirty and messy, service is good etc..It, can be with by directly broadcasting the evaluation of destination
User is allowed more to be perceive intuitively that whether match with oneself expectation.
Optionally, device further include: the 5th acquiring unit, the second recognition unit, the 4th output unit.5th obtains list
Member, for obtaining the second voice of the first user, the second voice is the language for being used to indicate confirmation destination that the first user issues
Sound;Second recognition unit is confirmed text for identifying the second voice using the speech recognition modeling to match with characteristic voice
This;4th output unit, for based on confirmation text, output destination text information to navigation system.It is to be appreciated that passing through
Whether driver consults real picture, driver can be allowed quickly correctly to judge to destination, then the purpose after will confirm that
Ground text information is exported to navigation system, avoids causing navigation to go wrong because place name is identical or identical during realizing, making
The path planning of mistake.
The embodiment of the invention provides a kind of storage medium, storage medium includes the program of storage, wherein is run in program
When control storage medium where equipment execute following steps:
The first voice signal of the first user is obtained, the first voice signal is used to indicate search mesh for what the first user issued
Ground real picture voice signal;Feature extraction is carried out to the first voice signal, obtains characteristic information;Known according to characteristic information
Other characteristic voice, and the first voice signal is identified using the speech recognition modeling to match with characteristic voice, obtain destination text
This;The structuring word of webpage where grabbing multiple real pictures and multiple real pictures on a search engine according to destination document
Section;The keyword in structured field is extracted, and calculates the similarity value of keyword and destination document;By similarity value maximum
Keyword corresponding to real picture real picture as a purpose, and output destination real picture is to the first user.
Optionally, when program is run, equipment where control storage medium also executes following steps: obtaining multiple bands and makes an uproar language
Sound signal sample and multiple clean speech sample of signal;It constructs and trains noise reduction model, wherein noise reduction model includes a generation
Device and an arbiter, generator receives Noisy Speech Signal sample, and new voice is generated according to Noisy Speech Signal sample
Signal, arbiter identify the signal that the new voice signal that generator generates is actual signal or generation;By arbiter with
The dual training of generator obtains trained noise reduction model;The first voice signal is inputted into trained noise reduction model,
In, noise reduction model carries out noise reduction process to the first voice signal, and generates the second voice signal;Obtain the of noise reduction model output
Two voice signals, to replace the first voice signal obtained.
Optionally, when program is run, equipment where control storage medium also executes following steps: being calculated using lowest mean square
Method carries out noise reduction process to the first voice signal, and obtains the corresponding mean square error gradient of current an iteration;According to M iteration
The concussion situation of corresponding mean square error gradient, it is determined whether have reached mean square error convergence sensitizing range, include in M iteration
Current an iteration;According to definitive result, convergence factor used by least mean square algorithm when updating next iteration;Based on receipts
The first voice signal after holding back factor output noise reduction;The first voice signal obtained with the first voice signal replacement after noise reduction.
Optionally, program run when control storage medium where equipment also execute following steps: to structured field into
Row word segmentation processing;The keyword in structured field after extracting word segmentation processing;Indicate that mode input mentions to preset term vector
The keyword and destination document got, and obtain vector expression and purpose that term vector indicates each keyword of model output
The vector of ground text indicates;The similarity of the vector of keyword and the vector of destination document is calculated using cosine similarity formula
Value.
Optionally, when program is run, equipment where control storage medium also executes following steps: according to destination document
Obtain the comment text information about destination;Comment text information is carried out at participle based on the segmenting method of string matching
Reason, obtains the keyword with evaluation attributes;Evaluation voice is generated based on keyword;Output evaluates voice to the first user.
The embodiment of the invention provides a kind of server, including memory and processor, memory includes journey for storing
The information of sequence instruction, processor are used to control the execution of program instruction, when program instruction load and is executed by processor realization with
Lower step:
The first voice signal of the first user is obtained, the first voice signal is used to indicate search mesh for what the first user issued
Ground real picture voice signal;Feature extraction is carried out to the first voice signal, obtains characteristic information;Known according to characteristic information
Other characteristic voice, and the first voice signal is identified using the speech recognition modeling to match with characteristic voice, obtain destination text
This;The structuring word of webpage where grabbing multiple real pictures and multiple real pictures on a search engine according to destination document
Section;The keyword in structured field is extracted, and calculates the similarity value of keyword and destination document;By similarity value maximum
Keyword corresponding to real picture real picture as a purpose, and output destination real picture is to the first user.
Optionally, it is also performed the steps of when program instruction is loaded and executed by processor and obtains multiple noisy speech letters
Number sample and multiple clean speech sample of signal;Construct and train noise reduction model, wherein noise reduction model include a generator and
One arbiter, generator receives Noisy Speech Signal sample, and new voice signal is generated according to Noisy Speech Signal sample,
Arbiter identifies the signal that the new voice signal that generator generates is actual signal or generation;Pass through arbiter and generator
Dual training, obtain trained noise reduction model;The first voice signal is inputted into trained noise reduction model, wherein drop
Model of making an uproar carries out noise reduction process to the first voice signal, and generates the second voice signal;Obtain the second language of noise reduction model output
Sound signal, to replace the first voice signal obtained.
Optionally, it also performs the steps of when program instruction is loaded and executed by processor using least mean square algorithm pair
First voice signal carries out noise reduction process, and obtains the corresponding mean square error gradient of current an iteration;It is corresponding according to M iteration
Mean square error gradient concussion situation, it is determined whether have reached mean square error convergence sensitizing range, include currently in M iteration
An iteration;According to definitive result, convergence factor used by least mean square algorithm when updating next iteration;Based on convergence because
The first voice signal after son output noise reduction;The first voice signal obtained with the first voice signal replacement after noise reduction.
Optionally, it is also performed the steps of when program instruction is loaded and executed by processor and structured field is divided
Word processing;The keyword in structured field after extracting word segmentation processing;Indicate that mode input extracts to preset term vector
Keyword and destination document, and obtain term vector indicate model output each keyword vector indicate and destination text
This vector indicates;The similarity value of the vector of keyword and the vector of destination document is calculated using cosine similarity formula.
Optionally, it also performs the steps of when program instruction is loaded and executed by processor and is obtained according to destination document
Comment text information about destination;Comment text information is subjected to word segmentation processing based on the segmenting method of string matching,
Obtain the keyword with evaluation attributes;Evaluation voice is generated based on keyword;Output evaluates voice to the first user.
It should be noted that terminal involved in the embodiment of the present invention can include but is not limited to personal computer
(Personal Computer, PC), personal digital assistant (Personal Digital Assistant, PDA), wireless handheld
Equipment, tablet computer (Tablet Computer), mobile phone, MP3 player, MP4 player etc..
It is understood that using the application program (nativeApp) that can be mounted in terminal, or can also be
One web page program (webApp) of the browser in terminal, the embodiment of the present invention is to this without limiting.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit,
Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can be with
In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or unit or
Communication connection can be electrical property, mechanical or other forms.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit
Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks
On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the present invention
The part steps of embodiment method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only
Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit
Store up the medium of program code.
The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (10)
1. a kind of destination searching method based on speech recognition, which is characterized in that the described method includes:
The first voice signal of the first user is obtained, first voice signal is that being used to indicate for first user sending is searched
The voice signal of rope destination real picture;
Feature extraction is carried out to first voice signal, obtains characteristic information;
Characteristic voice is identified according to the characteristic information, and is identified using the speech recognition modeling to match with the characteristic voice
First voice signal, obtains destination document;
The knot of webpage where grabbing multiple real pictures and multiple real pictures on a search engine according to the destination document
Structure field;
The keyword in the structured field is extracted, and calculates the similarity value of the keyword Yu the destination document;
Using real picture corresponding to the maximum keyword of the similarity value as the destination real picture, and export institute
Destination real picture is stated to first user.
2. the method according to claim 1, wherein it is described obtain the first user the first voice signal it
Afterwards, and feature extraction is being carried out to first voice signal, before obtaining characteristic information, the method also includes:
Obtain multiple Noisy Speech Signal samples and multiple clean speech sample of signal;
It constructs and trains noise reduction model, wherein the noise reduction model includes a generator and an arbiter, the generator
The Noisy Speech Signal sample is received, and new voice signal, the differentiation are generated according to the Noisy Speech Signal sample
Device identifies the signal that the new voice signal that the generator generates is actual signal or generation;
By the dual training of the arbiter and the generator, trained noise reduction model is obtained;
First voice signal is inputted into the trained noise reduction model, wherein the noise reduction model is to described first
Voice signal carries out noise reduction process, and generates the second voice signal;
Second voice signal of the noise reduction model output is obtained, to replace the first voice signal of the acquisition.
3. the method according to claim 1, wherein it is described obtain the first user the first voice signal it
Afterwards, and feature extraction is being carried out to first voice signal, before obtaining characteristic information, the method also includes:
Noise reduction process is carried out to first voice signal using least mean square algorithm, and it is corresponding to obtain current an iteration
Square error gradient;
According to the concussion situation of the corresponding mean square error gradient of M iteration, it is determined whether it is sensitive to have reached mean square error convergence
Area includes the current an iteration in the M iteration;
According to definitive result, convergence factor used by least mean square algorithm when next iteration is updated;
The first voice signal after exporting noise reduction based on the convergence factor;
The first voice signal of the acquisition is replaced with the first voice signal after the noise reduction.
4. the method according to claim 1, wherein the keyword extracted in the structured field, and
Calculate the similarity value of the keyword Yu the destination document, comprising:
Word segmentation processing is carried out to the structured field;
The keyword in structured field after extracting word segmentation processing;
The keyword and the destination document that mode input extracts are indicated to preset term vector, and obtain institute's predicate
The vector of each keyword of vector table representation model output indicates and the vector of the destination document indicates;
The similarity value of the vector of the keyword and the vector of the destination document is calculated using cosine similarity formula.
5. method according to any of claims 1-4, which is characterized in that in the output destination realistic picture
After piece to first user, the method also includes:
The comment text information about the destination is obtained according to the destination document;
The comment text information is subjected to word segmentation processing based on the segmenting method of string matching, obtains that there are evaluation attributes
Keyword;
Evaluation voice is generated based on the keyword;
The evaluation voice is exported to first user.
6. a kind of destination searcher based on speech recognition, which is characterized in that described device includes:
First acquisition unit, for obtaining the first voice signal of the first user, first voice signal is first use
What family issued is used to indicate the voice signal of search destination real picture;
Extraction unit obtains characteristic information for carrying out feature extraction to first voice signal;
First recognition unit for identifying characteristic voice according to the characteristic information, and uses and matches with the characteristic voice
Speech recognition modeling identify first voice signal, obtain destination document;
Picking unit, for grabbing multiple real pictures and multiple real pictures on a search engine according to the destination document
The structured field of place webpage;
Computing unit for extracting the keyword in the structured field, and calculates the keyword and destination text
This similarity value;
First output unit, for using real picture corresponding to the maximum keyword of the similarity value as the destination
Real picture, and the destination real picture is exported to first user.
7. device according to claim 6, which is characterized in that the computing unit includes:
First processing subelement, for carrying out word segmentation processing to the structured field;
Second processing subelement, for extracting the keyword in the structured field after word segmentation processing;
Subelement is obtained, the keyword and destination text for extracting to preset term vector expression mode input
This, and obtain the vector for each keyword that the term vector indicates that model exports and the vector of the destination document;
Computation subunit, for using cosine similarity formula calculate the keyword vector and the destination document to
The similarity value of amount.
8. device according to claim 6, which is characterized in that described device further include:
Second acquisition unit, for obtaining the comment text information about the destination according to the destination document;
Processing unit is obtained for the comment text information to be carried out word segmentation processing based on the segmenting method of string matching
Keyword with evaluation attributes;
Generation unit, for generating evaluation voice based on the keyword;
Second output unit, for exporting the evaluation voice to first user.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require any one of 1 to 5 described in the purpose based on speech recognition
Ground searching method.
10. a kind of server, including memory and processor, the memory is for storing the information including program instruction, institute
Processor is stated for controlling the execution of program instruction, it is characterised in that: described program instruction is real when being loaded and executed by processor
The step of showing the destination searching method described in claim 1 to 5 any one based on speech recognition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811295008.XA CN109410935A (en) | 2018-11-01 | 2018-11-01 | A kind of destination searching method and device based on speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811295008.XA CN109410935A (en) | 2018-11-01 | 2018-11-01 | A kind of destination searching method and device based on speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109410935A true CN109410935A (en) | 2019-03-01 |
Family
ID=65470899
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811295008.XA Pending CN109410935A (en) | 2018-11-01 | 2018-11-01 | A kind of destination searching method and device based on speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109410935A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706696A (en) * | 2019-09-25 | 2020-01-17 | 珠海格力电器股份有限公司 | Voice control method and device |
CN111914153A (en) * | 2020-07-24 | 2020-11-10 | 广州中医药大学第一附属医院 | Follower method, follower system, server, and storage medium |
CN112102843A (en) * | 2020-09-18 | 2020-12-18 | 北京搜狗科技发展有限公司 | Voice recognition method and device and electronic equipment |
CN113658598A (en) * | 2021-08-12 | 2021-11-16 | 海信电子科技(深圳)有限公司 | Voice interaction method of display equipment and display equipment |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6029130A (en) * | 1996-08-20 | 2000-02-22 | Ricoh Company, Ltd. | Integrated endpoint detection for improved speech recognition method and system |
US20060058947A1 (en) * | 2004-09-10 | 2006-03-16 | Schalk Thomas B | Systems and methods for off-board voice-automated vehicle navigation |
CN101976304A (en) * | 2010-10-16 | 2011-02-16 | 陈长江 | Intelligent life housekeeper system and method |
CN201830294U (en) * | 2010-08-18 | 2011-05-11 | 深圳市子栋科技有限公司 | Navigation system and navigation server based on voice command |
CN104391673A (en) * | 2014-11-20 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method and voice interaction device |
TWM517851U (en) * | 2015-11-25 | 2016-02-21 | Jie-Zhong Xu | Figure communication system |
CN105893564A (en) * | 2016-03-31 | 2016-08-24 | 百度在线网络技术(北京)有限公司 | Search method and device based on search engine client |
CN106328154A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Front-end audio processing system |
CN106354852A (en) * | 2016-09-02 | 2017-01-25 | 北京百度网讯科技有限公司 | Search method and device based on artificial intelligence |
CN107068161A (en) * | 2017-04-14 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | Voice de-noising method, device and computer equipment based on artificial intelligence |
CN107274885A (en) * | 2017-05-31 | 2017-10-20 | 广东欧珀移动通信有限公司 | Audio recognition method and Related product |
CN107346316A (en) * | 2016-05-06 | 2017-11-14 | 北京搜狗科技发展有限公司 | A kind of searching method, device and electronic equipment |
CN108520504A (en) * | 2018-04-16 | 2018-09-11 | 湘潭大学 | A kind of blurred picture blind restoration method based on generation confrontation network end-to-end |
-
2018
- 2018-11-01 CN CN201811295008.XA patent/CN109410935A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6029130A (en) * | 1996-08-20 | 2000-02-22 | Ricoh Company, Ltd. | Integrated endpoint detection for improved speech recognition method and system |
US20060058947A1 (en) * | 2004-09-10 | 2006-03-16 | Schalk Thomas B | Systems and methods for off-board voice-automated vehicle navigation |
CN201830294U (en) * | 2010-08-18 | 2011-05-11 | 深圳市子栋科技有限公司 | Navigation system and navigation server based on voice command |
CN101976304A (en) * | 2010-10-16 | 2011-02-16 | 陈长江 | Intelligent life housekeeper system and method |
CN104391673A (en) * | 2014-11-20 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method and voice interaction device |
CN106328154A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Front-end audio processing system |
TWM517851U (en) * | 2015-11-25 | 2016-02-21 | Jie-Zhong Xu | Figure communication system |
CN105893564A (en) * | 2016-03-31 | 2016-08-24 | 百度在线网络技术(北京)有限公司 | Search method and device based on search engine client |
CN107346316A (en) * | 2016-05-06 | 2017-11-14 | 北京搜狗科技发展有限公司 | A kind of searching method, device and electronic equipment |
CN106354852A (en) * | 2016-09-02 | 2017-01-25 | 北京百度网讯科技有限公司 | Search method and device based on artificial intelligence |
CN107068161A (en) * | 2017-04-14 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | Voice de-noising method, device and computer equipment based on artificial intelligence |
CN107274885A (en) * | 2017-05-31 | 2017-10-20 | 广东欧珀移动通信有限公司 | Audio recognition method and Related product |
CN108520504A (en) * | 2018-04-16 | 2018-09-11 | 湘潭大学 | A kind of blurred picture blind restoration method based on generation confrontation network end-to-end |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706696A (en) * | 2019-09-25 | 2020-01-17 | 珠海格力电器股份有限公司 | Voice control method and device |
CN111914153A (en) * | 2020-07-24 | 2020-11-10 | 广州中医药大学第一附属医院 | Follower method, follower system, server, and storage medium |
CN112102843A (en) * | 2020-09-18 | 2020-12-18 | 北京搜狗科技发展有限公司 | Voice recognition method and device and electronic equipment |
CN113658598A (en) * | 2021-08-12 | 2021-11-16 | 海信电子科技(深圳)有限公司 | Voice interaction method of display equipment and display equipment |
CN113658598B (en) * | 2021-08-12 | 2024-02-27 | Vidaa(荷兰)国际控股有限公司 | Voice interaction method of display equipment and display equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104915340B (en) | Natural language question-answering method and device | |
US10997370B2 (en) | Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time | |
WO2019153737A1 (en) | Comment assessing method, device, equipment and storage medium | |
CN105869642B (en) | A kind of error correction method and device of speech text | |
CN109410935A (en) | A kind of destination searching method and device based on speech recognition | |
CN105808590B (en) | Search engine implementation method, searching method and device | |
CN109299457A (en) | A kind of opining mining method, device and equipment | |
CN109033305A (en) | Question answering method, equipment and computer readable storage medium | |
CN110427463A (en) | Search statement response method, device and server and storage medium | |
CN107391614A (en) | A kind of Chinese question and answer matching process based on WMD | |
CN109271493A (en) | A kind of language text processing method, device and storage medium | |
CN103425727B (en) | Context speech polling expands method and system | |
CN110096567A (en) | Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning | |
CN107710192A (en) | Measurement for the automatic Evaluation of conversational response | |
CN109582969A (en) | Methodology for Entities Matching, device and electronic equipment | |
CN108735201A (en) | continuous speech recognition method, device, equipment and storage medium | |
JP6308708B1 (en) | Patent requirement conformity prediction device and patent requirement conformity prediction program | |
CN106356057A (en) | Speech recognition system based on semantic understanding of computer application scenario | |
CN104715063B (en) | search ordering method and device | |
CN108804526A (en) | Interest determines that system, interest determine method and storage medium | |
CN117521814B (en) | Question answering method and device based on multi-modal input and knowledge graph | |
CN109544104A (en) | A kind of recruitment data processing method and device | |
CN110085217A (en) | Phonetic navigation method, device and terminal device | |
CN109271624A (en) | A kind of target word determines method, apparatus and storage medium | |
CN103488782A (en) | Method for recognizing musical emotion through lyrics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190301 |