CN109976702A - A kind of audio recognition method, device and terminal - Google Patents
A kind of audio recognition method, device and terminal Download PDFInfo
- Publication number
- CN109976702A CN109976702A CN201910211472.4A CN201910211472A CN109976702A CN 109976702 A CN109976702 A CN 109976702A CN 201910211472 A CN201910211472 A CN 201910211472A CN 109976702 A CN109976702 A CN 109976702A
- Authority
- CN
- China
- Prior art keywords
- recognition result
- target voice
- speech recognition
- voice recognition
- file destination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000875 corresponding Effects 0.000 claims abstract description 123
- 238000004891 communication Methods 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 11
- 235000013399 edible fruits Nutrition 0.000 claims description 9
- 238000007619 statistical method Methods 0.000 claims description 7
- 238000000034 method Methods 0.000 description 25
- 238000010586 diagram Methods 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000006011 modification reaction Methods 0.000 description 5
- 230000002093 peripheral Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000001537 neural Effects 0.000 description 3
- 230000003287 optical Effects 0.000 description 3
- 230000003044 adaptive Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 235000019800 disodium phosphate Nutrition 0.000 description 2
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 2
- 230000002452 interceptive Effects 0.000 description 2
- 230000000306 recurrent Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000002224 dissection Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 101700016709 pin-2 Proteins 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000001502 supplementation Effects 0.000 description 1
- 230000001360 synchronised Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The invention discloses a kind of audio recognition method, device and terminals, which comprises receives the voice messaging of input;According to voice match model trained in advance, at least one speech recognition result for meeting the voice messaging of the first matching threshold is determined;Determine that the highest speech recognition result of matching degree is target voice recognition result at least one described speech recognition result;Obtain the corresponding file destination of the target voice recognition result;Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface, wherein the target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode.
Description
Technical field
The invention mainly relates to Smart Home technical field more particularly to a kind of audio recognition methods, device and terminal.
Background technique
Speech recognition product is more and more at present, and with advances in technology with the raising of popularity rate, user is to this interaction
Mode also gradually receives and approves.With the continuous improvement of interactive voice technology and artificial intelligence, application scenarios from voice assistant,
Intelligent sound box etc., which accelerates to expand, to be enclosed.Speech recognition product in use, is carried out by acquiring the sound of ambient enviroment
Semanteme parses and executes user speech instruction operation.
Currently, in speech recognition technology, at adaptive aspect there is still a need for bigger improvement, the sound-type in reality is
It is various, male sound, female's sound and Tong Yin can be divided into for sound characteristic, in addition, the same standard pronunciation of pronunciation of many people
There is very big gap, and the case where there is also certain unisonances, the recognition result and use for causing user to obtain when inputting voice
Family intention is inconsistent, affects the using effect of speech recognition product.
Summary of the invention
The embodiment of the invention provides a kind of audio recognition method, device and terminals, to solve voice in the prior art
Recognition result is not able to satisfy user demand, and user inputs the recognition result obtained when voice and user's intention is inconsistent, affects
The problem of using effect of speech recognition product.
The embodiment of the invention provides a kind of audio recognition methods, are applied to terminal, which comprises
Receive the voice messaging of input;
According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching threshold
A speech recognition result;
Determine that the highest speech recognition result of matching degree is target voice identification at least one described speech recognition result
As a result;
Obtain the corresponding file destination of the target voice recognition result;
Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface,
Wherein the target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode
Show.
A kind of possible implementation, it is described to obtain the corresponding file destination of the target voice recognition result, comprising:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding business of the target voice recognition result
Type;
It is identified from the target voice is searched in resources bank in the corresponding type of service of the target voice recognition result
As a result corresponding file destination.
A kind of possible implementation, it is described that semantics recognition is carried out to the target voice recognition result, determine the mesh
Mark the corresponding type of service of speech recognition result, comprising:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice
Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;
According to the weight of the corresponding type of service of each participle, the corresponding service class of the target voice recognition result is determined
Type.
A kind of possible implementation, it is described that each speech recognition result and the target voice recognition result is corresponding
File destination is shown to display interface, comprising:
Determine the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The corresponding file destination of the target voice recognition result is shown on the display interface of the terminal.
A kind of possible implementation, it is described that each speech recognition result and the target voice recognition result is corresponding
File destination is shown to display interface, further includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are with
Two display modes are shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
A kind of possible implementation, the basis voice match model that training is completed in advance determine the voice letter
Breath meets the speech recognition result of the first matching threshold, comprising:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group
At all possible candidate word;
Possible chinese character sequence and Chinese character are determined by syntax rule and statistical method for each possible candidate word
The score of sequence;
Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The embodiment of the invention provides a kind of speech recognition equipment, described device includes:
Transmit-Receive Unit, voice messaging for receiving input;
Processing unit, for determining the institute's predicate for meeting the first matching threshold according to voice match model trained in advance
At least one speech recognition result of message breath;Determine that the highest voice of matching degree is known at least one described speech recognition result
Other result is target voice recognition result;Obtain the corresponding file destination of the target voice recognition result;
Display unit, for showing each speech recognition result and the corresponding file destination of the target voice recognition result
Show to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are with
Two display modes are shown.
A kind of possible implementation, the processing unit are specifically used for:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding business of the target voice recognition result
Type;From searching the target voice recognition result in the corresponding type of service of the target voice recognition result in resources bank
Corresponding file destination.
A kind of possible implementation, the processing unit are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice
Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participle
The weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, the processing unit are specifically used for: determining the preferential of each speech recognition result
Grade;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The display unit, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminal
Display interface on show.
A kind of possible implementation, the Transmit-Receive Unit are also used to: obtaining user to the target voice recognition result
Switching command;
The processing unit, is also used to: according to the switching command, the target voice recognition result after determining change is corresponding
File destination;
The display unit, is also used to: the target voice recognition result after change shown with the first display mode,
Other speech recognition results are shown in a second display mode;The target voice recognition result after showing change simultaneously is corresponding
File destination.
A kind of possible implementation, the processing unit are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group
At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word
The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The embodiment of the invention provides a kind of terminals, including processor, communication interface, memory and communication bus, wherein
Processor, communication interface, memory complete mutual communication by communication bus;
It is stored with computer program in the memory, when described program is executed by the processor, so that the place
It manages device and executes the step of any of the above-described is applied to the method for terminal.
The embodiment of the invention provides a kind of computer readable storage medium, it is stored with the computer that can be executed by terminal
Program, when described program is run on the terminal, so that the terminal executes the method that any of the above-described is applied to terminal
The step of.
The embodiment of the invention provides a kind of audio recognition method, device and terminals, which comprises receives input
Voice messaging;According to voice match model trained in advance, the voice messaging of the first matching threshold of satisfaction is determined at least
One speech recognition result;Determine that the highest speech recognition result of matching degree is target at least one described speech recognition result
Speech recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and described
The corresponding file destination of target voice recognition result is shown to display interface, wherein the target voice recognition result is aobvious with first
The mode of showing shows that other speech recognition results are shown in a second display mode.By showing matching degree most to the first display mode
High speech recognition result can quickly be shown, the convenient degree that user uses is improved;To at least one speech recognition result
It carries out semantics recognition respectively, obtains more possible user search intents, and by each speech recognition result and at least one language
Sound recognition result is shown by the second display mode to the display interface of the terminal, effectively provides more search for user
As a result, effectively improving the coverage rate of speech recognition result and user's intention, success rate of the user by phonetic search is improved, is improved
The using effect of speech recognition product.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of process schematic for audio recognition method that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of exemplary diagram of voice match model provided in an embodiment of the present invention;
Fig. 3 is a kind of process schematic of audio recognition method provided in an embodiment of the present invention;
Fig. 4 is a kind of process schematic that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 4 a is a kind of schematic diagram that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 5 is a kind of process schematic of audio recognition method provided in an embodiment of the present invention;
Fig. 6 is a kind of process schematic that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 7 is a kind of process schematic that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 9 is a kind of schematic diagram that speech recognition result is shown provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of speech recognition equipment provided in an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of server provided in an embodiment of the present invention;
Figure 12 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
The present invention will be describe below in further detail with reference to the accompanying drawings, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
All other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Speech recognition is to allow machine to receive, identify and understand voice signal, and filled and change corresponding digital signal into.Though
Right speech recognition produces a large amount of application in many industries, but to realize that really man-machine exchange naturally also needs very much
Work will be done, for example, needing bigger improvement at adaptive aspect, reach the requirement not influenced by accent, dialect and particular person.
Sound-type in reality be it is various, male sound, female's sound and Tong Yin can be divided into for sound characteristic, in addition, very much
The pronunciation of people has very big gap with standard pronunciation, this just needs to carry out the processing of accent and dialect.User is caused to input voice
When obtained recognition result and user be intended to inconsistent problem.
After user carries out voice input, user needs to carry out transliteration according to everyday words there are more in the word of speech recognition
Being obtained after change, such as: four big names help --- > four great classical masterpieces, land Yao know Ma Li --- > Distance tests a horse's stamina, weather forecast --- > day
Gas is pre- quick-fried etc..Terminal with speech identifying function when carrying out voice recognition processing, due to polyphonic word and accent, dialect and
Meaningless modal particle in particular person, continuous speech recognition etc. influences, and leads to the influence factor multiplicity of speech recognition, it is likely that no
The case where can recognize that user wants as a result, leading to the intention of certain customers may cannot achieve, causing misrecognition, further
Speech recognition system is affected in recognition speed, recognition efficiency, reduces user experience effect.
It is of the existing technology in order to solve the problems, such as, using television set as the scene of terminal for, the embodiment of the present invention mentions
All schemes supplied, can be executed by terminal, can also be executed by server, can according to need setting, it is not limited here.
As shown in Figure 1, comprising:
Step 101: receiving the voice messaging of input;
Wherein, terminal can obtain the voice messaging of user's input by the speech ciphering equipment of terminal, can also be by external
Speech ciphering equipment obtain user input voice messaging;Specifically, it is provided with speech recognition module in terminal, can identifies voice
Information simultaneously carries out voice messaging acquisition.
In addition, be provided with communication module in terminal, such as WIFI wireless communication module etc., enable the terminal and service
Device connection, can be sent to server for collected voice messaging.It is of course also possible to all executed by terminal, it can also be only
Transmitting portion needs the voice messaging of server process, it is not limited here.
Step 102: according to voice match model trained in advance, determining the voice messaging for meeting the first matching threshold
At least one speech recognition result;
In the specific implementation process, voice match model can be set in terminal, also can be set on the server, herein
Without limitation.If being set on server, server, which determines, meets at least the one of the voice messaging of the first matching threshold
After a speech recognition result, at least one described speech recognition result is sent to terminal.
Step 103: determining that the highest speech recognition result of matching degree is target at least one described speech recognition result
Speech recognition result;
In the specific implementation process, terminal can determine matching degree most according to the score of at least one speech recognition result
High target voice recognition result, or server determines matching degree according to the score of at least one speech recognition result
After highest target voice recognition result, target voice recognition result is sent to terminal.
Step 104: obtaining the corresponding file destination of the target voice recognition result;
In the specific implementation process, terminal can search in local or network resources system according to target voice recognition result
The corresponding file destination of Suo Suoshu target voice recognition result, or according to target voice recognition result, in Internet resources
The corresponding file destination of the target voice recognition result is searched in library, determines the corresponding file destination of target voice recognition result
Afterwards, the identification information of file destination or file destination is sent to terminal, so that terminal determines the target voice recognition result
Corresponding file destination.
Step 105: by each speech recognition result and the corresponding file destination of the target voice recognition result show to
Display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are aobvious with second
The mode of showing is shown.
In the specific implementation process, the second display mode can be the display mode opposite with the first display mode.For example,
First display mode can be to highlight, having the display modes such as check boxes, and the second display mode can be highlighted, for nothing without choosing
The display modes such as center.For example, target as shown in Figure 7 shows that result is the first display mode that check boxes are highlighted, the
The mode that two display modes are highlighted for no check boxes.Specific display mode is it is not limited here.
A kind of audio recognition method provided in an embodiment of the present invention, by showing that matching degree is highest to the first display mode
Speech recognition result can quickly be shown, the convenient degree that user uses is improved;At least one speech recognition result is distinguished
Semantics recognition is carried out, obtains more possible user search intents, and each speech recognition result and at least one voice are known
Other result is shown by the second display mode to the display interface of the terminal, effective to provide more search knots for user
Fruit effectively improves the coverage rate of speech recognition result and user's intention, improves success rate of the user by phonetic search, improves language
The using effect of sound identification product.
In the embodiment of the present invention, as shown in Fig. 2, a kind of method that speech recognition modeling determines speech recognition result is provided,
Include:
Step 1: obtaining the voice messaging of user's input, the feature acoustics of the voice messaging is determined by acoustic feature
Probability;
Specifically, acoustic feature extracts, i.e., speech acoustics feature information is extracted from voice messaging, in order to guarantee that identification is quasi-
True rate, the extraction part should have preferable distinction to the modeling unit of acoustic model.Acoustics in the embodiment of the present invention
Feature may include: mel cepstrum coefficients (MFCC), linear prediction residue error (LPCC), perception linear predictor coefficient (PLP)
Deng.
Step 2: the voice messaging after acoustic feature is extracted is input in voice match model, voice match model packet
Include language model and acoustic model;
For example, in embodiments of the present invention, the training process of the voice match model may include:
Step 1: obtaining sample voice information, the markup information of its affiliated voice is carried in the sample voice information;
Step 2: by each sample voice information input into voice match model;
Step 3: according to the output of each sample voice information and the voice match model, to the voice match mould
Type is trained.
In order to facilitate the training of voice match model, a large amount of sample voice information can be collected, the sample voice information
It can be terminal acquisition, or what other approach obtained;For sample voice information, can the sample voice information into
Rower note.
By sample voice information input into the voice match model, which is trained, the model
It can be to have the models such as dynamic time warping technology, hidden Markov model, artificial neural network, support vector machines.It will be each
Sample voice information input is into the model, according to the output of the markup information of each sample voice information and voice match model
As a result, being trained to the voice match model.
In the embodiment of the present invention, voice match model is obtained by being trained to great amount of samples voice messaging, and pass through
The voice match model can carry out speech recognition to the voice messaging of acquisition.
Wherein, acoustic model is built using the acoustic model that training phonetic feature and its corresponding markup information carry out mark
Mould.Acoustic model construct voice signal in observational characteristic and pronunciation modeling unit between mapping relations, with this carry out phoneme or
The classification of phoneme state.In the embodiment of the present invention, acoustic model can be basic as the modeling of acoustic model using HMM.
Wherein, language model can be using under the speech recognition framework modeled based on statistical learning, and N-gram counts language
Say model comprising a Markov chain indicates the generating process of word sequence, i.e., indicates the Probability p (W) for generating word sequence W
Are as follows:
Wherein, wkIndicate word sequence in k-th of word, above formula as it can be seen that generate current word probability only with the word of front n-1
It is related;
In the embodiment of the present invention, the training of language model and evaluation index can use language model puzzlement degree
(Perplexity, PP), its definition are the inverses of word sequence generating probability ensemble average, it may be assumed that
As it can be seen that language model is smaller to the expectation puzzlement degree for generating word sequence from formula, then the language model is given
It is higher to the prediction accuracy for generating which kind of current word in the case where history word sequence, therefore the training objective of language model is exactly
Minimize the puzzlement degree of training set corpus.
In the training process, the probability of each word occurred in training set corpus and related word combination is counted, and first with this
For the relevant parameter of basic estimating language models.
However the number of related word combination is to increase with the vocabulary scale being likely to occur in geometry grade, counts all possibility
The case where appearance, is simultaneously infeasible, and in the realistic case, training data be usually it is sparse, the general of appearance is combined between some words
Rate very little did not occur even at all.For these problems, by drop power (Discounting) and can recall
The methods of (Backing-off), and utilize recurrent neural network (Recurrent Neural Network, RNN) modeling language
The method of model optimizes language model.
Step 3: obtaining the possible text of speech recognition after the result that speech model is obtained is input to decoder decoding
This information.
The phonetic feature acoustics probability that is calculated in conjunction with acoustic model and general by the calculated language model of language model
Rate analyzes most possible word sequence W' by relevant search algorithm using decoder, possible in voice messaging to export
Text information.
In a step 102, the voice match model completed according to preparatory training determines that the voice messaging meets first
Speech recognition result with threshold value, comprising:
Step 1: the voice messaging is input to the voice match model, the phonetic in the voice messaging is identified
Sequence forms all possible candidate word;
In order to confirm correct character to each syllable, first according to the pinyin sequence of input, form all possible
Character assumes or the word of single-tone byte, multisyllable is assumed.Such as: by taking list entries " video of Zheng Kai " as an example, corresponding phonetic
Sequence is [zheng4, kai3, de1, shi4, pin2].As shown in figure 3, each path is a possible recognition result.
Step 2: determining possible chinese character sequence by syntax rule and statistical method for each possible candidate word
And the score of chinese character sequence;
Specifically, multiple candidate words by each sound to be identified obtain Chinese character using grammar rule and Principle of Statistics
The score of sequence, and correct the mistake of some phonetic identifications.Applied probability statistical language model is in character string or word sequence
Search may correct path.Decoder in the embodiment of the present invention can use the Viterbi algorithm of Dynamic Programming Idea, and
By certain algorithm (algorithm (Language is seen before Gauss selection algorithm (Gaussian Selection), language model
Mode Look-Ahead) etc.) carry out quick synchronous probability calculating and search space cut, reduced by this in terms of
Complexity and memory overhead are calculated, the efficiency for realizing searching algorithm is improved.
Step 3: score is met the chinese character sequence of the first matching threshold as institute's speech recognition result.
Specifically, at least one matched man's sequence of language model and score are ranked up.From the knot of template matching
From the point of view of fruit, the high recognition result of matching degree identifies that correct probability is higher.There certainly exist due to not supplementing language in some model
Although material causes Model Matching score higher, not the case where correct result.Therefore, matching score can be chosen to need to meet
The speech recognition result of first threshold carries out semantics recognition.
Specifically, score needs to meet first threshold to get the speech recognition result divided greater than first threshold ρ as possible
Correct recognition result.For example, recognition result is as follows:
Recognition result | Matching score |
The video of Zheng Kai | 0.641 |
The video of Zheng Kai | 0.629 |
The video of Chinese regular script | 0.457 |
Just triumphant moral food | 0.231 |
As shown above, first threshold can be 0.4;Then speech recognition result at this time are as follows: " video, the Zheng Kai of Zheng Kai
Video, Chinese regular script video ".
A kind of possible implementation then takes when all recognition result Model Matching score values are respectively less than first threshold ρ
The recognition result of highest scoring executes step 104, carries out semantic processes.
To improve user experience, effectively shows searching process, before step 103, target voice can also be identified and be tied
Fruit is output to the interface of terminal and display.Wherein, the interface of terminal can be the client of the voice assistant of acquisition voice messaging
Display interface, be also possible to other interfaces of terminal, it is not limited here.For example, can as shown in fig. 4 a, the mesh
Marking speech recognition result is " video of Zheng Kai ".
As shown in figure 4, display recognition result process the following steps are included:
Step 1: the topology file at creation interface;
Wherein, the topology file includes the text control for showing speech recognition result.
Step 2: creation interface loads topology file, text control is initialized.
Step 3: the display interface in terminal shows speech recognition result, that is, the text information identified.
In order to effectively improve the accuracy and coverage of identification, it is stored with preset dictionary in server, the dictionary
Include a large amount of corpus data in library, have the function of semantic parsing, after the voice messaging that Cloud Server judgement receives, utilizes
The semantic parsing function of itself carries out semantic dissection process to the speech recognition result.Specifically, preserving semanteme in server
Identification model, the semantics recognition model can identify the participle of voice messaging;Determine the participle in the voice messaging, identification point
The semanteme of word determines the corresponding file destination of each semanteme.Certainly, the dictionary if desired retrieved is smaller, to improve parsing speed
Rate, semantics recognition can be completed at the terminal, it is not limited here.
At step 104, comprising:
Step 1: carrying out semantics recognition to target voice recognition result, the corresponding business of target voice recognition result is determined
Type;
If terminal executes semantics recognition, speech recognition result can be exported according to the semantics recognition model in terminal
Participle, parses the semanteme of participle, and semantic corresponding annotation results, search in the annotation results whether include and type of service phase
The type of service of pass.
If server executes semantics recognition, server is after the speech recognition result for receiving terminal transmission, according to service
Semantics recognition model on device, exports the participle of speech recognition result, parses the semanteme of participle, and semantic corresponding mark knot
Whether fruit searches in the annotation results comprising type of service relevant to type of service.
Step 2: from the target language is searched in the corresponding type of service of the target voice recognition result in resources bank
The corresponding file destination of sound recognition result.
For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition model
Process may include:
Step 1: carrying out word segmentation processing to target voice recognition result, and know to target voice according to preset dictionary
Respectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle;
Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industry
The mark of service type.
Step 2: determining the corresponding business of target voice recognition result according to the weight of the corresponding type of service of each participle
Type.
To further increase recall precision, for target voice recognition result is removed, more than the speech recognition knot of first threshold
Other speech recognition results in fruit can also be performed simultaneously aforesaid operations with target voice recognition result.It is of course also possible to
After the switching command for receiving user, then aforesaid operations are executed, it is not limited here.
As shown in figure 5, specifically, may include:
Step 1: carrying out semantics recognition to each speech recognition result at least one described speech recognition result, really
Determine the corresponding type of service of institute's speech recognition result;
Specifically, speech recognition result 1 is inputted into semantics recognition model, if wrapped in the result of semantics recognition model output
The type of service 1 contained, then it is assumed that include type of service 1 in the speech recognition result 1, need in the corresponding application of type of service 1
Subsequent processing is executed in program.
For example, speech recognition result 1 is " video of Zheng Kai ", the word segmentation result of semantics recognition model output are as follows: Zheng
Happy, video.Wherein, the type of service of video is video type, then the type of service of speech recognition result is video type.
A kind of possible implementation, type of service can also be determined according to the attribute of participle.For example, speech recognition knot
Fruit 2 is " weather forecast ", the word segmentation result that semantics recognition model determines are as follows: weather, forecast;" weather " has Weather property
(weatherKeys), it is determined that type of service is weather lookup type.
For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition model
Process may include:
Step 1: carrying out word segmentation processing to institute's speech recognition result, and know to the voice according to preset dictionary
Respectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle;
Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industry
The mark of service type.
Step 2: determining the corresponding business of institute's speech recognition result according to the weight of the corresponding type of service of each participle
Type.
A kind of possible implementation, the weight of the type of service be according to the type of service in the terminal
The user of the user of the priority or terminal of the data bank in the participle institute source in priority, the preset dictionary is inclined
In good at least one of determine.
For example, speech recognition result 3 is " video of Chinese regular script ", the word segmentation result that semantics recognition model determines are as follows: just
Pattern, video.The type of service of video is video type;The type of service of Chinese regular script is education type;If it is determined that " video " is corresponding
Video type weight be greater than " Chinese regular script " it is corresponding education type weight, it is determined that the type of service of speech recognition result 3
For video type.If it is determined that the weight of " video " the corresponding video type education weight of type corresponding with " Chinese regular script " is identical,
The corresponding type of service of speech recognition result 3 can also be determined as educating type and video type.
For another example, speech recognition result 4 is " weather is pre- quick-fried ", the word segmentation result that semantics recognition model determines are as follows: weather is pre-
It is quick-fried;According to preset dictionary determine weather it is pre- it is quick-fried be a film, corresponding type of service includes video type, types of songs
Deng;Then according to the weight of " weather is pre- quick-fried " corresponding video type, and the weight of " weather is pre- quick-fried " corresponding types of songs, determine
The type of service of speech recognition result 4.
In step 2, from searching at least one in the corresponding type of service of at least one speech recognition result in resources bank
The corresponding file destination of a speech recognition result.
The mesh of Zheng Kai can be searched for from the video type in resources bank for speech recognition result 1 in conjunction with the example above
Mark file.For speech recognition result 2, weather, the target text of forecast can be searched for from the weather lookup business in resources bank
Part.For speech recognition result 3, can be searched for just from the video type or education type or education video type in resources bank
The file destination of pattern.For speech recognition result 4, it is pre- weather can be searched for from the video type or types of songs in resources bank
Quick-fried file destination.
In step 105, it specifically includes:
Step 1: determining the priority of each speech recognition result at least one described speech recognition result;
Specifically, showing search result in the form of TAB in conjunction with semantic analysis UI, show the sequence of result mainly according to heat
It searches ranking and carries out TAB sequence.
Step 2: showing each speech recognition result according to the priority arrangement on the display interface of the terminal;
Its priority can be determining based on modes such as user's big data analysis, score and user preferences, it is not limited here.
Step 3: showing each speech recognition result and the corresponding file destination of the target voice recognition result to aobvious
Show interface.
In the specific implementation process, as shown in Figure 6, comprising:
The corresponding TAB data of speech recognition result and file destination are converted to JSON data by semantics recognition module, transmission
To the display module of terminal;
After the display module of terminal obtains the JSON data, corresponding speech recognition result and file destination are parsed;
Each speech recognition result and corresponding file destination are shown according to parsing result.
In conjunction with the example above, however, it is determined that ranking results are as follows: Zheng Kai > Zheng Kai > Chinese regular script then shows that result can be as shown in Figure 7.
A kind of possible implementation, the type of service that can not be determined for semantic analysis or can not determine corresponding target
When file, then the speech recognition result is not shown to terminal.As " video of Chinese regular script " semanteme can not understand or resources bank in search
Less than " Chinese regular script " calligraphy related content, then the speech recognition result is not shown to terminal.
In conjunction with the example above, however, it is determined that ranking results are as follows: weather forecast > weather is pre- quick-fried, then show result can such as Fig. 8 and
Shown in Fig. 9.
Further, if user wants switching target voice recognition result, the switching of speech recognition result can be carried out,
It specifically includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are with
Two display modes are shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
Specifically, determine that the corresponding file destination of target voice recognition result after change can refer to above-described embodiment,
Details are not described herein.
In order to further increase identification speech recognition accuracy rate, in embodiments of the present invention, the method also includes:
User is obtained to the operational order of institute's speech recognition result or file destination;
Increase the matching degree of the corresponding speech recognition result of the operational order or file destination, to update user preference.
For example, user selects " weather is pre- quick-fried " in display interface, then in the user preference of user, " weather is pre- for record
It is quick-fried ", and increase the matching degree of " weather is pre- quick-fried ".
In order to further increase identification speech recognition accuracy rate, the embodiment of the present invention also provides a kind of possible reality
Existing mode, comprising:
Judge whether the voice messaging includes the first control instruction controlled the terminal;
If the user speech information is the first control instruction controlled the terminal, in the terminal
Execute first control instruction.
A kind of possible implementation, if in voice messaging also include action type participle, illustrate terminal it is necessary to
Corresponding operating is carried out according to the voice messaging.At this time the finger handled according to the voice messaging directly can be sent to terminal
It enables.For example, opening, viewing, the participle of the action types such as broadcasting.
A kind of possible implementation, in the semanteme of voice messaging whether comprising for terminal setting target control refer to
Order is judged, if it is, executing first control instruction in the terminal.
For example, the speech recognition result of identification is " video for opening Zheng Kai ", then it can determine that the first control instruction is to beat
It opens.
A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " is unique, then can directly execute and beat
Open the video of Zheng Kai " file destination.
A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " has multiple, can first show multiple
File destination executes open control instruction after the operational order for obtaining user.
The embodiment of the present invention, by identifying that the voice messaging ties the highest identification of matching score in voice match model
Fruit shows user, while at least one speech recognition result for meeting the first matching threshold is carried out semantics recognition respectively, ties
Semantic processes are closed as a result, can more fully understand that user is intended to by the different service search result of UI interactive display to user,
It compared with audio recognition method in the prior art, is requested by multiple semantic analysis, realizes the search of homonym name service
With show, user can select desired result according to intention.
Based on the same technical idea, the embodiment of the present invention provides a kind of speech recognition equipment 1000, as shown in Figure 10, packet
It includes:
Transmit-Receive Unit 1001, voice messaging for receiving input;
Processing unit 1002, for determining the institute for meeting the first matching threshold according to voice match model trained in advance
State at least one speech recognition result of voice messaging;Determine the highest language of matching degree at least one described speech recognition result
Sound recognition result is target voice recognition result;Obtain the corresponding file destination of the target voice recognition result;
Display unit 1003, for each speech recognition result and the corresponding target of the target voice recognition result is literary
Part is shown to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results
It shows in a second display mode.
A kind of possible implementation, processing unit 1002 are specifically used for: carrying out language to the target voice recognition result
Justice identification, determines the corresponding type of service of the target voice recognition result;It identifies and ties in the target voice from resources bank
The corresponding file destination of the target voice recognition result is searched in the corresponding type of service of fruit.
A kind of possible implementation, processing unit 1002 are specifically used for: determining the preferential of each speech recognition result
Grade;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
Display unit 1003, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminal
Display interface on show.
A kind of possible implementation, Transmit-Receive Unit 1001 are also used to: obtaining user to the target voice recognition result
Switching command;
Processing unit 1002, is also used to: according to the switching command, the target voice recognition result after determining change is corresponding
File destination;
Display unit 1003, is also used to: the target voice recognition result after change shown with the first display mode,
Other speech recognition results are shown in a second display mode;The target voice recognition result after showing change simultaneously is corresponding
File destination.
A kind of possible implementation, processing unit 1002 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group
At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word
The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of servers 1100, as shown in figure 11,
It include: processor 1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication connects
Mouth 1102, memory 1103 complete mutual communication by communication bus 1104;
It is stored with computer program in the memory 1103, when described program is executed by the processor 1101, is made
It obtains the processor 1101 and executes following steps:
According to the voice match model that preparatory training is completed, determine that the voice messaging meets the first matching threshold at least
One speech recognition result;Determine that the highest speech recognition result of matching degree is target at least one described speech recognition result
Speech recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and described
The corresponding file destination of target voice recognition result shows to the display interface of terminal, wherein the target voice recognition result with
First display mode shows that other speech recognition results are shown in a second display mode.
A kind of possible implementation, processor 1101 are specifically used for: carrying out to the target voice recognition result semantic
Identification, determines the corresponding type of service of the target voice recognition result;In the target voice recognition result from resources bank
The corresponding file destination of the target voice recognition result is searched in corresponding type of service.
A kind of possible implementation, processor 1101 are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice
Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participle
The weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, processor 1101 are specifically used for: determining the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
A kind of possible implementation, processor 1101 are also used to the target voice according to switching command, after determining change
The corresponding file destination of recognition result;The target voice recognition result after change is shown with the first display mode, other
Speech recognition result is shown in a second display mode;The corresponding target of the target voice recognition result after showing change simultaneously
File.The switching command is the user that is obtained by communication interface 1102 to the target voice recognition result.
A kind of possible implementation, processor 1101 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group
At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word
The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The communication bus that above-mentioned server is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface 1102 is for the communication between above-mentioned server and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit, network processing unit (Network
Processor, NP) etc.;It can also be digital command processor (Digital Signal Processing, DSP), dedicated collection
At circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hard
Part component etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage medium
Matter is stored with the computer program that can be executed by server in the computer readable storage medium, when described program is described
When being run on server, so that the server realizes any method in above-described embodiment when executing.
Above-mentioned computer readable storage medium can be any usable medium that the processor in server can access or
Data storage device, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memory are such as
CD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NAND
FLASH), solid state hard disk (SSD) etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of terminals 1200, as shown in figure 12, packet
It includes: processor 1201, communication interface 1202, memory 1203 and communication bus 1204, wherein processor 1201, communication interface
1202, memory 1203 completes mutual communication by communication bus 1204;
It is stored with computer program in the memory 1203, when described program is executed by the processor 1201, is made
It obtains the processor 1201 and executes following steps:
According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching threshold
A speech recognition result;Determine that the highest speech recognition result of matching degree is target language at least one described speech recognition result
Sound recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and the mesh
The corresponding file destination of mark speech recognition result is shown to display interface, wherein the target voice recognition result is with the first display
Mode shows that other speech recognition results are shown in a second display mode.
A kind of possible implementation, processor 1201 are specifically used for: carrying out to the target voice recognition result semantic
Identification, determines the corresponding type of service of the target voice recognition result;In the target voice recognition result from resources bank
The corresponding file destination of the target voice recognition result is searched in corresponding type of service.
A kind of possible implementation, processor 1201 are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice
Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participle
The weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, processor 1201 are specifically used for: determining the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;By the target language
The corresponding file destination of sound recognition result is shown on the display interface of the terminal.
A kind of possible implementation, processor 1201 are also used to the target voice according to switching command, after determining change
The corresponding file destination of recognition result;The target voice recognition result after change is shown with the first display mode, other
Speech recognition result is shown in a second display mode;The corresponding target of the target voice recognition result after showing change simultaneously
File.Wherein, switching command is that the user obtained by communication interface 1202 refers to the switching of the target voice recognition result
It enables.
A kind of possible implementation, processor 1201 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group
At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word
The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The communication bus that above-mentioned terminal is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface 1202 is for the communication between above-mentioned terminal and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit, network processing unit (Network
Processor, NP) etc.;It can also be digital command processor (Digital Signal Processing, DSP), dedicated collection
At circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hard
Part component etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage medium
Matter is stored with the computer program that can be executed by terminal in the computer readable storage medium, when described program is at the end
When being run on end, so that the terminal realizes any method in above-described embodiment when executing.
Above-mentioned computer readable storage medium can be any usable medium or number that the processor in terminal can access
Such as according to storage equipment, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memory
CD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NAND
FLASH), solid state hard disk (SSD) etc..
For systems/devices embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple
Single, the relevent part can refer to the partial explaination of embodiments of method.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or an operation are distinguished with another entity or another operation, without necessarily requiring or implying these entities
Or there are any actual relationship or orders between operation.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, the reality of complete hardware embodiment, complete Application Example or connected applications and hardware aspect can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (10)
1. a kind of audio recognition method is applied to terminal, which is characterized in that the described method includes:
Receive the voice messaging of input;
According to voice match model trained in advance, at least one language for meeting the voice messaging of the first matching threshold is determined
Sound recognition result;
Determine that the highest speech recognition result of matching degree is target voice recognition result at least one described speech recognition result;
Obtain the corresponding file destination of the target voice recognition result;
Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface, wherein
The target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode.
2. the method as described in claim 1, which is characterized in that described to obtain the corresponding target of the target voice recognition result
File, comprising:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding service class of the target voice recognition result
Type;
From searching the target voice recognition result in the corresponding type of service of the target voice recognition result in resources bank
Corresponding file destination.
3. method according to claim 2, which is characterized in that described to carry out semantic knowledge to the target voice recognition result
Not, the corresponding type of service of the target voice recognition result is determined, comprising:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and identify to the target voice
As a result respectively participle carries out semantics recognition in, determines the corresponding type of service of each participle;
According to the weight of the corresponding type of service of each participle, the corresponding type of service of the target voice recognition result is determined.
4. the method as described in claim 1, which is characterized in that described to know each speech recognition result and the target voice
The corresponding file destination of other result is shown to display interface, comprising:
Determine the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The corresponding file destination of the target voice recognition result is shown on the display interface of the terminal.
5. method as claimed in claim 4, which is characterized in that described to know each speech recognition result and the target voice
The corresponding file destination of other result is shown to display interface, further includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are aobvious with second
The mode of showing is shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
6. such as method described in any one of claim 1 to 5, which is characterized in that the basis voice that training is completed in advance
With model, determine that the voice messaging meets the speech recognition result of the first matching threshold, comprising:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, forms institute
Possible candidate word;
Possible chinese character sequence and chinese character sequence are determined by syntax rule and statistical method for each possible candidate word
Score;
Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
7. a kind of speech recognition equipment, which is characterized in that described device includes:
Transmit-Receive Unit, voice messaging for receiving input;Each voice at least one described speech recognition result is obtained to know
The corresponding file destination of other result;
Processing unit, for according to voice match model trained in advance, determining the voice letter for meeting the first matching threshold
At least one speech recognition result of breath;Determine the highest speech recognition knot of matching degree at least one described speech recognition result
Fruit is target voice recognition result;
Display unit, for by each speech recognition result and the corresponding file destination of the target voice recognition result show to
Display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are aobvious with second
The mode of showing is shown.
8. device as claimed in claim 7, which is characterized in that the processing unit is specifically used for:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding service class of the target voice recognition result
Type;From searching the target voice recognition result pair in the corresponding type of service of the target voice recognition result in resources bank
The file destination answered.
9. a kind of terminal, which is characterized in that including processor, communication interface, memory and communication bus, wherein processor leads to
Believe that interface, memory complete mutual communication by communication bus;
It is stored with computer program in the memory, when described program is executed by the processor, so that the processor
Perform claim requires the step of any one of 1-6 the method.
10. a kind of computer readable storage medium, which is characterized in that it is stored with the computer that can be executed by terminal or server
Program, when described program is run in the terminal or server, so that the terminal or server perform claim require 1-6
The step of any one the method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910211472.4A CN109976702A (en) | 2019-03-20 | 2019-03-20 | A kind of audio recognition method, device and terminal |
PCT/CN2019/106806 WO2020186712A1 (en) | 2019-03-20 | 2019-09-19 | Voice recognition method and apparatus, and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910211472.4A CN109976702A (en) | 2019-03-20 | 2019-03-20 | A kind of audio recognition method, device and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109976702A true CN109976702A (en) | 2019-07-05 |
Family
ID=67079603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910211472.4A Pending CN109976702A (en) | 2019-03-20 | 2019-03-20 | A kind of audio recognition method, device and terminal |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109976702A (en) |
WO (1) | WO2020186712A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335606A (en) * | 2019-08-07 | 2019-10-15 | 广东电网有限责任公司 | A kind of voice interaction device for Work tool control |
CN110427459A (en) * | 2019-08-05 | 2019-11-08 | 苏州思必驰信息科技有限公司 | Visualized generation method, system and the platform of speech recognition network |
CN110931018A (en) * | 2019-12-03 | 2020-03-27 | 珠海格力电器股份有限公司 | Intelligent voice interaction method and device and computer readable storage medium |
CN111192572A (en) * | 2019-12-31 | 2020-05-22 | 斑马网络技术有限公司 | Semantic recognition method, device and system |
WO2020186712A1 (en) * | 2019-03-20 | 2020-09-24 | 海信视像科技股份有限公司 | Voice recognition method and apparatus, and terminal |
CN112735394A (en) * | 2020-12-16 | 2021-04-30 | 青岛海尔科技有限公司 | Semantic parsing method and device for voice |
CN112802474A (en) * | 2019-10-28 | 2021-05-14 | 中国移动通信有限公司研究院 | Voice recognition method, device, equipment and storage medium |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050038657A1 (en) * | 2001-09-05 | 2005-02-17 | Voice Signal Technologies, Inc. | Combined speech recongnition and text-to-speech generation |
US20070055520A1 (en) * | 2005-08-31 | 2007-03-08 | Microsoft Corporation | Incorporation of speech engine training into interactive user tutorial |
CN101309327A (en) * | 2007-04-16 | 2008-11-19 | 索尼株式会社 | Sound chat system, information processing device, speech recognition and key words detectiion |
CN101557651A (en) * | 2008-04-08 | 2009-10-14 | Lg电子株式会社 | Mobile terminal and menu control method thereof |
CN101557432A (en) * | 2008-04-08 | 2009-10-14 | Lg电子株式会社 | Mobile terminal and menu control method thereof |
CN101604520A (en) * | 2009-07-16 | 2009-12-16 | 北京森博克智能科技有限公司 | Spoken language voice recognition method based on statistical model and syntax rule |
CN102867512A (en) * | 2011-07-04 | 2013-01-09 | 余喆 | Method and device for recognizing natural speech |
CN103176591A (en) * | 2011-12-21 | 2013-06-26 | 上海博路信息技术有限公司 | Text location and selection method based on voice recognition |
CN103176998A (en) * | 2011-12-21 | 2013-06-26 | 上海博路信息技术有限公司 | Read auxiliary system based on voice recognition |
CN103811005A (en) * | 2012-11-13 | 2014-05-21 | Lg电子株式会社 | Mobile terminal and control method thereof |
CN105489220A (en) * | 2015-11-26 | 2016-04-13 | 小米科技有限责任公司 | Method and device for recognizing speech |
CN105679318A (en) * | 2015-12-23 | 2016-06-15 | 珠海格力电器股份有限公司 | Display method and device based on speech recognition, display system and air conditioner |
CN105869636A (en) * | 2016-03-29 | 2016-08-17 | 上海斐讯数据通信技术有限公司 | Speech recognition apparatus and method thereof, smart television set and control method thereof |
CN106098063A (en) * | 2016-07-01 | 2016-11-09 | 海信集团有限公司 | A kind of sound control method, terminal unit and server |
CN106356056A (en) * | 2016-10-28 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Speech recognition method and device |
CN109492175A (en) * | 2018-10-23 | 2019-03-19 | 青岛海信电器股份有限公司 | The display methods and device of Application Program Interface, electronic equipment, storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1021254A (en) * | 1996-06-28 | 1998-01-23 | Toshiba Corp | Information retrieval device with speech recognizing function |
CN109976702A (en) * | 2019-03-20 | 2019-07-05 | 青岛海信电器股份有限公司 | A kind of audio recognition method, device and terminal |
-
2019
- 2019-03-20 CN CN201910211472.4A patent/CN109976702A/en active Pending
- 2019-09-19 WO PCT/CN2019/106806 patent/WO2020186712A1/en active Application Filing
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050038657A1 (en) * | 2001-09-05 | 2005-02-17 | Voice Signal Technologies, Inc. | Combined speech recongnition and text-to-speech generation |
US20070055520A1 (en) * | 2005-08-31 | 2007-03-08 | Microsoft Corporation | Incorporation of speech engine training into interactive user tutorial |
CN101309327A (en) * | 2007-04-16 | 2008-11-19 | 索尼株式会社 | Sound chat system, information processing device, speech recognition and key words detectiion |
CN101557651A (en) * | 2008-04-08 | 2009-10-14 | Lg电子株式会社 | Mobile terminal and menu control method thereof |
CN101557432A (en) * | 2008-04-08 | 2009-10-14 | Lg电子株式会社 | Mobile terminal and menu control method thereof |
CN101604520A (en) * | 2009-07-16 | 2009-12-16 | 北京森博克智能科技有限公司 | Spoken language voice recognition method based on statistical model and syntax rule |
CN102867512A (en) * | 2011-07-04 | 2013-01-09 | 余喆 | Method and device for recognizing natural speech |
CN103176591A (en) * | 2011-12-21 | 2013-06-26 | 上海博路信息技术有限公司 | Text location and selection method based on voice recognition |
CN103176998A (en) * | 2011-12-21 | 2013-06-26 | 上海博路信息技术有限公司 | Read auxiliary system based on voice recognition |
CN103811005A (en) * | 2012-11-13 | 2014-05-21 | Lg电子株式会社 | Mobile terminal and control method thereof |
CN105489220A (en) * | 2015-11-26 | 2016-04-13 | 小米科技有限责任公司 | Method and device for recognizing speech |
CN105679318A (en) * | 2015-12-23 | 2016-06-15 | 珠海格力电器股份有限公司 | Display method and device based on speech recognition, display system and air conditioner |
CN105869636A (en) * | 2016-03-29 | 2016-08-17 | 上海斐讯数据通信技术有限公司 | Speech recognition apparatus and method thereof, smart television set and control method thereof |
CN106098063A (en) * | 2016-07-01 | 2016-11-09 | 海信集团有限公司 | A kind of sound control method, terminal unit and server |
CN106356056A (en) * | 2016-10-28 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Speech recognition method and device |
CN109492175A (en) * | 2018-10-23 | 2019-03-19 | 青岛海信电器股份有限公司 | The display methods and device of Application Program Interface, electronic equipment, storage medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020186712A1 (en) * | 2019-03-20 | 2020-09-24 | 海信视像科技股份有限公司 | Voice recognition method and apparatus, and terminal |
CN110427459A (en) * | 2019-08-05 | 2019-11-08 | 苏州思必驰信息科技有限公司 | Visualized generation method, system and the platform of speech recognition network |
CN110427459B (en) * | 2019-08-05 | 2021-09-17 | 思必驰科技股份有限公司 | Visual generation method, system and platform of voice recognition network |
CN110335606A (en) * | 2019-08-07 | 2019-10-15 | 广东电网有限责任公司 | A kind of voice interaction device for Work tool control |
CN110335606B (en) * | 2019-08-07 | 2022-04-19 | 广东电网有限责任公司 | Voice interaction device for management and control of tools and appliances |
CN112802474A (en) * | 2019-10-28 | 2021-05-14 | 中国移动通信有限公司研究院 | Voice recognition method, device, equipment and storage medium |
CN110931018A (en) * | 2019-12-03 | 2020-03-27 | 珠海格力电器股份有限公司 | Intelligent voice interaction method and device and computer readable storage medium |
CN111192572A (en) * | 2019-12-31 | 2020-05-22 | 斑马网络技术有限公司 | Semantic recognition method, device and system |
CN112735394A (en) * | 2020-12-16 | 2021-04-30 | 青岛海尔科技有限公司 | Semantic parsing method and device for voice |
Also Published As
Publication number | Publication date |
---|---|
WO2020186712A1 (en) | 2020-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109976702A (en) | A kind of audio recognition method, device and terminal | |
US10811013B1 (en) | Intent-specific automatic speech recognition result generation | |
CN108305634B (en) | Decoding method, decoder and storage medium | |
US20170206897A1 (en) | Analyzing textual data | |
CN105723449B (en) | speech content analysis system and speech content analysis method | |
KR101309042B1 (en) | Apparatus for multi domain sound communication and method for multi domain sound communication using the same | |
CN108305643B (en) | Method and device for determining emotion information | |
KR102390940B1 (en) | Context biasing for speech recognition | |
CN111933129B (en) | Audio processing method, language model training method and device and computer equipment | |
US20220246149A1 (en) | Proactive command framework | |
US11189277B2 (en) | Dynamic gazetteers for personalized entity recognition | |
JP2021033255A (en) | Voice recognition method, device, apparatus, and computer readable storage medium | |
CN108711420A (en) | Multilingual hybrid model foundation, data capture method and device, electronic equipment | |
CN108428446A (en) | Audio recognition method and device | |
CN108735201A (en) | Continuous speech recognition method, apparatus, equipment and storage medium | |
CN104572631B (en) | The training method and system of a kind of language model | |
US9922650B1 (en) | Intent-specific automatic speech recognition result generation | |
CN108711421A (en) | A kind of voice recognition acoustic model method for building up and device and electronic equipment | |
CN108899013A (en) | Voice search method, device and speech recognition system | |
US11120799B1 (en) | Natural language processing policies | |
CN109616096A (en) | Construction method, device, server and the medium of multilingual tone decoding figure | |
CN111090727A (en) | Language conversion processing method and device and dialect voice interaction system | |
CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
Öktem et al. | Attentional parallel RNNs for generating punctuation in transcribed speech | |
CN104750677A (en) | Speech translation apparatus, speech translation method and speech translation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 266555 Qingdao economic and Technological Development Zone, Shandong, Hong Kong Road, No. 218 Applicant after: Hisense Video Technology Co., Ltd Address before: 266555 Qingdao economic and Technological Development Zone, Shandong, Hong Kong Road, No. 218 Applicant before: HISENSE ELECTRIC Co.,Ltd. |
|
CB02 | Change of applicant information |