CN109976702A - A kind of audio recognition method, device and terminal - Google Patents

A kind of audio recognition method, device and terminal Download PDF

Info

Publication number
CN109976702A
CN109976702A CN201910211472.4A CN201910211472A CN109976702A CN 109976702 A CN109976702 A CN 109976702A CN 201910211472 A CN201910211472 A CN 201910211472A CN 109976702 A CN109976702 A CN 109976702A
Authority
CN
China
Prior art keywords
recognition result
target voice
speech recognition
voice recognition
file destination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910211472.4A
Other languages
Chinese (zh)
Inventor
任晓楠
崔保磊
戴磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Hisense Electronics Co Ltd
Original Assignee
Qingdao Hisense Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Hisense Electronics Co Ltd filed Critical Qingdao Hisense Electronics Co Ltd
Priority to CN201910211472.4A priority Critical patent/CN109976702A/en
Publication of CN109976702A publication Critical patent/CN109976702A/en
Priority to PCT/CN2019/106806 priority patent/WO2020186712A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention discloses a kind of audio recognition method, device and terminals, which comprises receives the voice messaging of input;According to voice match model trained in advance, at least one speech recognition result for meeting the voice messaging of the first matching threshold is determined;Determine that the highest speech recognition result of matching degree is target voice recognition result at least one described speech recognition result;Obtain the corresponding file destination of the target voice recognition result;Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface, wherein the target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode.

Description

A kind of audio recognition method, device and terminal
Technical field
The invention mainly relates to Smart Home technical field more particularly to a kind of audio recognition methods, device and terminal.
Background technique
Speech recognition product is more and more at present, and with advances in technology with the raising of popularity rate, user is to this interaction Mode also gradually receives and approves.With the continuous improvement of interactive voice technology and artificial intelligence, application scenarios from voice assistant, Intelligent sound box etc., which accelerates to expand, to be enclosed.Speech recognition product in use, is carried out by acquiring the sound of ambient enviroment Semanteme parses and executes user speech instruction operation.
Currently, in speech recognition technology, at adaptive aspect there is still a need for bigger improvement, the sound-type in reality is It is various, male sound, female's sound and Tong Yin can be divided into for sound characteristic, in addition, the same standard pronunciation of pronunciation of many people There is very big gap, and the case where there is also certain unisonances, the recognition result and use for causing user to obtain when inputting voice Family intention is inconsistent, affects the using effect of speech recognition product.
Summary of the invention
The embodiment of the invention provides a kind of audio recognition method, device and terminals, to solve voice in the prior art Recognition result is not able to satisfy user demand, and user inputs the recognition result obtained when voice and user's intention is inconsistent, affects The problem of using effect of speech recognition product.
The embodiment of the invention provides a kind of audio recognition methods, are applied to terminal, which comprises
Receive the voice messaging of input;
According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching threshold A speech recognition result;
Determine that the highest speech recognition result of matching degree is target voice identification at least one described speech recognition result As a result;
Obtain the corresponding file destination of the target voice recognition result;
Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface, Wherein the target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode Show.
A kind of possible implementation, it is described to obtain the corresponding file destination of the target voice recognition result, comprising:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding business of the target voice recognition result Type;
It is identified from the target voice is searched in resources bank in the corresponding type of service of the target voice recognition result As a result corresponding file destination.
A kind of possible implementation, it is described that semantics recognition is carried out to the target voice recognition result, determine the mesh Mark the corresponding type of service of speech recognition result, comprising:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;
According to the weight of the corresponding type of service of each participle, the corresponding service class of the target voice recognition result is determined Type.
A kind of possible implementation, it is described that each speech recognition result and the target voice recognition result is corresponding File destination is shown to display interface, comprising:
Determine the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The corresponding file destination of the target voice recognition result is shown on the display interface of the terminal.
A kind of possible implementation, it is described that each speech recognition result and the target voice recognition result is corresponding File destination is shown to display interface, further includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are with Two display modes are shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
A kind of possible implementation, the basis voice match model that training is completed in advance determine the voice letter Breath meets the speech recognition result of the first matching threshold, comprising:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group At all possible candidate word;
Possible chinese character sequence and Chinese character are determined by syntax rule and statistical method for each possible candidate word The score of sequence;
Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The embodiment of the invention provides a kind of speech recognition equipment, described device includes:
Transmit-Receive Unit, voice messaging for receiving input;
Processing unit, for determining the institute's predicate for meeting the first matching threshold according to voice match model trained in advance At least one speech recognition result of message breath;Determine that the highest voice of matching degree is known at least one described speech recognition result Other result is target voice recognition result;Obtain the corresponding file destination of the target voice recognition result;
Display unit, for showing each speech recognition result and the corresponding file destination of the target voice recognition result Show to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are with Two display modes are shown.
A kind of possible implementation, the processing unit are specifically used for:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding business of the target voice recognition result Type;From searching the target voice recognition result in the corresponding type of service of the target voice recognition result in resources bank Corresponding file destination.
A kind of possible implementation, the processing unit are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participle The weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, the processing unit are specifically used for: determining the preferential of each speech recognition result Grade;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The display unit, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminal Display interface on show.
A kind of possible implementation, the Transmit-Receive Unit are also used to: obtaining user to the target voice recognition result Switching command;
The processing unit, is also used to: according to the switching command, the target voice recognition result after determining change is corresponding File destination;
The display unit, is also used to: the target voice recognition result after change shown with the first display mode, Other speech recognition results are shown in a second display mode;The target voice recognition result after showing change simultaneously is corresponding File destination.
A kind of possible implementation, the processing unit are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The embodiment of the invention provides a kind of terminals, including processor, communication interface, memory and communication bus, wherein Processor, communication interface, memory complete mutual communication by communication bus;
It is stored with computer program in the memory, when described program is executed by the processor, so that the place It manages device and executes the step of any of the above-described is applied to the method for terminal.
The embodiment of the invention provides a kind of computer readable storage medium, it is stored with the computer that can be executed by terminal Program, when described program is run on the terminal, so that the terminal executes the method that any of the above-described is applied to terminal The step of.
The embodiment of the invention provides a kind of audio recognition method, device and terminals, which comprises receives input Voice messaging;According to voice match model trained in advance, the voice messaging of the first matching threshold of satisfaction is determined at least One speech recognition result;Determine that the highest speech recognition result of matching degree is target at least one described speech recognition result Speech recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and described The corresponding file destination of target voice recognition result is shown to display interface, wherein the target voice recognition result is aobvious with first The mode of showing shows that other speech recognition results are shown in a second display mode.By showing matching degree most to the first display mode High speech recognition result can quickly be shown, the convenient degree that user uses is improved;To at least one speech recognition result It carries out semantics recognition respectively, obtains more possible user search intents, and by each speech recognition result and at least one language Sound recognition result is shown by the second display mode to the display interface of the terminal, effectively provides more search for user As a result, effectively improving the coverage rate of speech recognition result and user's intention, success rate of the user by phonetic search is improved, is improved The using effect of speech recognition product.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of process schematic for audio recognition method that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of exemplary diagram of voice match model provided in an embodiment of the present invention;
Fig. 3 is a kind of process schematic of audio recognition method provided in an embodiment of the present invention;
Fig. 4 is a kind of process schematic that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 4 a is a kind of schematic diagram that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 5 is a kind of process schematic of audio recognition method provided in an embodiment of the present invention;
Fig. 6 is a kind of process schematic that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 7 is a kind of process schematic that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram that speech recognition result is shown provided in an embodiment of the present invention;
Fig. 9 is a kind of schematic diagram that speech recognition result is shown provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of speech recognition equipment provided in an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of server provided in an embodiment of the present invention;
Figure 12 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
The present invention will be describe below in further detail with reference to the accompanying drawings, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist All other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Speech recognition is to allow machine to receive, identify and understand voice signal, and filled and change corresponding digital signal into.Though Right speech recognition produces a large amount of application in many industries, but to realize that really man-machine exchange naturally also needs very much Work will be done, for example, needing bigger improvement at adaptive aspect, reach the requirement not influenced by accent, dialect and particular person. Sound-type in reality be it is various, male sound, female's sound and Tong Yin can be divided into for sound characteristic, in addition, very much The pronunciation of people has very big gap with standard pronunciation, this just needs to carry out the processing of accent and dialect.User is caused to input voice When obtained recognition result and user be intended to inconsistent problem.
After user carries out voice input, user needs to carry out transliteration according to everyday words there are more in the word of speech recognition Being obtained after change, such as: four big names help --- > four great classical masterpieces, land Yao know Ma Li --- > Distance tests a horse's stamina, weather forecast --- > day Gas is pre- quick-fried etc..Terminal with speech identifying function when carrying out voice recognition processing, due to polyphonic word and accent, dialect and Meaningless modal particle in particular person, continuous speech recognition etc. influences, and leads to the influence factor multiplicity of speech recognition, it is likely that no The case where can recognize that user wants as a result, leading to the intention of certain customers may cannot achieve, causing misrecognition, further Speech recognition system is affected in recognition speed, recognition efficiency, reduces user experience effect.
It is of the existing technology in order to solve the problems, such as, using television set as the scene of terminal for, the embodiment of the present invention mentions All schemes supplied, can be executed by terminal, can also be executed by server, can according to need setting, it is not limited here. As shown in Figure 1, comprising:
Step 101: receiving the voice messaging of input;
Wherein, terminal can obtain the voice messaging of user's input by the speech ciphering equipment of terminal, can also be by external Speech ciphering equipment obtain user input voice messaging;Specifically, it is provided with speech recognition module in terminal, can identifies voice Information simultaneously carries out voice messaging acquisition.
In addition, be provided with communication module in terminal, such as WIFI wireless communication module etc., enable the terminal and service Device connection, can be sent to server for collected voice messaging.It is of course also possible to all executed by terminal, it can also be only Transmitting portion needs the voice messaging of server process, it is not limited here.
Step 102: according to voice match model trained in advance, determining the voice messaging for meeting the first matching threshold At least one speech recognition result;
In the specific implementation process, voice match model can be set in terminal, also can be set on the server, herein Without limitation.If being set on server, server, which determines, meets at least the one of the voice messaging of the first matching threshold After a speech recognition result, at least one described speech recognition result is sent to terminal.
Step 103: determining that the highest speech recognition result of matching degree is target at least one described speech recognition result Speech recognition result;
In the specific implementation process, terminal can determine matching degree most according to the score of at least one speech recognition result High target voice recognition result, or server determines matching degree according to the score of at least one speech recognition result After highest target voice recognition result, target voice recognition result is sent to terminal.
Step 104: obtaining the corresponding file destination of the target voice recognition result;
In the specific implementation process, terminal can search in local or network resources system according to target voice recognition result The corresponding file destination of Suo Suoshu target voice recognition result, or according to target voice recognition result, in Internet resources The corresponding file destination of the target voice recognition result is searched in library, determines the corresponding file destination of target voice recognition result Afterwards, the identification information of file destination or file destination is sent to terminal, so that terminal determines the target voice recognition result Corresponding file destination.
Step 105: by each speech recognition result and the corresponding file destination of the target voice recognition result show to Display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are aobvious with second The mode of showing is shown.
In the specific implementation process, the second display mode can be the display mode opposite with the first display mode.For example, First display mode can be to highlight, having the display modes such as check boxes, and the second display mode can be highlighted, for nothing without choosing The display modes such as center.For example, target as shown in Figure 7 shows that result is the first display mode that check boxes are highlighted, the The mode that two display modes are highlighted for no check boxes.Specific display mode is it is not limited here.
A kind of audio recognition method provided in an embodiment of the present invention, by showing that matching degree is highest to the first display mode Speech recognition result can quickly be shown, the convenient degree that user uses is improved;At least one speech recognition result is distinguished Semantics recognition is carried out, obtains more possible user search intents, and each speech recognition result and at least one voice are known Other result is shown by the second display mode to the display interface of the terminal, effective to provide more search knots for user Fruit effectively improves the coverage rate of speech recognition result and user's intention, improves success rate of the user by phonetic search, improves language The using effect of sound identification product.
In the embodiment of the present invention, as shown in Fig. 2, a kind of method that speech recognition modeling determines speech recognition result is provided, Include:
Step 1: obtaining the voice messaging of user's input, the feature acoustics of the voice messaging is determined by acoustic feature Probability;
Specifically, acoustic feature extracts, i.e., speech acoustics feature information is extracted from voice messaging, in order to guarantee that identification is quasi- True rate, the extraction part should have preferable distinction to the modeling unit of acoustic model.Acoustics in the embodiment of the present invention Feature may include: mel cepstrum coefficients (MFCC), linear prediction residue error (LPCC), perception linear predictor coefficient (PLP) Deng.
Step 2: the voice messaging after acoustic feature is extracted is input in voice match model, voice match model packet Include language model and acoustic model;
For example, in embodiments of the present invention, the training process of the voice match model may include:
Step 1: obtaining sample voice information, the markup information of its affiliated voice is carried in the sample voice information;
Step 2: by each sample voice information input into voice match model;
Step 3: according to the output of each sample voice information and the voice match model, to the voice match mould Type is trained.
In order to facilitate the training of voice match model, a large amount of sample voice information can be collected, the sample voice information It can be terminal acquisition, or what other approach obtained;For sample voice information, can the sample voice information into Rower note.
By sample voice information input into the voice match model, which is trained, the model It can be to have the models such as dynamic time warping technology, hidden Markov model, artificial neural network, support vector machines.It will be each Sample voice information input is into the model, according to the output of the markup information of each sample voice information and voice match model As a result, being trained to the voice match model.
In the embodiment of the present invention, voice match model is obtained by being trained to great amount of samples voice messaging, and pass through The voice match model can carry out speech recognition to the voice messaging of acquisition.
Wherein, acoustic model is built using the acoustic model that training phonetic feature and its corresponding markup information carry out mark Mould.Acoustic model construct voice signal in observational characteristic and pronunciation modeling unit between mapping relations, with this carry out phoneme or The classification of phoneme state.In the embodiment of the present invention, acoustic model can be basic as the modeling of acoustic model using HMM.
Wherein, language model can be using under the speech recognition framework modeled based on statistical learning, and N-gram counts language Say model comprising a Markov chain indicates the generating process of word sequence, i.e., indicates the Probability p (W) for generating word sequence W Are as follows:
Wherein, wkIndicate word sequence in k-th of word, above formula as it can be seen that generate current word probability only with the word of front n-1 It is related;
In the embodiment of the present invention, the training of language model and evaluation index can use language model puzzlement degree (Perplexity, PP), its definition are the inverses of word sequence generating probability ensemble average, it may be assumed that
As it can be seen that language model is smaller to the expectation puzzlement degree for generating word sequence from formula, then the language model is given It is higher to the prediction accuracy for generating which kind of current word in the case where history word sequence, therefore the training objective of language model is exactly Minimize the puzzlement degree of training set corpus.
In the training process, the probability of each word occurred in training set corpus and related word combination is counted, and first with this For the relevant parameter of basic estimating language models.
However the number of related word combination is to increase with the vocabulary scale being likely to occur in geometry grade, counts all possibility The case where appearance, is simultaneously infeasible, and in the realistic case, training data be usually it is sparse, the general of appearance is combined between some words Rate very little did not occur even at all.For these problems, by drop power (Discounting) and can recall The methods of (Backing-off), and utilize recurrent neural network (Recurrent Neural Network, RNN) modeling language The method of model optimizes language model.
Step 3: obtaining the possible text of speech recognition after the result that speech model is obtained is input to decoder decoding This information.
The phonetic feature acoustics probability that is calculated in conjunction with acoustic model and general by the calculated language model of language model Rate analyzes most possible word sequence W' by relevant search algorithm using decoder, possible in voice messaging to export Text information.
In a step 102, the voice match model completed according to preparatory training determines that the voice messaging meets first Speech recognition result with threshold value, comprising:
Step 1: the voice messaging is input to the voice match model, the phonetic in the voice messaging is identified Sequence forms all possible candidate word;
In order to confirm correct character to each syllable, first according to the pinyin sequence of input, form all possible Character assumes or the word of single-tone byte, multisyllable is assumed.Such as: by taking list entries " video of Zheng Kai " as an example, corresponding phonetic Sequence is [zheng4, kai3, de1, shi4, pin2].As shown in figure 3, each path is a possible recognition result.
Step 2: determining possible chinese character sequence by syntax rule and statistical method for each possible candidate word And the score of chinese character sequence;
Specifically, multiple candidate words by each sound to be identified obtain Chinese character using grammar rule and Principle of Statistics The score of sequence, and correct the mistake of some phonetic identifications.Applied probability statistical language model is in character string or word sequence Search may correct path.Decoder in the embodiment of the present invention can use the Viterbi algorithm of Dynamic Programming Idea, and By certain algorithm (algorithm (Language is seen before Gauss selection algorithm (Gaussian Selection), language model Mode Look-Ahead) etc.) carry out quick synchronous probability calculating and search space cut, reduced by this in terms of Complexity and memory overhead are calculated, the efficiency for realizing searching algorithm is improved.
Step 3: score is met the chinese character sequence of the first matching threshold as institute's speech recognition result.
Specifically, at least one matched man's sequence of language model and score are ranked up.From the knot of template matching From the point of view of fruit, the high recognition result of matching degree identifies that correct probability is higher.There certainly exist due to not supplementing language in some model Although material causes Model Matching score higher, not the case where correct result.Therefore, matching score can be chosen to need to meet The speech recognition result of first threshold carries out semantics recognition.
Specifically, score needs to meet first threshold to get the speech recognition result divided greater than first threshold ρ as possible Correct recognition result.For example, recognition result is as follows:
Recognition result Matching score
The video of Zheng Kai 0.641
The video of Zheng Kai 0.629
The video of Chinese regular script 0.457
Just triumphant moral food 0.231
As shown above, first threshold can be 0.4;Then speech recognition result at this time are as follows: " video, the Zheng Kai of Zheng Kai Video, Chinese regular script video ".
A kind of possible implementation then takes when all recognition result Model Matching score values are respectively less than first threshold ρ The recognition result of highest scoring executes step 104, carries out semantic processes.
To improve user experience, effectively shows searching process, before step 103, target voice can also be identified and be tied Fruit is output to the interface of terminal and display.Wherein, the interface of terminal can be the client of the voice assistant of acquisition voice messaging Display interface, be also possible to other interfaces of terminal, it is not limited here.For example, can as shown in fig. 4 a, the mesh Marking speech recognition result is " video of Zheng Kai ".
As shown in figure 4, display recognition result process the following steps are included:
Step 1: the topology file at creation interface;
Wherein, the topology file includes the text control for showing speech recognition result.
Step 2: creation interface loads topology file, text control is initialized.
Step 3: the display interface in terminal shows speech recognition result, that is, the text information identified.
In order to effectively improve the accuracy and coverage of identification, it is stored with preset dictionary in server, the dictionary Include a large amount of corpus data in library, have the function of semantic parsing, after the voice messaging that Cloud Server judgement receives, utilizes The semantic parsing function of itself carries out semantic dissection process to the speech recognition result.Specifically, preserving semanteme in server Identification model, the semantics recognition model can identify the participle of voice messaging;Determine the participle in the voice messaging, identification point The semanteme of word determines the corresponding file destination of each semanteme.Certainly, the dictionary if desired retrieved is smaller, to improve parsing speed Rate, semantics recognition can be completed at the terminal, it is not limited here.
At step 104, comprising:
Step 1: carrying out semantics recognition to target voice recognition result, the corresponding business of target voice recognition result is determined Type;
If terminal executes semantics recognition, speech recognition result can be exported according to the semantics recognition model in terminal Participle, parses the semanteme of participle, and semantic corresponding annotation results, search in the annotation results whether include and type of service phase The type of service of pass.
If server executes semantics recognition, server is after the speech recognition result for receiving terminal transmission, according to service Semantics recognition model on device, exports the participle of speech recognition result, parses the semanteme of participle, and semantic corresponding mark knot Whether fruit searches in the annotation results comprising type of service relevant to type of service.
Step 2: from the target language is searched in the corresponding type of service of the target voice recognition result in resources bank The corresponding file destination of sound recognition result.
For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition model Process may include:
Step 1: carrying out word segmentation processing to target voice recognition result, and know to target voice according to preset dictionary Respectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle;
Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industry The mark of service type.
Step 2: determining the corresponding business of target voice recognition result according to the weight of the corresponding type of service of each participle Type.
To further increase recall precision, for target voice recognition result is removed, more than the speech recognition knot of first threshold Other speech recognition results in fruit can also be performed simultaneously aforesaid operations with target voice recognition result.It is of course also possible to After the switching command for receiving user, then aforesaid operations are executed, it is not limited here.
As shown in figure 5, specifically, may include:
Step 1: carrying out semantics recognition to each speech recognition result at least one described speech recognition result, really Determine the corresponding type of service of institute's speech recognition result;
Specifically, speech recognition result 1 is inputted into semantics recognition model, if wrapped in the result of semantics recognition model output The type of service 1 contained, then it is assumed that include type of service 1 in the speech recognition result 1, need in the corresponding application of type of service 1 Subsequent processing is executed in program.
For example, speech recognition result 1 is " video of Zheng Kai ", the word segmentation result of semantics recognition model output are as follows: Zheng Happy, video.Wherein, the type of service of video is video type, then the type of service of speech recognition result is video type.
A kind of possible implementation, type of service can also be determined according to the attribute of participle.For example, speech recognition knot Fruit 2 is " weather forecast ", the word segmentation result that semantics recognition model determines are as follows: weather, forecast;" weather " has Weather property (weatherKeys), it is determined that type of service is weather lookup type.
For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition model Process may include:
Step 1: carrying out word segmentation processing to institute's speech recognition result, and know to the voice according to preset dictionary Respectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle;
Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industry The mark of service type.
Step 2: determining the corresponding business of institute's speech recognition result according to the weight of the corresponding type of service of each participle Type.
A kind of possible implementation, the weight of the type of service be according to the type of service in the terminal The user of the user of the priority or terminal of the data bank in the participle institute source in priority, the preset dictionary is inclined In good at least one of determine.
For example, speech recognition result 3 is " video of Chinese regular script ", the word segmentation result that semantics recognition model determines are as follows: just Pattern, video.The type of service of video is video type;The type of service of Chinese regular script is education type;If it is determined that " video " is corresponding Video type weight be greater than " Chinese regular script " it is corresponding education type weight, it is determined that the type of service of speech recognition result 3 For video type.If it is determined that the weight of " video " the corresponding video type education weight of type corresponding with " Chinese regular script " is identical, The corresponding type of service of speech recognition result 3 can also be determined as educating type and video type.
For another example, speech recognition result 4 is " weather is pre- quick-fried ", the word segmentation result that semantics recognition model determines are as follows: weather is pre- It is quick-fried;According to preset dictionary determine weather it is pre- it is quick-fried be a film, corresponding type of service includes video type, types of songs Deng;Then according to the weight of " weather is pre- quick-fried " corresponding video type, and the weight of " weather is pre- quick-fried " corresponding types of songs, determine The type of service of speech recognition result 4.
In step 2, from searching at least one in the corresponding type of service of at least one speech recognition result in resources bank The corresponding file destination of a speech recognition result.
The mesh of Zheng Kai can be searched for from the video type in resources bank for speech recognition result 1 in conjunction with the example above Mark file.For speech recognition result 2, weather, the target text of forecast can be searched for from the weather lookup business in resources bank Part.For speech recognition result 3, can be searched for just from the video type or education type or education video type in resources bank The file destination of pattern.For speech recognition result 4, it is pre- weather can be searched for from the video type or types of songs in resources bank Quick-fried file destination.
In step 105, it specifically includes:
Step 1: determining the priority of each speech recognition result at least one described speech recognition result;
Specifically, showing search result in the form of TAB in conjunction with semantic analysis UI, show the sequence of result mainly according to heat It searches ranking and carries out TAB sequence.
Step 2: showing each speech recognition result according to the priority arrangement on the display interface of the terminal;
Its priority can be determining based on modes such as user's big data analysis, score and user preferences, it is not limited here.
Step 3: showing each speech recognition result and the corresponding file destination of the target voice recognition result to aobvious Show interface.
In the specific implementation process, as shown in Figure 6, comprising:
The corresponding TAB data of speech recognition result and file destination are converted to JSON data by semantics recognition module, transmission To the display module of terminal;
After the display module of terminal obtains the JSON data, corresponding speech recognition result and file destination are parsed;
Each speech recognition result and corresponding file destination are shown according to parsing result.
In conjunction with the example above, however, it is determined that ranking results are as follows: Zheng Kai > Zheng Kai > Chinese regular script then shows that result can be as shown in Figure 7.
A kind of possible implementation, the type of service that can not be determined for semantic analysis or can not determine corresponding target When file, then the speech recognition result is not shown to terminal.As " video of Chinese regular script " semanteme can not understand or resources bank in search Less than " Chinese regular script " calligraphy related content, then the speech recognition result is not shown to terminal.
In conjunction with the example above, however, it is determined that ranking results are as follows: weather forecast > weather is pre- quick-fried, then show result can such as Fig. 8 and Shown in Fig. 9.
Further, if user wants switching target voice recognition result, the switching of speech recognition result can be carried out, It specifically includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are with Two display modes are shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
Specifically, determine that the corresponding file destination of target voice recognition result after change can refer to above-described embodiment, Details are not described herein.
In order to further increase identification speech recognition accuracy rate, in embodiments of the present invention, the method also includes:
User is obtained to the operational order of institute's speech recognition result or file destination;
Increase the matching degree of the corresponding speech recognition result of the operational order or file destination, to update user preference.
For example, user selects " weather is pre- quick-fried " in display interface, then in the user preference of user, " weather is pre- for record It is quick-fried ", and increase the matching degree of " weather is pre- quick-fried ".
In order to further increase identification speech recognition accuracy rate, the embodiment of the present invention also provides a kind of possible reality Existing mode, comprising:
Judge whether the voice messaging includes the first control instruction controlled the terminal;
If the user speech information is the first control instruction controlled the terminal, in the terminal Execute first control instruction.
A kind of possible implementation, if in voice messaging also include action type participle, illustrate terminal it is necessary to Corresponding operating is carried out according to the voice messaging.At this time the finger handled according to the voice messaging directly can be sent to terminal It enables.For example, opening, viewing, the participle of the action types such as broadcasting.
A kind of possible implementation, in the semanteme of voice messaging whether comprising for terminal setting target control refer to Order is judged, if it is, executing first control instruction in the terminal.
For example, the speech recognition result of identification is " video for opening Zheng Kai ", then it can determine that the first control instruction is to beat It opens.
A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " is unique, then can directly execute and beat Open the video of Zheng Kai " file destination.
A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " has multiple, can first show multiple File destination executes open control instruction after the operational order for obtaining user.
The embodiment of the present invention, by identifying that the voice messaging ties the highest identification of matching score in voice match model Fruit shows user, while at least one speech recognition result for meeting the first matching threshold is carried out semantics recognition respectively, ties Semantic processes are closed as a result, can more fully understand that user is intended to by the different service search result of UI interactive display to user, It compared with audio recognition method in the prior art, is requested by multiple semantic analysis, realizes the search of homonym name service With show, user can select desired result according to intention.
Based on the same technical idea, the embodiment of the present invention provides a kind of speech recognition equipment 1000, as shown in Figure 10, packet It includes:
Transmit-Receive Unit 1001, voice messaging for receiving input;
Processing unit 1002, for determining the institute for meeting the first matching threshold according to voice match model trained in advance State at least one speech recognition result of voice messaging;Determine the highest language of matching degree at least one described speech recognition result Sound recognition result is target voice recognition result;Obtain the corresponding file destination of the target voice recognition result;
Display unit 1003, for each speech recognition result and the corresponding target of the target voice recognition result is literary Part is shown to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results It shows in a second display mode.
A kind of possible implementation, processing unit 1002 are specifically used for: carrying out language to the target voice recognition result Justice identification, determines the corresponding type of service of the target voice recognition result;It identifies and ties in the target voice from resources bank The corresponding file destination of the target voice recognition result is searched in the corresponding type of service of fruit.
A kind of possible implementation, processing unit 1002 are specifically used for: determining the preferential of each speech recognition result Grade;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
Display unit 1003, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminal Display interface on show.
A kind of possible implementation, Transmit-Receive Unit 1001 are also used to: obtaining user to the target voice recognition result Switching command;
Processing unit 1002, is also used to: according to the switching command, the target voice recognition result after determining change is corresponding File destination;
Display unit 1003, is also used to: the target voice recognition result after change shown with the first display mode, Other speech recognition results are shown in a second display mode;The target voice recognition result after showing change simultaneously is corresponding File destination.
A kind of possible implementation, processing unit 1002 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of servers 1100, as shown in figure 11, It include: processor 1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication connects Mouth 1102, memory 1103 complete mutual communication by communication bus 1104;
It is stored with computer program in the memory 1103, when described program is executed by the processor 1101, is made It obtains the processor 1101 and executes following steps:
According to the voice match model that preparatory training is completed, determine that the voice messaging meets the first matching threshold at least One speech recognition result;Determine that the highest speech recognition result of matching degree is target at least one described speech recognition result Speech recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and described The corresponding file destination of target voice recognition result shows to the display interface of terminal, wherein the target voice recognition result with First display mode shows that other speech recognition results are shown in a second display mode.
A kind of possible implementation, processor 1101 are specifically used for: carrying out to the target voice recognition result semantic Identification, determines the corresponding type of service of the target voice recognition result;In the target voice recognition result from resources bank The corresponding file destination of the target voice recognition result is searched in corresponding type of service.
A kind of possible implementation, processor 1101 are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participle The weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, processor 1101 are specifically used for: determining the priority of each speech recognition result; Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
A kind of possible implementation, processor 1101 are also used to the target voice according to switching command, after determining change The corresponding file destination of recognition result;The target voice recognition result after change is shown with the first display mode, other Speech recognition result is shown in a second display mode;The corresponding target of the target voice recognition result after showing change simultaneously File.The switching command is the user that is obtained by communication interface 1102 to the target voice recognition result.
A kind of possible implementation, processor 1101 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The communication bus that above-mentioned server is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface 1102 is for the communication between above-mentioned server and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit, network processing unit (Network Processor, NP) etc.;It can also be digital command processor (Digital Signal Processing, DSP), dedicated collection At circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hard Part component etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage medium Matter is stored with the computer program that can be executed by server in the computer readable storage medium, when described program is described When being run on server, so that the server realizes any method in above-described embodiment when executing.
Above-mentioned computer readable storage medium can be any usable medium that the processor in server can access or Data storage device, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memory are such as CD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid state hard disk (SSD) etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of terminals 1200, as shown in figure 12, packet It includes: processor 1201, communication interface 1202, memory 1203 and communication bus 1204, wherein processor 1201, communication interface 1202, memory 1203 completes mutual communication by communication bus 1204;
It is stored with computer program in the memory 1203, when described program is executed by the processor 1201, is made It obtains the processor 1201 and executes following steps:
According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching threshold A speech recognition result;Determine that the highest speech recognition result of matching degree is target language at least one described speech recognition result Sound recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and the mesh The corresponding file destination of mark speech recognition result is shown to display interface, wherein the target voice recognition result is with the first display Mode shows that other speech recognition results are shown in a second display mode.
A kind of possible implementation, processor 1201 are specifically used for: carrying out to the target voice recognition result semantic Identification, determines the corresponding type of service of the target voice recognition result;In the target voice recognition result from resources bank The corresponding file destination of the target voice recognition result is searched in corresponding type of service.
A kind of possible implementation, processor 1201 are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voice Respectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participle The weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, processor 1201 are specifically used for: determining the priority of each speech recognition result; Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;By the target language The corresponding file destination of sound recognition result is shown on the display interface of the terminal.
A kind of possible implementation, processor 1201 are also used to the target voice according to switching command, after determining change The corresponding file destination of recognition result;The target voice recognition result after change is shown with the first display mode, other Speech recognition result is shown in a second display mode;The corresponding target of the target voice recognition result after showing change simultaneously File.Wherein, switching command is that the user obtained by communication interface 1202 refers to the switching of the target voice recognition result It enables.
A kind of possible implementation, processor 1201 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, group At all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate word The score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The communication bus that above-mentioned terminal is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface 1202 is for the communication between above-mentioned terminal and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit, network processing unit (Network Processor, NP) etc.;It can also be digital command processor (Digital Signal Processing, DSP), dedicated collection At circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hard Part component etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage medium Matter is stored with the computer program that can be executed by terminal in the computer readable storage medium, when described program is at the end When being run on end, so that the terminal realizes any method in above-described embodiment when executing.
Above-mentioned computer readable storage medium can be any usable medium or number that the processor in terminal can access Such as according to storage equipment, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memory CD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid state hard disk (SSD) etc..
For systems/devices embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or an operation are distinguished with another entity or another operation, without necessarily requiring or implying these entities Or there are any actual relationship or orders between operation.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, the reality of complete hardware embodiment, complete Application Example or connected applications and hardware aspect can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of audio recognition method is applied to terminal, which is characterized in that the described method includes:
Receive the voice messaging of input;
According to voice match model trained in advance, at least one language for meeting the voice messaging of the first matching threshold is determined Sound recognition result;
Determine that the highest speech recognition result of matching degree is target voice recognition result at least one described speech recognition result;
Obtain the corresponding file destination of the target voice recognition result;
Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface, wherein The target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode.
2. the method as described in claim 1, which is characterized in that described to obtain the corresponding target of the target voice recognition result File, comprising:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding service class of the target voice recognition result Type;
From searching the target voice recognition result in the corresponding type of service of the target voice recognition result in resources bank Corresponding file destination.
3. method according to claim 2, which is characterized in that described to carry out semantic knowledge to the target voice recognition result Not, the corresponding type of service of the target voice recognition result is determined, comprising:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and identify to the target voice As a result respectively participle carries out semantics recognition in, determines the corresponding type of service of each participle;
According to the weight of the corresponding type of service of each participle, the corresponding type of service of the target voice recognition result is determined.
4. the method as described in claim 1, which is characterized in that described to know each speech recognition result and the target voice The corresponding file destination of other result is shown to display interface, comprising:
Determine the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The corresponding file destination of the target voice recognition result is shown on the display interface of the terminal.
5. method as claimed in claim 4, which is characterized in that described to know each speech recognition result and the target voice The corresponding file destination of other result is shown to display interface, further includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are aobvious with second The mode of showing is shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
6. such as method described in any one of claim 1 to 5, which is characterized in that the basis voice that training is completed in advance With model, determine that the voice messaging meets the speech recognition result of the first matching threshold, comprising:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, forms institute Possible candidate word;
Possible chinese character sequence and chinese character sequence are determined by syntax rule and statistical method for each possible candidate word Score;
Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
7. a kind of speech recognition equipment, which is characterized in that described device includes:
Transmit-Receive Unit, voice messaging for receiving input;Each voice at least one described speech recognition result is obtained to know The corresponding file destination of other result;
Processing unit, for according to voice match model trained in advance, determining the voice letter for meeting the first matching threshold At least one speech recognition result of breath;Determine the highest speech recognition knot of matching degree at least one described speech recognition result Fruit is target voice recognition result;
Display unit, for by each speech recognition result and the corresponding file destination of the target voice recognition result show to Display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are aobvious with second The mode of showing is shown.
8. device as claimed in claim 7, which is characterized in that the processing unit is specifically used for:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding service class of the target voice recognition result Type;From searching the target voice recognition result pair in the corresponding type of service of the target voice recognition result in resources bank The file destination answered.
9. a kind of terminal, which is characterized in that including processor, communication interface, memory and communication bus, wherein processor leads to Believe that interface, memory complete mutual communication by communication bus;
It is stored with computer program in the memory, when described program is executed by the processor, so that the processor Perform claim requires the step of any one of 1-6 the method.
10. a kind of computer readable storage medium, which is characterized in that it is stored with the computer that can be executed by terminal or server Program, when described program is run in the terminal or server, so that the terminal or server perform claim require 1-6 The step of any one the method.
CN201910211472.4A 2019-03-20 2019-03-20 A kind of audio recognition method, device and terminal Pending CN109976702A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910211472.4A CN109976702A (en) 2019-03-20 2019-03-20 A kind of audio recognition method, device and terminal
PCT/CN2019/106806 WO2020186712A1 (en) 2019-03-20 2019-09-19 Voice recognition method and apparatus, and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910211472.4A CN109976702A (en) 2019-03-20 2019-03-20 A kind of audio recognition method, device and terminal

Publications (1)

Publication Number Publication Date
CN109976702A true CN109976702A (en) 2019-07-05

Family

ID=67079603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910211472.4A Pending CN109976702A (en) 2019-03-20 2019-03-20 A kind of audio recognition method, device and terminal

Country Status (2)

Country Link
CN (1) CN109976702A (en)
WO (1) WO2020186712A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335606A (en) * 2019-08-07 2019-10-15 广东电网有限责任公司 A kind of voice interaction device for Work tool control
CN110427459A (en) * 2019-08-05 2019-11-08 苏州思必驰信息科技有限公司 Visualized generation method, system and the platform of speech recognition network
CN110931018A (en) * 2019-12-03 2020-03-27 珠海格力电器股份有限公司 Intelligent voice interaction method and device and computer readable storage medium
CN111192572A (en) * 2019-12-31 2020-05-22 斑马网络技术有限公司 Semantic recognition method, device and system
WO2020186712A1 (en) * 2019-03-20 2020-09-24 海信视像科技股份有限公司 Voice recognition method and apparatus, and terminal
CN112735394A (en) * 2020-12-16 2021-04-30 青岛海尔科技有限公司 Semantic parsing method and device for voice
CN112802474A (en) * 2019-10-28 2021-05-14 中国移动通信有限公司研究院 Voice recognition method, device, equipment and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050038657A1 (en) * 2001-09-05 2005-02-17 Voice Signal Technologies, Inc. Combined speech recongnition and text-to-speech generation
US20070055520A1 (en) * 2005-08-31 2007-03-08 Microsoft Corporation Incorporation of speech engine training into interactive user tutorial
CN101309327A (en) * 2007-04-16 2008-11-19 索尼株式会社 Sound chat system, information processing device, speech recognition and key words detectiion
CN101557651A (en) * 2008-04-08 2009-10-14 Lg电子株式会社 Mobile terminal and menu control method thereof
CN101557432A (en) * 2008-04-08 2009-10-14 Lg电子株式会社 Mobile terminal and menu control method thereof
CN101604520A (en) * 2009-07-16 2009-12-16 北京森博克智能科技有限公司 Spoken language voice recognition method based on statistical model and syntax rule
CN102867512A (en) * 2011-07-04 2013-01-09 余喆 Method and device for recognizing natural speech
CN103176591A (en) * 2011-12-21 2013-06-26 上海博路信息技术有限公司 Text location and selection method based on voice recognition
CN103176998A (en) * 2011-12-21 2013-06-26 上海博路信息技术有限公司 Read auxiliary system based on voice recognition
CN103811005A (en) * 2012-11-13 2014-05-21 Lg电子株式会社 Mobile terminal and control method thereof
CN105489220A (en) * 2015-11-26 2016-04-13 小米科技有限责任公司 Method and device for recognizing speech
CN105679318A (en) * 2015-12-23 2016-06-15 珠海格力电器股份有限公司 Display method and device based on speech recognition, display system and air conditioner
CN105869636A (en) * 2016-03-29 2016-08-17 上海斐讯数据通信技术有限公司 Speech recognition apparatus and method thereof, smart television set and control method thereof
CN106098063A (en) * 2016-07-01 2016-11-09 海信集团有限公司 A kind of sound control method, terminal unit and server
CN106356056A (en) * 2016-10-28 2017-01-25 腾讯科技(深圳)有限公司 Speech recognition method and device
CN109492175A (en) * 2018-10-23 2019-03-19 青岛海信电器股份有限公司 The display methods and device of Application Program Interface, electronic equipment, storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1021254A (en) * 1996-06-28 1998-01-23 Toshiba Corp Information retrieval device with speech recognizing function
CN109976702A (en) * 2019-03-20 2019-07-05 青岛海信电器股份有限公司 A kind of audio recognition method, device and terminal

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050038657A1 (en) * 2001-09-05 2005-02-17 Voice Signal Technologies, Inc. Combined speech recongnition and text-to-speech generation
US20070055520A1 (en) * 2005-08-31 2007-03-08 Microsoft Corporation Incorporation of speech engine training into interactive user tutorial
CN101309327A (en) * 2007-04-16 2008-11-19 索尼株式会社 Sound chat system, information processing device, speech recognition and key words detectiion
CN101557651A (en) * 2008-04-08 2009-10-14 Lg电子株式会社 Mobile terminal and menu control method thereof
CN101557432A (en) * 2008-04-08 2009-10-14 Lg电子株式会社 Mobile terminal and menu control method thereof
CN101604520A (en) * 2009-07-16 2009-12-16 北京森博克智能科技有限公司 Spoken language voice recognition method based on statistical model and syntax rule
CN102867512A (en) * 2011-07-04 2013-01-09 余喆 Method and device for recognizing natural speech
CN103176591A (en) * 2011-12-21 2013-06-26 上海博路信息技术有限公司 Text location and selection method based on voice recognition
CN103176998A (en) * 2011-12-21 2013-06-26 上海博路信息技术有限公司 Read auxiliary system based on voice recognition
CN103811005A (en) * 2012-11-13 2014-05-21 Lg电子株式会社 Mobile terminal and control method thereof
CN105489220A (en) * 2015-11-26 2016-04-13 小米科技有限责任公司 Method and device for recognizing speech
CN105679318A (en) * 2015-12-23 2016-06-15 珠海格力电器股份有限公司 Display method and device based on speech recognition, display system and air conditioner
CN105869636A (en) * 2016-03-29 2016-08-17 上海斐讯数据通信技术有限公司 Speech recognition apparatus and method thereof, smart television set and control method thereof
CN106098063A (en) * 2016-07-01 2016-11-09 海信集团有限公司 A kind of sound control method, terminal unit and server
CN106356056A (en) * 2016-10-28 2017-01-25 腾讯科技(深圳)有限公司 Speech recognition method and device
CN109492175A (en) * 2018-10-23 2019-03-19 青岛海信电器股份有限公司 The display methods and device of Application Program Interface, electronic equipment, storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020186712A1 (en) * 2019-03-20 2020-09-24 海信视像科技股份有限公司 Voice recognition method and apparatus, and terminal
CN110427459A (en) * 2019-08-05 2019-11-08 苏州思必驰信息科技有限公司 Visualized generation method, system and the platform of speech recognition network
CN110427459B (en) * 2019-08-05 2021-09-17 思必驰科技股份有限公司 Visual generation method, system and platform of voice recognition network
CN110335606A (en) * 2019-08-07 2019-10-15 广东电网有限责任公司 A kind of voice interaction device for Work tool control
CN110335606B (en) * 2019-08-07 2022-04-19 广东电网有限责任公司 Voice interaction device for management and control of tools and appliances
CN112802474A (en) * 2019-10-28 2021-05-14 中国移动通信有限公司研究院 Voice recognition method, device, equipment and storage medium
CN110931018A (en) * 2019-12-03 2020-03-27 珠海格力电器股份有限公司 Intelligent voice interaction method and device and computer readable storage medium
CN111192572A (en) * 2019-12-31 2020-05-22 斑马网络技术有限公司 Semantic recognition method, device and system
CN112735394A (en) * 2020-12-16 2021-04-30 青岛海尔科技有限公司 Semantic parsing method and device for voice

Also Published As

Publication number Publication date
WO2020186712A1 (en) 2020-09-24

Similar Documents

Publication Publication Date Title
CN109976702A (en) A kind of audio recognition method, device and terminal
US10811013B1 (en) Intent-specific automatic speech recognition result generation
CN108305634B (en) Decoding method, decoder and storage medium
US20170206897A1 (en) Analyzing textual data
CN105723449B (en) speech content analysis system and speech content analysis method
KR101309042B1 (en) Apparatus for multi domain sound communication and method for multi domain sound communication using the same
CN108305643B (en) Method and device for determining emotion information
KR102390940B1 (en) Context biasing for speech recognition
CN111933129B (en) Audio processing method, language model training method and device and computer equipment
US20220246149A1 (en) Proactive command framework
US11189277B2 (en) Dynamic gazetteers for personalized entity recognition
JP2021033255A (en) Voice recognition method, device, apparatus, and computer readable storage medium
CN108711420A (en) Multilingual hybrid model foundation, data capture method and device, electronic equipment
CN108428446A (en) Audio recognition method and device
CN108735201A (en) Continuous speech recognition method, apparatus, equipment and storage medium
CN104572631B (en) The training method and system of a kind of language model
US9922650B1 (en) Intent-specific automatic speech recognition result generation
CN108711421A (en) A kind of voice recognition acoustic model method for building up and device and electronic equipment
CN108899013A (en) Voice search method, device and speech recognition system
US11120799B1 (en) Natural language processing policies
CN109616096A (en) Construction method, device, server and the medium of multilingual tone decoding figure
CN111090727A (en) Language conversion processing method and device and dialect voice interaction system
CN109582788A (en) Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
Öktem et al. Attentional parallel RNNs for generating punctuation in transcribed speech
CN104750677A (en) Speech translation apparatus, speech translation method and speech translation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 266555 Qingdao economic and Technological Development Zone, Shandong, Hong Kong Road, No. 218

Applicant after: Hisense Video Technology Co., Ltd

Address before: 266555 Qingdao economic and Technological Development Zone, Shandong, Hong Kong Road, No. 218

Applicant before: HISENSE ELECTRIC Co.,Ltd.

CB02 Change of applicant information