CN105931642A - Speech recognition method, apparatus and system - Google Patents
Speech recognition method, apparatus and system Download PDFInfo
- Publication number
- CN105931642A CN105931642A CN201610375073.8A CN201610375073A CN105931642A CN 105931642 A CN105931642 A CN 105931642A CN 201610375073 A CN201610375073 A CN 201610375073A CN 105931642 A CN105931642 A CN 105931642A
- Authority
- CN
- China
- Prior art keywords
- user
- speech recognition
- voice
- speech
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000001514 detection method Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 241000272816 Anser cygnoides Species 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000001550 time effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a speech recognition method, a speech recognition apparatus and a speech recognition system. The method includes the following steps that: speech input of a user is obtained; a speech database is selected to recognize speech inputted by the user, and recognition outputs adopted as results are outputted; and domain determination is adopted to select one or more candidate optimal recognition outputs from the recognition outputs; and an optimal recognition output in the one or more candidate optimal recognition outputs is determined with the personality identification information of the user adopted as a determination condition. With the speech recognition method provided by the above technical schemes of the invention adopted, the accuracy of speech recognition can be improved under a condition that response time is not increased.
Description
Technical field
The present invention relates to field of speech recognition, be specifically related to a kind of audio recognition method, equipment and be
System.
Background technology
Along with the application popularization of smart machine, speech recognition system becomes the new means of Information application,
Meanwhile, speech recognition system is passed through, it is possible to achieve the Based Intelligent Control of equipment.
In the use of speech recognition system, Consumer's Experience becomes the emphasis that numerous system is focused on.
Accuracy rate for the application of speech recognition system, response time and judgement becomes Consumer's Experience lifting
Core content.And in current judgement form, mostly use specific data model to carry out voice
The judgement of data.This judgement form uses general system to carry out the judgement of all of voice environment.
And this judgement form will necessarily strengthen the live load of speech recognition, extend the response judgement time,
Thus reduce the experience of user.
In the art, common automatic speech recognition system (ASR) is by identifying automotive engine system
Carry out the identification of phonetic entry.The engine model of speech recognition system is generally by acoustic model and language
Speech model two parts composition, corresponds respectively to voice to the calculating of syllable probability and syllable to word probability
Calculating.Language model is broadly divided into rule model and statistical model two kinds, and it uses probability statistics
Method disclose in linguistic unit statistical law.Above-mentioned engine unit is judged by ken,
Complete the identification output of phonetic entry.
There is various ways can indicate by general-purpose system is increased specific user profile, thus carry out
The voice of particular range judges, thus improves response time, improves determination rate of accuracy.In this area
Common form is: setting for different dialects, the data base that accent form sets classifies, thus
Phonetic entry can be carried out system classification, it is achieved response faster in initial decision stage
Time.Volume above-mentioned data base is selected can increase specific message identification in form.This information
Mark can come from user side.Identification information can be by adding the speech input information of user
Work and get.Same identification information can obtain by other means, such as by using
The positional information at family, the signal source etc. of mobile device.Using above-mentioned information as the identification information of user
It is input in ASR system, thus assists the selected differentiation of the data of user, improve response time,
Reduce False Rate.
Although but above-mentioned form adds the identification information of user, but above-mentioned information is only logical
Cross for language form, the input of positional information, carry out help system and carry out the selected of language database.
This form is while reducing response time, in final recognition result output, and can not
Exported by the purposiveness obtaining relative users of the utilization of above-mentioned identification information, i.e. recognition efficiency
The highest.
It is thus desirable to a kind of recognition methods, it can improve in the case of promoting obtaining response time
The recognition efficiency of user.
Summary of the invention
In order to solve the problems referred to above, embodiments provide a kind of audio recognition method, equipment
And system, with under conditions of not increasing response time, improve the accuracy rate of speech recognition.
A scheme according to the present invention, it is provided that a kind of audio recognition method, including: obtain and use
The phonetic entry at family;Select speech database to identify the voice that user inputs, and export as knot
The identification output of fruit;Use field judges to select one or more candidate from described identification output
Optimal identification exports;And judge described one using the personal sign information of user as decision condition
Optimal identification during individual or multiple candidate's optimal identification export exports.
According to another aspect of the present invention, it is provided that a kind of speech recognition apparatus, including: voice obtains
Take unit, for obtaining the phonetic entry of user;Voice recognition unit, is used for selecting speech data
Storehouse identifies the voice that user inputs, and exports the identification output as result;First identifying unit,
Select one or more candidate's optimal identification defeated for using field to judge from described identification output
Go out;And second identifying unit, sentence as decision condition for the personal sign information using user
Optimal identification output in fixed the one or more candidate's optimal identification output.
Third program according to the present invention, it is provided that a kind of speech recognition system, including: above-mentioned
Speech recognition apparatus;And the client device communicated to connect with described speech recognition apparatus.
Such scheme carries out the second level outcome of speech recognition by the customizing messages mark using user
Judge, and this result of determination is exported as final result, it is achieved that speech recognition judges defeated
The multi-level output gone out, the judgement scope judging output simultaneously newly increased uses the output that field judges
Result is as input.Therefore, it can only retain a small amount of result and come for final judgement, therefore,
Such scheme can't increase the load of system, can on the premise of not reducing response time more
Judge the output result of speech recognition accurately.
Accompanying drawing explanation
By the detailed description below in conjunction with the accompanying drawings invention carried out, the features described above of the present invention will be made
Become apparent from advantage, wherein:
Fig. 1 is the indicative flowchart of audio recognition method according to an embodiment of the invention;
Fig. 2 provides and utilizes the native place information of user to carry out speech recognition according to embodiments of the invention
The flow chart of method;
Fig. 3 shows the flow chart of another audio recognition method according to embodiments of the present invention;
Fig. 4 is to illustrate according to an embodiment of the invention for realizing the voice knowledge of audio recognition method
The schematic block diagram of other equipment;And
Fig. 5 shows the schematic block diagram of speech recognition system according to embodiments of the present invention.
Detailed description of the invention
Below, the preferred embodiment of the present invention is described in detail with reference to the accompanying drawings.In the accompanying drawings, although
It is shown in different accompanying drawings, but identical reference is used for representing identical or similar assembly.
For clarity and conciseness, the detailed description being included in known function here and structure will be omitted,
To avoid making subject of the present invention unclear.
Fig. 1 shows the indicative flowchart of audio recognition method according to an embodiment of the invention.
As it is shown in figure 1, in step S01, obtain the phonetic entry of user.
In some instances, client device (such as, this client can being currently in use by user
The voice receiving unit of end equipment, such as mike etc.) obtain user phonetic entry.Then with this
The speech recognition apparatus that client device communications connects can obtain phonetic entry from client device.
Here, the client device that used of user can with the mobile phone of user, fixed terminal,
PDA (personal digital assistant), notebook computer, net book, panel computer etc., but this
Bright be not limited to this, but those skilled in the art can be used it is contemplated that any movement or non-moving
Equipment is used as client device.
Speech recognition apparatus described herein is referred to alternatively as server, high in the clouds in some implementations
Server, remote terminal etc., but the present invention is similarly not so limited to, the voice in the present invention is known
Other equipment can be used for realizing any equipment of inventive technique scheme, is mobile regardless of it
Or the most non-moving, what is regardless of its title in implementing.
In some instances, the voice messaging of user can be by unit such as the mikes of client device
It is read out.The voice messaging of this user can be converted into electronic signal and store, such as,
User can carry out phonetic entry by the microphone system of electronic equipment: " broadcasting musical play ", " play
Opera ", " I wants to listen Shaoxing opera " etc..
The most in some instances, such as in the case of speech recognition apparatus is positioned at user this locality,
Can not also use client device, user can be at speech recognition apparatus (such as, its mike)
Place directly inputs voice.
At step S02, select speech database to identify the voice that user inputs, and export work
Identification for result exports.
In some instances, optional speech database to be used, and according to selected voice
Data base, utilizes acoustic model and the language model etc. of speech recognition engine to carry out the identification of voice,
And export recognition result.
In step S03, use field judges to select one or more candidate from identification output
Optimal identification exports.
Can judge that selecting most preferred candidate to export result exports by field.In the output may be used
Comprise multiple output result to be selected;The most multiple results to be selected can be that " I wants to listen more
Acute ", multiple results such as " I wants to listen Guangdong opera ".Certainly, in some cases, it is also possible to only export one
Individual output result.
Alternatively, in step S04, the personal sign information of detection user.
This step can perform between step S03 and step S05 that next will elaborate,
But the invention is not restricted to this, this step also can perform the whenever execution before step S05.
Such as, in the case of user is used for multiple times this speech recognition apparatus, it is also possible to user is at it in storage
The personal sign information detected during front use speech recognition apparatus, and in this identifies, use institute
The personal sign information of storage.
Personal sign information such as can include the used movement of the geographical location information of user, user
The current connecting signal source of equipment, the native place of user and other abilities of personalizable mark user
The information that field technique personnel are known.The geographical location information of user can obtain in several ways
Take.The collection of this information can be the combination using various ways, or individually uses a kind of mode to carry out
Obtain, such as, may include that connecting IP address by the network of user obtains, such as when with
Family uses the intelligent sound equipment connecting cloud server, can pass through the detection of user network information,
The address obtaining user is " Shaoxing, Zhejiang Province city ";Or can be by the mobile device of user
The base station location being associated is determined;The GPS system of the mobile device of user can also be passed through,
The Geographic mapping carrying out user obtains.The one in above-mentioned multiple acquisition mode can be used,
The combination in any that can also use multiple acquisition mode avoids erroneous judgement (such as when Internet user makes
When using proxy server, it is difficult to judged the position of user by the network information).
In step S05, judge this using the personal sign information of user as decision condition
Or the optimal identification output in the output of multiple candidate's optimal identification.
Using the personal sign information of above-mentioned user as decision condition, come multiple candidate's optimal identification
Output judges, further by the retrieval of little scope, it is determined that above-mentioned multiple candidates are optimum
Identify optimal optimal identification output in output.Such as, the candidate determined in step S03 is
Excellent identification output is " I wants to listen Guangdong opera " and " I wants to listen Shaoxing opera ", and such as by above-mentioned steps S04
The geography information position of user obtained is " Shaoxing, Zhejiang Province city ", then by using above-mentioned information as
Decision condition, carries out the inspection of low sample size to the candidate's optimal identification output determined in step S03
Rope, may thereby determine that output result is " I wants to listen Shaoxing opera ".
Therefore, improve by user personality identification information and the relatedness identified between output
The accuracy rate identified.And in said method, in step S05, only in little scope identification field
In carry out identification decision again, therefore, this judgement form will not carry on overall response time
Carrying out excessive load, therefore, aforesaid way ensure that at response time substantially without the premise increased
Under, improve the discrimination of user speech input, thus obtain higher Consumer's Experience.
In another example, if user inputs " I wants to listen swan goose " by intelligent voice system,
Judged by system, it can be deduced that what probability was bigger is combined as " swan goose " or " red gorgeous ", and
Above-mentioned both can be as the output form of multiple optimum combination, and in final selected form data
Storehouse increases personalized identification and carries out system judgement, according to acquired different personalized identification,
It is eventually led to different result output, thus improves the experience of user largely, precisely know
Other user's request.
And when the input information of user clearly guides such as geography information position, in above-mentioned multiple times
After selecting optimal result to judge output, to avoid the retrieval form of small sample, but user can be known
Other geography information indicates directly as judging information, compares output with multiple optimal solutions, from
And faster obtain the output of result: such as user's input " Chaoyang weather ", then in multiple courts of output
In sun area, the geography information mark identified by user is selected.Aforesaid way is the simplest
Change recognition mode, but aforesaid way, only it is limited to multiple optimal solution and is all directed to identical
Under conditions of personal sign information (such as geography information).
Also likely to be present the candidate of output in step S03 and identify that the number of output is only the situation of 1.?
In the case of Gai, can bypass the process of step S05.But in some other example, it is possible to so that
With step S05 come this candidate of determination step S03 identify output if appropriate for, and abandon substantially
Unaccommodated identification exports, and again points out user input voice.
Judging that optimal identification output is last, in step S06, exporting the identification output of this optimum.
At this, the adoptable way of output may include but be not limited to sound, image, text or this area use
Exporting other any modes of information, this is not limited by the present invention.
Above to the description of technical scheme uses the geographical location information of user as user
The example of property identification information, but other personal sign information can also be used.The nationality of such as user
Pass through information etc..
In the case of the native place information using user, the dialect that can be inputted by user speech, mouth
Sound judges, so that it is determined that user native place information.Fig. 2 provides according to embodiments of the invention
The native place information utilizing user carries out the flow chart of the method for speech recognition.
When step S01 shown in Fig. 2 obtains the phonetic entry of user, can be by acquired user
The dialect of speech recognition user and/or accent attribute, to judge the native place information (step of user
S07)。
After obtaining above-mentioned native place information, use this native place information as user's in step S05
Personal sign information carries out the judgement of optimum output result.
Such as in step S02, the dialect attribute of above-mentioned voice can be judged by speech recognition system,
Result of determination e.g. " Zhejiang dialect ".
Then can using above-mentioned " Zhejiang dialect ", attribute be as decision condition, to step in step S05
The multiple candidate's optimal result selected in rapid S03 judge further.Such as in required judgement
When candidate's optimal result is " I wants to listen Shaoxing opera " and " I wants to listen Guangdong opera ", then can pass through decision condition " Zhejiang
Jiang Fangyan ", it is determined that it is " I wants to listen Shaoxing opera " for final output result.
The above-mentioned mode performing to judge as the personal sign information of user using native place information is permissible
It is avoided by equipment association and obtains the misjudgment that geography information mark is caused, such as, work as user
It it is native place, Zhejiang and issuable when currently using above-mentioned speech recognition apparatus in the case of Guangdong
Mistake.
Below use geographical location information and native place information conduct are described with reference to Fig. 1 and Fig. 2 respectively
The situation of the personal sign information of user.But in some instances, it is also possible to by above two feelings
Condition combines to obtain result of determination more accurately.For example, it is possible to by the native place information of user and
The geographical location information of user combines and uses as personal sign information.
Specific embodiment three, detailed description of the invention three are to be combined with two by above-described embodiment one
Form, wherein it is possible in concrete mode, the comprehensive judgement using native place information and geographical position
The judgement of confidence breath, compares the two result of determination, and using comparison result as S05
In judgement identification information.Such as, the result of determination identical (e.g., being all Zhejiang) at both
In the case of, using this result of determination as judging identification information.But in further embodiments, as
Really both result of determination are different, such as can give native place according to default or user setup and judge
Or geographical position judges higher priority.Or in further embodiments, have more
In the case of property identification information, it is possible to combining this more personal sign information judges, example
As, distribute different weights for different identification informations, and select the result of determination of PTS maximum.
Any other use that technical scheme can use those skilled in the art to be readily apparent that is many
Plant the decision method of different personal sign information, do not repeat them here.
Above example only uses in the judgement of step S05 the personal sign information of user, but
In some instances, it is also possible to use the personal sign of user to believe in the speech recognition of step S02
Breath.Fig. 3 shows the flow chart of another audio recognition method according to embodiments of the present invention.
As it is shown on figure 3, in step S01, obtain the phonetic entry of user.
In a subsequent step, the personal sign information of user is detected.Such as can be according to user's
The native place information of phonetic entry detection user, or detect the geographical position letter of user by other means
Breaths etc., this is not limited by the present invention.Certainly, as it was previously stated, this detecting step can make
With the whenever execution before personal sign information (in this example, before step S02).
In some cases, it might even be possible to use the personal sign information obtained before stored.
Then, in the speech recognition steps of step S02, using above-mentioned personal sign information as step
The standard that in rapid S02, data base selects, to accelerate the carrying out of speech recognition;
In subsequent step, make to carry out in the same way data identification, and in S05, again
Using above-mentioned personal sign information, carry out the judgement of small sample, final precisely acquisition exports data.
Employing personal sign information in the examples described above for twice, above-mentioned identification information makes for the first time
Effect in is by voice and judges that the selection of data base is (such as by specific geography information mark
Know the data base selecting to use in speech recognition), and second time uses geography information mark to be to use
Candidate's optimal result selects suitably judge output, even if because have selected suitable voice
Data base, there will be equally and inappropriate output information according to probabilistic combination, therefore, it can lead to
The personal sign information (such as native place information, or above-mentioned geography information mark) crossing user is entered
The screening of row optimal result.
Fig. 4 is to illustrate according to an embodiment of the invention for realizing the language of above-mentioned audio recognition method
The schematic block diagram of sound identification equipment.As shown in Figure 4, this speech recognition apparatus can include that voice obtains
Unit 410, for obtaining the phonetic entry of user;Voice recognition unit 420, is used for selecting voice
Data base identifies the voice that user inputs, and exports the identification output as result;First judges
Unit 430, is used for using field to judge from identification output and selects one or more candidate's optimum to know
Do not export;And second identifying unit 440, for using the personal sign information of user as judging bar
Part judges the optimal identification output in the output of these one or more candidate's optimal identification.
In some instances, voice recognition unit 420 can be additionally used in: according to selected voice number
According to storehouse, utilize the acoustic model of speech recognition engine and language model to identify the voice that user inputs.
In some instances, this speech recognition apparatus may also include that information detecting unit 450, uses
Personal sign information in detection user.
In some instances, this speech recognition apparatus may also include memorizer 460, is used for storing letter
The personal sign information that breath detector unit 450 detects.Additionally, this memorizer can also storaged voice
Any data that identification equipment is used when carrying out speech recognition, the most above-mentioned speech database
Deng, this is not limited by the present invention.
The personal sign information of heretofore described user can include user geographical location information,
One or more in the current connecting signal source of the used mobile device of user and the native place of user.
And as it was previously stated, in the present invention personal sign information of user be not limited to this, but can be ability
Territory is known any information of user for personalized placemarks.
In some instances, information detecting unit 450 is additionally operable to: by carrying out user's input
Identify during the identification of voice that the dialect of user and/or accent attribute obtain the native place of user.
In some instances, voice recognition unit 420 is additionally operable to: use the personal sign letter of user
Breath selects the speech database for speech recognition.
Speech recognition apparatus according to embodiments of the present invention is described above with the form of module/unit
Schematic block diagram.It is noted, however, that one or more in this module/unit can be led to
Cross one or more particular hardware to realize.Additionally, Fig. 4 is merely to explain technical scheme
And use schematic block diagram.In actual realization, it is also possible to include more or less
Module/unit.Such as, in some implementations, it is also possible to include setting for the output exporting information
Standby, such as speaker, display etc..And in some implementations, it is also possible to include various storage device,
Data/program required in technical scheme or produced data/journey is realized to be stored in
Sequences etc., the present invention is not limited except as.
Fig. 5 shows the schematic block diagram of speech recognition system according to embodiments of the present invention.Such as figure
Shown in 5, this speech recognition system includes according to the speech recognition system cloud server shown in Fig. 4
(or referred to as speech recognition apparatus) and the client voice intelligence with speech recognition apparatus communication connection
Can equipment (or referred to as client device).As it was previously stated, when user coexists with speech recognition apparatus
During one ground, it is also possible to omit client device.User directly can input language at speech recognition apparatus
Sound.
The voice recognition processing of the speech recognition apparatus shown in Fig. 5 and reference Fig. 1, Fig. 2 and Tu
3 process described are identical, do not repeat them here.
Further, it should be noted that technical scheme described in the embodiment of the present invention is not being conflicted
In the case of can be in any combination.
In several embodiments provided by the present invention, it should be understood that disclosed method and setting
Standby, can realize by another way.Apparatus embodiments described above is only schematically
, such as, the division of described unit, be only a kind of logic function and divide, actual can when realizing
To have other dividing mode, such as: multiple unit or assembly can be in conjunction with, or are desirably integrated into another
One system, or some features can ignore, or do not perform.It addition, shown or discussed is each
Ingredient coupling each other or direct-coupling or communication connection can be to be connect by some
Mouthful, equipment or the INDIRECT COUPLING of unit or communication connection, can be electrical, machinery or other
Form.
The above-mentioned unit illustrated as separating component can be or may not be physically separate,
The parts shown as unit can be or may not be physical location, i.e. may be located at one
Local, it is also possible to be distributed on multiple NE;Can select therein according to the actual needs
Partly or entirely unit realizes the purpose of the present embodiment scheme.
It addition, each functional unit in various embodiments of the present invention can be fully integrated into one second
In processing unit, it is also possible to be that each unit is individually as a unit, it is also possible to two or two
Individual above unit is integrated in a unit;Above-mentioned integrated unit both can be to have used the form of hardware
Realizing, electricity can realize with the form using hardware to add SFU software functional unit.
Above description is only used for realizing embodiments of the present invention, and those skilled in the art should
Understand, in any modification or partial replacement without departing from the scope of the present invention, all should belong to this
The scope that bright claim limits, therefore, protection scope of the present invention should be with claim
The protection domain of book is as the criterion.
Claims (13)
1. an audio recognition method, including:
Obtain the phonetic entry of user;
Select speech database to identify the voice that user inputs, and it is defeated to export the identification as result
Go out;
Use field judges to select one or more candidate's optimal identification from described identification output
Output;And
The one or more candidate is judged as decision condition using the personal sign information of user
Optimal identification output in optimal identification output.
Audio recognition method the most according to claim 1, wherein, described selection speech data
Storehouse identifies that the voice that user inputs includes:
According to selected speech database, utilize acoustic model and the language mould of speech recognition engine
Type identifies the voice that user inputs.
Audio recognition method the most according to claim 1, also includes:
The described personal sign information of detection user.
Audio recognition method the most according to claim 3, wherein, the described individual character mark of user
What knowledge information included the used mobile device of the geographical location information of user, user currently connects signal
One or more in the native place of source and user.
Audio recognition method the most according to claim 4, wherein, by carry out user defeated
Identify during the identification of the voice entered that the dialect of user and/or accent attribute are to obtain the nationality of described user
Pass through.
Audio recognition method the most according to claim 1, also includes:
The described personal sign information using user selects the speech database for speech recognition.
7. a speech recognition apparatus, including:
Voice acquiring unit, for obtaining the phonetic entry of user;
Voice recognition unit, for selecting speech database to identify the voice that user inputs and defeated
Go out the identification as result to export;
First identifying unit, be used for using field judge to come from described identify output selects one or
Multiple candidate's optimal identification export;And
Second identifying unit, judges institute for the personal sign information using user as decision condition
State the optimal identification output in the output of one or more candidate's optimal identification.
Speech recognition apparatus the most according to claim 7, wherein, described voice recognition unit
It is additionally operable to:
According to selected speech database, utilize acoustic model and the language mould of speech recognition engine
Type identifies the voice that user inputs.
Speech recognition apparatus the most according to claim 7, also includes:
Information detecting unit, for detecting the described personal sign information of user.
Speech recognition apparatus the most according to claim 9, wherein, the described individual character mark of user
What knowledge information included the used mobile device of the geographical location information of user, user currently connects signal
One or more in the native place of source and user.
11. speech recognition apparatus according to claim 10, wherein, described infomation detection list
Unit is additionally operable to: by identifying dialect and/or the mouth of user when carrying out the identification of voice of user's input
Sound attribute obtains the native place of described user.
12. speech recognition apparatus according to claim 7, wherein, described voice recognition unit
It is additionally operable to:
The described personal sign information using user selects the speech database for speech recognition.
13. 1 kinds of speech recognition systems, including:
According to the speech recognition apparatus according to any one of claim 7 to 12;And
The client device communicated to connect with described speech recognition apparatus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610375073.8A CN105931642B (en) | 2016-05-31 | 2016-05-31 | Voice recognition method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610375073.8A CN105931642B (en) | 2016-05-31 | 2016-05-31 | Voice recognition method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105931642A true CN105931642A (en) | 2016-09-07 |
CN105931642B CN105931642B (en) | 2020-11-10 |
Family
ID=56832830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610375073.8A Active CN105931642B (en) | 2016-05-31 | 2016-05-31 | Voice recognition method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105931642B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107464115A (en) * | 2017-07-20 | 2017-12-12 | 北京小米移动软件有限公司 | personal characteristic information verification method and device |
CN107785021A (en) * | 2017-08-02 | 2018-03-09 | 上海壹账通金融科技有限公司 | Pronunciation inputting method, device, computer equipment and medium |
CN108206020A (en) * | 2016-12-16 | 2018-06-26 | 北京智能管家科技有限公司 | A kind of audio recognition method, device and terminal device |
CN109101475A (en) * | 2017-06-20 | 2018-12-28 | 北京嘀嘀无限科技发展有限公司 | Trip audio recognition method, system and computer equipment |
CN110517660A (en) * | 2019-08-22 | 2019-11-29 | 珠海格力电器股份有限公司 | Noise reduction method and device based on embedded Linux real-time kernel |
CN111353091A (en) * | 2018-12-24 | 2020-06-30 | 北京三星通信技术研究有限公司 | Information processing method and device, electronic equipment and readable storage medium |
CN112363631A (en) * | 2019-07-24 | 2021-02-12 | 北京搜狗科技发展有限公司 | Input method, input device and input device |
US11302313B2 (en) | 2017-06-15 | 2022-04-12 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for speech recognition |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1842842A (en) * | 2003-08-29 | 2006-10-04 | 松下电器产业株式会社 | Method and apparatus for improved speech recognition with supplementary information |
CN103037117A (en) * | 2011-09-29 | 2013-04-10 | 中国电信股份有限公司 | Method and system of voice recognition and voice access platform |
CN103578469A (en) * | 2012-08-08 | 2014-02-12 | 百度在线网络技术(北京)有限公司 | Method and device for showing voice recognition result |
CN103903611A (en) * | 2012-12-24 | 2014-07-02 | 联想(北京)有限公司 | Speech information identifying method and equipment |
CN103956169A (en) * | 2014-04-17 | 2014-07-30 | 北京搜狗科技发展有限公司 | Speech input method, device and system |
CN105070288A (en) * | 2015-07-02 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Vehicle-mounted voice instruction recognition method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104836720B (en) * | 2014-02-12 | 2022-02-25 | 北京三星通信技术研究有限公司 | Method and device for information recommendation in interactive communication |
-
2016
- 2016-05-31 CN CN201610375073.8A patent/CN105931642B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1842842A (en) * | 2003-08-29 | 2006-10-04 | 松下电器产业株式会社 | Method and apparatus for improved speech recognition with supplementary information |
CN103037117A (en) * | 2011-09-29 | 2013-04-10 | 中国电信股份有限公司 | Method and system of voice recognition and voice access platform |
CN103578469A (en) * | 2012-08-08 | 2014-02-12 | 百度在线网络技术(北京)有限公司 | Method and device for showing voice recognition result |
CN103903611A (en) * | 2012-12-24 | 2014-07-02 | 联想(北京)有限公司 | Speech information identifying method and equipment |
CN103956169A (en) * | 2014-04-17 | 2014-07-30 | 北京搜狗科技发展有限公司 | Speech input method, device and system |
CN105070288A (en) * | 2015-07-02 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Vehicle-mounted voice instruction recognition method and device |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108206020A (en) * | 2016-12-16 | 2018-06-26 | 北京智能管家科技有限公司 | A kind of audio recognition method, device and terminal device |
US11302313B2 (en) | 2017-06-15 | 2022-04-12 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for speech recognition |
CN109101475A (en) * | 2017-06-20 | 2018-12-28 | 北京嘀嘀无限科技发展有限公司 | Trip audio recognition method, system and computer equipment |
CN107464115A (en) * | 2017-07-20 | 2017-12-12 | 北京小米移动软件有限公司 | personal characteristic information verification method and device |
CN107785021A (en) * | 2017-08-02 | 2018-03-09 | 上海壹账通金融科技有限公司 | Pronunciation inputting method, device, computer equipment and medium |
CN111353091A (en) * | 2018-12-24 | 2020-06-30 | 北京三星通信技术研究有限公司 | Information processing method and device, electronic equipment and readable storage medium |
CN112363631A (en) * | 2019-07-24 | 2021-02-12 | 北京搜狗科技发展有限公司 | Input method, input device and input device |
CN110517660A (en) * | 2019-08-22 | 2019-11-29 | 珠海格力电器股份有限公司 | Noise reduction method and device based on embedded Linux real-time kernel |
Also Published As
Publication number | Publication date |
---|---|
CN105931642B (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105931642A (en) | Speech recognition method, apparatus and system | |
CN107210033B (en) | Updating language understanding classifier models for digital personal assistants based on crowd sourcing | |
US20230072352A1 (en) | Speech Recognition Method and Apparatus, Terminal, and Storage Medium | |
US11600259B2 (en) | Voice synthesis method, apparatus, device and storage medium | |
US10032454B2 (en) | Speaker and call characteristic sensitive open voice search | |
CN102782751B (en) | Digital media voice tags in social networks | |
CN103974109B (en) | Speech recognition apparatus and for providing the method for response message | |
CN110415679A (en) | Voice error correction method, device, equipment and storage medium | |
CN108447471A (en) | Audio recognition method and speech recognition equipment | |
CN104335160A (en) | Function execution instruction system, function execution instruction method, and function execution instruction program | |
CN112530408A (en) | Method, apparatus, electronic device, and medium for recognizing speech | |
CN103474065A (en) | Method for determining and recognizing voice intentions based on automatic classification technology | |
CN106959999A (en) | Voice search method and device | |
CN111128134A (en) | Acoustic model training method, voice awakening method, device and electronic equipment | |
WO2022135496A1 (en) | Voice interaction data processing method and device | |
CN104252464A (en) | Information processing method and information processing device | |
CN106649410B (en) | Method and device for obtaining chat reply content | |
CN102497481A (en) | Method, device and system for voice dialing | |
CN109710949A (en) | A kind of interpretation method and translator | |
CN105609105A (en) | Speech recognition system and speech recognition method | |
CN110085217A (en) | Phonetic navigation method, device and terminal device | |
KR102312993B1 (en) | Method and apparatus for implementing interactive message using artificial neural network | |
CN107885845B (en) | Audio classification method and device, computer equipment and storage medium | |
CN109637536A (en) | A kind of method and device of automatic identification semantic accuracy | |
CN111625636A (en) | Man-machine conversation refusal identification method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190312 Address after: 100086 8th Floor, 76 Zhichun Road, Haidian District, Beijing Applicant after: Beijing Jingdong Shangke Information Technology Co., Ltd. Applicant after: Iflytek Co., Ltd. Address before: Room C-301, 3rd floor, No. 2 Building, 20 Suzhou Street, Haidian District, Beijing 100080 Applicant before: BEIJING LINGLONG TECHNOLOGY CO., LTD. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |