CN106157956A - The method and device of speech recognition - Google Patents
The method and device of speech recognition Download PDFInfo
- Publication number
- CN106157956A CN106157956A CN201510130636.2A CN201510130636A CN106157956A CN 106157956 A CN106157956 A CN 106157956A CN 201510130636 A CN201510130636 A CN 201510130636A CN 106157956 A CN106157956 A CN 106157956A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- user
- information
- candidate
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000033001 locomotion Effects 0.000 claims description 44
- 230000001815 facial effect Effects 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 description 16
- 238000012545 processing Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 14
- 230000009471 action Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 238000003672 processing method Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000008447 perception Effects 0.000 description 5
- 238000012790 confirmation Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 210000001097 facial muscle Anatomy 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 244000287680 Garcinia dulcis Species 0.000 description 1
- 244000283207 Indigofera tinctoria Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses the method and device of a kind of speech recognition, wherein, the method obtains the voice recognition information of user's current speech, and assists in identifying information based on what user's current state corresponding with user's current speech obtained this voice recognition information;According to voice recognition information with assist in identifying information and determine the final recognition result of user's current speech.Solved in correlation technique by the present invention and only obtain the speech content of user by the sound of user and cause the problem that the accuracy of speech recognition is the highest, and then improve the accuracy of speech recognition.
Description
Technical field
The present invention relates to the communications field, in particular to the method and device of a kind of speech recognition.
Background technology
Speech recognition technology, along with computer and the development of relevant software and hardware technology, is the most more and more applied in every field,
Its discrimination is also constantly improving.Under the specified conditions such as environment quiet, pronunciation standard, apply at present in speech recognition
The discrimination of input writing system has reached more than 95%.Regular speech identification technology comparative maturity, for mobile whole
The speech recognition of end, owing to voice quality is relatively poor relative to normal speech identification scene, therefore speech recognition effect is subject to
To limiting.Here voice quality is very poor includes that following reason, such as client are had powerful connections noise, client voice collecting
Equipment, the noise of verbal system, the noise of communication line and interference, band of itself speaking is also had to have an accent or the side of employing
Speech, speaker itself speak ambiguous or unclear etc..All of these factors taken together is all likely to result in speech recognition effect and is deteriorated.
Its discrimination is affected by several factors, low for phonetic recognization rate in correlation technique and cause user experience difference
Problem, the most not yet proposes effective solution.Onboard or noise is relatively big, pronounce non-type in the case of, it is known
Not rate will be had a greatly reduced quality, to such an extent as to be unable to reach real practical purpose.Its correct recognition rata is low, impact accurately manipulation, effect
The most not ideal enough.If other method can be used to carry out auxiliary judgment to improve the accuracy rate of its speech recognition, then speech recognition
Practicality will significantly improve.
The language acknowledging process of the mankind is a multichannel perception.During the daily exchange of person to person, pass through
Sound carrys out the content of other people speech of perception, when noisy environment or the other side pronounce smudgy, in addition it is also necessary to eye observation its
The shape of the mouth as one speaks, the change of expression etc., the content that the other side is said could be understood exactly.Existing speech recognition system have ignored language
This one side of the visual characteristic of speech perception, with only single auditory properties so that existing speech recognition system is being made an uproar
Under the conditions of acoustic environment or loquacity person, its discrimination is all remarkably decreased, and reduces the practicality of speech recognition, and range of application is also
Restricted.
For in correlation technique, only cause the accuracy of speech recognition not by the speech content of the sound acquisition user of user
High problem, does not also propose effective solution.
Summary of the invention
The invention provides the method and device of a kind of speech recognition, at least to solve in correlation technique only by the sound of user
Sound obtains the speech content of user and causes the problem that the accuracy of speech recognition is the highest.
According to an aspect of the invention, it is provided a kind of method of speech recognition, including: obtain user's current speech
Voice recognition information, and obtain described speech recognition letter based on user's current state corresponding with described user's current speech
Cease assists in identifying information;Described user's current speech is determined according to described voice recognition information and the described information that assists in identifying
Final recognition result.
Further, determine that described user's current speech is according to described voice recognition information and the described information that assists in identifying
Whole recognition result includes: obtain corresponding one or more of described user's current speech the according to described voice recognition information
One candidate's vocabulary;According to described assist in identifying vocabulary classification corresponding to user's current speech described in acquisition of information or one or
The multiple second candidate's vocabulary of person;Described use is determined according to one or more the first candidate vocabulary and described lexical types
The final recognition result of family current speech;Or, according to one or more the first candidate vocabulary and one or
The multiple second candidate's vocabulary of person determines the final recognition result of described user's current speech.
Further, determine that described user is current according to one or more the first candidate vocabulary and described lexical types
The final recognition result of voice includes: select to meet described vocabulary classification from one or more the first candidate vocabulary
The first specific vocabulary, using described first specific vocabulary as the final recognition result of described user's current speech.
Further, according to one or more the first candidate vocabulary and one or more the second candidate vocabulary
Determine that the final recognition result of described user's current speech includes: select from one or more the second candidate vocabulary
The second specific vocabulary high with one or more the first candidate Lexical Similarity, using described second specific vocabulary as
The final recognition result of described user's current speech.
Further, described voice recognition information is obtained based on user's current state corresponding with described user's current speech
Assist in identifying information to include: obtain the image for indicating described user's current state;Special according to described Image Acquisition image
Reference ceases;According to described image feature information obtain the vocabulary classification corresponding with described image feature information and/or one or
Person's multiple candidate vocabulary, assists in identifying described vocabulary classification and/or one or more candidate's vocabulary as described
Information.
Further, according to described image feature information obtain the vocabulary classification corresponding with described image feature information and/or
One or more candidate's vocabulary includes: search the highest with described image feature information similarity in predetermined image library
Specific image;According to default image and vocabulary classification or the corresponding relation of one or more candidate's vocabulary, obtain with
Vocabulary classification that described specific image is corresponding or one or more candidate's vocabulary.
Further, described user's current state includes at least one of: the lip kinestate of described user, described
The laryngeal vibration state of user, the facial movement state of described user, the gesture motion state of described user.
Further, obtain the voice recognition information of user's current speech, and based on corresponding with described user's current speech
User's current state obtain described voice recognition information assist in identifying information before include: judge know based on described voice
Other information determines that the accuracy of the final recognition result of described user's current speech is less than predetermined threshold.
According to another aspect of the present invention, it is provided that the device of a kind of speech recognition, described device includes: acquisition module,
For obtaining the voice recognition information of user's current speech and current based on the user corresponding with described user's current speech
State obtain described voice recognition information assist in identifying information;Determine module, for according to described voice recognition information and
The described information that assists in identifying determines the final recognition result of described user's current speech.
Further, described determine that module includes: the first acquiring unit, for obtaining institute according to described voice recognition information
State one or more the first candidate vocabulary that user's current speech is corresponding;Second acquisition unit, for according to described auxiliary
Identify vocabulary classification corresponding to user's current speech described in acquisition of information or one or more the second candidate vocabulary;Determine
Unit, for determining described user's current speech according to one or more the first candidate vocabulary and described lexical types
Final recognition result;Or, according to one or more the first candidate vocabulary and one or more second
Candidate's vocabulary determines the final recognition result of described user's current speech.
Further, described determine that unit is additionally operable to from one or more the first candidate vocabulary to select to meet described
First specific vocabulary of vocabulary classification, using described first specific vocabulary as the final recognition result of described user's current speech.
Further, described determine that unit is additionally operable to from one or more the second candidate vocabulary select with described one
The second specific vocabulary that individual or multiple first candidate's Lexical Similarity is high, using described second specific vocabulary as described user
The final recognition result of current speech.
Further, described acquisition module also includes: the 3rd acquiring unit, is used for indicating described user current for acquisition
The image of state;4th acquiring unit, for according to described Image Acquisition image feature information;5th acquiring unit, uses
According to described image feature information obtain the vocabulary classification corresponding with described image feature information and/or one or more
Candidate's vocabulary, assists in identifying information using described vocabulary classification and/or one or more candidate's vocabulary as described.
Further, described 5th acquiring unit also includes: search subelement, in predetermined image library search with
The specific image that described image feature information similarity is the highest;Obtain subelement, for according to the image preset and vocabulary class
Not or the corresponding relation of one or more candidate's vocabulary, the vocabulary classification or corresponding with described specific image is obtained
Individual or multiple candidate's vocabulary.
Further, described user's current state includes at least one of: the lip kinestate of described user, described
The laryngeal vibration state of user, the facial movement state of described user, the gesture motion state of described user.
Further, described device also includes: determination module, determines described based on described voice recognition information for judgement
The accuracy of the final recognition result of user's current speech is less than predetermined threshold.
According to another aspect of the present invention, additionally providing a kind of terminal, including processor, described processor is used for obtaining
The voice recognition information of user's current speech, and obtain based on user's current state corresponding with described user's current speech
Described voice recognition information assist in identifying information;Institute is determined according to described voice recognition information and the described information that assists in identifying
State the final recognition result of user's current speech.
By the present invention, obtain the voice recognition information of user's current speech, and based on corresponding with user's current speech
What user's current state obtained this voice recognition information assists in identifying information;According to voice recognition information and assist in identifying information
Determine the final recognition result of user's current speech.Solve in correlation technique and only obtain saying of user by the sound of user
Words content causes the problem that the accuracy of speech recognition is the highest, and then improves the accuracy of speech recognition.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, the present invention
Schematic description and description be used for explaining the present invention, be not intended that inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of audio recognition method according to embodiments of the present invention;
Fig. 2 is the structured flowchart of speech recognition equipment according to embodiments of the present invention;
Fig. 3 is the structured flowchart () of speech recognition equipment according to embodiments of the present invention;
Fig. 4 is the structured flowchart (two) of speech recognition equipment according to embodiments of the present invention;
Fig. 5 is the structured flowchart (three) of speech recognition equipment according to embodiments of the present invention;
Fig. 6 is the structured flowchart (four) of speech recognition equipment according to embodiments of the present invention;
Fig. 7 is the flow chart of voice recognition processing method according to embodiments of the present invention;
The structured flowchart of Fig. 8 voice recognition processing device according to embodiments of the present invention;
Fig. 9 is voice recognition processing flow chart according to embodiments of the present invention.
Detailed description of the invention
Below with reference to accompanying drawing and describe the present invention in detail in conjunction with the embodiments.It should be noted that in the feelings do not conflicted
Under condition, the embodiment in the application and the feature in embodiment can be mutually combined.
A kind of method providing speech recognition in the present embodiment, Fig. 1 is speech recognition side according to embodiments of the present invention
The flow chart of method, as it is shown in figure 1, this flow process comprises the steps:
Step S102, obtains the voice recognition information of user's current speech, and based on corresponding with this user's current speech
User's current state obtain this voice recognition information assist in identifying information;
Step S104, according to voice recognition information with assist in identifying information and determine the final recognition result of user's current speech.
By above-mentioned steps, obtain the voice recognition information of user's current speech, and obtain user when sending voice
State characteristic information, using user's state characteristic information when sending voice as the auxiliary information identifying current speech, phase
More relatively low than only carrying out the recognition accuracy of voice by the current speech of user in prior art, above-mentioned steps solves phase
Pass technology only obtains the speech content of user by the sound of user and causes the problem that the accuracy of speech recognition is the highest, enter
And improve the accuracy of speech recognition.
Above-mentioned steps S104 relate to according to voice recognition information and assist in identifying information and determine that this user's current speech is
Whole recognition result, in one alternate embodiment, according to voice recognition information obtain user's current speech corresponding or
The multiple first candidate's vocabulary of person;According to assisting in identifying vocabulary classification corresponding to this user's current speech of acquisition of information or one
Or multiple second candidate's vocabulary;Determine that this user is current according to one or more the first candidate vocabulary and this lexical types
The final recognition result of voice;Or, according to one or more the first candidate vocabulary and one or more second candidate
Vocabulary determines the final recognition result of user's current speech.
The final recognition result of this user's current speech is determined according to one or more the first candidate vocabulary and lexical types
Mode can have a variety of, in one alternate embodiment, select to meet from one or more the first candidate vocabulary
First specific vocabulary of vocabulary classification, using the first specific vocabulary as the final recognition result of this user's current speech.Separately
In one alternative embodiment, select and one or more the first candidate vocabulary from one or more the second candidate vocabulary
The second specific vocabulary that similarity is high, using the second specific vocabulary as the final recognition result of user's current speech.
Above-mentioned determining this user according to one or more the first candidate vocabulary and one or more the second candidate vocabulary
During the final recognition result of current speech, in one alternate embodiment, first acquisition is used for indicating this user to work as
The image of front state, then according to this Image Acquisition image feature information, obtains and this figure further according to this image feature information
As vocabulary classification corresponding to characteristic information and/or one or more candidate's vocabulary, by this vocabulary classification and/or this or
Person's multiple candidate vocabulary assists in identifying information as this.
In one alternate embodiment, search in predetermined image library the highest with this image feature information similarity specific
Image, according to default image and vocabulary classification or the corresponding relation of one or more candidate's vocabulary, obtains and this spy
Determine vocabulary classification corresponding to image or one or more candidate's vocabulary.Thus according to image feature information got with
Vocabulary classification that this image feature information is corresponding and/or one or more candidate's vocabulary.
User's current state can include multiple, is illustrated this below.In one alternate embodiment, this use
The lip kinestate at family, the laryngeal vibration state of this user, the facial movement state of this user, the gesture fortune of this user
Dynamic state.The information included by current state feature of above-mentioned user is illustrative only, and is not restricted this.Such as
In actual life, only can i.e. can recognize that the content described in speaker by lip reading.Therefore, lip reading is to identify voice
Important cofactor.
In one alternate embodiment, obtain the voice recognition information of user's current speech, and based on current with this user
User's current state that voice is corresponding obtain this voice recognition information assist in identifying information before, it is determined that know based on this voice
Other information determines that the accuracy of the final recognition result of this user's current speech is less than predetermined threshold.
Additionally providing the device of a kind of speech recognition in the present embodiment, this device is used for realizing above-described embodiment and the most real
Execute mode, carry out repeating no more of explanation.As used below, term " module " can realize predetermined merit
The software of energy and/or the combination of hardware.Although the device described by following example preferably realizes with software, but
Hardware, or the realization of the combination of software and hardware also may and be contemplated.
Fig. 2 is the structured flowchart of speech recognition equipment according to embodiments of the present invention, as in figure 2 it is shown, this device includes:
Acquisition module 22, for obtaining the voice recognition information of user's current speech, and based on corresponding with this user's current speech
User's current state obtain this voice recognition information assist in identifying information;Determine module 24, for knowing according to this voice
Other information assists in identifying information with this and determines the final recognition result of this user's current speech.
Fig. 3 is the structured flowchart () of speech recognition equipment according to embodiments of the present invention, as it is shown on figure 3, determine mould
Block 24 includes: the first acquiring unit 242, for obtain that this user's current speech is corresponding according to this voice recognition information one
Individual or multiple first candidate's vocabulary;Second acquisition unit 244, works as assisting in identifying this user of acquisition of information according to this
Vocabulary classification that front voice is corresponding or one or more the second candidate vocabulary;Determine unit 246, for according to this one
Individual or multiple first candidate's vocabulary and this lexical types determine the final recognition result of this user's current speech;Or, root
This user's current speech is determined with this one or more second candidate's vocabulary according to this one or more first candidate's vocabulary
Final recognition result.
Optionally it is determined that unit 246 is additionally operable to select to meet this vocabulary class from this one or more first candidate's vocabulary
Other first specific vocabulary, using this first specific vocabulary as the final recognition result of this user's current speech.
Optionally it is determined that unit 246 be additionally operable to select from this one or more second candidate's vocabulary with this or
The second specific vocabulary that multiple first candidate's Lexical Similarities are high, using this second specific vocabulary as this user's current speech
Final recognition result.
Fig. 4 is the structured flowchart (two) of speech recognition equipment according to embodiments of the present invention, as described in Figure 4, obtains mould
Block 22 also includes: the 3rd acquiring unit 222, for obtaining the image for indicating this user's current state;4th obtains
Unit 224, for according to this Image Acquisition image feature information;5th acquiring unit 226, for special according to this image
Levy the acquisition of information vocabulary classification corresponding with this image feature information and/or one or more candidate's vocabulary, by this vocabulary
Classification and/or this one or more candidate's vocabulary assist in identifying information as this.
Fig. 5 is the structured flowchart (three) of speech recognition equipment according to embodiments of the present invention, as it is shown in figure 5, the 5th obtains
Take unit 226 also to include: search subelement 2262, for searching and this image feature information phase in predetermined image library
Like spending the highest specific image;Obtain subelement 2264, for according to image and the vocabulary classification preset or one or
The corresponding relation of multiple candidate's vocabulary, obtains the vocabulary classification corresponding with this specific image or one or more candidate word
Converge.
Alternatively, user's current state includes at least one of: the lip kinestate of this user, the throat of this user
Vibrational state, the facial movement state of this user, the gesture motion state of this user.
Fig. 6 is the structured flowchart (four) of speech recognition equipment according to embodiments of the present invention, as shown in Figure 6, this device
Also include: determination module 26, finally identify knot based on what this voice recognition information determined this user's current speech for judging
The accuracy of fruit is less than predetermined threshold.
According to another aspect of the present invention, additionally providing a kind of terminal, including processor, this processor is used for obtaining use
The voice recognition information of family current speech, and obtain this language based on user's current state corresponding with this user's current speech
Sound identification information assist in identifying information;Assist in identifying information according to this voice recognition information with this and determine the current language of this user
The final recognition result of sound.
It should be noted that above-mentioned modules can be by software or hardware realizes, for the latter, Ke Yitong
Cross in the following manner to realize, but be not limited to this: above-mentioned modules is respectively positioned in same processor;Or, each mould above-mentioned
Block lays respectively at first processor, the second processor and the 3rd processor ... in.
For the problems referred to above present in correlation technique, illustrate below in conjunction with concrete alternative embodiment, following can
Select and embodiment combines above-mentioned alternative embodiment and optional embodiment thereof.
This alternative embodiment provides a kind of voice recognition processing method and device, to solve phonetic recognization rate in correlation technique
Low and cause user experience difference problem.In order to overcome disadvantages mentioned above and deficiency, this alternative embodiment of prior art
Purpose be to provide a kind of Intelligent voice recognition method based on auxiliary interactive mode and device, on the basis of speech recognition
On, as baseband signal, with the use of lipreading recognition, recognition of face, gesture identification, laryngeal vibration identification etc., as
Auxiliary signal.Utilizing each technology in the advantage of its application, learn from other's strong points to offset one's weaknesses, each technology modules is relatively independent the most mutually to be melted
Close, be greatly improved speech processes discrimination, it is preferred that the increase of auxiliary signal identification can be determined by voice identification result,
When voice identification result probability then increases assistance data less than threshold value.The language acknowledging process meeting the mankind is a manifold
The perception in road.Allow terminal carry out, based on by sound, the content that perception is talked, coordinate and identify its shape of the mouth as one speaks, changes in faces etc.
Understand the content said exactly.
An aspect according to this alternative embodiment, it is provided that a kind of voice recognition processing method, is obtained by audio sensor
Take on the basis of voice data carries out speech recognition as baseband signal, by terminal unit photographic head or external sensing
Device gathers the moving image of human body, including gesture motion, facial movement, laryngeal vibration, lipreading recognition etc., and passes through collection
The image algorithm become and action process chip and resolve, as the auxiliary signal of speech recognition, baseband signal and auxiliary letter
Number recognition result by terminal integrated treatment and performs corresponding operating.By auxiliary signal recognition result and speech recognition baseband signal
Result carries out accumulation process and forms unified recognition result, helps out speech recognition, improves phonetic recognization rate.
By gesture motion, facial movement, laryngeal vibration, lipreading recognition integrates, each way all passes through feature extraction,
Template training, template classification, judging process organically combine, and use first speech recognition to carry out point as baseband signal
Analysis confirmation, rear auxiliary signal carry out the logical judgment sequence of auxiliary judgment, effectively reduce because noise and external sound disturb
Produce the probability identifying mistake.During auxiliary signal identification, by sensor and camera collection characteristic, enter
Row characteristic is extracted, and carries out a series of matching judgment identification with preset template base data, then identifies feature with corresponding
Result is compared, and identifies candidate word vocabulary possible in speech recognition modeling dictionary.
Alternatively, the above-mentioned lipreading recognition lip image by camera collection speaker, lip image is carried out at image
Reason, Real-time and Dynamic is extracted lip feature, is then determined, with lip algorithm for pattern recognition, content of speaking.Use lip and color of the lip
The determination methods combined, is accurately positioned lip position.Suitable lip matching algorithm is used to be identified.
Alternatively, pretreated video data is taken out the feature of lip image by above-mentioned lipreading recognition, utilizes lip image
Feature identification active user nozzle type change;Detection user's mouth motion realizes the identification of lip, improves recognition efficiency
And accuracy rate.Above-mentioned mouth motion feature figure is classified, it is thus achieved that classification information, above-mentioned mouth motion feature figure is entered
Row is sorted out, and the mouth motion feature figure of every kind of characteristic type is all to there being some vocabulary classifications.Above-mentioned lipreading recognition obtains letter
Breath, after a series of process such as denoising, modulus (A/D) conversion, respectively be preset in image/voice recognition processing mould
Template base comparing in block, relatively above-mentioned lipreading recognition information with all mouth motion feature figures sampled in advance
Similarity, reads the some vocabulary classifications corresponding to mouth motion feature figure that similarity is the highest.
Alternatively, above-mentioned laryngeal vibration is identified by the laryngeal vibration form of outer sensor collection speaker, to vibration shape
State processes, and Real-time and Dynamic extracts vibration shape feature, then determines, with vibration shape algorithm for pattern recognition, content of speaking.
Alternatively, before user is carried out laryngeal vibration identification, need first the laryngeal vibration motion feature figure of user to be carried out
Sampling, sets up different laryngeal vibration motion feature archives to different user.Laryngeal vibration at sample user in advance is moved
During characteristic pattern, the laryngeal vibration motion feature figure that user can send a syllable is sampled, it is possible to user is sent one
The laryngeal vibration motion feature figure of individual word is sampled.For the different speech events that pronounces, laryngeal vibration motion is different,
Owing to being relevant between each speech events that user sends, after completing the identification to laryngeal vibration, on using
Error correcting technique hereafter, verifies the laryngeal vibration identified, reduces the identification of generic laryngeal vibration motion feature figure
Mistake, improves the accuracy rate of laryngeal vibration identification further.
Alternatively, pretreated vibration data is taken out the feature of laryngeal vibration image by above-mentioned laryngeal vibration identification, utilizes
The laryngeal vibration change of the feature identification active user of laryngeal vibration image;Detection user's laryngeal vibration motion realizes throat
The identification of vibration, improves recognition efficiency and accuracy rate.Above-mentioned laryngeal vibration motion feature figure is classified, it is thus achieved that classification
Information, sorts out above-mentioned laryngeal vibration motion feature figure, and the laryngeal vibration motion feature of every kind of characteristic type is the most corresponding
There are some vocabulary classifications.Above-mentioned laryngeal vibration identification obtain information, respectively be preset in image/voice recognition processing module
In template base comparing, relatively above-mentioned laryngeal vibration identification information and all laryngeal vibration motion spy of sampling in advance
Levy the similarity of figure, read the some vocabulary classifications corresponding to laryngeal vibration motion feature figure that similarity is the highest.
Above-mentioned recognition of face is for extracting user's face feature in video data, and identity and position to user are carried out
Determine;When speaking, facial muscle also correspond to different motor patterns, by gathering the action of facial muscle, the most permissible
From signal characteristic, identify the muscle movement pattern of correspondence, and then auxiliary is identified voice messaging.
An aspect according to this alternative embodiment, additionally provides a kind of voice recognition processing device, including: baseband signal
Module.Auxiliary signal module, signal processing module.
Baseband signal module, for traditional sound identification module, it is right that above-mentioned sound identification module is used for by audio sensor
Pretreated voice data is identified;The identification object of sound identification module includes speech recognition and the company of isolated vocabulary
The speech recognition of continuous large vocabulary, the former is mainly used to determine the input that control instruction, the latter are mainly used in text.At this
Mainly illustrating as a example by the identification of isolated vocabulary in invention, identifying of continuous large vocabulary uses identical processing mode.
Alternatively, audio sensor is microphone array or directional microphone.Owing to environment existing various forms of making an uproar
Sound interference, and existing audio frequency based on common microphone obtains mode and has identical spirit for user speech and environment noise
Sensitivity, as broad as long voice and the ability of noise, therefore easily cause the decline of user speech identification command operating accuracy.
Use microphone array or directional microphone can overcome the problems referred to above, use sound localization to follow the tracks of with voice enhancement algorithm
Operate the sound of user and its acoustical signal is strengthened, suppression ambient noise and the impact of people's sound interference, improve
The signal to noise ratio of system voice audio frequency input, it is ensured that back-end algorithm obtains the reliable of the quality of data.
Auxiliary signal module, including front end photographic head, audio sensor, laryngeal vibration sensor;For obtaining video counts
According to, voice data and action data;
Alternatively, laryngeal vibration sensor integration contacts with user throat in wearable device, position, and detection user produces
Speech fluctuations, a temperature sensor is positioned over inside wearable device, and a temperature sensor is positioned over wearable setting
Standby outside, microprocessor is by comparing the temperature of two sensor detections, it is judged that whether wearable device is dressed by user,
Wearable device, under the situation not being worn, will reduce wearable device overall power automatically into park mode.
Microprocessor is by detection vibrating sensor condition adjudgement and identifies the phonetic order that user sends, and by phonetic order by indigo plant
Tooth equipment is sent to need device to be controlled, performs voice recognition instruction.
Signal processing unit, including lipreading recognition module, face recognition module, Vibration identification module, gesture recognition module,
Sound identification module and point adjusting module;For baseband signal (voice signal) and auxiliary signal are identified, select
Baseband signal is as main voice messaging, using auxiliary signal as assistant voice information;
Use elder generation's baseband signal (voice signal) to be analyzed confirmation as baseband signal, rear auxiliary signal carries out auxiliary and sentences
Disconnected logical judgment sequence, concrete identify during, the highest some of the probability score value that selects voice signal identification to draw
Individual word is as candidate word, for for each candidate word, generating multistage related term set according to predetermined vocabulary.Auxiliary letter
Number assistant voice information produced for the related term that improves in speech recognition modeling in candidate word and related term set at language
Score value in the other model dictionary of sound.After baseband signal and auxiliary signal are all disposed, select the candidate that score value is the highest
Word or related term are as recognition result.
Above-mentioned lipreading recognition module, for pretreated video data takes out the feature of lip image, utilizes lip motion information
Identify the nozzle type change of active user;
Above-mentioned face recognition module is used for extracting user's face feature in video data, identity and the position to user
Being determined, the identity identifying different registration user is mainly conducive to the customization of whole device individual operation, if not
Authorizing with control, the positional information of user may be used for assisting gesture identification to determine the operating area of user's hands, determine
User carries out orientation during voice operating, to improve the audio frequency input gain in microphone users orientation;Multiple possible when having
During user, this module can recognize that the position of all faces, and judges all user identity, and carries out respectively
Process.Ask that the user in who camera view of user will be awarded control;
Above-mentioned gesture recognition module, for extracting gesture information in pretreated video data, determines hand-type, hands
Movement locus, hands coordinate information in the picture, and then any hand-type is tracked, opponent's profile in the picture
Be analyzed, user by specific gesture or action to obtain startup and the control of whole terminal.
By alternative embodiment, to existing various forms of human-computer interaction technologies, know including gesture identification, laryngeal vibration
Not, speech recognition, recognition of face, lipreading recognition technology etc. merge, speech recognition makes as baseband signal, cooperation
Speech recognition candidate word is carried out as auxiliary signal with lipreading recognition, recognition of face, gesture identification, laryngeal vibration identification etc.
Point adjust.Use elder generation's baseband signal (voice signal) to be analyzed confirmation as baseband signal, rear auxiliary signal is carried out
The logical judgment sequence of auxiliary judgment, utilizes each technology in the advantage of its application, learns from other's strong points to offset one's weaknesses, each technology modules phase
The most mutually merge independent, utilize the nozzle type of lip motion information identification active user to change, reduce user on this basis and carry out
False Rate during speech recognition operation, to ensure that voice operating also can normally identify in noise circumstance;Face recognition module
Identify the positional information of user, may be used for assisting gesture identification to determine the operating area of user's hands, determine that user is carried out
Orientation during voice operating, to improve the audio frequency input gain in microphone users orientation.Thus overcome the impact of noise, aobvious
Work improves phonetic recognization rate, then result is changed into dependent instruction.Accomplish that lifting terminal speech identification is stable well
Comfortable with operate.
Can perform in user terminal such as smart mobile phone, the panel computer etc. in the step shown in the flow chart of accompanying drawing, and
And, although show logical order in flow charts, but in some cases, can hold with the order being different from herein
Step shown or described by row.
Present embodiments providing a kind of voice recognition processing method, Fig. 7 is voice recognition processing according to embodiments of the present invention
The flow chart of method, as it is shown in fig. 7, this flow process includes:
Step S702, the voice messaging obtained by audio sensor is identified processing as baseband signal;
Step S704, is identified lipreading recognition, recognition of face, Vibration identification, gesture identification as auxiliary signal
Process, and the recognition result of baseband signal is carried out a point adjustment.
Speech recognition object includes speech recognition and the speech recognition of continuous large vocabulary of isolated vocabulary, and the former is mainly used to
Determine that control instruction, the latter are mainly used in the input of text.Say as a example by the identification of isolated vocabulary in the present embodiment
Bright, identifying of continuous large vocabulary uses identical processing mode.By each step above-mentioned, use first baseband signal (language
Tone signal) be analyzed confirming as baseband signal, rear auxiliary signal carries out the logical judgment sequence of auxiliary judgment, select
The highest several words of probability score value that voice signal identification draws are as candidate word, for for each candidate word, root
Multistage related term set is generated according to predetermined vocabulary.The highest candidate word classification of probability score value that auxiliary signal identification produces
As auxiliary information, judge several candidate word that baseband signal identifies successively, if meeting what auxiliary signal identified
Candidate word classification, then improve the score value in the other model dictionary of voice of the related term in this candidate word and related term set.When
After baseband signal and auxiliary signal are all disposed, select the highest candidate word of score value or related term as recognition result.
In specific implementation process, lipreading recognition, recognition of face, Vibration identification, gesture identification are carried out as auxiliary signal
Identifying processing, various recognition method are separate, and one or more recognition method can be used as auxiliary letter simultaneously
Number input.
Additionally providing a kind of device in an embodiment, this device is corresponding with the method in above-described embodiment, has carried out
Illustrate does not repeats them here.Module or unit in this device can be stored in memorizer or user terminal and permissible
The code run by processor, it is also possible to realize in other ways, illustrates the most one by one at this.
According to an aspect of the present invention, additionally providing a kind of voice recognition processing device, Fig. 8 is to implement according to the present invention
The structured flowchart of the voice recognition processing device of example, as shown in Figure 8, this device includes:
Baseband signal module, including audio sensor, for traditional sound identification module, above-mentioned sound identification module passes through
Audio sensor is for being identified pretreated voice data;
Auxiliary signal module, including front end photographic head, laryngeal vibration sensor;For obtaining video data, voice data
And action data, including lipreading recognition, recognition of face, laryngeal vibration identification, gesture identification etc.;
Signal processing module, including lipreading recognition module, face recognition module, Vibration identification module, gesture recognition module,
Sound identification module and point adjusting module;For baseband signal (voice signal) and auxiliary signal are identified, select
Auxiliary signal, as main voice messaging, is carried out a point adjustment as auxiliary information by baseband signal;
Above-mentioned lipreading recognition module, for pretreated video data takes out the feature of lip image, utilizes lip motion information
Identify the nozzle type change of active user;
Above-mentioned face recognition module is used for extracting user's face feature in video data, identity and the position to user
Being determined, the identity identifying different registration user is mainly conducive to the customization of whole device individual operation, if not
Authorizing with control;
Above-mentioned gesture recognition module, for extracting gesture information in pretreated video data, determines hand-type, hands
Movement locus, hands coordinate information in the picture, and then any hand-type is tracked, opponent's profile in the picture
Be analyzed, user by specific gesture or action to obtain startup and the control of whole terminal;
Fig. 9 is the flow chart according to voice recognition processing method of the present invention, as it is shown in figure 9, the speech recognition of this embodiment
Method is as follows:
Step S902, the voice messaging obtained from audio sensor, obtain from front end photographic head, laryngeal vibration sensor
Video data, action data, the information such as including lipreading recognition, recognition of face, laryngeal vibration identification, gesture identification;
Step S904, as a example by the speech recognition of isolated vocabulary, is identified confirming as baseband signal to voice signal,
Identify that this isolated vocabulary obtains several maximum words of this probability as candidate word;
Step S906, to terminal unit photographic head or the moving image of external sensor acquisition human body, including gesture
Motion, facial movement, laryngeal vibration, lipreading recognition etc., as auxiliary signal, is analyzed confirming, obtains probability and divide
It is worth the highest candidate word classification;
Step S908, judging several candidate word that baseband signal identifies successively, identifying if meeting auxiliary signal
Candidate word classification, then improve this candidate word score value in the other model dictionary of voice;
Step S910, after baseband signal and auxiliary signal are all disposed, selects the candidate word conduct that score value is the highest
Recognition result.
With a concrete example, this alternative embodiment is illustrated below.Such as by the voice of owner is identified,
Obtain following result:
" please call (0.9) browser (0.7) by (0.6) cards folder (0.9), the numerical value in its bracket is probability score value value, generation
Table probability size, the biggest probability of score value is the biggest.The word selecting probability score value the highest is candidate word, such as, select such as
Under candidate word: cards folder (0.9) calling (0.9) as voice identification result.
The gesture motion that simultaneously carries out, facial movement, laryngeal vibration, the various ways combination such as lipreading recognition or only use
One or more of which mode is identified as auxiliary signal, obtains the candidate word classification that probability score value is the highest.
Judge the cards folder (0.9) calling (0.9) that voice signal identifies successively, it may be judged whether meet what auxiliary signal identified
Candidate word classification.Assume that cards folder meets candidate word classification.Then improve the probability score value of cards folder, such as, update and run after fame
Sheet folder (1.0) calling (0.9).
After voice baseband signal and auxiliary signal are all disposed, the candidate word cards folder (1.0) that score value is the highest is selected to make
For recognition result.
As the alternative embodiment of the present embodiment, first auxiliary signal identification can be used to determine candidate word classification, after pass through language
Tone signal is analyzed the logical judgment sequence confirmed as baseband signal.First pass through gesture motion, facial movement, throat
Vibration, the various ways such as lipreading recognition combination or only use one or more of which mode to be identified as auxiliary signal,
When using various ways to be identified, the recognition result accumulation process of each way, obtain the time that probability score value is the highest
Select word class, in this on the basis of combine voice identification result, therefrom selecting the word that probability score value is the highest be final identification
Result.With a concrete example, this programme is illustrated below.Such as by the voice of owner is identified, obtain
Following result:
" please call (0.9) browser (0.7) by (0.6) cards folder (0.9), the numerical value in its bracket is probability score value.Select
The word that probability score value is the highest is candidate word, such as, select following candidate word: cards folder (0.9) calling (0.9) is as voice
Recognition result.
The laryngeal vibration simultaneously carried out and the combination of lipreading recognition two ways are identified as auxiliary signal, it is assumed that be first
Laryngeal vibration identification, judges the cards folder (0.9) calling (0.9) that baseband signal identifies, it may be judged whether meet throat and shake successively
The candidate word classification that dynamic identification identifies.Assume that cards folder meets the classification of laryngeal vibration identification, then that improves cards folder can
Energy property score value, such as, be updated to cards folder (1.0) calling (0.9).Lip is proceeded on the basis of upper once recognition result
Identify and judge, judge that cards folder (1.0) calls (0.9) successively, it may be judged whether meet the candidate word classification of lipreading recognition.Assume
Cards folder meets the classification of lipreading recognition, then improve the probability score value of cards folder, such as, be updated to cards folder (1.1) calling
(0.9).The recognition result of two ways has carried out accumulation process.
After voice baseband signal and auxiliary signal are all disposed, the candidate word cards folder (1.1) that score value is the highest is selected to make
For recognition result.
As the alternative embodiment of the present embodiment, the process of screening is to be completed by a point adjustment further, i.e. can increase
Meet the score value of the candidate word of auxiliary signal identification, it is also possible to reduce the score value of the candidate word not meeting auxiliary signal identification,
After baseband signal and auxiliary signal are all disposed, select the candidate word that score value is the highest as recognition result.
As the alternative embodiment of the present embodiment, the utilization added to improve speech recognition accuracy assists information to identification
It is optional that result carries out confirming user, and speech recognition device determines recognition result according to input voice.Tie for above-mentioned identification
Fruit calculates a possibility metric.If this possibility metric less than threshold value, then prompt the user with whether input auxiliary
Help data or automatically turn on ancillary data identification.If this possibility metric is more than threshold value, then whether prompt the user with
Close assistance data or be automatically switched off ancillary data identification.The concrete numerical value of threshold value is not defined, empirical value draw
Or draw according to Consumer's Experience.
The audio recognition method improved based on this above-described embodiment, to existing various forms of human-computer interaction technologies, including
Gesture identification, laryngeal vibration identification, speech recognition, recognition of face, lipreading recognition technology etc. merge, speech recognition
As baseband signal, with the use of lipreading recognition, recognition of face, gesture identification, laryngeal vibration identification etc. as auxiliary letter
Number carry out point adjusting of speech recognition candidate word.First baseband signal (voice signal) is used to be analyzed as baseband signal
Confirmation, rear auxiliary signal carry out the logical judgment sequence of auxiliary judgment, and profit has accomplished that lifting terminal speech identification is steady well
Fixed comfortable with operate.
In sum, a kind of voice recognition processing method and device provided by the present invention, on the basis of speech recognition,
As baseband signal, with the use of lipreading recognition, recognition of face, gesture identification, laryngeal vibration identification etc. as auxiliary letter
Number.Solve phonetic recognization rate in correlation technique low and cause user experience difference problem.Each technology is utilized to answer at it
By the advantage in field, learning from other's strong points to offset one's weaknesses, each technology modules is relatively independent the most mutually to be merged, and is greatly improved speech processes discrimination.
In another embodiment, additionally providing a kind of software, this software is used for performing above-described embodiment and being preferable to carry out
Technical scheme described in mode.
In another embodiment, additionally providing a kind of storage medium, in this storage medium, storage has above-mentioned software, should
Storage medium includes but not limited to: CD, floppy disk, hard disk, scratch pad memory etc..
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general
Calculating device to realize, they can concentrate on single calculating device, or be distributed in multiple calculating device and formed
Network on, alternatively, they can realize, it is thus possible to by them with calculating the executable program code of device
Storage is performed by calculating device in the storage device, and in some cases, can hold with the order being different from herein
Step shown or described by row, or they are fabricated to respectively each integrated circuit modules, or by many in them
Individual module or step are fabricated to single integrated circuit module and realize.So, the present invention is not restricted to any specific hardware
Combine with software.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the technology of this area
For personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, that is made is any
Amendment, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (17)
1. the method for a speech recognition, it is characterised in that including:
Obtain the voice recognition information of user's current speech, and based on the user corresponding with described user's current speech
Current state obtain described voice recognition information assist in identifying information;
The final knowledge of described user's current speech is determined according to described voice recognition information and the described information that assists in identifying
Other result.
Method the most according to claim 1, it is characterised in that according to described voice recognition information with described assist in identifying
Information determines that the final recognition result of described user's current speech includes:
One or more first candidate that described user's current speech is corresponding is obtained according to described voice recognition information
Vocabulary;
According to described assist in identifying vocabulary classification corresponding to user's current speech described in acquisition of information or one or
Multiple second candidate's vocabulary;
Described user's current speech is determined according to one or more the first candidate vocabulary and described vocabulary classification
Final recognition result;Or, according to one or more the first candidate vocabulary and one or more
Second candidate's vocabulary determines the final recognition result of described user's current speech.
Method the most according to claim 2, it is characterised in that according to one or more the first candidate vocabulary and
Described lexical types determines that the final recognition result of described user's current speech includes:
From one or more the first candidate vocabulary, select to meet the first specific vocabulary of described vocabulary classification,
Using described first specific vocabulary as the final recognition result of described user's current speech.
Method the most according to claim 2, it is characterised in that according to one or more the first candidate vocabulary and
One or more the second candidate vocabulary determines that the final recognition result of described user's current speech includes:
Select and one or more the first candidate vocabulary from one or more the second candidate vocabulary
The second specific vocabulary that similarity is high, identifies described second specific vocabulary as the final of described user's current speech
Result.
Method the most according to claim 1, it is characterised in that work as based on the user corresponding with described user's current speech
Front state obtains the information that assists in identifying of described voice recognition information and includes:
Obtain the image for indicating described user's current state;
According to described Image Acquisition image feature information;
According to described image feature information obtain the vocabulary classification corresponding with described image feature information and/or one or
Person's multiple candidate vocabulary, using described vocabulary classification and/or one or more candidate's vocabulary as described auxiliary
Identification information.
Method the most according to claim 5, it is characterised in that obtain and described image according to described image feature information
Vocabulary classification and/or one or more candidate's vocabulary that characteristic information is corresponding include:
The specific image the highest with described image feature information similarity is searched in predetermined image library;
According to default image and vocabulary classification or the corresponding relation of one or more candidate's vocabulary, obtain and institute
State vocabulary classification corresponding to specific image or one or more candidate's vocabulary.
Method the most according to any one of claim 1 to 6, it is characterised in that described user's current state include with
At least one lower: the lip kinestate of described user, the laryngeal vibration state of described user, the face of described user
Portion's kinestate, the gesture motion state of described user.
Method the most according to any one of claim 1 to 7, it is characterised in that obtain the voice of user's current speech
Identification information, and obtain described speech recognition letter based on user's current state corresponding with described user's current speech
Breath assist in identifying information before include:
Judge to determine the accuracy of the final recognition result of described user's current speech based on described voice recognition information
Less than predetermined threshold.
9. the device of a speech recognition, it is characterised in that described device includes:
Acquisition module, for obtaining the voice recognition information of user's current speech, and based on current with described user
What user's current state corresponding to voice obtained described voice recognition information assists in identifying information;
Determine module, for determining that described user is current according to described voice recognition information and the described information that assists in identifying
The final recognition result of voice.
Device the most according to claim 9, it is characterised in that described determine that module includes:
First acquiring unit, for one corresponding according to the described user's current speech of described voice recognition information acquisition
Or multiple first candidate's vocabulary;
Second acquisition unit, for assisting in identifying, described in basis, the vocabulary that user's current speech described in acquisition of information is corresponding
Classification or one or more the second candidate vocabulary;
Determine unit, described for determining according to one or more the first candidate vocabulary and described vocabulary classification
The final recognition result of user's current speech;Or, according to one or more the first candidate vocabulary and described
One or more the second candidate vocabulary determines the final recognition result of described user's current speech.
11. devices according to claim 10, it is characterised in that described determine unit be additionally operable to from one or
Multiple first candidate's vocabulary select to meet the first specific vocabulary of described vocabulary classification, by described first specific vocabulary
Final recognition result as described user's current speech.
12. devices according to claim 10, it is characterised in that described determine unit be additionally operable to from one or
Multiple second candidate's vocabulary select second high with one or more the first candidate Lexical Similarity specific
Vocabulary, using described second specific vocabulary as the final recognition result of described user's current speech.
13. devices according to claim 9, it is characterised in that described acquisition module also includes:
3rd acquiring unit, for obtaining the image for indicating described user's current state;
4th acquiring unit, for according to described Image Acquisition image feature information;
5th acquiring unit, for obtaining the word corresponding with described image feature information according to described image feature information
Remittance classification and/or one or more candidate's vocabulary, by described vocabulary classification and/or one or more candidate
Vocabulary assists in identifying information as described.
14. devices according to claim 13, it is characterised in that described 5th acquiring unit also includes:
Search subelement, for searching the spy the highest with described image feature information similarity in predetermined image library
Determine image;
Obtain subelement, right for according to image and the vocabulary classification preset or one or more candidate's vocabulary
Should be related to, obtain the vocabulary classification corresponding with described specific image or one or more candidate's vocabulary.
15. according to the device according to any one of claim 9 to 14, it is characterised in that described user's current state includes
At least one of: the lip kinestate of described user, the laryngeal vibration state of described user, described user
Facial movement state, the gesture motion state of described user.
16. according to the device according to any one of claim 9 to 15, it is characterised in that described device also includes:
Determination module, for judging to determine the final identification of described user's current speech based on described voice recognition information
The accuracy of result is less than predetermined threshold.
17. 1 kinds of terminals, including processor, it is characterised in that described processor is known for the voice obtaining user's current speech
Other information, and obtain described voice recognition information based on user's current state corresponding with described user's current speech
Assist in identifying information;The current language of described user is determined according to described voice recognition information and the described information that assists in identifying
The final recognition result of sound.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510130636.2A CN106157956A (en) | 2015-03-24 | 2015-03-24 | The method and device of speech recognition |
PCT/CN2015/079317 WO2016150001A1 (en) | 2015-03-24 | 2015-05-19 | Speech recognition method, device and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510130636.2A CN106157956A (en) | 2015-03-24 | 2015-03-24 | The method and device of speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106157956A true CN106157956A (en) | 2016-11-23 |
Family
ID=56976870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510130636.2A Pending CN106157956A (en) | 2015-03-24 | 2015-03-24 | The method and device of speech recognition |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106157956A (en) |
WO (1) | WO2016150001A1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106875941A (en) * | 2017-04-01 | 2017-06-20 | 彭楚奥 | A kind of voice method for recognizing semantics of service robot |
CN107945789A (en) * | 2017-12-28 | 2018-04-20 | 努比亚技术有限公司 | Audio recognition method, device and computer-readable recording medium |
CN108010526A (en) * | 2017-12-08 | 2018-05-08 | 北京奇虎科技有限公司 | Method of speech processing and device |
CN108074561A (en) * | 2017-12-08 | 2018-05-25 | 北京奇虎科技有限公司 | Method of speech processing and device |
CN108346427A (en) * | 2018-02-05 | 2018-07-31 | 广东小天才科技有限公司 | Voice recognition method, device, equipment and storage medium |
CN108449323A (en) * | 2018-02-14 | 2018-08-24 | 深圳市声扬科技有限公司 | Login authentication method, device, computer equipment and storage medium |
CN108446641A (en) * | 2018-03-22 | 2018-08-24 | 深圳市迪比科电子科技有限公司 | Mouth shape image recognition system based on machine learning and method for recognizing and sounding through facial texture |
CN108510988A (en) * | 2018-03-22 | 2018-09-07 | 深圳市迪比科电子科技有限公司 | Language identification system and method for deaf-mutes |
CN108965621A (en) * | 2018-10-09 | 2018-12-07 | 北京智合大方科技有限公司 | Self study smart phone sells the assistant that attends a banquet |
CN108986818A (en) * | 2018-07-04 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Video calling hangs up method, apparatus, equipment, server-side and storage medium |
CN109213970A (en) * | 2017-06-30 | 2019-01-15 | 北京国双科技有限公司 | Put down generation method and device |
CN109448711A (en) * | 2018-10-23 | 2019-03-08 | 珠海格力电器股份有限公司 | Voice recognition method and device and computer storage medium |
CN109462694A (en) * | 2018-11-19 | 2019-03-12 | 维沃移动通信有限公司 | A kind of control method and mobile terminal of voice assistant |
CN109583359A (en) * | 2018-11-26 | 2019-04-05 | 北京小米移动软件有限公司 | Presentation content recognition methods, device, electronic equipment, machine readable storage medium |
CN109697976A (en) * | 2018-12-14 | 2019-04-30 | 北京葡萄智学科技有限公司 | A kind of pronunciation recognition methods and device |
CN109872714A (en) * | 2019-01-25 | 2019-06-11 | 广州富港万嘉智能科技有限公司 | A kind of method, electronic equipment and storage medium improving accuracy of speech recognition |
CN110415689A (en) * | 2018-04-26 | 2019-11-05 | 富泰华工业(深圳)有限公司 | Speech recognition equipment and method |
CN110473570A (en) * | 2018-05-09 | 2019-11-19 | 广达电脑股份有限公司 | Integrated voice identification system and method |
CN110830708A (en) * | 2018-08-13 | 2020-02-21 | 深圳市冠旭电子股份有限公司 | Tracking camera shooting method and device and terminal equipment |
CN111445912A (en) * | 2020-04-03 | 2020-07-24 | 深圳市阿尔垎智能科技有限公司 | Voice processing method and system |
CN111447325A (en) * | 2020-04-03 | 2020-07-24 | 上海闻泰电子科技有限公司 | Call auxiliary method, device, terminal and storage medium |
CN111951629A (en) * | 2019-05-16 | 2020-11-17 | 上海流利说信息技术有限公司 | Pronunciation correction system, method, medium and computing device |
CN113823278A (en) * | 2021-09-13 | 2021-12-21 | 北京声智科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107800860A (en) * | 2016-09-07 | 2018-03-13 | 中兴通讯股份有限公司 | Method of speech processing, device and terminal device |
EP3618457A1 (en) * | 2018-09-02 | 2020-03-04 | Oticon A/s | A hearing device configured to utilize non-audio information to process audio signals |
CN110865705B (en) * | 2019-10-24 | 2023-09-19 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode fusion communication method and device, head-mounted equipment and storage medium |
CN112672021B (en) * | 2020-12-25 | 2022-05-17 | 维沃移动通信有限公司 | Language identification method and device and electronic equipment |
CN116434027A (en) * | 2023-06-12 | 2023-07-14 | 深圳星寻科技有限公司 | Artificial intelligent interaction system based on image recognition |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4769845A (en) * | 1986-04-10 | 1988-09-06 | Kabushiki Kaisha Carrylab | Method of recognizing speech using a lip image |
JPS6419399A (en) * | 1987-07-15 | 1989-01-23 | Mitsubishi Electric Corp | Voice recognition equipment |
CN102023703A (en) * | 2009-09-22 | 2011-04-20 | 现代自动车株式会社 | Combined lip reading and voice recognition multimodal interface system |
CN102298443A (en) * | 2011-06-24 | 2011-12-28 | 华南理工大学 | Smart home voice control system combined with video channel and control method thereof |
CN104409075A (en) * | 2014-11-28 | 2015-03-11 | 深圳创维-Rgb电子有限公司 | Voice identification method and system |
CN104423543A (en) * | 2013-08-26 | 2015-03-18 | 联想(北京)有限公司 | Information processing method and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002304194A (en) * | 2001-02-05 | 2002-10-18 | Masanobu Kujirada | System, method and program for inputting voice and/or mouth shape information |
US7587318B2 (en) * | 2002-09-12 | 2009-09-08 | Broadcom Corporation | Correlating video images of lip movements with audio signals to improve speech recognition |
CN101472066A (en) * | 2007-12-27 | 2009-07-01 | 华晶科技股份有限公司 | Near-end control method of image viewfinding device and image viewfinding device applying the method |
CN102324035A (en) * | 2011-08-19 | 2012-01-18 | 广东好帮手电子科技股份有限公司 | Method and system of applying lip posture assisted speech recognition technique to vehicle navigation |
CN105096935B (en) * | 2014-05-06 | 2019-08-09 | 阿里巴巴集团控股有限公司 | A kind of pronunciation inputting method, device and system |
-
2015
- 2015-03-24 CN CN201510130636.2A patent/CN106157956A/en active Pending
- 2015-05-19 WO PCT/CN2015/079317 patent/WO2016150001A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4769845A (en) * | 1986-04-10 | 1988-09-06 | Kabushiki Kaisha Carrylab | Method of recognizing speech using a lip image |
JPS6419399A (en) * | 1987-07-15 | 1989-01-23 | Mitsubishi Electric Corp | Voice recognition equipment |
CN102023703A (en) * | 2009-09-22 | 2011-04-20 | 现代自动车株式会社 | Combined lip reading and voice recognition multimodal interface system |
CN102298443A (en) * | 2011-06-24 | 2011-12-28 | 华南理工大学 | Smart home voice control system combined with video channel and control method thereof |
CN104423543A (en) * | 2013-08-26 | 2015-03-18 | 联想(北京)有限公司 | Information processing method and device |
CN104409075A (en) * | 2014-11-28 | 2015-03-11 | 深圳创维-Rgb电子有限公司 | Voice identification method and system |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106875941B (en) * | 2017-04-01 | 2020-02-18 | 彭楚奥 | Voice semantic recognition method of service robot |
CN106875941A (en) * | 2017-04-01 | 2017-06-20 | 彭楚奥 | A kind of voice method for recognizing semantics of service robot |
CN109213970A (en) * | 2017-06-30 | 2019-01-15 | 北京国双科技有限公司 | Put down generation method and device |
CN109213970B (en) * | 2017-06-30 | 2022-07-29 | 北京国双科技有限公司 | Method and device for generating notes |
CN108010526A (en) * | 2017-12-08 | 2018-05-08 | 北京奇虎科技有限公司 | Method of speech processing and device |
CN108074561A (en) * | 2017-12-08 | 2018-05-25 | 北京奇虎科技有限公司 | Method of speech processing and device |
CN107945789A (en) * | 2017-12-28 | 2018-04-20 | 努比亚技术有限公司 | Audio recognition method, device and computer-readable recording medium |
CN108346427A (en) * | 2018-02-05 | 2018-07-31 | 广东小天才科技有限公司 | Voice recognition method, device, equipment and storage medium |
CN108449323A (en) * | 2018-02-14 | 2018-08-24 | 深圳市声扬科技有限公司 | Login authentication method, device, computer equipment and storage medium |
CN108449323B (en) * | 2018-02-14 | 2021-05-25 | 深圳市声扬科技有限公司 | Login authentication method and device, computer equipment and storage medium |
CN108510988A (en) * | 2018-03-22 | 2018-09-07 | 深圳市迪比科电子科技有限公司 | Language identification system and method for deaf-mutes |
CN108446641A (en) * | 2018-03-22 | 2018-08-24 | 深圳市迪比科电子科技有限公司 | Mouth shape image recognition system based on machine learning and method for recognizing and sounding through facial texture |
CN110415689B (en) * | 2018-04-26 | 2022-02-15 | 富泰华工业(深圳)有限公司 | Speech recognition device and method |
CN110415689A (en) * | 2018-04-26 | 2019-11-05 | 富泰华工业(深圳)有限公司 | Speech recognition equipment and method |
CN110473570A (en) * | 2018-05-09 | 2019-11-19 | 广达电脑股份有限公司 | Integrated voice identification system and method |
CN108986818A (en) * | 2018-07-04 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Video calling hangs up method, apparatus, equipment, server-side and storage medium |
CN110830708A (en) * | 2018-08-13 | 2020-02-21 | 深圳市冠旭电子股份有限公司 | Tracking camera shooting method and device and terminal equipment |
CN108965621A (en) * | 2018-10-09 | 2018-12-07 | 北京智合大方科技有限公司 | Self study smart phone sells the assistant that attends a banquet |
CN109448711A (en) * | 2018-10-23 | 2019-03-08 | 珠海格力电器股份有限公司 | Voice recognition method and device and computer storage medium |
CN109462694A (en) * | 2018-11-19 | 2019-03-12 | 维沃移动通信有限公司 | A kind of control method and mobile terminal of voice assistant |
CN109583359B (en) * | 2018-11-26 | 2023-10-24 | 北京小米移动软件有限公司 | Method, apparatus, electronic device, and machine-readable storage medium for recognizing expression content |
CN109583359A (en) * | 2018-11-26 | 2019-04-05 | 北京小米移动软件有限公司 | Presentation content recognition methods, device, electronic equipment, machine readable storage medium |
CN109697976A (en) * | 2018-12-14 | 2019-04-30 | 北京葡萄智学科技有限公司 | A kind of pronunciation recognition methods and device |
CN109872714A (en) * | 2019-01-25 | 2019-06-11 | 广州富港万嘉智能科技有限公司 | A kind of method, electronic equipment and storage medium improving accuracy of speech recognition |
CN111951629A (en) * | 2019-05-16 | 2020-11-17 | 上海流利说信息技术有限公司 | Pronunciation correction system, method, medium and computing device |
CN111447325A (en) * | 2020-04-03 | 2020-07-24 | 上海闻泰电子科技有限公司 | Call auxiliary method, device, terminal and storage medium |
CN111445912A (en) * | 2020-04-03 | 2020-07-24 | 深圳市阿尔垎智能科技有限公司 | Voice processing method and system |
CN113823278A (en) * | 2021-09-13 | 2021-12-21 | 北京声智科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN113823278B (en) * | 2021-09-13 | 2023-12-08 | 北京声智科技有限公司 | Speech recognition method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2016150001A1 (en) | 2016-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106157956A (en) | The method and device of speech recognition | |
CN112088402B (en) | Federated neural network for speaker recognition | |
CN108000526B (en) | Dialogue interaction method and system for intelligent robot | |
CN107799126B (en) | Voice endpoint detection method and device based on supervised machine learning | |
WO2017112813A1 (en) | Multi-lingual virtual personal assistant | |
CN110310623A (en) | Sample generating method, model training method, device, medium and electronic equipment | |
KR102167760B1 (en) | Sign language analysis Algorithm System using Recognition of Sign Language Motion process and motion tracking pre-trained model | |
JP2002182680A (en) | Operation indication device | |
CN112016367A (en) | Emotion recognition system and method and electronic equipment | |
WO2016173132A1 (en) | Method and device for voice recognition, and user equipment | |
KR20100001928A (en) | Service apparatus and method based on emotional recognition | |
CN109101663A (en) | A kind of robot conversational system Internet-based | |
CN111126280B (en) | Gesture recognition fusion-based aphasia patient auxiliary rehabilitation training system and method | |
CN110570873A (en) | voiceprint wake-up method and device, computer equipment and storage medium | |
CN108074571A (en) | Sound control method, system and the storage medium of augmented reality equipment | |
CN113129867B (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
CN111341350A (en) | Man-machine interaction control method and system, intelligent robot and storage medium | |
CN110931018A (en) | Intelligent voice interaction method and device and computer readable storage medium | |
WO2022072752A1 (en) | Voice user interface using non-linguistic input | |
CN118197315A (en) | Cabin voice interaction method, system and computer readable medium | |
CN111158490A (en) | Auxiliary semantic recognition system based on gesture recognition | |
CN114239610A (en) | Multi-language speech recognition and translation method and related system | |
CN113873297A (en) | Method and related device for generating digital character video | |
CN113822187A (en) | Sign language translation, customer service, communication method, device and readable medium | |
CN114466179A (en) | Method and device for measuring synchronism of voice and image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161123 |