CN109872714A - A kind of method, electronic equipment and storage medium improving accuracy of speech recognition - Google Patents

A kind of method, electronic equipment and storage medium improving accuracy of speech recognition Download PDF

Info

Publication number
CN109872714A
CN109872714A CN201910072525.9A CN201910072525A CN109872714A CN 109872714 A CN109872714 A CN 109872714A CN 201910072525 A CN201910072525 A CN 201910072525A CN 109872714 A CN109872714 A CN 109872714A
Authority
CN
China
Prior art keywords
speech recognition
mouth
recognition
shape
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910072525.9A
Other languages
Chinese (zh)
Inventor
傅峰峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fugang Wanjia Intelligent Technology Co Ltd
Original Assignee
Guangzhou Fugang Wanjia Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fugang Wanjia Intelligent Technology Co Ltd filed Critical Guangzhou Fugang Wanjia Intelligent Technology Co Ltd
Priority to CN201910072525.9A priority Critical patent/CN109872714A/en
Publication of CN109872714A publication Critical patent/CN109872714A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of methods for improving accuracy of speech recognition, the following steps are included: obtaining the voice messaging of active user by sound collection equipment, and voice recognition information is obtained by speech recognition technology, the voice recognition information includes speech recognition probability and speech recognition result;Judge whether speech recognition probability is greater than the first preset value, if it is, output speech recognition result;The Shape of mouth of active user is obtained by image capture device, and Mouth-Shape Recognition information is obtained by image recognition technology;Judge whether Mouth-Shape Recognition probability is higher than speech recognition probability, if it is, output shape of the mouth as one speaks recognition result.The present invention also provides a kind of electronic equipment and computer readable storage medium.The method of raising accuracy of speech recognition of the invention passes through global alignment speech recognition result and Mouth-Shape Recognition result to obtain the higher recognition result of accuracy rate, to improve the accuracy of identification.

Description

A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
Technical field
The present invention relates to a kind of identification technology field more particularly to a kind of methods for improving accuracy of speech recognition, electronics Equipment and storage medium.
Background technique
Currently, speech recognition is a kind of technology that digital speech is converted to the text that computer is understood that.It is several recently Year, speech recognition technology obtains remarkable break-throughs, and speech recognition technology gradually enters into people's lives, life, work to us It offers convenience.Speech recognition technology is in industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumer electronics at present The every field such as product start to apply.The present invention mainly focuses speech recognition (i.e. to the identification of recording file), such as meeting note Record, the discriminance analysis of phone customer service voices and dining room are ordered dishes.It is quasi- although unprecedented development has been obtained in speech recognition technology True rate has been in relatively high level, but still can not accomplish entirely accurate;Therefore, speech recognition standard is further increased True property becomes those skilled in the art's technical problem urgently to be resolved.
Summary of the invention
For overcome the deficiencies in the prior art, one of the objects of the present invention is to provide a kind of raising accuracy of speech recognition Method, can be further improved identification accuracy.
The second object of the present invention is to provide a kind of electronic equipment, can be further improved identification accuracy.
The third object of the present invention is to provide a kind of computer readable storage medium, can be further improved and identify accurately Property.
An object of the present invention adopts the following technical scheme that realization:
A method of improving accuracy of speech recognition, comprising the following steps:
Voice recognition step: the voice messaging of active user is obtained by sound collection equipment, and passes through speech recognition skill Art obtains voice recognition information, and the voice recognition information includes speech recognition probability and speech recognition result;
First judgment step: judging whether speech recognition probability is greater than the first preset value, if it is, output speech recognition As a result, if it is not, then executing Mouth-Shape Recognition step;
Mouth-Shape Recognition step: the Shape of mouth of active user is obtained by image capture device, and passes through image recognition skill Art obtains Mouth-Shape Recognition information, and the Mouth-Shape Recognition information includes Mouth-Shape Recognition probability and Mouth-Shape Recognition result;
Second judgment step: judging whether Mouth-Shape Recognition probability is higher than speech recognition probability, if it is, the output shape of the mouth as one speaks is known Other result.
Further, second judgment step includes following sub-step:
It calculates step: the difference of Mouth-Shape Recognition probability Yu speech recognition probability is calculated;
Result step: judging whether the difference is greater than the second preset value, if so, output shape of the mouth as one speaks recognition result, and it is described Second preset value is positive value.
Further, in the result step, judge whether the difference is greater than the second preset value, second preset value is Positive value, if it is, output shape of the mouth as one speaks recognition result, if it is not, then speech recognition result and Mouth-Shape Recognition result are exported simultaneously It is marked.
Further, in the result step, judge whether the difference is greater than the second preset value, if so, the output shape of the mouth as one speaks Recognition result, if not, calculating separately speech recognition result and Mouth-Shape Recognition result and up and down by natural language processing technique Meaning of one's words correlation between literary sentence is used as prediction result to be exported using meaning of one's words correlation is higher.
Further, institute's speech recognition result, Mouth-Shape Recognition result and prediction result are minutes or finger of ordering dishes It enables.
Further, first preset value is 85%.
Further, second preset value is 5%.
Further, for the sound collection equipment using annular microphone, described image acquires equipment using annular camera shooting Head array.
The second object of the present invention adopts the following technical scheme that realization:
A kind of electronic equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor are realized when executing the computer program one described in any one of one of the object of the invention The method that kind improves accuracy of speech recognition.
The third object of the present invention adopts the following technical scheme that realization:
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor A kind of method of raising accuracy of speech recognition as described in any one of one of the object of the invention is realized when row.
Compared with prior art, the beneficial effects of the present invention are:
The method of raising accuracy of speech recognition of the invention passes through global alignment speech recognition result and Mouth-Shape Recognition knot Fruit is to obtain the higher recognition result of accuracy rate, to improve the accuracy of identification.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the raising accuracy of speech recognition of embodiment one.
Specific embodiment
In the following, being described further in conjunction with attached drawing and specific embodiment to the present invention, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.
Embodiment one
In the description of the present embodiment, it is described primarily directed to two scenes are ordered in minutes and user , but it can be applied not only in both scenes when being embodied, it can also be according to actual need It asks and is applied in other scenes.
As shown in Figure 1, present embodiments providing a kind of method for improving accuracy of speech recognition, comprising the following steps:
S1: the voice messaging of active user is obtained by sound collection equipment, and voice is obtained by speech recognition technology Identification information, the voice recognition information include speech recognition probability and speech recognition result;The sound collection equipment uses Annular microphone;Can be with the highly efficient accurate acoustic information for obtaining round table surrounding by annular microphone, the sound got Source information is more clear, also it will be made more accurate then the later period carries out voiced translation.
Speech recognition technology mainly includes three Feature Extraction Technology, pattern match criterion and model training technology aspects. Speech recognition system is mainly formed including speech signal samples module, voice signal pre-processing module, phonic signal character ginseng Number extraction module, voice signal identification nucleus module, voice signal identify post-processing module.Pattern-recognition matching is that voice is known Other main process.The voice of people is analyzed first, feature is extracted and establishes targetedly speech model, pass through speech model Mode needed for establishing speech recognition.Using the overall model of speech recognition, obtained voice is believed in speech recognition process Number feature compared with the speech pattern that early period establishes carries out matching, by preset search strategy and matching strategy, can obtain The mode that best out and with input voice signal matches.Finally, can computer output recognition result.
In general, there are three ways to speech recognition: method, template matching based on channel model and phonic knowledge Method and the method for utilizing artificial neural network.The method of template matching develops comparative maturity, has had reached practical rank at present Section.In template matching method, to pass through four steps: feature extraction, template training, template classification, judgement.Common technology There are three types of: dynamic time warping (DTW), hidden Markov (HMM) theory, vector quantization (VQ) technology.
It is above-mentioned only to depict the technology that we use in field of speech recognition, next specific to Identification process is described in detail with one section of voice: when we will identify one section of voice, it is necessary first to which progress is Extraction to phonetic feature.The work that this step is done exactly is extracted from the voice signal (time-domain signal) of input in fact can With the Acoustic observation characteristic vector sequence O modeled.Generically explain that one section of voice for being exactly needs are identified carries out feature It extracts, has obtained one group of vector that can characterize this section of voice later, the subsequent sequence of operations carried out to voice is all base In this group of vector.
Under conditions of observational characteristic vector O, the maximum probability that one group of term vector W makes P (W | O) is found.This is also just It is that people hears the thing done when one section of voice --- look for all known texts to neutralize this section of voice most matched.But only This formula is only relied on, we can not solve speech recognition problem.It also needs to convert it using Bayes' theorem, The form of model solution can be carried out respectively by converting thereof into us.It converts as follows: W=argmaxP (W | O)=argmax P (O |W)P(W)/P(O);
Wherein, P (O) is the prior probability of Acoustic observation, during automatic speech recognition, due to the Acoustic observation of input Characteristic sequence is fixed, it is believed that the P (O) in above-mentioned formula is constant, therefore P (O) is in the maximized of above-mentioned formula It does not work, can ignore in the process.So we only remaining P (O | W) and P (W) need to consider now.And in acoustic model and Language model each provides the method calculated P (O | W) and P (W), is calculated by acoustic model and language model P (O | W) and P (W).
We can build a solution code space using above-mentioned acoustic model, language model and pronunciation dictionary, it Decoder is utilized afterwards, is scanned in space in conjunction with the speech feature vector of each group of input, is found an optimal word order Column exactly find a paths and make P (O | W) P (W) maximum probability.So, this finally obtained word sequence is exactly that we think The recognition result wanted.It that is to say when we obtain corresponding recognition result, it is also available to arrive corresponding identification probability, For to a certain extent, the accuracy rate of the recognition result is shown in this probability.
S2: judging whether speech recognition probability is greater than the first preset value, if it is, output speech recognition result, if It is no, then execute Mouth-Shape Recognition step;First preset value is 85%;In conventional minutes, regardless of this voice is known Other probability it is high or low, it can all be recorded, can thus generate the hidden danger of very big correctness;In order to avoid this The case where sample, occurs, and can have mode below to go to be implemented, in minutes, when speech recognition probability is pre- lower than first If can be marked when value, leave subsequent arrangement minutes person for and go to check, mark mode, which can be, to be added Thick or italic or turn colors etc..But if it is during ordering, if the obtained accuracy rate of speech recognition compared with It if low, then will not place an order, voice prompting can be issued further to link up with user, to judge whether to continue to place an order.
But in this application not in this manner, in the present embodiment, in order to realize the speech recognition system Higher automation checks voice messaging provided with another mode;It that is to say the side by the image recognition shape of the mouth as one speaks Formula carries out, since the two is using different model and recognition logic, so avoiding mistake to a certain extent It is overlapping, so that the sentence information is can to avoid malfunctioning to a certain extent, so that its voice is known by calculating verification twice Other accuracy rate further improves.When being configured, the two can be set while identify then comparison, it can also In a manner of first progress Mouth-Shape Recognition is set and then is carried out as speech recognition again.
Preferred ground embodiment is as follows: being exactly first to detect to voice messaging, is then gone again by Mouth-Shape Recognition Carry out sentence detection.Due to for image, handling the data volume of voice or relatively small, so one is arranged herein The step of a judgement, just uses Mouth-Shape Recognition, in this way can not only only when the obtained probability of speech recognition is lower Guarantee certain identification accuracy, it is also possible that whole processing speed is very fast.Make it possible to the computing resource phase of consumption To less, the efficiency of integrated automation identification is improved.
S3: the Shape of mouth of active user is obtained by image capture device, and the shape of the mouth as one speaks is obtained by image recognition technology Identification information, the Mouth-Shape Recognition information include Mouth-Shape Recognition probability and Mouth-Shape Recognition result;Described image acquires equipment and uses Annular camera array;Preferred ground mode is the camera shooting in sound collection equipment in the quantity of microphone and annular camera The quantity of head is identical, and the two is made to have an one-to-one relationship, and the information got in this way can be carried out directly It is corresponding, without looking for respective corresponding relationship again.
In order to realize that the two combines, camera can be added on microphone, so that microphone not only may be used To acquire the voice signal that user generates during executing and speaking, the mouth shape image signal of generation, the image can also be acquired Including at least the image at the lip position of face in signal, of course for the variation of better identification export-oriented, in picture signal It also may include the image at other positions of face, this is because shape of the mouth as one speaks variation sometimes is related to human face expression variation.
It needs first to construct Mouth-Shape Recognition library come what is realized during carrying out Mouth-Shape Recognition, and by template matching, This Mouth-Shape Recognition library is trained by obtaining a large amount of figure or video information, so that the model is more healthy and stronger.
As follows during concrete implementation: image capture device obtains the view only changed comprising user's shape of the mouth as one speaks by camera Frequency sequence and input video decoding unit;The lip of input is moved video and obtains view using key frame acquisition technique by video decoding unit Representative key frame in frequency stream, and the keyframe sequence of extraction (normalized lip color static images) are sent into and are schemed As pretreatment unit;The key frame images that image pre-processing unit obtains a upper unit carry out ash using OpenCV library function Degreeization and median filter process then carry out binary conversion treatment to picture, are finally scanned denoising to picture and are standardized Shape of the mouth as one speaks binaryzation picture.
Feature extraction unit carries out shape of the mouth as one speaks spy for the normalization binaryzation picture after image procossing, using template Sign is extracted, and the feature vector for indicating shape of the mouth as one speaks feature is obtained;Shape of the mouth as one speaks template library is pre-established for storing standard shape of the mouth as one speaks feature Vector field homoemorphism block stores the standard shape of the mouth as one speaks template acquired in earlier test, lip when including the pronunciation of all Chinese phonetic alphabets Motion video (individual or multiple) sample and the feature vector for utilizing template to extract for mouth shape image;Mouth-Shape Recognition unit is to place Normalization binary image after reason identified, from the feature vector for obtaining every picture in sequence in feature extraction unit, Mouth-Shape Recognition information is finally obtained, likewise, Mouth-Shape Recognition result also has corresponding Mouth-Shape Recognition probability;After this probability is used for It is continuous to further determine whether to carry out output operation.
For minutes, it is to need to obtain all shape of the mouth as one speaks features certainly, is then stored with sufficiently large letter Breath can be compared complete prediction and record to all situations occurred in conference process.Similarly in the process of ordering In, it can also go to be realized in this way, but in addition to that, there are also another operation sides during ordering Formula goes to carry out Mouth-Shape Recognition, since the pattern of the dish of each businessman is fixed, institute when then usually ordering dishes for user The sentence of use is collected, and obtains the difference of all occur placing an order during ordering such imperative sentences and interrogative sentence, so All word tones being collected into are matched with the shape of the mouth as one speaks afterwards, then go the shape of the mouth as one speaks feature for extracting training word tone corresponding in this way;Greatly The cumbersome of database sharing is reduced greatly, due to the reduction of database amount, it is also possible that speed is significantly during matched Raising;So that it reaches better effect.This step passes through image recognition Shape of mouth mainly to obtain corresponding knowledge Other result.
S4: judging whether Mouth-Shape Recognition probability is higher than speech recognition probability, if it is, output shape of the mouth as one speaks recognition result.? Just like under type during specific implementation, one is when obtained Mouth-Shape Recognition probability is higher than speech recognition probability, Then can directly by identification probability it is higher that export as a result, operate relatively easy, but meeting this when There are such a problems, although being exactly that Mouth-Shape Recognition probability is relatively high, the two result carrys out normal voice identification It says, accuracy rate lower in this way can not all receive, such as the two one 80%, another is 81%;Although mouth Type identification probability is higher than speech recognition probability by 1%, but during minutes, discrimination low in this way is also that can not connect It receives;And since speech recognition result and Mouth-Shape Recognition result are obtained by probability, so it is in a certain range There are errors, and this 1% gap is sometimes also negligible.
So more preferably embodiment is as follows:
The step S4 specifically includes following sub-step:
It calculates step: the difference of Mouth-Shape Recognition probability Yu speech recognition probability is calculated;This difference have it is positive and negative, Rather than absolute value.
Result step: judging whether the difference is greater than the second preset value, if so, output shape of the mouth as one speaks recognition result, and it is described Second preset value is positive value, if it is not, then exporting and being marked speech recognition result and Mouth-Shape Recognition result.Institute Stating the second preset value is 5%.Both both the namely obtained accuracy rate of identification is all in lower level herein, such as Discrimination is all 80%, if it is in the identification process of this type of minutes, then the two is marked respectively, then And respective corresponding relationship is indicated, in order to which later period check person more specific can position and be checked.And if it is in point In the identification process of such type of eating, then directly selects a wherein result and be sent to audio output progress audio reading, make The personnel of ordering are obtained to further confirm that.
Although aforesaid way can solve the problems, such as to promote accuracy to a certain extent, in order to preferably can be into Row automation, so that minutes check person reduces workload, so that a little less confirmation of the personnel that order can order correct dish Product additionally use another way in the present embodiment and go to be identified.In the result step, judge whether the difference is greater than Second preset value, if so, output shape of the mouth as one speaks recognition result, knows if not, calculating separately voice by natural language processing technique Meaning of one's words correlation between other result and Mouth-Shape Recognition result and context sentence, it is higher as prediction using meaning of one's words correlation As a result it is exported.
When being analyzed it by natural language processing technique, first choice need to do is to two kinds of recognition results into Row participle has only carried out being possible to carry out further comparing analysis to the two after participle.It is compared in analysis specific It goes to carry out just like under type;The content for identifying and not meeting word specification in two kinds of recognition results can be first passed through to syntax, so Afterwards in word segmentation result, the correlation between each word carries out semantic analysis.
For different linguistic units, the task of semantic analysis is different.On the level of word, semantic analysis it is basic Task is to carry out word sense disambiguation (WSD), is semantic character labeling (SRL) on sentence surface;There is supervision word sense disambiguation according to upper Hereafter classification task is completed with annotation results.And unsupervised word sense disambiguation is commonly known as cluster task, uses clustering algorithm pair All contexts of the same polysemant carry out equivalence class partition, when meaning of a word identification, by the context of the word with it is each The equivalence class that the meaning of a word corresponds to context is compared, and the meaning of a word of word is determined by the corresponding equivalence class of context.In addition, in addition to There are supervision and unsupervised word sense disambiguation, there are also a kind of disambiguation methods based on dictionary.It is counted by natural language recognition technology Calculation obtains that meaning of one's words correlation is higher to be exported as a result.Institute's speech recognition result, Mouth-Shape Recognition result and prediction knot Fruit is minutes or instruction of ordering dishes.
The scheme of the present embodiment further improves the accuracy rate of speech recognition by integrated voice identification and Mouth-Shape Recognition, And when the result accuracy both identified is all lower, further by natural language processing technique come into one Step is modified, and it is higher as a result, being verified and being judged by the way that various aspects are comprehensive, so that voice is known to choose meaning of one's words correlation Other accuracy greatly improves;Also the practicability for allowing for the system is more preferable.
Embodiment two
Embodiment two discloses a kind of electronic equipment, which includes processor, memory and program, wherein locating One or more can be used in reason device and memory, and program is stored in memory, and is configured to be executed by processor, When processor executes the program, a kind of method of raising accuracy of speech recognition of realization embodiment one.The electronic equipment can be with It is a series of electronic equipment of mobile phone, computer, tablet computer etc..
Embodiment three
Embodiment three discloses a kind of computer readable storage medium, and the storage medium is for storing program, and the journey When sequence is executed by processor, a kind of method of raising accuracy of speech recognition of realization embodiment one.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above, can also be performed in method provided by any embodiment of the invention Relevant operation.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions use so that an electronic equipment (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the above-mentioned embodiment based on content update notice device, included each unit and mould Block is only divided according to the functional logic, but is not limited to the above division, and is as long as corresponding functions can be realized It can;In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection model being not intended to restrict the invention It encloses.
The above embodiment is only the preferred embodiment of the present invention, and the scope of protection of the present invention is not limited thereto, The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention Claimed range.

Claims (10)

1. a kind of method for improving accuracy of speech recognition, which comprises the following steps:
Voice recognition step: the voice messaging of active user is obtained by sound collection equipment, and is obtained by speech recognition technology Voice recognition information is taken, the voice recognition information includes speech recognition probability and speech recognition result;
First judgment step: judging whether speech recognition probability is greater than the first preset value, if it is, output speech recognition knot Fruit, if it is not, then executing Mouth-Shape Recognition step;
Mouth-Shape Recognition step: the Shape of mouth of active user is obtained by image capture device, and is obtained by image recognition technology Mouth-Shape Recognition information is taken, the Mouth-Shape Recognition information includes Mouth-Shape Recognition probability and Mouth-Shape Recognition result;
Second judgment step: judging whether Mouth-Shape Recognition probability is higher than speech recognition probability, if it is, output Mouth-Shape Recognition knot Fruit.
2. a kind of method for improving accuracy of speech recognition as described in claim 1, which is characterized in that the second judgement step Suddenly include following sub-step:
It calculates step: the difference of Mouth-Shape Recognition probability Yu speech recognition probability is calculated;
Result step: judging whether the difference is greater than the second preset value, if so, output shape of the mouth as one speaks recognition result, and described second Preset value is positive value.
3. a kind of method for improving accuracy of speech recognition as claimed in claim 2, which is characterized in that the result step In, judge whether the difference is greater than the second preset value, second preset value is positive value, if it is, output Mouth-Shape Recognition knot Fruit, if it is not, then exporting and being marked speech recognition result and Mouth-Shape Recognition result.
4. a kind of method for improving accuracy of speech recognition as claimed in claim 2, which is characterized in that the result step In, judge whether the difference is greater than the second preset value, if so, output shape of the mouth as one speaks recognition result, if not, by natural language Reason technology calculates separately the meaning of one's words correlation between speech recognition result and Mouth-Shape Recognition result and context sentence, using the meaning of one's words Correlation is higher to be exported as prediction result.
5. a kind of method for improving accuracy of speech recognition as claimed in claim 4, which is characterized in that the speech recognition knot Fruit, Mouth-Shape Recognition result and prediction result are minutes or instruction of ordering dishes.
6. a kind of method of raising accuracy of speech recognition as described in any one of claim 1-5, which is characterized in that institute Stating the first preset value is 85%.
7. a kind of method of raising accuracy of speech recognition as described in any one of claim 1-5, which is characterized in that institute Stating the second preset value is 5%.
8. a kind of method of raising accuracy of speech recognition as described in any one of claim 1-5, which is characterized in that institute Sound collection equipment is stated using annular microphone, described image acquires equipment using annular camera array.
9. a kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes any one of claim 1-8 institute when executing the computer program A kind of method for the raising accuracy of speech recognition stated.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program A kind of method of raising accuracy of speech recognition as described in any one of claim 1-8 is realized when being executed by processor.
CN201910072525.9A 2019-01-25 2019-01-25 A kind of method, electronic equipment and storage medium improving accuracy of speech recognition Pending CN109872714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910072525.9A CN109872714A (en) 2019-01-25 2019-01-25 A kind of method, electronic equipment and storage medium improving accuracy of speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910072525.9A CN109872714A (en) 2019-01-25 2019-01-25 A kind of method, electronic equipment and storage medium improving accuracy of speech recognition

Publications (1)

Publication Number Publication Date
CN109872714A true CN109872714A (en) 2019-06-11

Family

ID=66918031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910072525.9A Pending CN109872714A (en) 2019-01-25 2019-01-25 A kind of method, electronic equipment and storage medium improving accuracy of speech recognition

Country Status (1)

Country Link
CN (1) CN109872714A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739534A (en) * 2020-06-04 2020-10-02 广东小天才科技有限公司 Processing method and device for assisting speech recognition, electronic equipment and storage medium
CN112562646A (en) * 2020-12-09 2021-03-26 江苏科技大学 Robot voice recognition method
CN112885168A (en) * 2021-01-21 2021-06-01 绍兴市人民医院 Immersive speech feedback training system based on AI
CN112927688A (en) * 2021-01-25 2021-06-08 思必驰科技股份有限公司 Voice interaction method and system for vehicle
CN114757209A (en) * 2022-06-13 2022-07-15 天津大学 Man-machine interaction instruction analysis method and device based on multi-mode semantic role recognition

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023703A (en) * 2009-09-22 2011-04-20 现代自动车株式会社 Combined lip reading and voice recognition multimodal interface system
CN103915095A (en) * 2013-01-06 2014-07-09 华为技术有限公司 Method, interaction device, server and system for voice recognition
CN105046231A (en) * 2015-07-27 2015-11-11 小米科技有限责任公司 Face detection method and device
CN105122353A (en) * 2013-05-20 2015-12-02 英特尔公司 Natural human-computer interaction for virtual personal assistant systems
CN105760363A (en) * 2016-02-17 2016-07-13 腾讯科技(深圳)有限公司 Text file word sense disambiguation method and device
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition
CN106297800A (en) * 2016-08-10 2017-01-04 中国科学院计算技术研究所 A kind of method and apparatus of adaptive speech recognition
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106571135A (en) * 2016-10-27 2017-04-19 苏州大学 Whisper speech feature extraction method and system
CN106875941A (en) * 2017-04-01 2017-06-20 彭楚奥 A kind of voice method for recognizing semantics of service robot
CN106971733A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 The method and system and intelligent terminal of Application on Voiceprint Recognition based on voice de-noising
CN107404381A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 A kind of identity identifying method and device
CN107885483A (en) * 2017-11-07 2018-04-06 广东欧珀移动通信有限公司 Method of calibration, device, storage medium and the electronic equipment of audio-frequency information
CN107945789A (en) * 2017-12-28 2018-04-20 努比亚技术有限公司 Audio recognition method, device and computer-readable recording medium
CN108289244A (en) * 2017-12-28 2018-07-17 努比亚技术有限公司 Video caption processing method, mobile terminal and computer readable storage medium
CN108346427A (en) * 2018-02-05 2018-07-31 广东小天才科技有限公司 A kind of audio recognition method, device, equipment and storage medium
CN109036381A (en) * 2018-08-08 2018-12-18 平安科技(深圳)有限公司 Method of speech processing and device, computer installation and readable storage medium storing program for executing
CN109147775A (en) * 2018-10-18 2019-01-04 深圳供电局有限公司 A kind of audio recognition method neural network based and device

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023703A (en) * 2009-09-22 2011-04-20 现代自动车株式会社 Combined lip reading and voice recognition multimodal interface system
CN103915095A (en) * 2013-01-06 2014-07-09 华为技术有限公司 Method, interaction device, server and system for voice recognition
CN105122353A (en) * 2013-05-20 2015-12-02 英特尔公司 Natural human-computer interaction for virtual personal assistant systems
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition
CN105046231A (en) * 2015-07-27 2015-11-11 小米科技有限责任公司 Face detection method and device
CN106971733A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 The method and system and intelligent terminal of Application on Voiceprint Recognition based on voice de-noising
CN105760363A (en) * 2016-02-17 2016-07-13 腾讯科技(深圳)有限公司 Text file word sense disambiguation method and device
CN107404381A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 A kind of identity identifying method and device
CN106297800A (en) * 2016-08-10 2017-01-04 中国科学院计算技术研究所 A kind of method and apparatus of adaptive speech recognition
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106571135A (en) * 2016-10-27 2017-04-19 苏州大学 Whisper speech feature extraction method and system
CN106875941A (en) * 2017-04-01 2017-06-20 彭楚奥 A kind of voice method for recognizing semantics of service robot
CN107885483A (en) * 2017-11-07 2018-04-06 广东欧珀移动通信有限公司 Method of calibration, device, storage medium and the electronic equipment of audio-frequency information
CN107945789A (en) * 2017-12-28 2018-04-20 努比亚技术有限公司 Audio recognition method, device and computer-readable recording medium
CN108289244A (en) * 2017-12-28 2018-07-17 努比亚技术有限公司 Video caption processing method, mobile terminal and computer readable storage medium
CN108346427A (en) * 2018-02-05 2018-07-31 广东小天才科技有限公司 A kind of audio recognition method, device, equipment and storage medium
CN109036381A (en) * 2018-08-08 2018-12-18 平安科技(深圳)有限公司 Method of speech processing and device, computer installation and readable storage medium storing program for executing
CN109147775A (en) * 2018-10-18 2019-01-04 深圳供电局有限公司 A kind of audio recognition method neural network based and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
洪文学等: "《可视化模式识别》", 31 July 2014 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739534A (en) * 2020-06-04 2020-10-02 广东小天才科技有限公司 Processing method and device for assisting speech recognition, electronic equipment and storage medium
CN111739534B (en) * 2020-06-04 2022-12-27 广东小天才科技有限公司 Processing method and device for assisting speech recognition, electronic equipment and storage medium
CN112562646A (en) * 2020-12-09 2021-03-26 江苏科技大学 Robot voice recognition method
CN112885168A (en) * 2021-01-21 2021-06-01 绍兴市人民医院 Immersive speech feedback training system based on AI
CN112927688A (en) * 2021-01-25 2021-06-08 思必驰科技股份有限公司 Voice interaction method and system for vehicle
CN114757209A (en) * 2022-06-13 2022-07-15 天津大学 Man-machine interaction instruction analysis method and device based on multi-mode semantic role recognition

Similar Documents

Publication Publication Date Title
US10657969B2 (en) Identity verification method and apparatus based on voiceprint
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
CN109255113B (en) Intelligent proofreading system
CN109872714A (en) A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
US9230547B2 (en) Metadata extraction of non-transcribed video and audio streams
US10515292B2 (en) Joint acoustic and visual processing
CN106782603B (en) Intelligent voice evaluation method and system
CN109192194A (en) Voice data mask method, device, computer equipment and storage medium
Sahoo et al. Emotion recognition from audio-visual data using rule based decision level fusion
CN108320734A (en) Audio signal processing method and device, storage medium, electronic equipment
Zvarevashe et al. Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm
US20230089308A1 (en) Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering
CN112784696A (en) Lip language identification method, device, equipment and storage medium based on image identification
CN112232276B (en) Emotion detection method and device based on voice recognition and image recognition
CN112151015A (en) Keyword detection method and device, electronic equipment and storage medium
CN111899740A (en) Voice recognition system crowdsourcing test case generation method based on test requirements
Birla A robust unsupervised pattern discovery and clustering of speech signals
Gangashetty et al. Detection of vowel on set points in continuous speech using autoassociative neural network models.
CN113539234B (en) Speech synthesis method, device, system and storage medium
CN113539235B (en) Text analysis and speech synthesis method, device, system and storage medium
Chit et al. Myanmar continuous speech recognition system using convolutional neural network
CN114911922A (en) Emotion analysis method, emotion analysis device and storage medium
Fennir et al. Acoustic scene classification for speaker diarization
CN113990288B (en) Method for automatically generating and deploying voice synthesis model by voice customer service
Bertero et al. Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190611

RJ01 Rejection of invention patent application after publication