CN107644638A - Audio recognition method, device, terminal and computer-readable recording medium - Google Patents

Audio recognition method, device, terminal and computer-readable recording medium Download PDF

Info

Publication number
CN107644638A
CN107644638A CN201710964474.1A CN201710964474A CN107644638A CN 107644638 A CN107644638 A CN 107644638A CN 201710964474 A CN201710964474 A CN 201710964474A CN 107644638 A CN107644638 A CN 107644638A
Authority
CN
China
Prior art keywords
phoneme sequence
aligned phoneme
voice
decoding network
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710964474.1A
Other languages
Chinese (zh)
Other versions
CN107644638B (en
Inventor
何金来
雷宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Rubu Technology Co.,Ltd.
Original Assignee
Beijing Intelligent Housekeeper Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Intelligent Housekeeper Technology Co Ltd filed Critical Beijing Intelligent Housekeeper Technology Co Ltd
Priority to CN201710964474.1A priority Critical patent/CN107644638B/en
Publication of CN107644638A publication Critical patent/CN107644638A/en
Application granted granted Critical
Publication of CN107644638B publication Critical patent/CN107644638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of audio recognition method, including the acoustic feature according to the voice collected, calculate the voice and the acoustics likelihood probability of the aligned phoneme sequence in decoding network;Wherein described decoding network includes multigroup aligned phoneme sequence;Noise content is perhaps corresponded in the corresponding default order word of each group of aligned phoneme sequence;According to the acoustics likelihood probability, the matching probability of the voice and the aligned phoneme sequence is obtained;It is the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.Correspondingly, invention additionally discloses a kind of speech recognition equipment, terminal and computer-readable recording medium.The present invention, which realizes, to be avoided noise being identified as order word, and without calculating confidence level after speech recognition, reaches the effect for reducing false recognition rate.

Description

Audio recognition method, device, terminal and computer-readable recording medium
Technical field
The present embodiments relate to speech recognition technology, more particularly to a kind of audio recognition method, device, terminal and calculating Machine readable storage medium storing program for executing.
Background technology
In voice command words identification technology, misrecognition is always a more insoluble problem.Order word identifies Why false recognition rate is higher, is because the order word recognition method of prior art is generally by constructing decoding network come real It is existing, multigroup aligned phoneme sequence corresponding with default order word is included in the decoding network.Inputting any voice all can be according to the language Sound searches out an aligned phoneme sequence matched the most from decoding network, therefore causes to misidentify.
The method for solving for noise to be identified as order word at present is to calculate the confidence level of recognition result, when confidence level is more than in advance If threshold value when represent that identification is correct, when confidence level is less than the threshold value expression do not recognize order word.Due to confidence level Calculate rely on several factors, especially it is affected by environment can cause confidence level value changes scope it is very big.Under noisy environment, often The very high situation of the very low but wrong recognition result confidence level of correct recognition result confidence level occurs so that false recognition rate It is still very high.
The content of the invention
The present invention provides a kind of recognition methods of voice command, device, terminal and computer-readable recording medium, to realize Avoid noise being identified as order word, and without calculating confidence level after speech recognition, reach the effect for reducing false recognition rate.
In a first aspect, the embodiments of the invention provide a kind of audio recognition method, including:
According to the acoustic feature of the voice collected, the voice and the acoustics phase of the aligned phoneme sequence in decoding network are calculated Like probability;Wherein, the decoding network includes multigroup aligned phoneme sequence;In the corresponding default order word of each group of aligned phoneme sequence Perhaps correspond to noise content;
According to the acoustics likelihood probability, the matching probability of the voice and the aligned phoneme sequence is obtained;
It is the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.
Second aspect, present invention also offers a kind of speech recognition equipment, including:
Computing module, for the acoustic feature according to the voice collected, calculate the voice and the sound in decoding network The acoustics likelihood probability of prime sequences;Wherein, the decoding network includes multigroup aligned phoneme sequence;Each group of aligned phoneme sequence is corresponding one Noise content is perhaps corresponded in default order word;
Matching module, for according to the acoustics likelihood probability, obtaining the voice and the matching of the aligned phoneme sequence being general Rate;
Identification module, for being the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.
The third aspect, present invention also offers a kind of terminal, the terminal includes:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processing Device realizes the audio recognition method that any embodiment of the present invention provides.
Fourth aspect, present invention also offers a kind of computer-readable recording medium, computer program is stored thereon with, should The audio recognition method that any embodiment of the present invention provides is realized when program is executed by processor.
The present invention can solved by increasing aligned phoneme sequence corresponding to noise content, the voice collected in decoding network Searched in code network and be just identified as noise or order word when most matching aligned phoneme sequence, without searching for aligned phoneme sequence in decoding network Confidence calculations are carried out to search result afterwards, the confidence calculations method influenceed by environment phoneme is used so as to solve prior art The problem of causing false recognition rate high, realization avoids noise being identified as order word, and reduces the effect of false recognition rate.
Brief description of the drawings
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one provides;
Fig. 2 is the flow chart for the audio recognition method that the embodiment of the present invention two provides;
Fig. 3 is the structural representation for the speech recognition equipment that the embodiment of the present invention three provides;
Fig. 4 is the structural representation for the terminal that the embodiment of the present invention four provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one provides, and the present embodiment is applicable to order word The situation of identification, this method can be performed by speech recognition equipment, specifically comprised the following steps:
Step 110, the acoustic feature according to the voice collected, calculate the voice and the aligned phoneme sequence in decoding network Acoustics likelihood probability;
Wherein, the decoding network includes multigroup aligned phoneme sequence;The corresponding default order word of each group of aligned phoneme sequence Inside perhaps correspond to noise content;Because the embodiment of the present invention is to be applied to the identification to voice command, any non-command word voice All it is interference for the identification of order word, therefore is all noise, then noise described in the embodiment of the present invention refers to any non-command word Voice.Specifically, decoding network can be made up of interference networks, the phoneme node connected in interference networks multiple phoneme nodes Form aligned phoneme sequence.In field of speech recognition, the acoustics likelihood probability of a phoneme and the phoneme in decoding network, typically pass through The acoustic model of phoneme in structure decoding network realizes that acoustics likelihood probability refers to using the acoustic feature of voice for input correspondence Acoustic model output probability.
Step 120, according to the acoustics likelihood probability, obtain the matching probability of the voice and the aligned phoneme sequence;
Wherein, in order to simplify the data processing of identification process, matching probability directly can be used as using acoustics likelihood probability;But should High scene is required for identifying, as the audio recognition method of high discrimination, matching probability removes to be believed comprising acoustics likelihood probability Breath is outer, can also include other information, for example, for the decoding network using weighted finite state converted configuration, matching Probability also includes the weight information of aligned phoneme sequence, and the weight information can relate to the probability that aligned phoneme sequence occurs in actual applications, That is probabilistic language model.For example, in order word identifies scene, partial order word is higher in the probability that practical application occurs, such as " volume tunes up ", " shutdown " etc., and partial order word is relatively low in the probability that practical application occurs, similar in both acoustic features In the case of, the aligned phoneme sequence weight corresponding to the former can be set higher than the aligned phoneme sequence weight corresponding to the latter.In addition, weight Information can also adjust according to the discrimination in the implementation process of audio recognition method.Step 130, by the speech recognition it is Content corresponding to matching probability highest aligned phoneme sequence.
The operation principle of above-mentioned steps is to increase aligned phoneme sequence corresponding to noise content in decoding network, can be according to typing The acoustic feature of noise cause the matching of noise corresponding with the noise content in decoding network aligned phoneme sequence so that based on acoustics Feature recognition goes out the noise of typing, is avoided that non-command word being identified as order word, and compared to prior art using identification after The method for calculating confidence level, the scheme that the present embodiment avoids for noise being identified as order word are not influenceed by environment phoneme, dropped significantly Low false recognition rate.
In order to reduce false recognition rate, improve by the matching of noise corresponding with noise content in decoding network aligned phoneme sequence can Energy property, the present embodiment provide a kind of preferred embodiment.Specifically, step 110, the acoustic feature according to the voice collected, The voice and the acoustics likelihood probability of the aligned phoneme sequence in decoding network are calculated, is specifically included:
Obtain the acoustic model of aligned phoneme sequence in the decoding network of training in advance;Wherein, sound corresponding to noise content is trained Noisy samples include the speech samples that multiple differences of acoustic feature between any two are more than default threshold value used by learning model;
According to the acoustic feature of the voice collected, calculated using the acoustic model in the voice and decoding network The acoustics likelihood probability of aligned phoneme sequence.
In above-mentioned preferred embodiment, the noisy samples of training noise acoustic model include multiple acoustics between any two spy Levy the speech samples that difference is more than default threshold value, i.e. noise acoustic model is using multiple speech samples instructions to differ greatly Get, such as noisy ambient sound and a large amount of mutually different non-command word phrases etc..Use the big language of a large amount of differences Sound sample training to acoustic model corresponding to aligned phoneme sequence can be intended to that between various sound difference minimizes from Right sound, it is easier to various non-command word voice match.And the order word sample of training order word acoustic model is usually to use The order word sound that different accents are read aloud, the acoustic feature difference between order word sample is little, therefore only for order word phase Near sound acoustics likelihood probability is high.Therefore, above-mentioned preferred embodiment can be improved noise content in noise and decoding network The possibility of corresponding aligned phoneme sequence matching, reduces false recognition rate.
Further, the decoding network uses weighted finite state converted configuration;Then step 120, described According to the acoustics likelihood probability, the matching probability of the voice and the aligned phoneme sequence is obtained, is specifically included:Calculate the acoustics Likelihood probability and the weight of the aligned phoneme sequence and value, as the voice and the matching probability of the aligned phoneme sequence.Certainly, The product of acoustics likelihood probability and weight can also be calculated as matching probability.
Further, the decoding network also includes aligned phoneme sequence corresponding with Jing Yin content.It is corresponding to increase Jing Yin content Aligned phoneme sequence can improve Consumer's Experience.Because can to noise and it is Jing Yin make differentiation, to the different signal of user feedback. For example, noise is probably because the wrong voice of user causes, therefore the information that exportable prompting user retells, can for Jing Yin It can be that accidentally touch identification device causes typing voice to user, identification output can be set not performed any for sky Operation, leaves user alone, so as to improve Consumer's Experience.
It should be noted that acoustics likelihood probability is calculated, obtain matching probability and then searches for matching probability highest phoneme Sequence, can be the matching probability of voice for first calculating each aligned phoneme sequence and collecting, then comparison match probability obtains With probability highest aligned phoneme sequence.Can also be the voice initial phoneme for first searching with collecting acoustics likelihood probability it is close Decoding network in phoneme, then according to acoustics likelihood probability, weight (including probabilistic language model information) etc., judge the phase In multigroup aligned phoneme sequence where near phoneme, next phoneme of which group matches with next phoneme of the voice collected Probability highest, and then determine that next phoneme node of this group of aligned phoneme sequence matches with next phoneme of the voice collected. Further, judgement search is continued executing with, the aligned phoneme sequence finally obtained is exactly matching probability highest aligned phoneme sequence.
In summary, the technical scheme of the present embodiment, aligned phoneme sequence corresponding to increase noise content, is adopted in decoding network The voice collected can be searched in decoding network is just identified as noise or order word when most matching aligned phoneme sequence, without solving Confidence calculations are carried out to search result after code web search aligned phoneme sequence, used so as to solve prior art by environment phoneme shadow The problem of loud confidence calculations method causes false recognition rate high, realization avoids noise being identified as order word, and reduces and know by mistake The not effect of rate.
Embodiment two
Fig. 2 is the flow chart for the audio recognition method that the embodiment of the present invention two provides, and the present embodiment is applicable to order word The situation of identification, this method can be performed by speech recognition equipment.Base of the present embodiment in the audio recognition method of embodiment one On plinth, the step of adding adjust automatically decoding network parameter so that audio recognition method can dynamically change parameter, lasting drop Low false recognition rate.The audio recognition method that the present embodiment provides includes:
Step 210, the acoustic feature according to the voice collected, calculate the voice and the aligned phoneme sequence in decoding network Acoustics likelihood probability;Wherein, the decoding network includes multigroup aligned phoneme sequence;Corresponding one of each group of aligned phoneme sequence is default Noise content is perhaps corresponded in order word;
Step 220, according to the acoustics likelihood probability, obtain the matching probability of the voice and the aligned phoneme sequence;
Step 230, the content by the speech recognition corresponding to matching probability highest aligned phoneme sequence;
If the voice that step 240, confirmation collect is noise, and is order word set in advance by the speech recognition, Then improve the weight of aligned phoneme sequence corresponding to noise content in the decoding network.
The present embodiment can also gather confirmation (can provide confirmation by user) after voice is identified, confirm identification As a result it is whether correct, if the voice for confirming to collect is noise, and it is order word by speech recognition, then illustrates that false recognition rate is still omited Height, therefore the weight of aligned phoneme sequence corresponding to noise content in the decoding network is improved, to increase noise aligned phoneme sequence with adopting The matching probability of the voice collected so that non-command word voice is more likely to be identified as noise.Further, settable confirmation is adopted The voice integrated reaches default threshold value as noise and by the speech recognition as the number of order word, just improves noise phoneme sequence The weight of row, to avoid identifying individually that it is unbalance that mistake causes to adjust.
Preferably, in addition to:If the voice for confirming to collect is order word, and is noise by the speech recognition, then drop The weight of aligned phoneme sequence corresponding to noise content in the low decoding network.
Further, the settable voice for confirming to collect is order word and reaches the number that the speech recognition is noise To the weight of default threshold value, just reduction noise aligned phoneme sequence.In order to reduce false recognition rate, inevitably on a small quantity will Order word is identified as the situation of noise, and above-mentioned preferred scheme can improve the discrimination to order word.
Further, the also settable instruction triggered according to user, is adjusted in the decoding network corresponding to noise content The weight of aligned phoneme sequence, to reduce false recognition rate or improve discrimination.
The technical scheme of the present embodiment, increase aligned phoneme sequence corresponding to noise content, the language collected in decoding network Sound can be searched in decoding network is just identified as noise or order word when most matching aligned phoneme sequence, realization avoids knowing noise Not Wei order word, and reduce false recognition rate effect.And according to recognition result, adjust the power of noise aligned phoneme sequence in decoding network Weight, to realize dynamic modification parameter, persistently reduce false recognition rate.
Embodiment three
Fig. 3 is the structural representation for the speech recognition equipment that the embodiment of the present invention three provides.The speech recognition equipment includes:
Computing module 310, for the acoustic feature according to the voice collected, calculate in the voice and decoding network The acoustics likelihood probability of aligned phoneme sequence;Wherein, the decoding network includes multigroup aligned phoneme sequence;Each group of aligned phoneme sequence corresponding one Noise content is perhaps corresponded in individual default order word;
Matching module 320, for according to the acoustics likelihood probability, obtaining the matching of the voice and the aligned phoneme sequence Probability;
Identification module 330, for being the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.
Preferably, the decoding network uses weighted finite state converted configuration.The speech recognition equipment is also Including:
Weight adjusting module 340, if being noise for the voice for confirming to collect, and it is to set in advance by the speech recognition Fixed order word, then improve the weight of aligned phoneme sequence corresponding to noise content in the decoding network.
Preferably, matching module 320 includes:
With value computing unit, for calculating the weight and value of the acoustics likelihood probability and the aligned phoneme sequence, as The voice and the matching probability of the aligned phoneme sequence.
Preferably, the decoding network also includes aligned phoneme sequence corresponding with Jing Yin content.
Preferably, the computing module includes:
Model acquiring unit, the acoustic model of aligned phoneme sequence in the decoding network for obtaining training in advance;Wherein, train Noisy samples include multiple differences of acoustic feature between any two more than default used by acoustic model corresponding to noise content The speech samples of threshold value;
Model arithmetic unit, for the acoustic feature according to the voice collected, using described in acoustic model calculating Voice and the acoustics likelihood probability of the aligned phoneme sequence in decoding network.
The speech recognition equipment that the embodiment of the present invention is provided, which can perform the voice that any embodiment of the present invention is provided, to be known Other method, possess the corresponding functional module of execution method and beneficial effect.
Example IV
Fig. 4 is a kind of structural representation for terminal that the embodiment of the present invention four provides, as shown in figure 4, the terminal includes place Manage device 410, memory 420, input unit 430 and output device 440;In terminal the quantity of processor 410 can be one or It is multiple, in Fig. 4 by taking a processor 410 as an example;Processor 410, memory 420, input unit 430 and output dress in terminal Putting 440 can be connected by bus or other modes, in Fig. 4 exemplified by being connected by bus.
Memory 420 is used as a kind of computer-readable recording medium, and journey is can perform available for storage software program, computer Sequence and module, programmed instruction/module is (for example, speech recognition fills as corresponding to the audio recognition method in the embodiment of the present invention Computing module 310, matching module 320, identification module 330 and weight adjusting module 340 in putting).Processor 410 passes through operation Software program, instruction and the module being stored in memory 420, so as to perform at the various function application and data of terminal Reason, that is, realize above-mentioned audio recognition method.
Memory 420 can mainly include storing program area and storage data field, wherein, storing program area can store operation system Application program needed for system, at least one function;Storage data field can store uses created data etc. according to terminal.This Outside, memory 420 can include high-speed random access memory, can also include nonvolatile memory, for example, at least one Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 420 can enter one Step includes that relative to the remotely located memory of processor 410, these remote memories network connection to terminal can be passed through.On The example for stating network includes but is not limited to internet, intranet, LAN, mobile radio communication and combinations thereof.
Input unit 430 can be used for the numeral or character information for receiving input, and produce with the user of terminal set with And the key signals input that function control is relevant.Output device 740 may include the display devices such as display screen.
Embodiment five
The embodiment of the present invention five also provides a kind of computer-readable recording medium for being stored with computer program, the calculating Machine program realizes a kind of audio recognition method when being subsequently can by computer device and performing, and this method includes:
According to the acoustic feature of the voice collected, the voice and the acoustics phase of the aligned phoneme sequence in decoding network are calculated Like probability;Wherein, the decoding network includes multigroup aligned phoneme sequence;In the corresponding default order word of each group of aligned phoneme sequence Perhaps correspond to noise content;
According to the acoustics likelihood probability, the matching probability of the voice and the aligned phoneme sequence is obtained;
It is the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.
Certainly, a kind of computer-readable recording medium for storage computer program that the embodiment of the present invention is provided, its journey The method operation that sequence is not limited to the described above, can also carry out in the audio recognition method that any embodiment of the present invention is provided Associative operation.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to realized by hardware, but the former is more in many cases Good embodiment.Based on such understanding, what technical scheme substantially contributed to prior art in other words Part can be embodied in the form of software product, and the computer software product can be stored in computer-readable recording medium In, floppy disk, read-only storage (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are causing a computer to set Standby (can be personal computer, server, or network equipment etc.) performs the method described in each embodiment of the present invention.
It is worth noting that, in the embodiment of above-mentioned speech recognition equipment, included unit and module are simply pressed Divided according to function logic, but be not limited to above-mentioned division, as long as corresponding function can be realized;In addition, The specific name of each functional unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims (10)

  1. A kind of 1. audio recognition method, it is characterised in that including:
    According to the acoustic feature of the voice collected, it is similar to the acoustics of the aligned phoneme sequence in decoding network general to calculate the voice Rate;Wherein, the decoding network includes multigroup aligned phoneme sequence, and each group of aligned phoneme sequence is corresponded in a default order word perhaps Corresponding noise content;
    According to the acoustics likelihood probability, the matching probability of the voice and the aligned phoneme sequence is obtained;
    It is the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.
  2. 2. audio recognition method as claimed in claim 1, it is characterised in that the decoding network is to use weighted finite state Converted configuration;
    It is described according to the acoustics likelihood probability, obtain the matching probability of the voice and the aligned phoneme sequence, specifically include:
    The weight and value of the acoustics likelihood probability and the aligned phoneme sequence is calculated, as the voice and the aligned phoneme sequence Matching probability.
  3. 3. audio recognition method as claimed in claim 2, it is characterised in that also include:
    If the voice for confirming to collect is noise, and is order word set in advance by the speech recognition, then the solution is improved The weight of aligned phoneme sequence corresponding to noise content in code network.
  4. 4. the audio recognition method as described in claim 1-3 is any, it is characterised in that the decoding network also include with it is Jing Yin Aligned phoneme sequence corresponding to content.
  5. 5. the audio recognition method as described in claim 1-3 is any, it is characterised in that the sound for the voice that the basis collects Feature is learned, the voice and the acoustics likelihood probability of the aligned phoneme sequence in decoding network is calculated, specifically includes:
    Obtain the acoustic model of aligned phoneme sequence in the decoding network of training in advance;Wherein, acoustic mode corresponding to noise content is trained Noisy samples include the speech samples that multiple differences of acoustic feature between any two are more than default threshold value used by type;
    According to the acoustic feature of the voice collected, the voice and the phoneme in decoding network are calculated using the acoustic model The acoustics likelihood probability of sequence.
  6. A kind of 6. speech recognition equipment, it is characterised in that including:
    Computing module, for the acoustic feature according to the voice collected, calculate the voice and the phoneme sequence in decoding network The acoustics likelihood probability of row;Wherein, the decoding network includes multigroup aligned phoneme sequence, and corresponding one of each group of aligned phoneme sequence is default Order word in perhaps correspond to noise content;
    Matching module, for according to the acoustics likelihood probability, obtaining the matching probability of the voice and the aligned phoneme sequence;
    Identification module, for being the content corresponding to matching probability highest aligned phoneme sequence by the speech recognition.
  7. 7. speech recognition equipment as claimed in claim 6, it is characterised in that the decoding network is to use weighted finite state Converted configuration;
    The speech recognition equipment also includes:
    Weight adjusting module, if being noise for the voice for confirming to collect, and it is life set in advance by the speech recognition Word is made, then improves the weight of aligned phoneme sequence corresponding to noise content in the decoding network.
  8. 8. speech recognition equipment as claimed in claims 6 or 7, it is characterised in that the computing module includes:
    Model acquiring unit, the acoustic model of aligned phoneme sequence in the decoding network for obtaining training in advance;Wherein, noise is trained Noisy samples are more than default threshold value including multiple differences of acoustic feature between any two used by acoustic model corresponding to content Speech samples;
    Model arithmetic unit, for the acoustic feature according to the voice collected, the voice is calculated using the acoustic model With the acoustics likelihood probability of the aligned phoneme sequence in decoding network.
  9. 9. a kind of terminal, it is characterised in that the terminal includes:
    One or more processors;
    Memory, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now audio recognition method as described in any in claim 1-5.
  10. 10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The audio recognition method as described in any in claim 1-5 is realized during execution.
CN201710964474.1A 2017-10-17 2017-10-17 Audio recognition method, device, terminal and computer readable storage medium Active CN107644638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710964474.1A CN107644638B (en) 2017-10-17 2017-10-17 Audio recognition method, device, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710964474.1A CN107644638B (en) 2017-10-17 2017-10-17 Audio recognition method, device, terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN107644638A true CN107644638A (en) 2018-01-30
CN107644638B CN107644638B (en) 2019-01-04

Family

ID=61123547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710964474.1A Active CN107644638B (en) 2017-10-17 2017-10-17 Audio recognition method, device, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN107644638B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831446A (en) * 2018-05-24 2018-11-16 百度在线网络技术(北京)有限公司 Method and apparatus for generating sample
CN108932943A (en) * 2018-07-12 2018-12-04 广州视源电子科技股份有限公司 Order word sound detection method, device, equipment and storage medium
CN109243429A (en) * 2018-11-21 2019-01-18 苏州奇梦者网络科技有限公司 A kind of pronunciation modeling method and device
CN109274845A (en) * 2018-08-31 2019-01-25 平安科技(深圳)有限公司 Intelligent sound pays a return visit method, apparatus, computer equipment and storage medium automatically
CN109273007A (en) * 2018-10-11 2019-01-25 科大讯飞股份有限公司 Voice awakening method and device
CN110415710A (en) * 2019-08-06 2019-11-05 大众问问(北京)信息科技有限公司 Parameter regulation means, device, equipment and the medium of interactive system for vehicle-mounted voice
CN110570842A (en) * 2019-10-25 2019-12-13 南京云白信息科技有限公司 Speech recognition method and system based on phoneme approximation degree and pronunciation standard degree
CN110716767A (en) * 2018-07-13 2020-01-21 阿里巴巴集团控股有限公司 Model component calling and generating method, device and storage medium
CN110992932A (en) * 2019-12-18 2020-04-10 睿住科技有限公司 Self-learning voice control method, system and storage medium
CN111145748A (en) * 2019-12-30 2020-05-12 广州视源电子科技股份有限公司 Audio recognition confidence determining method, device, equipment and storage medium
CN111179974A (en) * 2019-12-30 2020-05-19 苏州思必驰信息科技有限公司 Improved decoding network, command word recognition method and device
CN111710337A (en) * 2020-06-16 2020-09-25 睿云联(厦门)网络通讯技术有限公司 Voice data processing method and device, computer readable medium and electronic equipment
CN111798846A (en) * 2020-06-02 2020-10-20 厦门亿联网络技术股份有限公司 Voice command word recognition method and device, conference terminal and conference terminal system
CN113889083A (en) * 2021-11-03 2022-01-04 广州博冠信息科技有限公司 Voice recognition method and device, storage medium and electronic equipment
WO2023036014A1 (en) * 2021-09-07 2023-03-16 广西电网有限责任公司贺州供电局 Method for automatically saving power grid scheduling command on basis of voice recognition

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840699A (en) * 2010-04-30 2010-09-22 中国科学院声学研究所 Voice quality evaluation method based on pronunciation model
JP2012113087A (en) * 2010-11-24 2012-06-14 Nippon Telegr & Teleph Corp <Ntt> Voice recognition wfst creation apparatus, voice recognition device employing the same, methods thereof, program and storage medium
US20130138441A1 (en) * 2011-11-28 2013-05-30 Electronics And Telecommunications Research Institute Method and system for generating search network for voice recognition
CN103971685A (en) * 2013-01-30 2014-08-06 腾讯科技(深圳)有限公司 Method and system for recognizing voice commands
CN103985391A (en) * 2014-04-16 2014-08-13 柳超 Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
CN104036774A (en) * 2014-06-20 2014-09-10 国家计算机网络与信息安全管理中心 Method and system for recognizing Tibetan dialects
CN107112010A (en) * 2015-01-16 2017-08-29 三星电子株式会社 Method and apparatus for performing speech recognition using syntactic model
CN107195296A (en) * 2016-03-15 2017-09-22 阿里巴巴集团控股有限公司 A kind of audio recognition method, device, terminal and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840699A (en) * 2010-04-30 2010-09-22 中国科学院声学研究所 Voice quality evaluation method based on pronunciation model
JP2012113087A (en) * 2010-11-24 2012-06-14 Nippon Telegr & Teleph Corp <Ntt> Voice recognition wfst creation apparatus, voice recognition device employing the same, methods thereof, program and storage medium
US20130138441A1 (en) * 2011-11-28 2013-05-30 Electronics And Telecommunications Research Institute Method and system for generating search network for voice recognition
CN103971685A (en) * 2013-01-30 2014-08-06 腾讯科技(深圳)有限公司 Method and system for recognizing voice commands
CN103985391A (en) * 2014-04-16 2014-08-13 柳超 Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
CN104036774A (en) * 2014-06-20 2014-09-10 国家计算机网络与信息安全管理中心 Method and system for recognizing Tibetan dialects
CN107112010A (en) * 2015-01-16 2017-08-29 三星电子株式会社 Method and apparatus for performing speech recognition using syntactic model
CN107195296A (en) * 2016-03-15 2017-09-22 阿里巴巴集团控股有限公司 A kind of audio recognition method, device, terminal and system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831446A (en) * 2018-05-24 2018-11-16 百度在线网络技术(北京)有限公司 Method and apparatus for generating sample
CN108831446B (en) * 2018-05-24 2019-10-18 百度在线网络技术(北京)有限公司 Method and apparatus for generating sample
CN108932943A (en) * 2018-07-12 2018-12-04 广州视源电子科技股份有限公司 Order word sound detection method, device, equipment and storage medium
CN110716767A (en) * 2018-07-13 2020-01-21 阿里巴巴集团控股有限公司 Model component calling and generating method, device and storage medium
CN110716767B (en) * 2018-07-13 2023-05-05 阿里巴巴集团控股有限公司 Model component calling and generating method, device and storage medium
CN109274845A (en) * 2018-08-31 2019-01-25 平安科技(深圳)有限公司 Intelligent sound pays a return visit method, apparatus, computer equipment and storage medium automatically
CN109273007A (en) * 2018-10-11 2019-01-25 科大讯飞股份有限公司 Voice awakening method and device
CN109273007B (en) * 2018-10-11 2022-05-17 西安讯飞超脑信息科技有限公司 Voice wake-up method and device
CN109243429A (en) * 2018-11-21 2019-01-18 苏州奇梦者网络科技有限公司 A kind of pronunciation modeling method and device
CN109243429B (en) * 2018-11-21 2021-12-10 苏州奇梦者网络科技有限公司 Voice modeling method and device
CN110415710A (en) * 2019-08-06 2019-11-05 大众问问(北京)信息科技有限公司 Parameter regulation means, device, equipment and the medium of interactive system for vehicle-mounted voice
CN110415710B (en) * 2019-08-06 2022-05-31 大众问问(北京)信息科技有限公司 Parameter adjusting method, device, equipment and medium for vehicle-mounted voice interaction system
CN110570842A (en) * 2019-10-25 2019-12-13 南京云白信息科技有限公司 Speech recognition method and system based on phoneme approximation degree and pronunciation standard degree
CN110992932A (en) * 2019-12-18 2020-04-10 睿住科技有限公司 Self-learning voice control method, system and storage medium
CN110992932B (en) * 2019-12-18 2022-07-26 广东睿住智能科技有限公司 Self-learning voice control method, system and storage medium
CN111145748A (en) * 2019-12-30 2020-05-12 广州视源电子科技股份有限公司 Audio recognition confidence determining method, device, equipment and storage medium
CN111145748B (en) * 2019-12-30 2022-09-30 广州视源电子科技股份有限公司 Audio recognition confidence determining method, device, equipment and storage medium
CN111179974A (en) * 2019-12-30 2020-05-19 苏州思必驰信息科技有限公司 Improved decoding network, command word recognition method and device
CN111798846A (en) * 2020-06-02 2020-10-20 厦门亿联网络技术股份有限公司 Voice command word recognition method and device, conference terminal and conference terminal system
CN111710337A (en) * 2020-06-16 2020-09-25 睿云联(厦门)网络通讯技术有限公司 Voice data processing method and device, computer readable medium and electronic equipment
CN111710337B (en) * 2020-06-16 2023-07-07 睿云联(厦门)网络通讯技术有限公司 Voice data processing method and device, computer readable medium and electronic equipment
WO2023036014A1 (en) * 2021-09-07 2023-03-16 广西电网有限责任公司贺州供电局 Method for automatically saving power grid scheduling command on basis of voice recognition
CN113889083A (en) * 2021-11-03 2022-01-04 广州博冠信息科技有限公司 Voice recognition method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN107644638B (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN107644638A (en) Audio recognition method, device, terminal and computer-readable recording medium
JP6538779B2 (en) Speech dialogue system, speech dialogue method and method for adapting a speech dialogue system
JP7150770B2 (en) Interactive method, device, computer-readable storage medium, and program
US10635698B2 (en) Dialogue system, a dialogue method and a method of adapting a dialogue system
JP6828001B2 (en) Voice wakeup method and equipment
US10446148B2 (en) Dialogue system, a dialogue method and a method of adapting a dialogue system
JP4195428B2 (en) Speech recognition using multiple speech features
CN108899013B (en) Voice search method and device and voice recognition system
KR101622111B1 (en) Dialog system and conversational method thereof
JP2016126330A (en) Speech recognition device and speech recognition method
US20220076674A1 (en) Cross-device voiceprint recognition
CN108364650B (en) Device and method for adjusting voice recognition result
CN104157285A (en) Voice recognition method and device, and electronic equipment
US10152298B1 (en) Confidence estimation based on frequency
CN110689881B (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN111145733B (en) Speech recognition method, speech recognition device, computer equipment and computer readable storage medium
CN110164416B (en) Voice recognition method and device, equipment and storage medium thereof
CN113314119B (en) Voice recognition intelligent household control method and device
CN104462912A (en) Biometric password security
CN108682415A (en) voice search method, device and system
CN111386566A (en) Device control method, cloud device, intelligent device, computer medium and device
CN109065026B (en) Recording control method and device
CN111508497A (en) Voice recognition method and device, electronic equipment and storage medium
US11615787B2 (en) Dialogue system and method of controlling the same
CN112863496B (en) Voice endpoint detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 508-598, Xitian Gezhuang Town Government Office Building, No. 8 Xitong Road, Miyun District Economic Development Zone, Beijing 101500

Patentee after: Beijing Rubo Technology Co., Ltd.

Address before: Room 508-598, Xitian Gezhuang Town Government Office Building, No. 8 Xitong Road, Miyun District Economic Development Zone, Beijing 101500

Patentee before: BEIJING INTELLIGENT HOUSEKEEPER TECHNOLOGY CO., LTD.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20210824

Address after: 301-112, floor 3, building 2, No. 18, YANGFANGDIAN Road, Haidian District, Beijing 100038

Patentee after: Beijing Rubu Technology Co.,Ltd.

Address before: Room 508-598, Xitian Gezhuang Town Government Office Building, No. 8 Xitong Road, Miyun District Economic Development Zone, Beijing 101500

Patentee before: BEIJING ROOBO TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right