CN105719649B - Audio recognition method and device - Google Patents

Audio recognition method and device Download PDF

Info

Publication number
CN105719649B
CN105719649B CN201610035394.3A CN201610035394A CN105719649B CN 105719649 B CN105719649 B CN 105719649B CN 201610035394 A CN201610035394 A CN 201610035394A CN 105719649 B CN105719649 B CN 105719649B
Authority
CN
China
Prior art keywords
scene
input
voice messaging
voice
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610035394.3A
Other languages
Chinese (zh)
Other versions
CN105719649A (en
Inventor
穆向禹
张东栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610035394.3A priority Critical patent/CN105719649B/en
Publication of CN105719649A publication Critical patent/CN105719649A/en
Application granted granted Critical
Publication of CN105719649B publication Critical patent/CN105719649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Abstract

The application proposes a kind of audio recognition method and device, wherein this method comprises: configuration proprietary identification resource corresponding with customized voice scene, and universal identification resource corresponding with universal phonetic scene;Establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By audio recognition method provided by the present application and device, realizes and speech recognition is carried out according to identification resource corresponding with voice input scene, improve accuracy of identification and treatment effeciency.

Description

Audio recognition method and device
Technical field
This application involves technical field of voice recognition more particularly to a kind of audio recognition methods and device.
Background technique
With the development of mobile internet, large screen cell phone is at mainstream, no matter keyboard or hand-written, have various limitations. Phonitic entry method will become mainstream input method, more favourable.Since voice input is more natural, learning cost is lower, slowly by more Multi-user is received.Either child or old man can quickly learn to use, and get used to this input mode.
Existing speech recognition technology has used a large amount of living scene data for training, defeated under different scenes to identify The voice entered, thus it is too low for some customization scene Recognition precision, it can not be identified for some customization scenes, waste processing Resource reduces treatment effeciency.
Summary of the invention
The application is intended to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the application is to propose a kind of audio recognition method, the method achieve basis and languages The corresponding identification resource of sound input scene carries out speech recognition, improves accuracy of identification and treatment effeciency.
Second purpose of the application is to propose a kind of speech recognition equipment.
In order to achieve the above object, the application first aspect embodiment proposes a kind of audio recognition method, comprising: configure and fixed The corresponding proprietary identification resource of voice scene processed, and universal identification resource corresponding with universal phonetic scene;Establishing includes institute The speech recognition library for stating proprietary identification resource and the universal identification resource, with according to the input scene of voice messaging, using institute It states speech recognition library and identifies the voice messaging.
The audio recognition method of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene;Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes and carries out speech recognition according to identification resource corresponding with voice input scene, improves accuracy of identification and processing effect Rate.
In order to achieve the above object, the application second aspect embodiment proposes a kind of speech recognition equipment, comprising: configuration mould Block, for configuring proprietary identification resource corresponding with customized voice scene, and universal identification corresponding with universal phonetic scene Resource;Module is established, includes the proprietary speech recognition library for identifying resource and the universal identification resource for establishing, with root According to the input scene of voice messaging, the voice messaging is identified using the speech recognition library.
The speech recognition equipment of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene;Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes and carries out speech recognition according to identification resource corresponding with voice input scene, improves accuracy of identification and processing effect Rate.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart of the audio recognition method of the application one embodiment;
Fig. 2 is the flow chart of the audio recognition method of the application another embodiment;
Fig. 3 is the flow chart of the audio recognition method of the application another embodiment;
Fig. 4 is the structural schematic diagram of the speech recognition equipment of the application one embodiment;
Fig. 5 is the structural schematic diagram of the speech recognition equipment of the application another embodiment.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the application, and should not be understood as the limitation to the application.
Below with reference to the accompanying drawings the audio recognition method and device of the embodiment of the present application are described.
Fig. 1 is the flow chart of the audio recognition method of the application one embodiment.
As shown in Figure 1, the audio recognition method includes:
Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene Universal identification resource.
Specifically, audio recognition method provided in an embodiment of the present invention is applied to the terminal device with speech voice input function In.Under normal circumstances, terminal device realizes speech voice input function, specific speech input interface by man machine language's interactive interface It can be the equipment such as microphone.
It should be noted that terminal device can be mentioned by being able to access that the application of man machine language's interactive interface for user It inputs and services for voice, which can be selected according to actual needs, such as: the navigation with speech voice input function is answered With, search engine etc., the present embodiment to this with no restriction.
It is then defeated to user to man machine language's input interface input voice information when user needs to carry out voice input The voice messaging entered is identified, to be performed corresponding processing based on recognition result.Different voices inputs application, based on knowledge It is different that other result carries out respective treated process.Such as:
It is anti-to user according to recognition result after being identified to the voice messaging of user's input for phonetic search application Present search result;Alternatively,
For instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result written Word information is shown in input frame.
For the voice messaging inputted under different scenes, in order to improve the precision and process performance of speech recognition, this implementation The speech recognition modeling that example provides configures proprietary identification resource corresponding with customized voice scene first, and with common language sound field The corresponding universal identification resource of scape.
It should be noted that the type of customized voice scene has very much, different customized voice scenes corresponds to different special There is identification resource, particular content can be configured and select according to the needs of different application scene, and the present embodiment does not do this Limitation, such as may include:
For the voice scene of digital map navigation, corresponding proprietary identification resource is place name identification resource;Alternatively,
For the voice scene of electric business platform, corresponding proprietary identification resource is that electric business product name identifies resource;Alternatively,
For the voice scene of film search, corresponding proprietary identification resource is that movie name identifies resource.
Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, with basis The input scene of voice messaging identifies the voice messaging using the speech recognition library.
Specifically, according to preconfigured proprietary identification resource corresponding with customized voice scene, and and universal phonetic The corresponding universal identification resource of scene, establishing includes the proprietary speech recognition for identifying resource and the universal identification resource Library.
In turn, it when receiving the voice messaging of user's input, determines the input scene of voice messaging, and determines voice letter The type of the input scene of breath, i.e. input scene are customized voice scene or universal phonetic scene, thus from speech recognition library Identification resource corresponding with input scene type is obtained to identify the voice messaging of input.
The audio recognition method of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene;Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes the customization that environment-identification is carried out for different vertical class scenes, according to identification resource corresponding with voice input scene Speech recognition is carried out, accuracy of identification and treatment effeciency are improved.
Fig. 2 is the flow chart of the audio recognition method of the application another embodiment.
As shown in Fig. 2, after step 102, can with the following steps are included:
Step 201, the voice messaging of input is received.
Step 202, according to the determining input scene with the voice messaging of preset scene acquisition strategy.
Specifically, the voice messaging for receiving user's input, according to preset scene acquisition strategy it is determining with it is currently received The corresponding input scene of voice messaging.
It should be noted that different scene acquisition strategies, the present embodiment can be preset according to the actual application With no restriction to this, such as may include:
Example one: the input scene of the voice messaging is determined according to application program;
Specifically, the application program that voice input is currently carried out according to user determines the input field of the voice messaging Scape.Such as: user is to digital map navigation application input voice information, it is determined that the input scene of the voice messaging is led for map Boat.
Alternatively,
Example two: the input scene of the voice messaging is based on context determined;
Specifically, the input field of the voice messaging is determined according to the context of user and other users session log Scape.Such as: in instant messaging application, user is convenient content of travelling with the conversation content before other users, then described The input scene of voice messaging is tourism scene.
Alternatively,
Example three: the input scene of the voice messaging is determined according to geographical location information.
Specifically, the current geographical location information of user is obtained by the GPS information of terminal device, and then according to user Current geographical location information determines the input scene of the voice messaging.Such as: it is obtained when by the GPS information of terminal device When the current geographical location information of user is movie theatre, then the input scene of the voice messaging is film scene.
Step 203, the voice messaging of input is identified according to the input scene and the speech recognition library.
Specifically, according to the input scene of current speech information, and the speech recognition library that pre-establishes is to the language of input Message breath is identified, is specifically included:
If the input scene of current speech be preparatory customized voice scene, from speech recognition library obtain with it is described fixed The corresponding proprietary identification resource of voice scene processed, and the voice messaging is identified using proprietary identification resource;
If the input scene of current speech is not preparatory customized voice scene, universal identification is obtained from speech recognition library Resource, and the voice messaging is identified using proprietary identification resource.
Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application is further advanced by the language for receiving input Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.
Fig. 3 is that the flow chart of the audio recognition method of the application another embodiment is described as follows referring to Fig. 3:
Step 1: after receiving voice messaging, judging whether being capable of and institute predicate determining according to preset scene acquisition strategy The input scene of message breath.
Step 2: if the input scene of voice messaging can not be determined, using the universal identification resource to the voice Information is identified.
Step 3: if can determine the input scene of voice messaging, judging whether it is the voice scene customized in advance.
Step 4: if the input scene is preparatory customized voice scene, using in the speech recognition library with it is described fixed The corresponding proprietary identification resource of voice scene processed, identifies the voice messaging;
Step 5: if the input scene is not customized voice scene, using described general in the speech recognition library It identifies resource, the voice messaging is identified.
Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application is further advanced by the language for receiving input Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.
In order to realize above-described embodiment, the application also proposes a kind of speech recognition equipment.
Fig. 4 is the structural schematic diagram of the speech recognition equipment of the application one embodiment.
As shown in figure 4, the speech recognition equipment includes:
Configuration module 11, for configuring corresponding with customized voice scene proprietary identification resource, and with common language sound field The corresponding universal identification resource of scape;
Specifically, the proprietary identification resource includes at least one of:
Place name identification resource, search hot word identification resource, electric business product name identification resource, movie name identify resource.
Module 12 is established, includes the proprietary speech recognition for identifying resource and the universal identification resource for establishing Library, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.
It should be noted that the aforementioned voice for being also applied for the embodiment to the explanation of audio recognition method embodiment Identification device, details are not described herein again.
The speech recognition equipment of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene;Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes the customization that environment-identification is carried out for different vertical class scenes, according to identification resource corresponding with voice input scene Speech recognition is carried out, accuracy of identification and treatment effeciency are improved.
Fig. 5 is the structural schematic diagram of the speech recognition equipment of the application another embodiment, as shown in figure 5, being based on Fig. 4 institute Show embodiment, described device further include:
Receiving module 13, voice messaging for receiving input;
Module 14 is obtained, for according to the determining input scene with the voice messaging of preset scene acquisition strategy;
Identification module 15, for being known according to the input scene and the speech recognition library to the voice messaging of input Not.
In one embodiment, the acquisition module 14 is used for: the application program of voice input is currently carried out according to user Determine the input scene of the voice messaging;
Alternatively,
In one embodiment, the acquisition module 14 is used for: according to the context of user and other users session log Determine the input scene of the voice messaging;
Alternatively,
In one embodiment, the module 14 that obtains is used for: according to the current geographical location information determination of user The input scene of voice messaging.
In one embodiment, the identification module 15 is used for:
If the input scene is the customized voice scene, using in the speech recognition library with the customized voice The corresponding proprietary identification resource of scene, identifies the voice messaging;
If the input scene is not the customized voice scene, using the general knowledge in the speech recognition library Other resource, identifies the voice messaging;
In another embodiment, the identification module 15 is also used to:
If the input scene can not be determined, the voice messaging is known using the universal identification resource Not.
It should be noted that the aforementioned voice for being also applied for the embodiment to the explanation of audio recognition method embodiment Identification device, details are not described herein again.
Embodiment based on shown in Fig. 4, the speech recognition equipment of the embodiment of the present application are further advanced by the language for receiving input Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present application, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the application System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of application Type.

Claims (12)

1. a kind of audio recognition method, which comprises the following steps:
Configure proprietary identification resource corresponding with customized voice scene, and universal identification corresponding with universal phonetic scene money Source;Wherein, different customized voice scenes corresponds to different proprietary identification resources;
Establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, according to the defeated of voice messaging Enter scene, and determine the type of the input scene, is obtained from the speech recognition library corresponding with the type of the input scene Identification resource the voice messaging is identified.
2. the method as described in claim 1, which is characterized in that the proprietary identification resource includes at least one of:
Place name identification resource, search hot word identification resource, electric business product name identification resource, movie name identify resource.
3. method according to claim 1 or 2, which is characterized in that further include:
Receive the voice messaging of input;
According to the determining input scene with the voice messaging of preset scene acquisition strategy;
The voice messaging of input is identified according to the input scene and the speech recognition library.
4. method as claimed in claim 3, which is characterized in that described according to preset scene acquisition strategy determination and institute's predicate The input scene of message breath, comprising:
The application program that voice input is currently carried out according to user determines the input scene of the voice messaging;
Alternatively,
The input scene of the voice messaging is determined according to the context of user and other users session log;
Alternatively,
The input scene of the voice messaging is determined according to the current geographical location information of user.
5. method as claimed in claim 3, which is characterized in that described according to the input scene and the speech recognition library pair The voice messaging of input is identified, comprising:
If the input scene is the customized voice scene, using in the speech recognition library with the customized voice scene Corresponding proprietary identification resource, identifies the voice messaging;
Universal identification money if the input scene is not the customized voice scene, in the application speech recognition library Source identifies the voice messaging.
6. method as claimed in claim 3, which is characterized in that further include:
If the input scene can not be determined, the voice messaging is identified using the universal identification resource.
7. a kind of speech recognition equipment characterized by comprising
Configuration module is used to configure proprietary identification resource corresponding with customized voice scene, and corresponding with universal phonetic scene Universal identification resource;Wherein, different customized voice scenes corresponds to different proprietary identification resources;
Module is established, includes the proprietary speech recognition library for identifying resource and the universal identification resource for establishing, with root It according to the input scene of voice messaging, and determines the type of the input scene, is obtained and the input from the speech recognition library The corresponding identification resource of the type of scene identifies the voice messaging.
8. device as claimed in claim 7, which is characterized in that the proprietary identification resource includes at least one of:
Place name identification resource, search hot word identification resource, electric business product name identification resource, movie name identify resource.
9. device as claimed in claim 7 or 8, which is characterized in that further include:
Receiving module, voice messaging for receiving input;
Module is obtained, for according to the determining input scene with the voice messaging of preset scene acquisition strategy;
Identification module, for being identified according to the input scene and the speech recognition library to the voice messaging of input.
10. device as claimed in claim 9, which is characterized in that the acquisition module is used for:
The application program that voice input is currently carried out according to user determines the input scene of the voice messaging;
Alternatively,
The input scene of the voice messaging is determined according to the context of user and other users session log;
Alternatively,
The input scene of the voice messaging is determined according to the current geographical location information of user.
11. device as claimed in claim 9, which is characterized in that the identification module is used for:
If the input scene is the customized voice scene, using in the speech recognition library with the customized voice scene Corresponding proprietary identification resource, identifies the voice messaging;
Universal identification money if the input scene is not the customized voice scene, in the application speech recognition library Source identifies the voice messaging.
12. device as claimed in claim 9, which is characterized in that the identification module is also used to:
If the input scene can not be determined, the voice messaging is identified using the universal identification resource.
CN201610035394.3A 2016-01-19 2016-01-19 Audio recognition method and device Active CN105719649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610035394.3A CN105719649B (en) 2016-01-19 2016-01-19 Audio recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610035394.3A CN105719649B (en) 2016-01-19 2016-01-19 Audio recognition method and device

Publications (2)

Publication Number Publication Date
CN105719649A CN105719649A (en) 2016-06-29
CN105719649B true CN105719649B (en) 2019-07-05

Family

ID=56147425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610035394.3A Active CN105719649B (en) 2016-01-19 2016-01-19 Audio recognition method and device

Country Status (1)

Country Link
CN (1) CN105719649B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122179A (en) * 2017-03-31 2017-09-01 阿里巴巴集团控股有限公司 The function control method and device of voice
CN108288467B (en) * 2017-06-07 2020-07-14 腾讯科技(深圳)有限公司 Voice recognition method and device and voice recognition engine
CN107463700B (en) * 2017-08-15 2020-09-08 北京百度网讯科技有限公司 Method, device and equipment for acquiring information
CN107728783B (en) * 2017-09-25 2021-05-18 联想(北京)有限公司 Artificial intelligence processing method and system
CN109920429A (en) * 2017-12-13 2019-06-21 上海擎感智能科技有限公司 It is a kind of for vehicle-mounted voice recognition data processing method and system
CN110299136A (en) * 2018-03-22 2019-10-01 上海擎感智能科技有限公司 A kind of processing method and its system for speech recognition
CN109087639B (en) * 2018-08-02 2021-01-15 泰康保险集团股份有限公司 Method, apparatus, electronic device and computer readable medium for speech recognition
TWI698857B (en) * 2018-11-21 2020-07-11 財團法人工業技術研究院 Speech recognition system and method thereof, and computer program product
CN109360565A (en) * 2018-12-11 2019-02-19 江苏电力信息技术有限公司 A method of precision of identifying speech is improved by establishing resources bank
CN111312233A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Voice data identification method, device and system
CN111312235B (en) * 2018-12-11 2023-06-30 阿里巴巴集团控股有限公司 Voice interaction method, device and system
CN109671421B (en) * 2018-12-25 2020-07-10 苏州思必驰信息科技有限公司 Off-line navigation customizing and implementing method and device
CN110349575A (en) * 2019-05-22 2019-10-18 深圳壹账通智能科技有限公司 Method, apparatus, electronic equipment and the storage medium of speech recognition
CN111049996B (en) * 2019-12-26 2021-06-15 思必驰科技股份有限公司 Multi-scene voice recognition method and device and intelligent customer service system applying same
CN111161739B (en) * 2019-12-28 2023-01-17 科大讯飞股份有限公司 Speech recognition method and related product
CN113223510B (en) * 2020-01-21 2022-09-20 青岛海尔电冰箱有限公司 Refrigerator and equipment voice interaction method and computer readable storage medium thereof
CN111583909B (en) * 2020-05-18 2024-04-12 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
CN112687261B (en) * 2020-12-15 2022-05-03 思必驰科技股份有限公司 Speech recognition training and application method and device
CN113470619B (en) * 2021-06-30 2023-08-18 北京有竹居网络技术有限公司 Speech recognition method, device, medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738784B1 (en) * 2000-04-06 2004-05-18 Dictaphone Corporation Document and information processing system
CN101329868A (en) * 2008-07-31 2008-12-24 林超 Speech recognition optimizing system aiming at locale language use preference and method thereof
CN103674012A (en) * 2012-09-21 2014-03-26 高德软件有限公司 Voice customizing method and device and voice identification method and device
CN104240698A (en) * 2014-09-24 2014-12-24 上海伯释信息科技有限公司 Voice recognition method
CN105225665A (en) * 2015-10-15 2016-01-06 桂林电子科技大学 A kind of audio recognition method and speech recognition equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069159B2 (en) * 2004-09-07 2011-11-29 Robert O Stuart More efficient search algorithm (MESA) using prioritized search sequencing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738784B1 (en) * 2000-04-06 2004-05-18 Dictaphone Corporation Document and information processing system
CN101329868A (en) * 2008-07-31 2008-12-24 林超 Speech recognition optimizing system aiming at locale language use preference and method thereof
CN103674012A (en) * 2012-09-21 2014-03-26 高德软件有限公司 Voice customizing method and device and voice identification method and device
CN104240698A (en) * 2014-09-24 2014-12-24 上海伯释信息科技有限公司 Voice recognition method
CN105225665A (en) * 2015-10-15 2016-01-06 桂林电子科技大学 A kind of audio recognition method and speech recognition equipment

Also Published As

Publication number Publication date
CN105719649A (en) 2016-06-29

Similar Documents

Publication Publication Date Title
CN105719649B (en) Audio recognition method and device
KR102295935B1 (en) Digital personal assistant interaction with impersonations and rich multimedia in responses
CN109101545A (en) Natural language processing method, apparatus, equipment and medium based on human-computer interaction
US9154629B2 (en) System and method for generating personalized tag recommendations for tagging audio content
CN105551480B (en) Dialect conversion method and device
CN106372059A (en) Information input method and information input device
US11308949B2 (en) Voice assistant response system based on a tone, keyword, language or etiquette behavioral rule
US20180226073A1 (en) Context-based cognitive speech to text engine
US11443227B2 (en) System and method for cognitive multilingual speech training and recognition
CN110444229A (en) Communication service method, device, computer equipment and storage medium based on speech recognition
CN108682420A (en) A kind of voice and video telephone accent recognition method and terminal device
CN109360565A (en) A method of precision of identifying speech is improved by establishing resources bank
US20230401978A1 (en) Enhancing video language learning by providing catered context sensitive expressions
CN110517668A (en) A kind of Chinese and English mixing voice identifying system and method
CN115952272A (en) Method, device and equipment for generating dialogue information and readable storage medium
CN110059313A (en) Translation processing method and device
Subirana Call for a wake standard for artificial intelligence
Inupakutika et al. Integration of NLP and Speech-to-text Applications with Chatbots
CN117370512A (en) Method, device, equipment and storage medium for replying to dialogue
JP2022531994A (en) Generation and operation of artificial intelligence-based conversation systems
US10681402B2 (en) Providing relevant and authentic channel content to users based on user persona and interest
US20230215417A1 (en) Using token level context to generate ssml tags
CN109979458A (en) News interview original text automatic generation method and relevant device based on artificial intelligence
CN115705705A (en) Video identification method, device, server and storage medium based on machine learning
US10657692B2 (en) Determining image description specificity in presenting digital content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant