CN109360565A - A method of precision of identifying speech is improved by establishing resources bank - Google Patents
A method of precision of identifying speech is improved by establishing resources bank Download PDFInfo
- Publication number
- CN109360565A CN109360565A CN201811508983.4A CN201811508983A CN109360565A CN 109360565 A CN109360565 A CN 109360565A CN 201811508983 A CN201811508983 A CN 201811508983A CN 109360565 A CN109360565 A CN 109360565A
- Authority
- CN
- China
- Prior art keywords
- scene
- input
- voice
- voice messaging
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012905 input function Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Abstract
The method that the invention discloses a kind of to improve precision of identifying speech by establishing resources bank, configures proprietary identification resource corresponding with customized voice scene, and universal identification resource corresponding with universal phonetic scene;Establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.The present invention, which is realized, carries out speech recognition according to identification resource corresponding with voice input scene, improves accuracy of identification and treatment effeciency.
Description
Technical field
The application belongs to technical field of voice recognition, more particularly to a kind of by establishing resources bank raising precision of identifying speech
Method.
Background technique
With the development of mobile internet, large screen cell phone is at mainstream, no matter keyboard or hand-written, have various limitations.
Phonitic entry method will become mainstream input method, more favourable.Since voice input is more natural, learning cost is lower, slowly by more
Multi-user is received.Either child or old man can quickly learn to use, and get used to this input mode.
Existing speech recognition technology has used a large amount of living scene data for training, defeated under different scenes to identify
The voice entered, thus it is too low for some customization scene Recognition precision, it can not be identified for some customization scenes, waste processing
Resource reduces treatment effeciency.
Summary of the invention
Of the existing technology in order to overcome the problems, such as, the object of the present invention is to provide one kind to improve language by establishing resources bank
The method of sound accuracy of identification realizes and carries out speech recognition according to identification resource corresponding with voice input scene, improves knowledge
Other precision and treatment effeciency.
The purpose of the present invention is achieved through the following technical solutions:
A method of precision of identifying speech is improved by establishing resources bank, it is characterised in that the following steps are included:
Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene general
Identify resource;
Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, according to voice
The input scene of information identifies the voice messaging using the speech recognition library.
Step 101 is specifically: when user needs to carry out voice input, to man machine language's input interface input voice letter
Then breath is identified the voice messaging of user's input, to be performed corresponding processing based on recognition result;Different voices
It is different to carry out respective treated process based on recognition result for input application;
It is searched according to recognition result to user feedback after being identified to the voice messaging of user's input for phonetic search application
Hitch fruit;Alternatively, being directed to instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result
It is shown in input frame at text information;
For the voice messaging inputted under different scenes, proprietary identification resource corresponding with customized voice scene, Yi Jiyu are configured
The corresponding universal identification resource of universal phonetic scene;
Step 102 is specifically: according to preconfigured proprietary identification resource corresponding with customized voice scene, and and common language
The corresponding universal identification resource of sound field scape, establishing includes the proprietary speech recognition for identifying resource and the universal identification resource
Library;When receiving the voice messaging of user's input, the input scene of voice messaging is determined, and determine the input field of voice messaging
The type of scape, i.e. input scene are customized voice scene or universal phonetic scene, to obtain and input from speech recognition library
The corresponding identification resource of scene type identifies the voice messaging of input.
The present invention, which is realized, carries out speech recognition according to identification resource corresponding with voice input scene, improves identification essence
Degree and treatment effeciency.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
The flow chart of the audio recognition method of another embodiment of Fig. 2.
The flow chart of the audio recognition method of another embodiment of Fig. 3.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the application, and should not be understood as the limitation to the application.
A method of precision of identifying speech being improved by establishing resources bank, as shown in Figure 1, the audio recognition method packet
It includes:
Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene general
Identify resource.Specifically, audio recognition method provided in an embodiment of the present invention is applied to that there is the terminal of speech voice input function to set
In standby.Under normal circumstances, terminal device realizes speech voice input function by man machine language's interactive interface, and specific voice input connects
Mouth can be the equipment such as microphone.
It should be noted that terminal device can be mentioned by being able to access that the application of man machine language's interactive interface for user
It inputs and services for voice, which can be selected according to actual needs, such as: the navigation with speech voice input function is answered
With, search engine etc., the present embodiment to this with no restriction.When user needs to carry out voice input, connect to man machine language's input
Then mouth input voice information identifies the voice messaging of user's input, to be located accordingly based on recognition result
Reason.Different voices inputs application, and it is different to carry out respective treated process based on recognition result.Such as:
It is searched according to recognition result to user feedback after being identified to the voice messaging of user's input for phonetic search application
Hitch fruit;Alternatively, being directed to instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result
It is shown in input frame at text information.
For the voice messaging inputted under different scenes, in order to improve the precision and process performance of speech recognition, this implementation
The speech recognition modeling that example provides configures proprietary identification resource corresponding with customized voice scene first, and with common language sound field
The corresponding universal identification resource of scape.
It should be noted that the type of customized voice scene has very much, different customized voice scenes corresponds to different special
There is identification resource, particular content can be configured and select according to the needs of different application scene, and the present embodiment does not do this
Limitation, such as may include:
For the voice scene of digital map navigation, corresponding proprietary identification resource is place name identification resource;Alternatively, being directed to electric business platform
Voice scene, corresponding proprietary identification resource be electric business product name identify resource;Or the voice field for film search
Scape, corresponding proprietary identification resource are that movie name identifies resource.
Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, with basis
The input scene of voice messaging identifies the voice messaging using the speech recognition library.Specifically, according to it is preconfigured with
The corresponding proprietary identification resource of customized voice scene, and universal identification resource corresponding with universal phonetic scene, foundation includes
The speech recognition library of proprietary the identification resource and the universal identification resource.In turn, when the voice letter for receiving user's input
When breath, the input scene of voice messaging is determined, and determine the type of the input scene of voice messaging, i.e., input scene is customization language
Sound field scape or universal phonetic scene, to obtain identification resource corresponding with input scene type to input from speech recognition library
Voice messaging identified.
The audio recognition method of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene,
And universal identification resource corresponding with universal phonetic scene;Establish includes that the proprietary identification resource and the universal identification provide
The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By
This, realizes the customization that environment-identification is carried out for different vertical class scenes, according to identification resource corresponding with voice input scene
Speech recognition is carried out, accuracy of identification and treatment effeciency are improved.
Fig. 2 is the flow chart of the audio recognition method of the application another embodiment.As shown in Fig. 2, step 102 it
Afterwards, can with the following steps are included:
Step 201, the voice messaging of input is received.
Step 202, according to the determining input scene with the voice messaging of preset scene acquisition strategy.
Specifically, the voice messaging for receiving user's input, according to preset scene acquisition strategy it is determining with it is currently received
The corresponding input scene of voice messaging.It is obtained it should be noted that different scenes can be preset according to the actual application
Take strategy, the present embodiment with no restriction, such as may include: to this
Example one: the input scene of the voice messaging is determined according to application program;
Specifically, the application program that voice input is currently carried out according to user determines the input scene of the voice messaging.Example
Such as: user is to digital map navigation application input voice information, it is determined that the input scene of the voice messaging is digital map navigation.
Example two: the input scene of the voice messaging is based on context determined;
Specifically, the input scene of the voice messaging is determined according to the context of user and other users session log.Example
Such as: in instant messaging application, user is convenient content of travelling with the conversation content before other users, then the voice letter
The input scene of breath is tourism scene.
Example three: the input scene of the voice messaging is determined according to geographical location information.
Specifically, the current geographical location information of user is obtained by the GPS information of terminal device, and then according to user
Current geographical location information determines the input scene of the voice messaging.Such as: it is obtained when by the GPS information of terminal device
When the current geographical location information of user is movie theatre, then the input scene of the voice messaging is film scene.
Step 203, the voice messaging of input is identified according to the input scene and the speech recognition library.
Specifically, according to the input scene of current speech information, and the speech recognition library that pre-establishes is to the language of input
Message breath is identified, is specifically included:
If the input scene of current speech is preparatory customized voice scene, obtained and the customization language from speech recognition library
The corresponding proprietary identification resource of sound field scape, and the voice messaging is identified using proprietary identification resource;
If the input scene of current speech is not preparatory customized voice scene, universal identification money is obtained from speech recognition library
Source, and the voice messaging is identified using proprietary identification resource.
Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application is further advanced by the language for receiving input
Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and
The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene
Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.
Fig. 3 is that the flow chart of the audio recognition method of the application another embodiment is described as follows referring to Fig. 3:
Step 1: after receiving voice messaging, judging whether to believe according to preset scene acquisition strategy is determining with the voice
The input scene of breath.
Step 2: if the input scene of voice messaging can not be determined, using the universal identification resource to the voice
Information is identified.
Step 3: if can determine the input scene of voice messaging, judging whether it is the voice scene customized in advance.
Step 4: if the input scene is preparatory customized voice scene, using in the speech recognition library with it is described fixed
The corresponding proprietary identification resource of voice scene processed, identifies the voice messaging;
Step 5: if the input scene is not customized voice scene, using the universal identification in the speech recognition library
Resource identifies the voice messaging.
Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application is further advanced by the language for receiving input
Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and
The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene
Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.
Claims (4)
1. a kind of method for improving precision of identifying speech by establishing resources bank, it is characterised in that the following steps are included:
Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene general
Identify resource;
Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, according to voice
The input scene of information identifies the voice messaging using the speech recognition library.
2. the method according to claim 1 for improving precision of identifying speech by establishing resources bank, it is characterised in that: step
101 are specifically: when user needs to carry out voice input, to man machine language's input interface input voice information, then to user
The voice messaging of input is identified, to be performed corresponding processing based on recognition result;Different voices inputs application, is based on
It is different that recognition result carries out respective treated process;
It is searched according to recognition result to user feedback after being identified to the voice messaging of user's input for phonetic search application
Hitch fruit;Alternatively, being directed to instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result
It is shown in input frame at text information;
For the voice messaging inputted under different scenes, proprietary identification resource corresponding with customized voice scene, Yi Jiyu are configured
The corresponding universal identification resource of universal phonetic scene;
Step 102 is specifically: according to preconfigured proprietary identification resource corresponding with customized voice scene, and and common language
The corresponding universal identification resource of sound field scape, establishing includes the proprietary speech recognition for identifying resource and the universal identification resource
Library;When receiving the voice messaging of user's input, the input scene of voice messaging is determined, and determine the input field of voice messaging
The type of scape, i.e. input scene are customized voice scene or universal phonetic scene, to obtain and input from speech recognition library
The corresponding identification resource of scene type identifies the voice messaging of input.
3. the method according to claim 2 for improving precision of identifying speech by establishing resources bank, it is characterised in that: in step
It is further comprising the steps of after rapid 102:
Step 201, the voice messaging of input is received;
Step 202, according to the determining input scene with the voice messaging of preset scene acquisition strategy;
Step 203, the voice messaging of input is identified according to the input scene and the speech recognition library.
4. the method according to claim 2 for improving precision of identifying speech by establishing resources bank, it is characterised in that: in step
It is further comprising the steps of after rapid 102:
Step 1: after receiving voice messaging, judging whether to believe according to preset scene acquisition strategy is determining with the voice
The input scene of breath;
Step 2: if the input scene of voice messaging can not be determined, using the universal identification resource to the voice messaging
It is identified;
Step 3: if can determine the input scene of voice messaging, judging whether it is the voice scene customized in advance;
Step 4: if the input scene is preparatory customized voice scene, using in the speech recognition library with the customization language
The corresponding proprietary identification resource of sound field scape, identifies the voice messaging;
Step 5: if the input scene is not customized voice scene, using the universal identification in the speech recognition library
Resource identifies the voice messaging.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811508983.4A CN109360565A (en) | 2018-12-11 | 2018-12-11 | A method of precision of identifying speech is improved by establishing resources bank |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811508983.4A CN109360565A (en) | 2018-12-11 | 2018-12-11 | A method of precision of identifying speech is improved by establishing resources bank |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109360565A true CN109360565A (en) | 2019-02-19 |
Family
ID=65332116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811508983.4A Pending CN109360565A (en) | 2018-12-11 | 2018-12-11 | A method of precision of identifying speech is improved by establishing resources bank |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109360565A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110544480A (en) * | 2019-09-05 | 2019-12-06 | 苏州思必驰信息科技有限公司 | Voice recognition resource switching method and device |
CN111048091A (en) * | 2019-12-30 | 2020-04-21 | 苏州思必驰信息科技有限公司 | Voice recognition method, voice recognition equipment and computer readable storage medium |
CN112687261A (en) * | 2020-12-15 | 2021-04-20 | 苏州思必驰信息科技有限公司 | Speech recognition training and application method and device |
US11289095B2 (en) | 2019-12-30 | 2022-03-29 | Yandex Europe Ag | Method of and system for translating speech to text |
CN116386644A (en) * | 2023-06-07 | 2023-07-04 | 百融至信(北京)科技有限公司 | Route control method and device for ASR resource side |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103674012A (en) * | 2012-09-21 | 2014-03-26 | 高德软件有限公司 | Voice customizing method and device and voice identification method and device |
CN105448292A (en) * | 2014-08-19 | 2016-03-30 | 北京羽扇智信息科技有限公司 | Scene-based real-time voice recognition system and method |
CN105719649A (en) * | 2016-01-19 | 2016-06-29 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
WO2017219495A1 (en) * | 2016-06-21 | 2017-12-28 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and system |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
-
2018
- 2018-12-11 CN CN201811508983.4A patent/CN109360565A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103674012A (en) * | 2012-09-21 | 2014-03-26 | 高德软件有限公司 | Voice customizing method and device and voice identification method and device |
CN105448292A (en) * | 2014-08-19 | 2016-03-30 | 北京羽扇智信息科技有限公司 | Scene-based real-time voice recognition system and method |
CN105719649A (en) * | 2016-01-19 | 2016-06-29 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
WO2017219495A1 (en) * | 2016-06-21 | 2017-12-28 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and system |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110544480A (en) * | 2019-09-05 | 2019-12-06 | 苏州思必驰信息科技有限公司 | Voice recognition resource switching method and device |
CN110544480B (en) * | 2019-09-05 | 2022-03-11 | 思必驰科技股份有限公司 | Voice recognition resource switching method and device |
CN111048091A (en) * | 2019-12-30 | 2020-04-21 | 苏州思必驰信息科技有限公司 | Voice recognition method, voice recognition equipment and computer readable storage medium |
US11289095B2 (en) | 2019-12-30 | 2022-03-29 | Yandex Europe Ag | Method of and system for translating speech to text |
CN112687261A (en) * | 2020-12-15 | 2021-04-20 | 苏州思必驰信息科技有限公司 | Speech recognition training and application method and device |
CN116386644A (en) * | 2023-06-07 | 2023-07-04 | 百融至信(北京)科技有限公司 | Route control method and device for ASR resource side |
CN116386644B (en) * | 2023-06-07 | 2024-03-22 | 百融至信(北京)科技有限公司 | Route control method and device for ASR resource side |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105719649B (en) | Audio recognition method and device | |
CN109360565A (en) | A method of precision of identifying speech is improved by establishing resources bank | |
CN112804400B (en) | Customer service call voice quality inspection method and device, electronic equipment and storage medium | |
CN102802114B (en) | Method and system for screening seat by using voices | |
CN109753560B (en) | Information processing method and device of intelligent question-answering system | |
CN106328124A (en) | Voice recognition method based on user behavior characteristics | |
CN103699530A (en) | Method and equipment for inputting texts in target application according to voice input information | |
CN111583931A (en) | Service data processing method and device | |
CN104751847A (en) | Data acquisition method and system based on overprint recognition | |
CN111027291A (en) | Method and device for adding punctuation marks in text and training model and electronic equipment | |
CN110517668A (en) | A kind of Chinese and English mixing voice identifying system and method | |
CN109545203A (en) | Audio recognition method, device, equipment and storage medium | |
CN104731874A (en) | Evaluation information generation method and device | |
CN114155853A (en) | Rejection method, device, equipment and storage medium | |
CN111178081A (en) | Semantic recognition method, server, electronic device and computer storage medium | |
CN111354362A (en) | Method and device for assisting hearing-impaired communication | |
CN113901837A (en) | Intention understanding method, device, equipment and storage medium | |
CN112242143B (en) | Voice interaction method and device, terminal equipment and storage medium | |
CN110047473B (en) | Man-machine cooperative interaction method and system | |
CN112837672A (en) | Method and device for determining conversation affiliation, electronic equipment and storage medium | |
KR20210065629A (en) | Chatbot integration agent platform system and service method thereof | |
CN108717851A (en) | A kind of audio recognition method and device | |
CN112002325B (en) | Multi-language voice interaction method and device | |
CN111554300B (en) | Audio data processing method, device, storage medium and equipment | |
CN112331201A (en) | Voice interaction method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190219 |
|
RJ01 | Rejection of invention patent application after publication |