CN109360565A

CN109360565A - A method of precision of identifying speech is improved by establishing resources bank

Info

Publication number: CN109360565A
Application number: CN201811508983.4A
Authority: CN
Inventors: 杨铭; 许斌锋; 戚群朗
Original assignee: Jiangsu Electric Power Information Technology Co Ltd
Current assignee: Jiangsu Electric Power Information Technology Co Ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2019-02-19

Abstract

The method that the invention discloses a kind of to improve precision of identifying speech by establishing resources bank, configures proprietary identification resource corresponding with customized voice scene, and universal identification resource corresponding with universal phonetic scene；Establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.The present invention, which is realized, carries out speech recognition according to identification resource corresponding with voice input scene, improves accuracy of identification and treatment effeciency.

Description

A method of precision of identifying speech is improved by establishing resources bank

Technical field

The application belongs to technical field of voice recognition, more particularly to a kind of by establishing resources bank raising precision of identifying speech Method.

Background technique

With the development of mobile internet, large screen cell phone is at mainstream, no matter keyboard or hand-written, have various limitations. Phonitic entry method will become mainstream input method, more favourable.Since voice input is more natural, learning cost is lower, slowly by more Multi-user is received.Either child or old man can quickly learn to use, and get used to this input mode.

Existing speech recognition technology has used a large amount of living scene data for training, defeated under different scenes to identify The voice entered, thus it is too low for some customization scene Recognition precision, it can not be identified for some customization scenes, waste processing Resource reduces treatment effeciency.

Summary of the invention

Of the existing technology in order to overcome the problems, such as, the object of the present invention is to provide one kind to improve language by establishing resources bank The method of sound accuracy of identification realizes and carries out speech recognition according to identification resource corresponding with voice input scene, improves knowledge Other precision and treatment effeciency.

The purpose of the present invention is achieved through the following technical solutions:

A method of precision of identifying speech is improved by establishing resources bank, it is characterised in that the following steps are included:

Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene general Identify resource；

Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, according to voice The input scene of information identifies the voice messaging using the speech recognition library.

Step 101 is specifically: when user needs to carry out voice input, to man machine language's input interface input voice letter Then breath is identified the voice messaging of user's input, to be performed corresponding processing based on recognition result；Different voices It is different to carry out respective treated process based on recognition result for input application；

It is searched according to recognition result to user feedback after being identified to the voice messaging of user's input for phonetic search application Hitch fruit；Alternatively, being directed to instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result It is shown in input frame at text information；

For the voice messaging inputted under different scenes, proprietary identification resource corresponding with customized voice scene, Yi Jiyu are configured The corresponding universal identification resource of universal phonetic scene；

Step 102 is specifically: according to preconfigured proprietary identification resource corresponding with customized voice scene, and and common language The corresponding universal identification resource of sound field scape, establishing includes the proprietary speech recognition for identifying resource and the universal identification resource Library；When receiving the voice messaging of user's input, the input scene of voice messaging is determined, and determine the input field of voice messaging The type of scape, i.e. input scene are customized voice scene or universal phonetic scene, to obtain and input from speech recognition library The corresponding identification resource of scene type identifies the voice messaging of input.

The present invention, which is realized, carries out speech recognition according to identification resource corresponding with voice input scene, improves identification essence Degree and treatment effeciency.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

The flow chart of the audio recognition method of another embodiment of Fig. 2.

The flow chart of the audio recognition method of another embodiment of Fig. 3.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the application, and should not be understood as the limitation to the application.

A method of precision of identifying speech being improved by establishing resources bank, as shown in Figure 1, the audio recognition method packet It includes:

Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene general Identify resource.Specifically, audio recognition method provided in an embodiment of the present invention is applied to that there is the terminal of speech voice input function to set In standby.Under normal circumstances, terminal device realizes speech voice input function by man machine language's interactive interface, and specific voice input connects Mouth can be the equipment such as microphone.

It should be noted that terminal device can be mentioned by being able to access that the application of man machine language's interactive interface for user It inputs and services for voice, which can be selected according to actual needs, such as: the navigation with speech voice input function is answered With, search engine etc., the present embodiment to this with no restriction.When user needs to carry out voice input, connect to man machine language's input Then mouth input voice information identifies the voice messaging of user's input, to be located accordingly based on recognition result Reason.Different voices inputs application, and it is different to carry out respective treated process based on recognition result.Such as:

It is searched according to recognition result to user feedback after being identified to the voice messaging of user's input for phonetic search application Hitch fruit；Alternatively, being directed to instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result It is shown in input frame at text information.

For the voice messaging inputted under different scenes, in order to improve the precision and process performance of speech recognition, this implementation The speech recognition modeling that example provides configures proprietary identification resource corresponding with customized voice scene first, and with common language sound field The corresponding universal identification resource of scape.

It should be noted that the type of customized voice scene has very much, different customized voice scenes corresponds to different special There is identification resource, particular content can be configured and select according to the needs of different application scene, and the present embodiment does not do this Limitation, such as may include:

For the voice scene of digital map navigation, corresponding proprietary identification resource is place name identification resource；Alternatively, being directed to electric business platform Voice scene, corresponding proprietary identification resource be electric business product name identify resource；Or the voice field for film search Scape, corresponding proprietary identification resource are that movie name identifies resource.

Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, with basis The input scene of voice messaging identifies the voice messaging using the speech recognition library.Specifically, according to it is preconfigured with The corresponding proprietary identification resource of customized voice scene, and universal identification resource corresponding with universal phonetic scene, foundation includes The speech recognition library of proprietary the identification resource and the universal identification resource.In turn, when the voice letter for receiving user's input When breath, the input scene of voice messaging is determined, and determine the type of the input scene of voice messaging, i.e., input scene is customization language Sound field scape or universal phonetic scene, to obtain identification resource corresponding with input scene type to input from speech recognition library Voice messaging identified.

The audio recognition method of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene；Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes the customization that environment-identification is carried out for different vertical class scenes, according to identification resource corresponding with voice input scene Speech recognition is carried out, accuracy of identification and treatment effeciency are improved.

Fig. 2 is the flow chart of the audio recognition method of the application another embodiment.As shown in Fig. 2, step 102 it Afterwards, can with the following steps are included:

Step 201, the voice messaging of input is received.

Step 202, according to the determining input scene with the voice messaging of preset scene acquisition strategy.

Specifically, the voice messaging for receiving user's input, according to preset scene acquisition strategy it is determining with it is currently received The corresponding input scene of voice messaging.It is obtained it should be noted that different scenes can be preset according to the actual application Take strategy, the present embodiment with no restriction, such as may include: to this

Example one: the input scene of the voice messaging is determined according to application program；

Specifically, the application program that voice input is currently carried out according to user determines the input scene of the voice messaging.Example Such as: user is to digital map navigation application input voice information, it is determined that the input scene of the voice messaging is digital map navigation.

Example two: the input scene of the voice messaging is based on context determined；

Specifically, the input scene of the voice messaging is determined according to the context of user and other users session log.Example Such as: in instant messaging application, user is convenient content of travelling with the conversation content before other users, then the voice letter The input scene of breath is tourism scene.

Example three: the input scene of the voice messaging is determined according to geographical location information.

Specifically, the current geographical location information of user is obtained by the GPS information of terminal device, and then according to user Current geographical location information determines the input scene of the voice messaging.Such as: it is obtained when by the GPS information of terminal device When the current geographical location information of user is movie theatre, then the input scene of the voice messaging is film scene.

Step 203, the voice messaging of input is identified according to the input scene and the speech recognition library.

Specifically, according to the input scene of current speech information, and the speech recognition library that pre-establishes is to the language of input Message breath is identified, is specifically included:

If the input scene of current speech is preparatory customized voice scene, obtained and the customization language from speech recognition library The corresponding proprietary identification resource of sound field scape, and the voice messaging is identified using proprietary identification resource；

If the input scene of current speech is not preparatory customized voice scene, universal identification money is obtained from speech recognition library Source, and the voice messaging is identified using proprietary identification resource.

Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application is further advanced by the language for receiving input Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.

Fig. 3 is that the flow chart of the audio recognition method of the application another embodiment is described as follows referring to Fig. 3:

Step 1: after receiving voice messaging, judging whether to believe according to preset scene acquisition strategy is determining with the voice The input scene of breath.

Step 2: if the input scene of voice messaging can not be determined, using the universal identification resource to the voice Information is identified.

Step 3: if can determine the input scene of voice messaging, judging whether it is the voice scene customized in advance.

Step 4: if the input scene is preparatory customized voice scene, using in the speech recognition library with it is described fixed The corresponding proprietary identification resource of voice scene processed, identifies the voice messaging；

Step 5: if the input scene is not customized voice scene, using the universal identification in the speech recognition library Resource identifies the voice messaging.

Claims

1. a kind of method for improving precision of identifying speech by establishing resources bank, it is characterised in that the following steps are included:

2. the method according to claim 1 for improving precision of identifying speech by establishing resources bank, it is characterised in that: step 101 are specifically: when user needs to carry out voice input, to man machine language's input interface input voice information, then to user The voice messaging of input is identified, to be performed corresponding processing based on recognition result；Different voices inputs application, is based on It is different that recognition result carries out respective treated process；

3. the method according to claim 2 for improving precision of identifying speech by establishing resources bank, it is characterised in that: in step It is further comprising the steps of after rapid 102:

Step 201, the voice messaging of input is received；

Step 202, according to the determining input scene with the voice messaging of preset scene acquisition strategy；

4. the method according to claim 2 for improving precision of identifying speech by establishing resources bank, it is characterised in that: in step It is further comprising the steps of after rapid 102:

Step 1: after receiving voice messaging, judging whether to believe according to preset scene acquisition strategy is determining with the voice The input scene of breath；

Step 2: if the input scene of voice messaging can not be determined, using the universal identification resource to the voice messaging It is identified；

Step 3: if can determine the input scene of voice messaging, judging whether it is the voice scene customized in advance；

Step 4: if the input scene is preparatory customized voice scene, using in the speech recognition library with the customization language The corresponding proprietary identification resource of sound field scape, identifies the voice messaging；