CN105719649A

CN105719649A - Voice recognition method and device

Info

Publication number: CN105719649A
Application number: CN201610035394.3A
Authority: CN
Inventors: 穆向禹; 张东栋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-01-19
Filing date: 2016-01-19
Publication date: 2016-06-29
Anticipated expiration: 2036-01-19
Also published as: CN105719649B

Abstract

The invention provides a voice recognition method and a device. The method comprises steps of configuring special recognition resources corresponding to customized voice scenes, and general recognition resources corresponding to general voice scenes; and establishing a voice recognition base including the special recognition resources and the general recognition resources so as to use the voice recognition base to recognize voice information according to input scenes of the voice information. According to the invention, voice recognition for recognition resources corresponding to the voice input scenes is achieved and recognition precision and processing efficiency are improved.

Description

Audio recognition method and device

Technical field

The application relates to technical field of voice recognition, particularly relates to a kind of audio recognition method and device.

Background technology

Along with the development of mobile Internet, giant-screen mobile phone becomes main flow, no matter keyboard or hand-written, all has various restriction.Phonitic entry method will become main flow input method, more favourable.Owing to phonetic entry is more natural, learning cost is lower, is slowly accepted by more users.No matter it is child or old man, can both quickly learn to use, and this input mode accustomed to using.

Existing speech recognition technology employs substantial amounts of living scene data for training, to identify the voice of input under different scene, thus too low for some customization scene Recognition precision, for some customization scene None-identified, waste process resource, reduce treatment effeciency.

Summary of the invention

One of technical problem that the application is intended to solve in correlation technique at least to a certain extent.

For this, first purpose of the application is in that to propose a kind of audio recognition method, the method achieves and carries out speech recognition according to the identification resource corresponding with phonetic entry scene, improves accuracy of identification and treatment effeciency.

Second purpose of the application is in that to propose a kind of speech recognition equipment.

For reaching above-mentioned purpose, the application first aspect embodiment proposes a kind of audio recognition method, including: configure the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene；Set up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.

The audio recognition method of the embodiment of the present application, by configuring the proprietary identification resource corresponding with customized voice scene, and the universal identification resource corresponding with universal phonetic scene；Set up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.Hereby it is achieved that carry out speech recognition according to the identification resource corresponding with phonetic entry scene, improve accuracy of identification and treatment effeciency.

For reaching above-mentioned purpose, the application second aspect embodiment proposes a kind of speech recognition equipment, including: configuration module, for configuring the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene；Set up module, for setting up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.

The speech recognition equipment of the embodiment of the present application, by configuring the proprietary identification resource corresponding with customized voice scene, and the universal identification resource corresponding with universal phonetic scene；Set up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.Hereby it is achieved that carry out speech recognition according to the identification resource corresponding with phonetic entry scene, improve accuracy of identification and treatment effeciency.

Accompanying drawing explanation

The present invention above-mentioned and/or that add aspect and advantage will be apparent from easy to understand from the following description of the accompanying drawings of embodiments, wherein:

Fig. 1 is the flow chart of the audio recognition method of one embodiment of the application；

Fig. 2 is the flow chart of the audio recognition method of another embodiment of the application；

Fig. 3 is the flow chart of the audio recognition method of another embodiment of the application；

Fig. 4 is the structural representation of the speech recognition equipment of one embodiment of the application；

Fig. 5 is the structural representation of the speech recognition equipment of another embodiment of the application.

Detailed description of the invention

Being described below in detail embodiments herein, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of same or like function from start to finish.The embodiment described below with reference to accompanying drawing is illustrative of, it is intended to be used for explaining the application, and it is not intended that restriction to the application.

Below with reference to the accompanying drawings audio recognition method and the device of the embodiment of the present application are described.

Fig. 1 is the flow chart of the audio recognition method of one embodiment of the application.

As it is shown in figure 1, this audio recognition method includes:

Step 101, configures the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene.

Specifically, the audio recognition method that the embodiment of the present invention provides is applied to be had in the terminal unit of speech voice input function.Generally, terminal unit realizes speech voice input function by man machine language's interactive interface, and concrete speech input interface can be the equipment such as mike.

It should be noted that, what terminal unit can pass through to be able to access that man machine language's interactive interface should for providing the user phonetic entry service, this application can select according to actual needs, such as: having the navigation application of speech voice input function, search engine etc., this is not limited as by the present embodiment.

When user needs to carry out phonetic entry, to man machine language's input interface input voice information, then the voice messaging of user's input is identified, in order to process accordingly based on recognition result.Different phonetic entry application, the process carrying out respective handling based on recognition result is different.Such as:

Apply for phonetic search, after the voice messaging of user's input is identified, according to recognition result to user feedback Search Results；Or,

Applying for instant messaging, the voice messaging that user is inputted converts word-information display in input frame according to recognition result after being identified.

For the voice messaging of input under different scenes, in order to improve precision and the process performance of speech recognition, first the speech recognition modeling that the present embodiment provides configures the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene.

It should be noted that, the type of customized voice scene has a lot, the proprietary identification resource that different customized voice scenes is corresponding different, and particular content can be configured according to the needs of different application scene and select, the present embodiment is without limitation, for instance may include that

For the voice scene of digital map navigation, corresponding proprietary identification resource is place name identification resource；Or,

For the voice scene of electricity business's platform, corresponding proprietary identification resource is electricity business's trade name identification resource；Or,

For the voice scene of film search, corresponding proprietary identification resource is movie name identification resource.

Step 102, sets up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopts voice messaging described in described speech recognition library identification.

Specifically, according to the proprietary identification resource corresponding with customized voice scene being pre-configured with, and the universal identification resource corresponding with universal phonetic scene, set up the speech recognition library including described proprietary identification resource and described universal identification resource.

And then, when receiving the voice messaging of user's input, determine the input scene of voice messaging, and determine the type of the input scene of voice messaging, namely input scene is customized voice scene or universal phonetic scene, thus obtaining the identification resource corresponding with input scene type from speech recognition library, the voice messaging of input is identified.

The audio recognition method of the embodiment of the present application, by configuring the proprietary identification resource corresponding with customized voice scene, and the universal identification resource corresponding with universal phonetic scene；Set up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.Hereby it is achieved that be identified the customization of environment for different vertical class scenes, carry out speech recognition according to the identification resource corresponding with phonetic entry scene, improve accuracy of identification and treatment effeciency.

Fig. 2 is the flow chart of the audio recognition method of another embodiment of the application.

As in figure 2 it is shown, after step 102, it is also possible to comprise the following steps:

Step 201, receives the voice messaging of input.

Step 202, determines the input scene with described voice messaging according to default scene acquisition strategy.

Specifically, receive the voice messaging of user's input, determine the input scene corresponding with currently received voice messaging according to default scene acquisition strategy.

It should be noted that can need to pre-set different scene acquisition strategies according to practical application, this is not limited as by the present embodiment, for instance may include that

Example one: determine the input scene of described voice messaging according to application program；

Specifically, the application program currently carrying out phonetic entry according to user determines the input scene of described voice messaging.Such as: user is to digital map navigation application input voice information, it is determined that the input scene of described voice messaging is digital map navigation.

Or,

Example two: based on context determine the input scene of described voice messaging；

Specifically, the input scene of described voice messaging is determined according to the context of user Yu other user session records.Such as: in instant communications applications, the conversation content before user and other users is convenient content of travelling, then the input scene of described voice messaging is tourism scene.

Or,

Example three: determine the input scene of described voice messaging according to geographical location information.

Specifically, obtain, by the GPS information of terminal unit, the geographical location information that user is current, and then determine the input scene of described voice messaging according to the geographical location information that user is current.Such as: when the geographical location information that the GPS information acquisition user by terminal unit is current is movie theatre, then the input scene of described voice messaging is film scene.

Step 203, is identified the voice messaging of input according to described input scene and described speech recognition library.

Specifically, the input scene according to current speech information, and the speech recognition library pre-build to input voice messaging be identified, specifically include:

If the input scene of current speech is customized voice scene in advance, then from speech recognition library, obtain the proprietary identification resource corresponding with described customized voice scene, and apply proprietary identification resource described voice messaging is identified；

If the input scene of current speech is not customized voice scene in advance, from speech recognition library, obtains universal identification resource, and apply proprietary identification resource described voice messaging is identified.

Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application, it is further advanced by the voice messaging receiving input, determine the input scene with described voice messaging according to default scene acquisition strategy, according to described input scene and described speech recognition library, the voice messaging of input is identified.Hereby it is achieved that carry out speech recognition according to the identification resource corresponding with phonetic entry scene, improve accuracy of identification and treatment effeciency.

Fig. 3 is the flow chart of the audio recognition method of another embodiment of the application, referring to Fig. 3, is described as follows:

Step 1: after receiving voice messaging, it may be judged whether the input scene with described voice messaging can be determined according to default scene acquisition strategy.

Step 2: if the input scene of voice messaging can not be determined, then apply described universal identification resource and described voice messaging be identified.

Step 3: if can determine the input scene of voice messaging, then determine whether the voice scene customized in advance.

Step 4: if described input scene is customized voice scene in advance, then applies proprietary identification resource corresponding with described customized voice scene in described speech recognition library, described voice messaging is identified；

Step 5: if described input scene is not customized voice scene, then apply the described universal identification resource in described speech recognition library, described voice messaging be identified.

In order to realize above-described embodiment, the application also proposes a kind of speech recognition equipment.

Fig. 4 is the structural representation of the speech recognition equipment of one embodiment of the application.

As shown in Figure 4, this speech recognition equipment includes:

Configuration module 11, for configuring the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene；

Specifically, described proprietary identification resource includes at least one of:

Place name identification resource, search hot word identification resource, electricity business's trade name identification resource, movie name identification resource.

Set up module 12, for setting up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.

It should be noted that the aforementioned explanation to audio recognition method embodiment is also applied for the speech recognition equipment of this embodiment, repeat no more herein.

The speech recognition equipment of the embodiment of the present application, by configuring the proprietary identification resource corresponding with customized voice scene, and the universal identification resource corresponding with universal phonetic scene；Set up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.Hereby it is achieved that be identified the customization of environment for different vertical class scenes, carry out speech recognition according to the identification resource corresponding with phonetic entry scene, improve accuracy of identification and treatment effeciency.

Fig. 5 is the structural representation of the speech recognition equipment of another embodiment of the application, as it is shown in figure 5, based on embodiment illustrated in fig. 4, described device also includes:

Receiver module 13, for receiving the voice messaging of input；

Acquisition module 14, for determining the input scene with described voice messaging according to the scene acquisition strategy preset；

Identification module 15, for being identified the voice messaging of input according to described input scene and described speech recognition library.

In one embodiment, described acquisition module 14 is used for: the application program currently carrying out phonetic entry according to user determines the input scene of described voice messaging；

Or,

In one embodiment, described acquisition module 14 is used for: determine the input scene of described voice messaging according to the context of user Yu other user session records；

Or,

In one embodiment, described acquisition module 14 is used for: determine the input scene of described voice messaging according to the geographical location information that user is current.

In one embodiment, described identification module 15 is used for:

If described input scene is described customized voice scene, then applies proprietary identification resource corresponding with described customized voice scene in described speech recognition library, described voice messaging is identified；

If described input scene is not described customized voice scene, then applies the described universal identification resource in described speech recognition library, described voice messaging is identified；

In another embodiment, described identification module 15 is additionally operable to:

If described input scene can not be determined, then apply described universal identification resource and described voice messaging is identified.

Based on embodiment illustrated in fig. 4, the speech recognition equipment of the embodiment of the present application, it is further advanced by the voice messaging receiving input, determine the input scene with described voice messaging according to default scene acquisition strategy, according to described input scene and described speech recognition library, the voice messaging of input is identified.Hereby it is achieved that carry out speech recognition according to the identification resource corresponding with phonetic entry scene, improve accuracy of identification and treatment effeciency.

In the description of this specification, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example describe are contained at least one embodiment or the example of the application.In this manual, the schematic representation of above-mentioned term is necessarily directed to identical embodiment or example.And, the specific features of description, structure, material or feature can combine in one or more embodiments in office or example in an appropriate manner.Additionally, when not conflicting, the feature of the different embodiments described in this specification or example and different embodiment or example can be carried out combining and combining by those skilled in the art.

Additionally, term " first ", " second " are only for descriptive purposes, and it is not intended that indicate or imply relative importance or the implicit quantity indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can express or implicitly include at least one this feature.In the description of the present application, " multiple " are meant that at least two, for instance two, three etc., unless otherwise expressly limited specifically.

Describe in flow chart or in this any process described otherwise above or method and be construed as, represent and include the module of code of executable instruction of one or more step for realizing custom logic function or process, fragment or part, and the scope of the preferred implementation of the application includes other realization, wherein can not press order that is shown or that discuss, including according to involved function by basic mode simultaneously or in the opposite order, performing function, this should be understood by embodiments herein person of ordinary skill in the field.

Represent in flow charts or in this logic described otherwise above and/or step, such as, it is considered the sequencing list of executable instruction for realizing logic function, may be embodied in any computer-readable medium, use for instruction execution system, device or equipment (such as computer based system, including the system of processor or other can from instruction execution system, device or equipment instruction fetch the system performing instruction), or use in conjunction with these instruction execution systems, device or equipment.For the purpose of this specification, " computer-readable medium " can be any can comprise, store, communicate, propagate or transmission procedure is for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically (non-exhaustive list) of computer-readable medium includes following: have the electrical connection section (electronic installation) of one or more wiring, portable computer diskette box (magnetic device), random access memory (RAM), read only memory (ROM), erasable edit read only memory (EPROM or flash memory), fiber device, and portable optic disk read only memory (CDROM).Additionally, computer-readable medium can even is that the paper that can print described program thereon or other suitable media, because can such as by paper or other media be carried out optical scanning, then carry out editing, interpreting or be processed to electronically obtain described program with other suitable methods if desired, be then stored in computer storage.

Should be appreciated that each several part of the application can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple steps or method can realize with the storage software or firmware in memory and by suitable instruction execution system execution.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: there is the discrete logic of logic gates for data signal realizes logic function, there is the special IC of suitable combination logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc..

Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries can be by the hardware that program carrys out instruction relevant and complete, described program can be stored in a kind of computer-readable recording medium, this program upon execution, including the step one or a combination set of of embodiment of the method.

Additionally, each functional unit in each embodiment of the application can be integrated in a processing module, it is also possible to be that unit is individually physically present, it is also possible to two or more unit are integrated in a module.Above-mentioned integrated module both can adopt the form of hardware to realize, it would however also be possible to employ the form of software function module realizes.If described integrated module is using the form realization of software function module and as independent production marketing or use, it is also possible to be stored in a computer read/write memory medium.

Storage medium mentioned above can be read only memory, disk or CD etc..Although above it has been shown and described that embodiments herein, it is understandable that, above-described embodiment is illustrative of, it is impossible to be interpreted as the restriction to the application, and above-described embodiment can be changed in scope of the present application, revises, replace and modification by those of ordinary skill in the art.

Claims

1. an audio recognition method, it is characterised in that comprise the following steps:

Configure the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene；

Set up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.

2. the method for claim 1, it is characterised in that described proprietary identification resource includes at least one of:

3. method as claimed in claim 1 or 2, it is characterised in that also include:

Receive the voice messaging of input；

The input scene with described voice messaging is determined according to default scene acquisition strategy；

According to described input scene and described speech recognition library, the voice messaging of input is identified.

4. method as claimed in claim 3, it is characterised in that the input scene with described voice messaging is determined in the scene acquisition strategy that described basis is preset, including:

The application program currently carrying out phonetic entry according to user determines the input scene of described voice messaging；

Or,

The input scene of described voice messaging determined in context according to user Yu other user session records；

Or,

The input scene of described voice messaging is determined according to the geographical location information that user is current.

5. method as claimed in claim 3, it is characterised in that described according to described input scene and described speech recognition library, the voice messaging of input is identified, including:

If described input scene is not described customized voice scene, then applies the described universal identification resource in described speech recognition library, described voice messaging is identified.

6. method as claimed in claim 3, it is characterised in that also include:

7. a speech recognition equipment, it is characterised in that including:

Configuration module, for configuring the proprietary identification resource corresponding with customized voice scene and the universal identification resource corresponding with universal phonetic scene；

Set up module, for setting up the speech recognition library including described proprietary identification resource and described universal identification resource, with the input scene according to voice messaging, adopt voice messaging described in described speech recognition library identification.

8. device as claimed in claim 7, it is characterised in that described proprietary identification resource includes at least one of:

9. device as claimed in claim 7 or 8, it is characterised in that also include:

Receiver module, for receiving the voice messaging of input；

Acquisition module, for determining the input scene with described voice messaging according to the scene acquisition strategy preset；

Identification module, for being identified the voice messaging of input according to described input scene and described speech recognition library.

10. device as claimed in claim 9, it is characterised in that described acquisition module is used for:

Or,

11. device as claimed in claim 9, it is characterised in that described identification module is used for:

12. device as claimed in claim 9, it is characterised in that described identification module is additionally operable to: