CN105719649B

CN105719649B - Audio recognition method and device

Info

Publication number: CN105719649B
Application number: CN201610035394.3A
Authority: CN
Inventors: 穆向禹; 张东栋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-01-19
Filing date: 2016-01-19
Publication date: 2019-07-05
Anticipated expiration: 2036-01-19
Also published as: CN105719649A

Abstract

The application proposes a kind of audio recognition method and device, wherein this method comprises: configuration proprietary identification resource corresponding with customized voice scene, and universal identification resource corresponding with universal phonetic scene；Establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By audio recognition method provided by the present application and device, realizes and speech recognition is carried out according to identification resource corresponding with voice input scene, improve accuracy of identification and treatment effeciency.

Description

Audio recognition method and device

Technical field

This application involves technical field of voice recognition more particularly to a kind of audio recognition methods and device.

Background technique

With the development of mobile internet, large screen cell phone is at mainstream, no matter keyboard or hand-written, have various limitations. Phonitic entry method will become mainstream input method, more favourable.Since voice input is more natural, learning cost is lower, slowly by more Multi-user is received.Either child or old man can quickly learn to use, and get used to this input mode.

Existing speech recognition technology has used a large amount of living scene data for training, defeated under different scenes to identify The voice entered, thus it is too low for some customization scene Recognition precision, it can not be identified for some customization scenes, waste processing Resource reduces treatment effeciency.

Summary of the invention

The application is intended to solve at least some of the technical problems in related technologies.

For this purpose, first purpose of the application is to propose a kind of audio recognition method, the method achieve basis and languages The corresponding identification resource of sound input scene carries out speech recognition, improves accuracy of identification and treatment effeciency.

Second purpose of the application is to propose a kind of speech recognition equipment.

In order to achieve the above object, the application first aspect embodiment proposes a kind of audio recognition method, comprising: configure and fixed The corresponding proprietary identification resource of voice scene processed, and universal identification resource corresponding with universal phonetic scene；Establishing includes institute The speech recognition library for stating proprietary identification resource and the universal identification resource, with according to the input scene of voice messaging, using institute It states speech recognition library and identifies the voice messaging.

The audio recognition method of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene；Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes and carries out speech recognition according to identification resource corresponding with voice input scene, improves accuracy of identification and processing effect Rate.

In order to achieve the above object, the application second aspect embodiment proposes a kind of speech recognition equipment, comprising: configuration mould Block, for configuring proprietary identification resource corresponding with customized voice scene, and universal identification corresponding with universal phonetic scene Resource；Module is established, includes the proprietary speech recognition library for identifying resource and the universal identification resource for establishing, with root According to the input scene of voice messaging, the voice messaging is identified using the speech recognition library.

The speech recognition equipment of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene；Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes and carries out speech recognition according to identification resource corresponding with voice input scene, improves accuracy of identification and processing effect Rate.

Detailed description of the invention

Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:

Fig. 1 is the flow chart of the audio recognition method of the application one embodiment；

Fig. 2 is the flow chart of the audio recognition method of the application another embodiment；

Fig. 3 is the flow chart of the audio recognition method of the application another embodiment；

Fig. 4 is the structural schematic diagram of the speech recognition equipment of the application one embodiment；

Fig. 5 is the structural schematic diagram of the speech recognition equipment of the application another embodiment.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the application, and should not be understood as the limitation to the application.

Below with reference to the accompanying drawings the audio recognition method and device of the embodiment of the present application are described.

Fig. 1 is the flow chart of the audio recognition method of the application one embodiment.

As shown in Figure 1, the audio recognition method includes:

Step 101, proprietary identification resource corresponding with customized voice scene is configured, and corresponding with universal phonetic scene Universal identification resource.

Specifically, audio recognition method provided in an embodiment of the present invention is applied to the terminal device with speech voice input function In.Under normal circumstances, terminal device realizes speech voice input function, specific speech input interface by man machine language's interactive interface It can be the equipment such as microphone.

It should be noted that terminal device can be mentioned by being able to access that the application of man machine language's interactive interface for user It inputs and services for voice, which can be selected according to actual needs, such as: the navigation with speech voice input function is answered With, search engine etc., the present embodiment to this with no restriction.

It is then defeated to user to man machine language's input interface input voice information when user needs to carry out voice input The voice messaging entered is identified, to be performed corresponding processing based on recognition result.Different voices inputs application, based on knowledge It is different that other result carries out respective treated process.Such as:

It is anti-to user according to recognition result after being identified to the voice messaging of user's input for phonetic search application Present search result；Alternatively,

For instant messaging application, after being identified to the voice messaging of user's input, converted according to recognition result written Word information is shown in input frame.

For the voice messaging inputted under different scenes, in order to improve the precision and process performance of speech recognition, this implementation The speech recognition modeling that example provides configures proprietary identification resource corresponding with customized voice scene first, and with common language sound field The corresponding universal identification resource of scape.

It should be noted that the type of customized voice scene has very much, different customized voice scenes corresponds to different special There is identification resource, particular content can be configured and select according to the needs of different application scene, and the present embodiment does not do this Limitation, such as may include:

For the voice scene of digital map navigation, corresponding proprietary identification resource is place name identification resource；Alternatively,

For the voice scene of electric business platform, corresponding proprietary identification resource is that electric business product name identifies resource；Alternatively,

For the voice scene of film search, corresponding proprietary identification resource is that movie name identifies resource.

Step 102, establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, with basis The input scene of voice messaging identifies the voice messaging using the speech recognition library.

Specifically, according to preconfigured proprietary identification resource corresponding with customized voice scene, and and universal phonetic The corresponding universal identification resource of scene, establishing includes the proprietary speech recognition for identifying resource and the universal identification resource Library.

In turn, it when receiving the voice messaging of user's input, determines the input scene of voice messaging, and determines voice letter The type of the input scene of breath, i.e. input scene are customized voice scene or universal phonetic scene, thus from speech recognition library Identification resource corresponding with input scene type is obtained to identify the voice messaging of input.

The audio recognition method of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene；Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes the customization that environment-identification is carried out for different vertical class scenes, according to identification resource corresponding with voice input scene Speech recognition is carried out, accuracy of identification and treatment effeciency are improved.

Fig. 2 is the flow chart of the audio recognition method of the application another embodiment.

As shown in Fig. 2, after step 102, can with the following steps are included:

Step 201, the voice messaging of input is received.

Step 202, according to the determining input scene with the voice messaging of preset scene acquisition strategy.

Specifically, the voice messaging for receiving user's input, according to preset scene acquisition strategy it is determining with it is currently received The corresponding input scene of voice messaging.

It should be noted that different scene acquisition strategies, the present embodiment can be preset according to the actual application With no restriction to this, such as may include:

Example one: the input scene of the voice messaging is determined according to application program；

Specifically, the application program that voice input is currently carried out according to user determines the input field of the voice messaging Scape.Such as: user is to digital map navigation application input voice information, it is determined that the input scene of the voice messaging is led for map Boat.

Alternatively,

Example two: the input scene of the voice messaging is based on context determined；

Specifically, the input field of the voice messaging is determined according to the context of user and other users session log Scape.Such as: in instant messaging application, user is convenient content of travelling with the conversation content before other users, then described The input scene of voice messaging is tourism scene.

Alternatively,

Example three: the input scene of the voice messaging is determined according to geographical location information.

Specifically, the current geographical location information of user is obtained by the GPS information of terminal device, and then according to user Current geographical location information determines the input scene of the voice messaging.Such as: it is obtained when by the GPS information of terminal device When the current geographical location information of user is movie theatre, then the input scene of the voice messaging is film scene.

Step 203, the voice messaging of input is identified according to the input scene and the speech recognition library.

Specifically, according to the input scene of current speech information, and the speech recognition library that pre-establishes is to the language of input Message breath is identified, is specifically included:

If the input scene of current speech be preparatory customized voice scene, from speech recognition library obtain with it is described fixed The corresponding proprietary identification resource of voice scene processed, and the voice messaging is identified using proprietary identification resource；

If the input scene of current speech is not preparatory customized voice scene, universal identification is obtained from speech recognition library Resource, and the voice messaging is identified using proprietary identification resource.

Based on embodiment illustrated in fig. 1, the audio recognition method of the embodiment of the present application is further advanced by the language for receiving input Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.

Fig. 3 is that the flow chart of the audio recognition method of the application another embodiment is described as follows referring to Fig. 3:

Step 1: after receiving voice messaging, judging whether being capable of and institute predicate determining according to preset scene acquisition strategy The input scene of message breath.

Step 2: if the input scene of voice messaging can not be determined, using the universal identification resource to the voice Information is identified.

Step 3: if can determine the input scene of voice messaging, judging whether it is the voice scene customized in advance.

Step 4: if the input scene is preparatory customized voice scene, using in the speech recognition library with it is described fixed The corresponding proprietary identification resource of voice scene processed, identifies the voice messaging；

Step 5: if the input scene is not customized voice scene, using described general in the speech recognition library It identifies resource, the voice messaging is identified.

In order to realize above-described embodiment, the application also proposes a kind of speech recognition equipment.

Fig. 4 is the structural schematic diagram of the speech recognition equipment of the application one embodiment.

As shown in figure 4, the speech recognition equipment includes:

Configuration module 11, for configuring corresponding with customized voice scene proprietary identification resource, and with common language sound field The corresponding universal identification resource of scape；

Specifically, the proprietary identification resource includes at least one of:

Place name identification resource, search hot word identification resource, electric business product name identification resource, movie name identify resource.

Module 12 is established, includes the proprietary speech recognition for identifying resource and the universal identification resource for establishing Library, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.

It should be noted that the aforementioned voice for being also applied for the embodiment to the explanation of audio recognition method embodiment Identification device, details are not described herein again.

The speech recognition equipment of the embodiment of the present application, by configuring proprietary identification resource corresponding with customized voice scene, And universal identification resource corresponding with universal phonetic scene；Establish includes that the proprietary identification resource and the universal identification provide The speech recognition library in source, to identify the voice messaging using the speech recognition library according to the input scene of voice messaging.By This, realizes the customization that environment-identification is carried out for different vertical class scenes, according to identification resource corresponding with voice input scene Speech recognition is carried out, accuracy of identification and treatment effeciency are improved.

Fig. 5 is the structural schematic diagram of the speech recognition equipment of the application another embodiment, as shown in figure 5, being based on Fig. 4 institute Show embodiment, described device further include:

Receiving module 13, voice messaging for receiving input；

Module 14 is obtained, for according to the determining input scene with the voice messaging of preset scene acquisition strategy；

Identification module 15, for being known according to the input scene and the speech recognition library to the voice messaging of input Not.

In one embodiment, the acquisition module 14 is used for: the application program of voice input is currently carried out according to user Determine the input scene of the voice messaging；

Alternatively,

In one embodiment, the acquisition module 14 is used for: according to the context of user and other users session log Determine the input scene of the voice messaging；

Alternatively,

In one embodiment, the module 14 that obtains is used for: according to the current geographical location information determination of user The input scene of voice messaging.

In one embodiment, the identification module 15 is used for:

If the input scene is the customized voice scene, using in the speech recognition library with the customized voice The corresponding proprietary identification resource of scene, identifies the voice messaging；

If the input scene is not the customized voice scene, using the general knowledge in the speech recognition library Other resource, identifies the voice messaging；

In another embodiment, the identification module 15 is also used to:

If the input scene can not be determined, the voice messaging is known using the universal identification resource Not.

Embodiment based on shown in Fig. 4, the speech recognition equipment of the embodiment of the present application are further advanced by the language for receiving input Message breath, according to the determining input scene with the voice messaging of preset scene acquisition strategy, according to the input scene and The speech recognition library identifies the voice messaging of input.Hereby it is achieved that according to knowledge corresponding with voice input scene Other resource carries out speech recognition, improves accuracy of identification and treatment effeciency.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present application, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.

Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the application System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of application Type.

Claims

1. a kind of audio recognition method, which comprises the following steps:

Configure proprietary identification resource corresponding with customized voice scene, and universal identification corresponding with universal phonetic scene money Source；Wherein, different customized voice scenes corresponds to different proprietary identification resources；

Establishing includes the proprietary speech recognition library for identifying resource and the universal identification resource, according to the defeated of voice messaging Enter scene, and determine the type of the input scene, is obtained from the speech recognition library corresponding with the type of the input scene Identification resource the voice messaging is identified.

2. the method as described in claim 1, which is characterized in that the proprietary identification resource includes at least one of:

3. method according to claim 1 or 2, which is characterized in that further include:

Receive the voice messaging of input；

According to the determining input scene with the voice messaging of preset scene acquisition strategy；

The voice messaging of input is identified according to the input scene and the speech recognition library.

4. method as claimed in claim 3, which is characterized in that described according to preset scene acquisition strategy determination and institute's predicate The input scene of message breath, comprising:

The application program that voice input is currently carried out according to user determines the input scene of the voice messaging；

Alternatively,

The input scene of the voice messaging is determined according to the context of user and other users session log；

Alternatively,

The input scene of the voice messaging is determined according to the current geographical location information of user.

5. method as claimed in claim 3, which is characterized in that described according to the input scene and the speech recognition library pair The voice messaging of input is identified, comprising:

If the input scene is the customized voice scene, using in the speech recognition library with the customized voice scene Corresponding proprietary identification resource, identifies the voice messaging；

Universal identification money if the input scene is not the customized voice scene, in the application speech recognition library Source identifies the voice messaging.

6. method as claimed in claim 3, which is characterized in that further include:

If the input scene can not be determined, the voice messaging is identified using the universal identification resource.

7. a kind of speech recognition equipment characterized by comprising

Configuration module is used to configure proprietary identification resource corresponding with customized voice scene, and corresponding with universal phonetic scene Universal identification resource；Wherein, different customized voice scenes corresponds to different proprietary identification resources；

Module is established, includes the proprietary speech recognition library for identifying resource and the universal identification resource for establishing, with root It according to the input scene of voice messaging, and determines the type of the input scene, is obtained and the input from the speech recognition library The corresponding identification resource of the type of scene identifies the voice messaging.

8. device as claimed in claim 7, which is characterized in that the proprietary identification resource includes at least one of:

9. device as claimed in claim 7 or 8, which is characterized in that further include:

Receiving module, voice messaging for receiving input；

Module is obtained, for according to the determining input scene with the voice messaging of preset scene acquisition strategy；

Identification module, for being identified according to the input scene and the speech recognition library to the voice messaging of input.

10. device as claimed in claim 9, which is characterized in that the acquisition module is used for:

Alternatively,

11. device as claimed in claim 9, which is characterized in that the identification module is used for:

12. device as claimed in claim 9, which is characterized in that the identification module is also used to: