CN109272995A

CN109272995A - Audio recognition method, device and electronic equipment

Info

Publication number: CN109272995A
Application number: CN201811126924.0A
Authority: CN
Inventors: 叶顺平; 邹明
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Chumen Wenwen Information Technology Co Ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2019-01-25

Abstract

The embodiment of the invention discloses a kind of audio recognition method, device and electronic equipments.Wherein method includes: the voice messaging for obtaining collected user's input；Voice messaging is identified with the language model that user matches according at least one, obtains speech recognition result.The embodiment of the present invention identifies voice messaging with the language model that user matches using at least one, this language model by matching with user knows otherwise voice messaging, realize the purpose that voice messaging is identified by appointed language model, not only increase the accuracy to voice messaging identification, it ensure that recognition result can meet the individual demand of user, and improve the accuracy and recognition efficiency of speech recognition, solves the technical issues of even wrong identification can not be identified caused by the indiscriminate general language model used in the related technology is identified, improve user experience.

Description

Audio recognition method, device and electronic equipment

Technical field

The present embodiments relate to technical field of voice recognition, more particularly to a kind of audio recognition method, device and electricity Sub- equipment.

Background technique

With the wide range of applications that the development of speech recognition technology, voice wake up, such as robot, movement are eventually End, wearable device, smart home device, mobile unit etc..However, in the related technology, speech recognition technology is only capable of identification one A little routine words, there is technical issues that can not to identify to strongly professional or even uncommon word or.

Summary of the invention

In view of this, can be realized the embodiment of the invention provides a kind of audio recognition method, device and electronic equipment State technical problem.

To solve the above-mentioned problems, the embodiment of the present invention mainly provides the following technical solutions:

In a first aspect, the embodiment of the invention provides a kind of audio recognition methods, this method comprises:

Obtain the voice messaging of collected user's input；

Voice messaging is identified with the language model that user matches according at least one, obtains speech recognition knot Fruit.

Second aspect, the embodiment of the present invention also provide a kind of speech recognition equipment, which includes:

Voice obtains module, for obtaining the voice messaging of collected user's input；

Speech recognition module, for being known with the language model that user matches to voice messaging according at least one Not, speech recognition result is obtained.

The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, comprising:

At least one processor；

And at least one processor connected to the processor, bus；Wherein,

Processor, memory complete mutual communication by bus；

Processor is used to call the program instruction in memory, to execute audio recognition method.

Fourth aspect, the embodiment of the present invention also provide a kind of non-transient computer readable storage medium, non-transient computer Readable storage medium storing program for executing stores computer instruction, and computer instruction makes computer execute audio recognition method.

By above-mentioned technical proposal, technical solution provided in an embodiment of the present invention is at least had the advantage that using at least One identifies voice messaging with the language model that user matches, this language model pair by matching with user Voice messaging is known realizes the purpose that voice messaging is identified by appointed language model otherwise, not only increases To the accuracy of voice messaging identification, it ensure that recognition result can meet the individual demand of user, and improve voice The accuracy and recognition efficiency of identification, solving the indiscriminate general language model used in the related technology and identify causes Can not identify even wrong identification the technical issues of, improve user experience.

Above description is only the general introduction of technical solution of the embodiment of the present invention, in order to better understand the embodiment of the present invention Technological means, and can be implemented in accordance with the contents of the specification, and in order to allow above and other mesh of the embodiment of the present invention , feature and advantage can be more clearly understood, the special specific embodiment for lifting the embodiment of the present invention below.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention The limitation of embodiment.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 shows a kind of flow diagram of audio recognition method provided in an embodiment of the present invention；

Fig. 2 shows a kind of process signals for the language model that determination matches with user provided in an embodiment of the present invention Figure；

Fig. 3 shows a kind of structural schematic diagram of speech recognition equipment provided in an embodiment of the present invention；

Fig. 4 shows the structural schematic diagram of another speech recognition equipment provided in an embodiment of the present invention；

Fig. 5 shows the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Embodiment one

As shown in Figure 1, including the following steps: the present invention provides a kind of audio recognition method

Step S101, the voice messaging of collected user's input is obtained.

Method provided in an embodiment of the present invention is executed by electronic equipment, and when electronic equipment is in running order, real Now obtain the purpose of the voice messaging of collected user's input.

Specifically, electronic equipment may include terminal device or cloud server.

When concrete application, the voice messaging of user's input can be terminal device acquisition, be also possible to and terminal device The voice capture device acquisition of connection.In the voice capture device that the voice messaging of user's input is by connecting with terminal device When acquisition, which can be the devices such as microphone.When practical application, the company of voice capture device and terminal device It connects either being also possible to connect by radio connections such as bluetooths, to guarantee voice collecting by data line connection The voice messaging of the collected user's input of equipment is sent to terminal device, and terminal device or cloud server is made to get voice Acquire the voice messaging of the collected user's input of equipment.

Step S102, voice messaging is identified with the language model that user matches according at least one, obtains language Sound recognition result.

Specifically, the language model to match with user, which can be, can targetedly identify that the affiliated specific area of user is special The language model of industry noun, it is language model such as corresponding with computer field professional term, corresponding with legal field professional term Language model, the language model of financial field professional term etc..It is carried out by the language model specifically to match with user Speech recognition can not only improve the speed of speech recognition, and can be improved the accuracy of speech recognition.

Specifically, the language model to match with user can be one, be also possible to multiple.

Audio recognition method provided in an embodiment of the present invention, the language model to be matched using at least one with user is to language Message breath is identified that this language model by matching with user can targetedly identify the affiliated specific area of user Professional term, realize the purpose that speech recognition is carried out by the specified language model that matches with user, can not only The accuracy for improving speech recognition, ensure that recognition result can meet customer objective, and improve the speed of speech recognition, It solves recognition result caused by the indiscriminate general language model progress speech recognition used in the related technology not meeting User demand or even unrecognized technical problem, improve user experience.

When practical application, using the electronic equipment of audio recognition method provided by the invention either terminal device, It can be cloud server.

When electronic equipment is cloud server, acquired by terminal device or the audio collecting device being connect with terminal device The voice messaging of user's input, and the voice messaging that collected user inputs is sent to cloud server, cloud server It is after being identified with the language model that user matches to voice messaging according at least one, obtained speech recognition result is anti- It is fed to terminal device, is handled by terminal device according to the speech recognition result that cloud server is fed back, to complete voice letter The identification of breath.

It is terminal device (such as mobile phone, sound equipment, tablet computer, laptop, PC machine, wearable device in electronic equipment Deng) when, terminal device, such as mobile phone, mobile phone obtain the voice of the collected user's input of microphone on mobile phone under open state Information, mobile phone identifies voice messaging with the language model that user matches according at least one, obtains speech recognition knot Fruit.

In some instances, voice messaging is carried out to identify it in the language model that matched according at least one with user Before, this method further include: determine at least one language model to match with user.

For the embodiment of the present invention, after determining the language model that at least one matches with user, it is therefore intended that will determine At least one language model to match with user as the language mould to match with user identified to voice messaging Type realizes the specified language model identified to voice messaging, to improve the accuracy identified to voice messaging.

Specifically, in some embodiments, a kind of realization side for the language model that at least one matches with user is determined Formula may include step S201 (not shown) or step S202.

Step S201, it is directed to the selection instruction of language model based on the user received, determines at least one and user's phase Matched language model.

The embodiment of the present invention is directed to the selection instruction of language model using the user received, is matched with determination with user Language model.Specifically, the selection instruction of user can be one language model of selection, be also possible to select multiple language moulds Type.

Specifically, selection instruction can be what user was sent by the human-computer interaction interface that terminal device provides.For example, eventually The mark of the language model there are many specific area professional term is provided on the human-computer interaction interface that end equipment provides, such as the mark It can be law, medicine, computer, finance etc..It is assumed that user is identified as law, computer what human-computer interaction interface selected And medicine, human-computer interaction interface generates corresponding selection instruction according to the mark that user selects, so that terminal device is according to reception To the determining language model to match with user of selection instruction, finally determining language model is legal field professional term The language model of language model, the language model of computer field professional term and financial field professional term.

It is below to be retouched for terminal device determines the language model that at least one matches with user by electronic equipment It states.

When practical application, terminal device can be mobile phone, pad, notebook, wearable device and such as smartwatch, intelligence The smart machines such as energy speaker.All language models can be stored in terminal device local, also can store beyond the clouds.Cause This, the mode of the determining language model to match with user can be there are two types of mode: one is be directed to based on the user received The language model of storage beyond the clouds is downloaded to terminal device, will be downloaded to the language of terminal device by the selection instruction of language model Model is sayed as the language model to match with user, to identify to voice messaging；Another kind is, based on the use received Family is directed to the selection instruction of language model, and determining and user matches in the language model for being stored in terminal device local Language model.

As shown in Fig. 2, step S202 includes step S2021 and S2022.

Step S2021, the history input record and/or user property of user are obtained.

In the embodiment of the present invention, the history input record of user is the content of certain section of time user input in the past, wherein is used The source of the content of family input can be the content of text input, be also possible to the content of voice input.Specifically, user inputs Content can be in search engine input, be also possible to input in input method, when practical application, can also be and utilize it The input of its software tool.For example, nearly middle of the month, user is in a certain input method (such as search dog input method, Baidu's input method) The content of input relates generally to edit code, novel is write, paper.

In the embodiment of the present invention, user property may include user's occupation, profession, interest etc., wherein user interest can be with User determines according to preset interest categorizing selection.Specifically, user interest may include star, sport, film, prose, Law, science and technology etc..

Step S2022, at least one corresponding field of user is determined according to history input record and/or user property, and Using the corresponding language model at least one field as the language model to match with user.

The embodiment of the present invention by the history input record or user property of user determine user it is corresponding at least one Field, so that corresponding language model is determined according at least one determining field, to complete to determine at least one and user's phase The purpose of matched language model.

When concrete application, the history input record of user can be only obtained, or only obtains user property, or obtain simultaneously The history input record and user property of user, to realize the purpose of the determining language model to match with user.

It should be noted that determining that at least one is referred to walk with the language model that user matches when practical application Rapid S201 is also referred to step S202, or referring concurrently to step S201 and step S202.

In some instances, in reference step S201, referring to step S202, or referring concurrently to step S201 and step After S202 determines the language model that at least one matches with user, this method further include: be based on the customized dictionary of user, more The language model newly to match with user.

The embodiment of the present invention by the language model that at least one matches with user by the customized Word library updating of user, So that in speech recognition, at least one updated language model to match with user can be quickly customized according to user Voice lexicon identified.

Specifically, the customized dictionary of user can be customized voice lexicon, or customized context word Library.

For example, when the customized dictionary of user is customized voice lexicon, in the customized dictionary of user: customized language The corresponding result of sound " A " is " new information ", and the corresponding result of customized voice " B " is " DNA "；Customized voice " C " is corresponding As a result it is " scientific and technological creation destiny ".In speech recognition, language mould that the customized Word library updating of user and user are matched After type, it is updated at least one when detecting that voice of user's input is " A " with the language model that user matches, without into The identification of one step, can directly export result is " new information ", to accelerate recognition efficiency.

For example, when the customized dictionary of user is context dictionary, it is assumed that the language model to match with user is law neck The language model of domain professional term includes the commonly used medicine special term " A Erci of user in the customized context dictionary of user The silent disease in sea ", the commonly used financial special term " inter-bank lending and borrowing " of user, by user in the context dictionary, commonly professional word is updated to After the language model of legal field professional term, when so as to later period speech recognition, the language of updated legal field professional term Speech model can quickly identify the professional word in nonlegal field while identifying legal field profession word.

Specifically, in some embodiments, determine that the another of the language model that at least one matches with user is realized Mode, comprising: step S301, step S302 and step S303.

Wherein, step S301 (not shown), the acquisition customized dictionary of user.

Step S302 (not shown) is based on the customized dictionary of user, generates the individualized language model of user；

The individualized language model of user is determined as the language mould to match with user by step S303 (not shown) Type.

The embodiment of the present invention generates the individualized language model for being suitble to the user according to the customized dictionary of user, by this Property language model be determined as the language model to match with user, due to individualized language model be best suitable for user input practise Used speech model, thus in speech recognition process, recognition result can be quickly determined according to individualized language model realization Purpose.

In some embodiments, referring to step S201, step S202 or referring concurrently to step S201 and step S202 After determining the language model that at least one matches with user, step S301 to step S303 in acceptable reference the present embodiment, Individualized language model is determined as to the language model to match with user.

For example, determining language mould referring to step S201, step S202 or referring concurrently to step S201 and step S202 Type is the language model and financial field professional term of the language model of legal field professional term, computer field professional term Language model after, while individualized language model being determined as to the language model to match with user.To utilize difference While the language model of field professional term carries out speech recognition, can quickly it be identified using individualized language model, Obtain recognition result.When practical application, different field profession is existed simultaneously in the determining language model to match with user When the language model and individualized language model of noun, the priority that individualized language model can be set is greater than and user's phase The language model matched, to be identified in identification process with individualized language model optimization.

The embodiment of the present invention is said for method provided in an embodiment of the present invention is applied to search engine below It is bright.

When search engine operates in backstage or foreground, after mobile phone obtains the voice messaging for collecting user's input, utilize It has been loaded at least one local down to identify voice messaging with the language model that user matches, has obtained speech recognition knot The input window of mobile phone searching engine is output to after fruit, while after command deployment engine is retrieved with the speech recognition result, Search result is shown.Or after mobile phone obtains the voice messaging for collecting user's input, voice messaging is sent to clothes Business device, identifies the voice messaging by least one language model to match with user prestored on server, obtains It after speech recognition result, is scanned for based on the speech recognition, and corresponding search result is fed back into mobile phone terminal.

The audio recognition method provided in order to further illustrate the present invention, below with by method provided in an embodiment of the present invention It is applied to for input method application program and the embodiment of the present invention is illustrated.

When user downloads input method and installs, input method application program can be shown the language model of different field dictionary To user, so that user selects, which downloads corresponding language model to server according to the user's choice, and deposits It stores up to local using as the language model to match with user；Or input method application program downloads all language moulds during installation Type is to locally, and during installation, the identification information of the language model of different field dictionary can be shown to by input method application program User, so that user selects, so that the later period is when identifying voice messaging, preferential corresponding language according to the user's choice Model is identified, after obtaining recognition result, recognition result is shown in editing interface, input is completed.

Embodiment two

It is illustrated in figure 3 a kind of apparatus structure schematic diagram of speech recognition provided in an embodiment of the present invention, the present invention is implemented The speech recognition equipment 30 of example may include: that voice obtains module 301, speech recognition module 302.

Wherein, voice obtains module 301, for obtaining the voice messaging of collected user's input；

Speech recognition module 302, the language model for being matched according at least one with user carry out voice messaging Identification, obtains speech recognition result.

Speech recognition equipment provided in an embodiment of the present invention, the language model to be matched using at least one with user is to language Message breath is identified that this language model by matching with user knows otherwise voice messaging, realizes The purpose that voice messaging is identified by appointed language model not only increases the accuracy to voice messaging identification, ensure that Recognition result can meet the individual demand of user, and improve the accuracy and recognition efficiency of speech recognition, solve The indiscriminate general language model used in the related technology, which carries out identification, to be led to not identify that the technology of even wrong identification is asked Topic, improves user experience.

The audio recognition method that the embodiment of the present invention one provides can be performed in the speech recognition equipment of the present embodiment, realizes former Manage similar, details are not described herein again.

Further, as shown in figure 4, the device 30 further include: the first model determining module 303, customized dictionary obtain mould Block 304, personalized model generation module 305 and the second model determining module 306.

Wherein, the first model determining module 303 is for matching language model to voice with user according at least one Before information is identified, at least one language model to match with user is determined；

Customized dictionary obtains module 304, for obtaining the customized dictionary of user；

Personalized model generation module 305 generates the individualized language mould of user for being based on the customized dictionary of user Type,

Second model determining module 306 is used in the language model to be matched according at least one with user to voice messaging Before being identified, the individualized language model of user is determined as the language model to match with user.

In some embodiments, the first model determining module 303 includes: 3031 (not shown) of the first determination unit, Wherein, the first determination unit 3031 determines at least one for being directed to the selection instruction of language model based on the user received The language model to match with user.

In some embodiments, the first model determining module 303 include: 3032 (not shown) of user data cell and Second determination unit, 3033 (not shown), wherein

User data cell 3031, for obtaining the history input record and/or user property of user；

Second determination unit 3033, for determining that user is corresponding at least according to history input record and/or user property One field, and using the corresponding language model at least one field as the language model to match with user.

Further, as shown in figure 4, the device 30 further include: model determining module 307 again.

Wherein, model determining module 307 again update the language to match with user for being based on the customized dictionary of user Say model.

What speech recognition equipment provided in this embodiment can be matched based on the customized Word library updating of user and user Language model identifies voice messaging according to the customized dictionary of user with realizing, improves the purpose of recognition efficiency.

When concrete application, speech recognition equipment provided in an embodiment of the present invention can not include that customized dictionary obtains module With personalized model generation module, realize by obtaining the customized dictionary of user, to realize the language for updating and matching with user Say the purpose of model.

The audio recognition method that embodiment one provides, realization principle can be performed in the speech recognition equipment of the embodiment of the present invention Similar, details are not described herein again.

Embodiment three

The embodiment of the invention provides a kind of electronic equipment, as shown in figure 5, electronic equipment shown in fig. 5 600 includes: place Manage device 6001 and memory 6003.Wherein, processor 6001 is connected with memory 6003, is such as connected by bus 6002.Into one Step ground, electronic equipment 600 can also include transceiver 6006.It should be noted that transceiver 6006 is not limited in practical application One, the structure of the electronic equipment 600 does not constitute the restriction to the embodiment of the present invention.

Processor 6001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by the disclosure of invention Various illustrative logic blocks, module and circuit.Processor 6001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..

Bus 6002 may include an access, and information is transmitted between said modules.Bus 6002 can be pci bus or Eisa bus etc..Bus 6002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 5 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.

Memory 6003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.

Memory 6003 is used to store the application code for executing the present invention program, and is held by processor 6001 to control Row.Processor 6001 is for executing the application code stored in memory 6003, to realize Fig. 3 and embodiment illustrated in fig. 4 The speech recognition equipment of offer.

Electronic equipment provided in an embodiment of the present invention using audio recognition method is matched using at least one with user Language model voice messaging is identified, it is this by being identified with the language model that user matches to voice messaging Mode, realize the purpose that voice messaging is identified by appointed language model, not only increase to voice messaging identification Accuracy ensure that recognition result can meet the individual demand of user, and improve the accuracy and knowledge of speech recognition Other efficiency, solve the indiscriminate general language model used in the related technology carry out identification lead to not identification even mistake The technical issues of identification, improves user experience.

Example IV

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, non-transient computer readable storage medium Computer instruction is stored, computer instruction makes computer execute audio recognition method shown in above-mentioned each method embodiment one.

A kind of non-transient computer readable storage medium provided in an embodiment of the present invention utilizes compared with prior art At least one identifies voice messaging with the language model that user matches, this language mould by matching with user Type knows voice messaging realizes the purpose that voice messaging is identified by appointed language model otherwise, not only mentions The high accuracy to voice messaging identification, ensure that recognition result can meet the individual demand of user, and improve The accuracy and recognition efficiency of speech recognition solve the indiscriminate general language model used in the related technology and are identified The technical issues of leading to not identification or even wrong identification, improve user experience.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the invention, it is noted that those skilled in the art are come It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as Protection scope of the present invention.

Claims

1. a kind of audio recognition method characterized by comprising

Obtain the voice messaging of collected user's input；

The voice messaging is identified with the language model that the user matches according at least one, obtains speech recognition As a result.

2. the method according to claim 1, wherein the language that matched according at least one with the user Before model identifies the voice messaging, further includes:

Determine at least one language model to match with the user.

3. according to the method described in claim 2, it is characterized in that, at least one language for matching with the user of the determination Say model, comprising:

It is directed to the selection instruction of language model based on the user received, determines what at least one matched with the user Language model.

4. according to the method described in claim 2, it is characterized in that, at least one language for matching with the user of the determination Say model, comprising:

Obtain the history input record and/or user property of user；

At least one corresponding field of the user is determined according to the history input record and/or user property, and will at least The corresponding language model in one field is as the language model to match with the user.

5. according to the method described in claim 2, it is characterized in that, at least one language for matching with the user of the determination Say model, comprising:

Obtain the customized dictionary of the user；

Based on the customized dictionary of the user, the individualized language model of the user is generated；

The individualized language model of the user is determined as the language model to match with the user.

6. the method according to claim 1, wherein the method also includes:

Based on the customized dictionary of the user, the language model to match with the user is updated.

7. a kind of speech recognition equipment characterized by comprising

Speech recognition module, the language model for being matched according at least one with the user carry out the voice messaging Identification, obtains speech recognition result.

8. device according to claim 7, which is characterized in that the speech recognition module according at least one with it is described User matches before language model identifies the voice messaging, described device further include:

Model determining module, for determining at least one language model to match with the user.

9. a kind of electronic equipment characterized by comprising

At least one processor；

And at least one processor, the bus being connected to the processor；Wherein,

The processor, memory complete mutual communication by the bus；

The processor is used to call the program instruction in the memory, is required described in any one of 1 to 6 with perform claim Audio recognition method.

10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes speech recognition described in any one of described computer perform claim requirement 1 to 6 Method.