CN109389978A

CN109389978A - A kind of audio recognition method and device

Info

Publication number: CN109389978A
Application number: CN201811306260.6A
Authority: CN
Inventors: 韩雪; 王慧君; 毛跃辉; 陶梦春
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2018-11-05
Filing date: 2018-11-05
Publication date: 2019-02-26
Anticipated expiration: 2038-11-05
Also published as: CN109389978B

Abstract

The application provides a kind of audio recognition method and device.This method comprises: receiving voice according to the information of the first position of the first user prestored, which includes the first voice command and noise of the first user, and then according to the voiceprint of the first user prestored, the first voice command is identified from the voice.In the program, it goes to receive voice by the information of the first position of the first user prestored, again by the voiceprint of the first user prestored, the matched voice command of voiceprint of identification with the first user is removed from received voice, and the corresponding function of the voice command is executed, therefore the accuracy of the speech recognition in noise circumstance can be improved.

Description

A kind of audio recognition method and device

Technical field

This application involves voice control technology field more particularly to a kind of audio recognition methods and device.

Background technique

Speech-sound intelligent controls equipment, can receive the voice of user, and is parsed to obtain voice life to the voice of user It enables, is then gone to execute corresponding function according to voice command.

Existing speech-sound intelligent controls equipment, when in use, if the environment of surrounding is more noisy, the voice that user assigns Order is interfered by the environment of surrounding, and the voice command that speech-sound intelligent control equipment possibly can not be assigned according to user parses language Sound order, or the voice command to make mistake may be parsed.

Summary of the invention

The application provides a kind of audio recognition method and device, to improve speech-sound intelligent control equipment in noise circumstance Speech recognition accuracy.

In a first aspect, the application provides a kind of audio recognition method, this method comprises: according to the of the first user prestored The information of one position receives voice, which includes the first voice command and noise of the first user, then according to the prestored The voiceprint of one user identifies the first voice command from the voice.In the program, pass through the of the first user prestored The information of one position goes to receive voice, then the voiceprint of the first user by prestoring, gone from received voice identification with The matched voice command of the voiceprint of first user, and the corresponding function of the voice command is executed, therefore can be improved and making an uproar The accuracy of speech recognition in acoustic environment.

In one possible implementation, the information of the first position for the first user that above-mentioned basis prestores receives language Sound, comprising: according to the information of the first position of the first user prestored, voice collecting strategy is determined, then according to voice collecting Strategy receives voice.The voice collecting strategy are as follows: the phonetic incepting intensity of any position within the scope of phonetic incepting and the One distance is inversely proportional, wherein first distance is the distance between any position and first position, and phonetic incepting range includes first The first position of user.The program takes different positions different phonetic incepting intensity, and right by voice collecting strategy Phonetic incepting intensity near the first position of the first user prestored is stronger, facilitates the voice for preferably receiving the first user Order.

It in one possible implementation, can also include: receive the first user second before above-mentioned reception voice Voice command, and according to the second voice command, determine and store the first user voiceprint and/or first of the first user The information set.The information of the first position of the first user stored in the program is stored for determining above-mentioned voice collecting strategy The first user voiceprint for identifying the first voice command from received voice.

In one possible implementation, the above method can also include: to determine the first language according to the first voice command The information of the corresponding position of sound order, and according to the information of the corresponding position of the first voice command, update first location information.It should Scheme, after receiving the first voice command, the location information of the first user updated storage is adopted to adjust above-mentioned voice Collection strategy facilitates the voice command for preferably receiving the first user.

In one possible implementation, the voiceprint for the first user that above-mentioned basis prestores, identifies from voice First voice command out, comprising: according to the letter of the voiceprint of the first user prestored and the first position of the first user Breath, identifies first voice command from voice.The program, the voice command of identification is simultaneously with the first user's for prestoring The information of the first position of voiceprint and the first user, therefore in the program, the voice command accuracy identified is higher.

Second aspect, the application provide a kind of speech recognition equipment, the speech recognition equipment include: voice receiving unit, Voice recognition unit, wherein voice receiving unit, the information for the first position according to the first user prestored receive language Sound, the voice include the first voice command and noise of the first user.Voice recognition unit, for according to the first user prestored Voiceprint, the first voice command is identified from the voice.In the program, believed by the vocal print of the first user prestored Breath, removes the matched voice command of voiceprint of identification with the first user from received voice, and executes the voice command pair The function of answering, therefore the accuracy of the speech recognition in noise circumstance can be improved.

In one possible implementation, above-mentioned apparatus can also include determination unit, for according to first prestored The information of the first position of user determines voice collecting strategy, the voice collecting strategy are as follows: any within the scope of phonetic incepting Phonetic incepting intensity and the first distance of position are inversely proportional, wherein first distance between any position and first position away from From phonetic incepting range includes the first position of the first user.Above-mentioned voice receiving unit is specifically used for, according to voice collecting plan Slightly receive voice.The program takes different positions different phonetic incepting intensity by voice collecting strategy, and to prestoring The first user first position near phonetic incepting intensity it is stronger, facilitate preferably receive the first user voice life It enables.

In one possible implementation, above-mentioned voice receiving unit can be also used for receiving the second language of the first user Sound order, above-mentioned apparatus can also include Application on Voiceprint Recognition unit, auditory localization unit and storage unit, wherein Application on Voiceprint Recognition list Member is for determining the voiceprint of the first user according to the second voice command.Auditory localization unit is used to be ordered according to the second voice It enables, determines the information of the first position of the first user.Storage unit be used for store the first user voiceprint and/or first The information set.The information of the first position of the first user stored in the program is stored for determining above-mentioned voice collecting strategy The first user voiceprint for identifying the first voice command from received voice.

In one possible implementation, above-mentioned auditory localization unit, can be also used for, according to the first voice command, Determine the information of the corresponding position of the first voice command.Said memory cells can be also used for, corresponding according to the first voice command Position information, update the information of first position.The program, after receiving the first voice command, the position that updates storage Confidence breath facilitates the voice command for preferably receiving the first user to adjust above-mentioned voice collecting strategy.

In one possible implementation, above-mentioned voice recognition unit, is specifically used for, according to the first user's prestored The information of the first position of voiceprint and the first user identifies first voice command from voice.The program, identification The voice command information with the first position of the voiceprint of the first user and the first user that prestore, therefore the program simultaneously In, the voice command accuracy identified is higher.

The third aspect, the embodiment of the present invention provide a kind of network equipment, comprising:

Memory, for storing program instruction；

Processor executes aforementioned first according to the program of acquisition for calling the program instruction stored in the memory Method described in any embodiment in aspect or first aspect.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium Matter is stored with computer executable instructions, and the computer executable instructions are for making computer execute aforementioned first aspect or the Method described in any embodiment in one side.

Detailed description of the invention

Fig. 1 is a kind of audio recognition method flow diagram provided by the present application；

Fig. 2 is a kind of speech recognition application schematic diagram of a scenario provided by the present application；

Fig. 3 is a kind of speech recognition equipment schematic diagram provided by the present application；

Fig. 4 is a kind of structural schematic diagram of the network equipment provided by the present application.

Specific embodiment

In order to keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application make into One step it is described in detail.Concrete operation method in embodiment of the method also can be applied to Installation practice or system embodiment In.Wherein, in the description of the present application, unless otherwise indicated, the meaning of " plurality " is two or more.

Fig. 1 illustratively shows a kind of flow chart of audio recognition method provided by the present application.The audio recognition method It can be executed by speech recognition equipment.Language can be used for acoustic control TV, voice-control toy, mobile phone etc. in the speech recognition equipment The smart machine that sound is controlled perhaps can also be the chip in any of the above-described smart machine or can also be above-mentioned The functional module with speech identifying function in one smart machine.

Method includes the following steps:

Step 105, voice is received according to the information of the first position of the first user prestored.

Wherein, received voice includes the first voice command and noise of the first user, wherein noise may include first The sound or be that the first voice command can be generated in external environment that people's (being properly termed as second user) other than user issues The sound (such as vehicle whistle sound, sound of the wind) of interference.

Step 106, according to the voiceprint of the first user prestored, the first voice of the first user is identified from voice Order.

For step 106, since the first voice command for including in voice is sent by the first user, first voice The voiceprint of order is matched with the voiceprint of the first user prestored, therefore, can pass through the sound of the first user prestored Line information identifies the first voice command of the first user from received voice.

Through the above steps 105 and step 106, since the information of the first position by the first user prestored goes to receive Voice, then the voiceprint of the first user by prestoring, remove the voiceprint of identification with the first user from received voice Matched voice command, and the corresponding function of the voice command is executed, therefore the speech recognition in noise circumstance can be improved Accuracy.

In one possible implementation, before above-mentioned steps 105, can also include:

Step 104, according to the information of the first position of the first user prestored, voice collecting strategy is determined.

The voice collecting strategy are as follows: the phonetic incepting intensity and first distance of any position within the scope of phonetic incepting at Inverse ratio, wherein first distance is the distance between any position and first position, and phonetic incepting range includes the of the first user One position.The program takes different positions different phonetic incepting intensity by voice collecting strategy, and to the prestored Phonetic incepting intensity near the first position of one user is stronger, facilitates the voice command for preferably receiving the first user.Its In, the information of first position is pre-stored in speech recognition equipment, and the information of first position refers to the location of first user Information, such as the relative position between the co-ordinates of satellite or the first user and speech recognition equipment of the first user etc..

Step 105 can specifically be accomplished by the following way: receive voice according to voice collecting strategy.

For example, for example, the first position of the corresponding user of the information of the first position of user is coordinate A, there is coordinate B With two positions coordinate C, wherein coordinate B is less than coordinate C with a distance from coordinate A with a distance from coordinate A, then speech recognition equipment root It is greater than the sound issued at coordinate C according to receiving intensity of the determining voice collecting strategy to the sound issued at coordinate B.

The program takes different positions different phonetic incepting intensity by voice collecting strategy, and to prestoring Phonetic incepting intensity near the first position of first user is stronger, facilitates the voice command for preferably receiving the first user.

Step 101, the second voice command of the first user is received.

Wherein, the second voice command can be wake-up command, which enters work for adjusting speech recognition equipment Make state, which can be one section of specific sentence, such as " opening voice system ", when speech recognition equipment receives After the voice of " opening voice system ", the second voice command is determined that it is, and enter working condition.Second voice command comes from Above-mentioned first user.

After step 101, can also include:

Step 102, according to the second voice command, determine and store the voiceprint of the first user.

The voiceprint of the first user stored in a step 102 can be used in step 106 identifying from received voice The voice command of first user out.Here voiceprint is used to identify the sound characteristic of the first user.The sound of different users Line information is different, therefore voiceprint can be used and distinguish to the voice of different user.

It by step 101 and 102, realizes and the voiceprint of the first user is prestored, prestored to basis later Voiceprint from it is received include the first voice command and noise voice in identify the first voice command.Certainly, in advance The method for depositing the voiceprint of the first user is not limited to this, for example can also be started by initializing in speech recognition equipment When, carry out the typing of voiceprint.

In one possible implementation, after above-mentioned steps 101, can also include:

Step 103, according to the second voice command, determine and store the information of the first position of the first user.

Wherein, the second voice command is received in a step 101, and the information of first position refers to that the first user sends the Information the location of when two voice commands.The information of the first position of the first user stored in step 103 is used for step Voice collecting strategy is determined in 104.

It should be noted that there is no the stringent sequence that executes between above-mentioned steps 102 and step 103, such as can first hold Row step 102 executes step 103 again, perhaps first carries out step 103 and executes step 102 or step 102 and step 103 again and exists It is executed in one step.

In one possible implementation, after above-mentioned steps 106, can also include:

According to the first voice command, the information of the corresponding position of the first voice command is determined, and according to the first voice command The information of corresponding position updates first location information.With this solution, it after receiving the first voice command, updates The location information of first user of storage helps preferably to receive the first user's to adjust above-mentioned voice collecting strategy Voice command.

In one possible implementation, above-mentioned steps 106 are specifically as follows: according to the vocal print of the first user prestored The information of the first position of information and the first user identifies first voice command from voice.The program, the language of identification The sound order information with the first position of the voiceprint of the first user and the first user that prestore simultaneously, therefore in the program, The voice command accuracy identified is higher.

In one possible implementation, if in above-mentioned steps 101, there was only voice command in received voice, do not wrap Noise is included, then judges whether the voiceprint of the voice command matches with the voiceprint of the first user prestored.If matching Corresponding function is executed according to the voice command, if mismatching, is not executed.

A specific example is given below, above-mentioned audio recognition method is specifically described.As shown in Fig. 2, being this Shen A kind of speech recognition application schematic diagram of a scenario that please be provide.

Wherein, speech recognition equipment such as can be used voice for acoustic control TV, voice-control toy, mobile phone etc. and be controlled The smart machine of system perhaps can also be the chip in any of the above-described smart machine or can also be that any of the above-described intelligence is set The functional module with speech identifying function in standby.It is illustrated below by acoustic control TV of the speech recognition equipment, First user has sent wake-up command to speech recognition equipment in first position, which is equivalent to above-mentioned second voice life It enables, which can be such as voice " booting ", and after acoustic control television reception is to the wake-up command, opening acoustic control TV is simultaneously According to the wake-up command, the information of the voiceprint of the first user and the first position of the first user is determined and stored.Acoustic control TV shows that the acoustic control TV later will be by first user's control, i.e. acoustic control after the voiceprint for storing the first user TV can execute corresponding operation according to the voice command of the first user received.And for other users, such as second User issue voice command, due to second user voiceprint and acoustic control TV storage the first user voiceprint not Symbol, therefore the voice command that second user issues is considered as noise by acoustic control TV.

Further, acoustic control TV can also be according to the information of the first position of the first user, by voice collecting strategy tune It is whole are as follows: acoustic control TV adjust the distance the closer position in first position voice collecting intensity it is stronger.

As an example, when the first user is moved to the second position from first position, second user A and second at this time The position of user B is as shown in Figure 2, at this time according to voice collecting strategy, due to the position of second user A, the first user Two positions, second user B positional distance first position from the near to the remote successively are as follows: the position of second user A, the first user Two positions, second user B position, therefore acoustic control TV is to the location of the first user, second user A, second user B Voice collecting intensity from big to small successively are as follows: the position of second user A, the second position of the first user, second user B position It sets.

When the first user is when the second position sends voice command " zapping " to acoustic control TV, second user A sends voice Order " increases volume ", and second user B sends voice command " reducing volume ", and there is also vehicle whistle sound, automobiles at this time The position of whistle sound is as shown in Figure 2.At this point, according to voice collecting strategy, due to positional distance first of vehicle whistle sound It sets far, therefore voice command " zapping ", voice command " increasing volume ", voice command is less than to the acquisition intensity of vehicle whistle sound " reducing volume ", therefore vehicle whistle sound substantially reduces the interference of voice command, acoustic control TV can clearly receive language Sound order " zapping ", voice command " increasing volume ", voice command " reducing volume ", finally, the voice that acoustic control television reception arrives It include: voice command " zapping ", voice command " increasing volume ", voice command " reducing volume " and relatively weak automobile ring Whistling, then acoustic control TV determines that voice command " changes from the voice received according to the voiceprint of the first user of storage Platform " is the voice command that the first user sends, and therefore, acoustic control TV carries out zapping according to the voice command " zapping ".

Based on above scheme, go to receive voice by the information of the first position of the first user prestored, then by prestoring The first user voiceprint, identification and the matched voice command of voiceprint of the first user are removed from received voice, And the corresponding function of the voice command is executed, therefore the accuracy of the speech recognition in noise circumstance can be improved.

Based on the same inventive concept, Fig. 3 illustratively shows a kind of speech recognition equipment provided by the present application, the device The process of audio recognition method can be executed.The device includes:

Voice receiving unit 301, the information for the first position according to the first user prestored receive voice, the voice The first voice command and noise including the first user, wherein noise includes sound, the external world that the human hair other than the first user goes out The sound (such as vehicle whistle sound, sound of the wind) of interference can be generated in environment to the first voice command.

Voice recognition unit 302 identifies first from the voice for the voiceprint according to the first user prestored Voice command.

In one possible implementation, above-mentioned apparatus can also include determination unit 303, for according to the prestored The information of the first position of one user determines voice collecting strategy, the voice collecting strategy are as follows: appointing within the scope of phonetic incepting Phonetic incepting intensity and the first distance of one position are inversely proportional, wherein first distance is between any position and first position Distance, phonetic incepting range include the first position of the first user.Above-mentioned voice receiving unit 301 is specifically used for, according to voice Acquisition strategies receive voice.

In one possible implementation, above-mentioned voice receiving unit 301 is also used to, and receives the second language of the first user Sound order.Above-mentioned apparatus can also include Application on Voiceprint Recognition unit 304, auditory localization unit 305 and storage unit 306, wherein sound Line recognition unit 304 is used to determine the voiceprint of the first user according to the second voice command.Auditory localization unit 305 is used for According to the second voice command, the information of the first position of the first user is determined.Storage unit 306 is used to store the sound of the first user Line information and/or the information of first position.

In one possible implementation, above-mentioned auditory localization unit 305, can be also used for, and be ordered according to the first voice It enables, determines the information of the corresponding position of the first voice command.Said memory cells 306 can be also used for, and be ordered according to the first voice The information for enabling corresponding position updates the information of first position.

In one possible implementation, above-mentioned voice recognition unit 302, is specifically used for, and is used according to first prestored The information of the first position of the voiceprint at family and the first user identifies first voice command from voice.

In one possible implementation, if there was only voice command in the received voice of above-mentioned voice receiving unit 301, It does not include noise, then above-mentioned voice recognition unit 302 judges the voiceprint of the voice command and the sound of the first user prestored Whether line information matches.

Concept relevant to technical solution provided by the present application involved in above-mentioned apparatus is explained and is described in detail and is other Step refers to the description as described in these contents in aforementioned voice recognition methods or other embodiments, is not repeated herein.

Based on design same as the previously described embodiments, the application also provides a kind of network equipment.

Fig. 4 is a kind of structural schematic diagram of the network equipment provided by the present application.As shown in figure 4, the network equipment 400 wraps It includes:

Memory 401, for storing program instruction；

Processor 402 executes aforementioned according to the program of acquisition for calling the program instruction stored in the memory One audio recognition method as described in the examples.

Based on design same as the previously described embodiments, the application also provides a kind of computer storage medium, the computer Readable storage medium storing program for executing is stored with computer executable instructions, and the computer executable instructions are for making computer execute aforementioned One audio recognition method as described in the examples.

It should be noted that be schematical, only a kind of logical function partition to the division of unit in the application, it is real There may be another division manner when border is realized.Each functional unit in this application can integrate in one processing unit, It is also possible to each unit to physically exist alone, can also be integrated in two or more units in a module.Above-mentioned collection At unit both can take the form of hardware realization, can also realize in the form of software functional units.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.Computer program product Including one or more computer instructions.When loading on computers and executing computer program instructions, all or part of real estate Raw process or function according to the application.Computer can be general purpose computer, special purpose computer, computer network or its His programmable device.Computer instruction may be stored in a computer readable storage medium, or computer-readable deposit from one Storage media is transmitted to another computer readable storage medium, for example, computer instruction can be from a web-site, calculating Machine, server or data center are (such as red by wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless Outside, wirelessly, microwave etc.) mode transmitted to another web-site, computer, server or data center.Computer can Reading storage medium can be any usable medium or include that one or more usable mediums are integrated that computer can access The data storage devices such as server, data center.Usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), Optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

It should be understood by those skilled in the art that, the application can provide as method, system or computer program product.Cause This, the shape of complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Formula.Moreover, the application, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.).

The application be referring to according to the present processes, equipment (system) and computer program product flow chart and/or Block diagram describes.It should be understood that each process that can be realized by computer program instructions in flowchart and/or the block diagram and/or The combination of process and/or box in box and flowchart and/or the block diagram.It can provide these computer program instructions to arrive General purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices processor to generate one Machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for realizing flowing The device for the function of being specified in journey figure one process or multiple processes and/or block diagrams one box or multiple boxes.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

Claims

1. a kind of audio recognition method characterized by comprising

Voice is received according to the information of the first position of the first user prestored, the voice includes the first voice of the first user Order and noise；

According to the voiceprint of first user prestored, first voice command is identified from the voice.

2. the method as described in claim 1, which is characterized in that the information of the first position for the first user that the basis prestores Receive voice, comprising:

According to the information of the first position of first user prestored, voice collecting strategy, the voice collecting strategy are determined Are as follows: the phonetic incepting intensity of any position within the scope of phonetic incepting is inversely proportional with first distance, and the first distance is institute The distance between the first position of any position Yu first user is stated, the phonetic incepting range includes first user First position；

According to the voice collecting strategy, voice is received.

3. method according to claim 1 or 2, which is characterized in that before the reception voice, further includes:

Receive the second voice command of first user；

According to second voice command, determine and store first user voiceprint and/or first user The information of first position.

4. method as claimed in claim 3, which is characterized in that the method also includes:

Determine the information of the corresponding position of first voice command；

According to the information of the corresponding position of first voice command, the information of the first position is updated.

5. method according to claim 1 or 2, which is characterized in that the vocal print for first user that the basis prestores is believed Breath, identifies first voice command from the voice, comprising:

According to the information of the voiceprint of first user prestored and the first position of first user, from the voice In identify first voice command.

6. a kind of speech recognition equipment characterized by comprising

Voice receiving unit, the information for the first position according to the first user prestored receive voice, and the voice includes The first voice command and noise of first user；

Voice recognition unit identifies described for the voiceprint according to first user prestored from the voice First voice command.

7. device as claimed in claim 6, which is characterized in that described device further includes determination unit, and the determination unit is used In determining voice collecting strategy, the voice collecting strategy according to the information of the first position of first user prestored Are as follows: the phonetic incepting intensity of any position within the scope of phonetic incepting is inversely proportional with first distance, and the first distance is institute The distance between the first position of any position Yu first user is stated, the phonetic incepting range includes first user First position；

The voice receiving unit, is specifically used for, and according to the voice collecting strategy, receives voice.

8. device as claimed in claims 6 or 7, which is characterized in that the voice receiving unit is also used to receive described first The second voice command of user；

Described device further includes Application on Voiceprint Recognition unit, for determining the sound of first user according to second voice command Line information；

Described device further includes auditory localization unit, for according to second voice command, determining the of first user The information of one position；

Described device further includes storage unit, for storing the first position of the voiceprint and/or first user Information.

9. device as claimed in claim 8, which is characterized in that the auditory localization unit is also used to, and determines first language The information of the corresponding position of sound order；

The storage unit, is also used to, and according to the information of the corresponding position of first voice command, updates the first position Information.

10. if method described in claim 6 or 7, which is characterized in that the voice recognition unit is prestored specifically for basis First user voiceprint and first user first position information, identified from the voice described First voice command.