CN110517682A - Audio recognition method, device, equipment and storage medium - Google Patents
Audio recognition method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110517682A CN110517682A CN201910822237.0A CN201910822237A CN110517682A CN 110517682 A CN110517682 A CN 110517682A CN 201910822237 A CN201910822237 A CN 201910822237A CN 110517682 A CN110517682 A CN 110517682A
- Authority
- CN
- China
- Prior art keywords
- frequency spectrum
- voice messaging
- voice
- processing
- wave beam
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000003860 storage Methods 0.000 title claims abstract description 16
- 238000001228 spectrum Methods 0.000 claims abstract description 133
- 238000012545 processing Methods 0.000 claims abstract description 79
- 230000015654 memory Effects 0.000 claims description 35
- 239000004568 cement Substances 0.000 claims description 31
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 11
- 230000004807 localization Effects 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- 230000002618 waking effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 241000209140 Triticum Species 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000005294 ferromagnetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000005291 magnetic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
This application provides a kind of audio recognition method, device, equipment and storage mediums, wherein the described method includes: the first voice messaging to acquisition carries out ADBF processing, obtains frequency spectrum at least two directions;In the frequency spectrum at least two direction, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction;The second voice messaging acquired in the target direction is obtained, and speech recognition is carried out to second voice messaging.By the application, the second voice messaging on accurate direction can be obtained, the accuracy rate of speech recognition is improved.
Description
Technical field
This application involves technical field of electronic equipment, relates to, but are not limited to a kind of audio recognition method, device, equipment and deposit
Storage media.
Background technique
Currently, for the electronic equipment with speech identifying function, when realizing speech identifying function, the signal of front end
Treatment process is usually that microphone signal collected is carried out echo cancellor and single channel noise reduction, the signal that obtains that treated,
And according to treated, signal wakes up electronic equipment, after electronic equipment is waken up, carries out speech recognition.
But audio recognition method in the related technology, pass through echo cancellor (Acoustic Echo
Cancellation, AEC) and single channel noise reduction (Noise suppression, NS) processing after signal in include its other party
To directional interference noise, and be easy to appear when interfering larger or live auditory localization inaccuracy ask
Topic, so that the accuracy rate of the speech recognition after will lead to reduces.
Summary of the invention
The embodiment of the present application provides a kind of audio recognition method, device, equipment and storage medium, and sound source can be accurately positioned
Direction, to improve the accuracy rate of speech recognition.
The technical solution of the embodiment of the present application is achieved in that
The embodiment of the present application provides a kind of audio recognition method, comprising:
ADBF processing is carried out to the first voice messaging of acquisition, obtains frequency spectrum at least two directions;
In the frequency spectrum at least two direction, spectrum signature is met to side corresponding to the frequency spectrum of preset condition
To being determined as target direction;
The second voice messaging acquired in the target direction is obtained, and voice knowledge is carried out to second voice messaging
Not.
The embodiment of the present application provides a kind of speech recognition equipment, comprising:
First processing module is obtained for carrying out ADBF processing to the first voice messaging of acquisition at least two directions
On frequency spectrum;
Determining module, in the frequency spectrum at least two direction, spectrum signature to be met to the frequency of preset condition
The corresponding direction of spectrum, is determined as target direction;
Second processing module, for obtaining the second voice messaging in target direction acquisition, and to second language
Message breath carries out speech recognition.
The embodiment of the present application provides a kind of speech recognition apparatus, comprising:
Memory, for storing executable instruction;
Processor when for executing the executable instruction stored in the memory, realizes above-mentioned method.
The embodiment of the present application provides a kind of storage medium, is stored with executable instruction, real when for causing processor to execute
Existing above-mentioned method.
The embodiment of the present application has the advantages that
ADBF processing is carried out to the first voice messaging of acquisition, obtains frequency spectrum at least two directions;And described
In frequency spectrum at least two directions, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target side
To, in such manner, it is possible to the accurately direction of localization of sound source, so that accurate direction can be obtained in subsequent speech recognition process
On the second voice messaging, improve the accuracy rate of speech recognition.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram of audio recognition method in the related technology;
Fig. 2A is an optional configuration diagram of speech recognition system provided by the embodiments of the present application;
Fig. 2 B is the structural schematic diagram of server provided by the embodiments of the present application;
Fig. 3 A is an optional flow diagram of audio recognition method provided by the embodiments of the present application;
Fig. 3 B is an optional schematic diagram of a scenario of audio recognition method provided by the embodiments of the present application;
Fig. 3 C is an optional schematic diagram of a scenario of audio recognition method provided by the embodiments of the present application;
Fig. 3 D is an optional schematic diagram of a scenario of audio recognition method provided by the embodiments of the present application;
Fig. 4 is an optional flow diagram of audio recognition method provided by the embodiments of the present application;
Fig. 5 is an optional flow diagram of audio recognition method provided by the embodiments of the present application;
Fig. 6 A is an optional flow diagram of audio recognition method provided by the embodiments of the present application;
Fig. 6 B is the realization schematic diagram in determining user direction provided by the embodiments of the present application;
Fig. 7 is an optional flow diagram of audio recognition method provided by the embodiments of the present application;
Fig. 8 is an optional flow diagram of audio recognition method provided by the embodiments of the present application;
Fig. 9 A is the beam position figure in 0 ° of direction of the embodiment of the present application;
Fig. 9 B is the beam position figure in 90 ° of directions of the embodiment of the present application;
Fig. 9 C is the beam position figure in 180 ° of directions of the embodiment of the present application;
Figure 10 is delivered to the wake-up word spectrum diagram of the wakeup unit of electronic equipment.
Specific embodiment
In order to keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application make into
It is described in detail to one step, described embodiment is not construed as the limitation to the application, and those of ordinary skill in the art are not having
All other embodiment obtained under the premise of creative work is made, shall fall in the protection scope of this application.
In the following description, it is related to " some embodiments ", which depict the subsets of all possible embodiments, but can
To understand, " some embodiments " can be the same subsets or different subsets of all possible embodiments, and can not conflict
In the case where be combined with each other.
Unless otherwise defined, all technical and scientific terms used herein and the technical field for belonging to the application
The normally understood meaning of technical staff is identical.Term used herein is intended merely to the purpose of description the embodiment of the present application,
It is not intended to limit the application.
Before explanation is further expalined in the audio recognition method to the embodiment of the present application, first in the related technology
Audio recognition method be illustrated.
Fig. 1 is the implementation process schematic diagram of audio recognition method in the related technology, as shown in Figure 1, the method includes
Following steps:
Step S101 acquires voice signal by microphone.
Step S102 carries out AEC processing to the voice signal, obtains AEC treated voice signal.
Step S103 carries out NS processing to the AEC treated voice signal, obtains NS treated voice signal.
The wake-up module of the NS treated voice signal issues electronic equipment is carried out electronic equipment by step S104
Wake-up processing.
Step S105, judges whether electronic equipment is waken up.
If it is judged that be it is yes, then follow the steps S106, if it is judged that be it is no, then terminate process.
Step S106, when the electronic equipment is waken up, the electronic equipment opens speech identifying function, to acquisition
Voice messaging carries out speech recognition.
But the above method in the related technology, the prior art has at least the following problems:
1) due to before electronic equipment wakes up, sound source be it is unknown, the voice signal of acquisition is only carried out in the related technology
Therefore, in the NS treated voice signal AEC processing and NS processing further include the directional interference noise in other directions,
It will lead to issue in this way and also remain higher interference signal in the voice signal of the wake-up module of electronic equipment, thus cannot
It is effective to wake up electronic equipment, cause wake-up rate lower.
2) also need to carry out the angle estimation of sound source at the wake-up moment of electronic equipment, then, for dual microphone
Electronic equipment, it is wrong to be easy for occurring the even auditory localization of auditory localization inaccuracy when interfering larger or live
Accidentally the problem of, the process after causing cannot not only enhance voice, or even can damage voice, serious to reduce speech recognition effect,
Reduce the accuracy rate of speech recognition.
In order to accurately carry out speech recognition, the embodiment of the present application provide a kind of audio recognition method, device, equipment and
Storage medium, can be accurately positioned the direction of sound source, to improve the accuracy rate of speech recognition.
Illustrate the exemplary application of speech recognition apparatus provided by the embodiments of the present application below, it is provided by the embodiments of the present application
Speech recognition apparatus may be embodied as server.In the following, by exemplary application when illustrating that equipment is embodied as server.
A referring to fig. 2, Fig. 2A are an optional framework signal of speech recognition system 10 provided by the embodiments of the present application
Figure, to realize that the voice messaging to user carries out speech recognition, terminal 100 (illustrates terminal 100-1 and terminal 100-
2) server 300 is connected by network 200, network 200 can be wide area network or local area network, or be combination.
Terminal 100 is shown on current interface 110 (illustrating current interface 110-1 and current interface 110-2)
Using the interface of (Application, APP), for example, the APP can be the APP with speech voice input function.Wherein, terminal
100-1 and terminal 100-2 can be the corresponding text information of identified voice messaging in current interface.The embodiment of the present application
In, server 300 obtains terminal 100-1 or terminal 100-2 the first voice messaging collected by network 200, to acquisition
First voice messaging carries out AD BF processing, obtains frequency spectrum at least two directions;Frequency at least two direction
In spectrum, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction;Then terminal acquisition is obtained
The second voice messaging, speech recognition is carried out to the voice messaging, and recognition result is fed back into terminal.It needs to illustrate
It is that voice messaging shown in Fig. 2A includes first voice messaging and second voice messaging.
Fig. 2 B is the structural schematic diagram of server 300 provided by the embodiments of the present application, as shown in Figure 2 B, the server
300 include: at least one processor 210, memory 250, at least one network interface 220 and user interface 230.Server
Various components in 300 are coupled by bus system 240.It is understood that bus system 240 for realizing these components it
Between connection communication.Bus system 240 further includes power bus, control bus and status signal in addition to including data/address bus
Bus.But for the sake of clear explanation, various buses are all designated as bus system 240 in fig. 2b.
Processor 210 can be a kind of IC chip, the processing capacity with signal, such as general processor, number
Word signal processor (Digital Signal Processor, DSP) either other programmable logic device, discrete gate or
Transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any conventional processing
Device etc..
User interface 230 include make it possible to present one or more output devices 231 of media content, including one or
Multiple loudspeakers and/or one or more visual display screens.User interface 230 further includes one or more input units 232, packet
Include the user interface component for facilitating user's input, for example keyboard, mouse, microphone, touch screen display screen, camera, other are defeated
Enter button and control.
Memory 250 can be it is removable, it is non-removable or combinations thereof.Illustrative hardware device includes that solid-state is deposited
Reservoir, hard disk drive, CD drive etc..Memory 250 optionally includes one geographically far from processor 210
A or multiple storage equipment.Memory 250 includes volatile memory or nonvolatile memory, may also comprise volatibility and non-
Both volatile memory.Nonvolatile memory can be read-only memory (Read Only Memory, ROM), volatibility
Memory can be random access memory (Random Access Memory, RAM).The memory of the embodiment of the present application description
250 are intended to include the memory of any suitable type.In some embodiments, memory 250 can storing data it is each to support
Kind of operation, the example of these data includes program, module and data structure or its subset or superset, below exemplary illustration.
Operating system 251, including for handle various basic system services and execute hardware dependent tasks system program,
Such as ccf layer, core library layer, driving layer etc., for realizing various basic businesses and the hardware based task of processing;
Network communication module 252, for reaching other calculating via one or more (wired or wireless) network interfaces 220
Equipment, illustrative network interface 220 include: bluetooth, Wireless Fidelity (WiFi) and universal serial bus
(Universal Serial Bus, USB) etc.;
Input processing module 253, for one to one or more from one of one or more input units 232 or
Multiple user's inputs or interaction detect and translate input or interaction detected.
In some embodiments, device provided by the embodiments of the present application can realize that Fig. 2 shows deposit using software mode
The speech recognition equipment 254 in memory 250 is stored up, which can be the soft of the forms such as program and plug-in unit
Part, including following software module: first processing module 2541, determining module 2542 and Second processing module 2543, these modules
It is that in logic, therefore can be combined arbitrarily according to the function of being realized or further split.It will be described hereinafter
The function of modules.
In further embodiments, device provided by the embodiments of the present application can be realized using hardware mode, as an example,
Device provided by the embodiments of the present application can be the processor using hardware decoding processor form, be programmed to perform this Shen
Please embodiment provide information recommendation method, for example, the processor of hardware decoding processor form can using one or more
Application specific integrated circuit (Application Specific Integ rated Circuit, ASIC), DSP, it may be programmed and patrol
Collect device (Programmable Logic Device, PLD), Complex Programmable Logic Devices (Complex Programmable
Logic Device, CPLD), field programmable gate array (Field-Programmable Gate Array, FPGA) or other
Electronic component.
Below in conjunction with the exemplary application and implementation of speech recognition apparatus provided by the embodiments of the present application, illustrate the application
The audio recognition method that embodiment provides.
Referring to Fig. 3 A, Fig. 3 A is an optional flow diagram of audio recognition method provided by the embodiments of the present application,
The step of showing in conjunction with Fig. 3 A, is illustrated.
Step S301, server carry out Adaptive beamformer (Adaptive to the first voice messaging of acquisition
Beamforming, ADBF) processing, obtain frequency spectrum at least two directions.
Here, the acquisition of the first voice messaging, the voice collecting list are carried out by the voice collecting unit of electronic equipment
Member can be the microphone on the electronic equipment.In the embodiment of the present application, the electronic equipment be can be with more wheats
The electronic equipment of gram wind, such as can have dual microphone, it is realized by dual microphone and the accurate and effective of voice is acquired.
The ADBF processing refers to using the priori data information obtained, according to adaptive algorithm and criterion, changes weighting
Coefficient, to achieve the purpose that retain desired signal, filter out interference.In the embodiment of the present application, the first acquired voice messaging
It can be and acquire in the form of a sound wave, therefore can be handled by the ADBF and sound wave is handled, retain the phase in sound wave
It hopes signal, and filters out the noise of interference.
It should be noted that the embodiment of the present application is to carry out ADBF to first voice messaging at least two directions
Processing, the direction can be determined by the angle between the voice collecting unit of the electronic equipment, for example, with described
Angle between electronic equipment can be 0 °, 90 °, 180 ° etc., and accordingly, the direction can then be expressed as 0 ° of direction, 90 ° of sides
To, 180 ° of directions etc..
In ADBF treatment process, the ADBF carried out simultaneously in multiple directions to first voice messaging is handled, and
Corresponding each party is upward, obtains a frequency spectrum, the frequency spectrum and the direction are one-to-one relationship.The frequency spectrum is used for
The wake-up word for waking up electronic equipment is carried, and then electronic equipment is waken up by the wake-up word, so that electronic equipment is in work shape
State.
For example, user says " please be switched on " to electronic equipment on 90 ° of directions, then the voice messaging of user is set by electronics
Standby voice collecting unit collects, and respectively on 0 ° of direction, 90 ° of directions, 180 ° of three, directions direction to voice messaging into
Row ADBF processing obtains 0 ° of direction, 90 ° of directions, frequency spectrum on the direction of 180 ° of three, directions, wherein the frequency spectrum on these three directions
In carry request electronic equipment booting wake-up word.
Spectrum signature is met the frequency spectrum institute of preset condition in the frequency spectrum at least two direction by step S302
Corresponding direction, is determined as target direction.
Here, the target direction is and the immediate direction of the actual direction of user.In the embodiment of the present application, it can incite somebody to action
The corresponding direction of frequency spectrum that spectrum signature meets preset condition is determined as the target direction.
The spectrum signature is the corresponding attribute information of the frequency spectrum, and the attribute information can react the matter of the frequency spectrum
Measure parameter corresponding with the frequency spectrum.
Step S303, obtains the second voice messaging for acquiring in the target direction, and to second voice messaging into
Row speech recognition.
Here, after determining the target direction, then show that the user therefore is worked as on the target direction
When carrying out speech recognition to the second voice messaging after first voice messaging of user, then the target direction can be acquired
On the second voice messaging, in this way, due to during voice collecting, the second voice messaging of user is actually required normal
Voice, and other voices other than the voice of user then can be regarded as noise, therefore, only acquire on target direction
The second voice messaging, can only acquisition normal voice, accurately obtain user voice messaging, avoid obtaining on other directions
Noise.
In the embodiment of the present application, carrying out speech recognition to second voice messaging be can be using the knowledge of any one voice
Other mode identifies second voice messaging, is that electronic equipment can by the vocabulary Content Transformation in the second voice messaging
The input content of reading, such as key, binary coding perhaps character string or are converted to the voice messaging identified readable
Text information and show output.
It should be noted that the sound source of second voice messaging can be identical with the sound source of first voice messaging,
It can also be different.For example, the embodiment of the present application can be applied to following scene:
Scene one, as shown in Figure 3B, user 31 issue the first voice messaging in 90 ° of directions of electronic equipment 30, then service
Device carries out ADBF processing to the first voice messaging, obtains frequency spectrum at least two directions;And at least two direction
On frequency spectrum in, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction, that is, determine and use
The immediate direction in family 31 is 90 ° of directions, and continues the second voice messaging of acquisition in 90 ° of directions by electronic equipment 30, this
Two voice messagings are also what user 31 issued, and need to carry out speech recognition to the voice messaging of user 31 at this time, therefore, to this
Second voice messaging carries out speech recognition.
Scene two, as shown in Figure 3 C, user 31 issue the first voice messaging in 90 ° of directions of electronic equipment 30, then service
Device carries out ADBF processing to the first voice messaging, obtains frequency spectrum at least two directions;And at least two direction
On frequency spectrum in, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction, that is, determine and use
The immediate direction in family 31 is 90 ° of directions, and continues the second voice messaging of acquisition in 90 ° of directions by electronic equipment 30, this
Two voice messagings are that user 32 issues, that is to say, that user 31 and user 32 are co-located, pass through the first of user 31
Voice messaging carries out the wake-up of electronic equipment, and needs to carry out speech recognition to the voice messaging of user 32 at this time, therefore, to this
Second voice messaging carries out speech recognition.
Scene three, please continue to refer to Fig. 3 C, user 31 issues the first voice messaging in 90 ° of directions of electronic equipment 30, then
Server carries out ADBF processing to the first voice messaging, obtains frequency spectrum at least two directions;And described at least two
In frequency spectrum on direction, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction, that is, is determined
It is 90 ° of directions with the immediate direction of user 31, and continues to acquire the second voice messaging in 90 ° of directions by electronic equipment, it should
Second voice messaging is that user 32 issues, that is to say, that user 31 and user 32 are co-located, pass through the of user 31
One voice messaging carries out the wake-up of electronic equipment, but needs to carry out speech recognition to the voice messaging of user 31 at this time, therefore,
The method of the embodiment of the present application can also include the steps that voice messaging judges, judge tone color and the institute of second voice messaging
Whether the tone color for stating the first voice messaging is identical, if identical, shows that user 32 and user 31 are the same persons, then can be right
Second voice messaging of acquisition directly carries out speech recognition, if it is different, then show that user 32 and user 31 are not the same persons,
It then needs to resurvey voice messaging, until collecting the second voice messaging of user 31.Alternatively, in other embodiments
In, when collected voice messaging is the voice messaging of user 32, and do not need to carry out language to the voice messaging of user 32 at this time
Sound identification, then can save the voice messaging of user 32.
Scene four, please continue to refer to Fig. 3 C, user 31 issues the first voice messaging in 90 ° of directions of electronic equipment 30, then
Server carries out ADBF processing to the first voice messaging, obtains frequency spectrum at least two directions;And described at least two
In frequency spectrum on direction, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction, that is, is determined
It is 90 ° of directions with the immediate direction of user 31, and continues to acquire the second voice messaging in 90 ° of directions by electronic equipment, it should
Second voice messaging is that user 32 issues, that is to say, that user 31 and user 32 are co-located, pass through the of user 31
One voice messaging carries out the wake-up of electronic equipment, and at this time and regulation useless has to which voice messaging to carry out voice knowledge to
, therefore, the voice messaging of user 31 to be received can not waited to identify, can also using the second voice messaging of user 32 as
First voice messaging realizes the secondary wake-up to electronic equipment, to realize that the use current to electronic equipment of user 32 needs
It asks.
Scene five, as shown in Figure 3D, user 31 issue the first voice messaging in 90 ° of directions of electronic equipment 30, then service
Device carries out ADBF processing to the first voice messaging, obtains frequency spectrum at least two directions;And at least two direction
On frequency spectrum in, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, is determined as target direction, that is, determine and use
The immediate direction in family 31 is 90 ° of directions, and continues to acquire the second voice messaging in 90 ° of directions by electronic equipment 30.If
Electronic equipment 30 does not collect the second voice messaging in 90 ° of directions within a preset time, then can acquire on other directions
Voice messaging therefore, can continue according to user for example, it may be collecting the voice messaging of user 32 on 60 ° of directions
32 voice messaging determines that the direction of user 32 is target direction, and further acquires the second voice messaging of user 32, goes forward side by side
Row speech recognition.
Audio recognition method provided by the embodiments of the present application carries out ADBF processing to the first voice messaging of acquisition, obtains
Frequency spectrum at least two directions;And in the frequency spectrum at least two direction, spectrum signature is met into preset condition
Frequency spectrum corresponding to direction, be determined as target direction, in such manner, it is possible to accurately localization of sound source direction, thus subsequent
In speech recognition process, the second voice messaging on accurate direction can be obtained, the accuracy rate of speech recognition is improved.
Fig. 4 is an optional flow diagram of audio recognition method provided by the embodiments of the present application, as shown in figure 4,
It the described method comprises the following steps:
Step S401, server determine the keyword that first voice messaging is included.
Here, it determines that keyword that first voice messaging is included can be to solve first voice messaging
Analysis, determines the keyword in first voice messaging, obtains for example, can carry out text identification to first voice messaging
Text information, and word segmentation processing is carried out to the text information, at least one word is obtained, then, according to the word of each word
Property, the word for meeting default part of speech condition is determined as the keyword.Alternatively, can also be using artificial intelligence technology to described
First voice messaging is parsed, and determines the keyword for including in first voice messaging.
It for example, then can be with when first voice messaging is that user says " please play music " against electronic equipment
Determine that keyword is " music ", it is therefore desirable to start application relevant to music on electronic equipment.
Step S402 carries out the ADBF processing to first voice messaging at least two directions, obtain with often
The corresponding wave beam including the keyword in one direction.
Here, first voice messaging is a sound wave, when getting the sound wave, to the sound from least two directions
Wave carries out ADBF processing, simultaneously as first voice messaging includes the keyword, therefore carries out at ADBF to the sound wave
Obtained wave beam also includes the keyword after reason.
Step S403 determines the wake-up on corresponding direction according to the corresponding wave beam including the keyword in each direction
Word frequency spectrum.
Here, when obtaining the corresponding wave beam in each direction, the distribution situation of the frequency of the wave beam can be determined,
The curve of frequency distribution of the wave beam is obtained, i.e., the described frequency spectrum.Also, due to including the keyword in the wave beam,
The keyword can also be carried on the frequency spectrum, the keyword is for waking up electronic equipment, then institute's shape
At the frequency spectrum including the keyword be wake-up word frequency spectrum.
Spectrum signature is met preset condition in the wake-up word frequency spectrum at least two direction by step S404
The corresponding direction of word frequency spectrum is waken up, target direction is determined as.
Here, the target direction is and the immediate direction of the actual direction of user.In the embodiment of the present application, it can incite somebody to action
The wake-up word frequency that spectrum signature meets preset condition composes corresponding direction and is determined as the target direction.
The spectrum signature is that the wake-up word frequency composes corresponding attribute information, and the attribute information can react described and call out
The quality and the wake-up word frequency of word frequency of waking up spectrum compose corresponding parameter.
Step S405 obtains the second voice messaging acquired in the target direction, and carries out language to the voice messaging
Sound identification.
Audio recognition method provided by the embodiments of the present application carries out ADBF processing to the first voice messaging of acquisition, obtains
Wake-up word frequency spectrum at least two directions;And in the wake-up word frequency spectrum at least two direction, by spectrum signature
Meet the corresponding direction of the wake-up word frequency spectrum of preset condition, is determined as target direction, in such manner, it is possible to accurate localization of sound source
Speech recognition is improved so that the second voice messaging on accurate direction can be obtained in subsequent speech recognition process in direction
Accuracy rate.
Fig. 5 is an optional flow diagram of audio recognition method provided by the embodiments of the present application, as shown in figure 5,
It the described method comprises the following steps:
Step S501, server determine the keyword that first voice messaging is included.
Step S502 carries out the ADBF processing to first voice messaging at least two directions, obtain with often
The corresponding wave beam including the keyword in one direction.
It should be noted that step S501 to step S502 is identical to step S402 as above-mentioned steps S401, the application is real
Example is applied to repeat no more.
Step S503, wave beam including the keyword corresponding to each direction carry out speech enhan-cement processing, obtain pair
Answer the speech enhan-cement wave beam in direction.
Here, the speech enhan-cement processing is in order to carry out signal enhancing to the wave beam, so that speech enhan-cement wave beam
Keyword can accurately be identified.
In the embodiment of the present application, speech enhan-cement processing can be carried out by following two mode:
Mode one, the wave beam corresponding to each direction carry out single-channel voice enhancing processing, obtain corresponding direction
Speech enhan-cement wave beam.
Here, the single-channel voice enhancing processing is for carrying out the speech enhan-cement on corresponding direction, institute to the wave beam
The signal strength of obtained speech enhan-cement wave beam is higher than the signal strength of the original wave beam, subsequent so as to be convenient for
The frequency spectrum of speech enhan-cement wave beam is accurately confirmed.
Mode two, the noise in the wave beam corresponding to each direction are eliminated, and the voice for obtaining corresponding direction increases
High-amplitude wave beam.
Here, by eliminating to the noise in the wave beam, reduce influence of the noise to efficient voice, due to noise
Reduce, accordingly, the intensity of efficient voice signal is opposite to be enhanced, therefore can also reach to corresponding to the first voice messaging
The effect that wave beam is enhanced, to obtain the speech enhan-cement wave beam.
Step S504, by the upward speech enhan-cement wave beam of each party, the wake-up word frequency being determined as on corresponding direction is composed.
It here, include wake-up word in the wake-up word frequency spectrum, the wake-up word can be the keyword, the wake-up
Word is for waking up electronic equipment.
Spectrum signature is met preset condition in the wake-up word frequency spectrum at least two direction by step S505
The corresponding direction of word frequency spectrum is waken up, target direction is determined as.
Step S506 obtains the second voice messaging acquired in the target direction, and carries out language to the voice messaging
Sound identification.
It should be noted that step S505 to step S506 is identical to step S405 as above-mentioned steps S404, the application is real
Example is applied to repeat no more.
In some embodiments, the spectrum signature includes signal-to-noise ratio or wake-up rate;Accordingly, above-mentioned steps S302 can be with
It is realized by following two mode:
The first, in the frequency spectrum at least two direction, by side corresponding to the frequency spectrum with highest signal to noise ratio
To being determined as the target direction.
Second, by direction corresponding to the frequency spectrum with highest wake-up rate, it is determined as the target direction.
Fig. 6 A is an optional flow diagram of audio recognition method provided by the embodiments of the present application, such as Fig. 6 A institute
Show, the described method comprises the following steps:
Step S601 acquires the first voice messaging by the voice collecting unit on electronic equipment, to first voice
Information carries out echo cancellor.
Here, the electronic equipment can have two voice collecting units, for example, the electronic equipment can for
The intelligent sound box of two microphones.The first voice messaging of user is acquired by two microphones on intelligent sound box.
In the embodiment of the present application, after collecting first voice messaging, first voice messaging is returned
Sound is eliminated, to remove the echo in first voice messaging.
In other embodiments, in the first voice messaging of microphone acquisition user, the electronic equipment can be
Audio is played, therefore the microphone can also acquire some back production signals, the back production information is the intelligent sound box inner part
Voice signal.It when intelligent sound box collects the back production signal, needs to carry out voice elimination to the back production information, to eliminate
The back production signal avoids influence of the back production signal to the first voice messaging of user.
Step S602 carries out ADBF processing to the first voice messaging after echo cancellor at least two direction,
Obtain the frequency spectrum at least two direction.
Here, a kind of method determining the direction is provided first, by taking microphone there are two above-mentioned intelligent sound box tools as an example,
The determination in the direction of the embodiment of the present application is illustrated.
As shown in Figure 6B, there is the first microphone 601 and second microphone 602, the first microphone 601 on intelligent sound box 60
With the midpoint 603 of second microphone 602, when user speaks against intelligent sound box 60, the position of user 61 as shown,
Line segment between the position of user 61 and midpoint 603 is determined as the first line, by the first microphone 601 and the second Mike
Line segment between wind 602 is determined as the second line, then, the direction of user is the angle between the first line and the second line
62.I.e. user 61 is located on 60 angle of intelligent sound box, 62 direction.
The method of clamp angular direction really is provided based on Fig. 6 B, in the embodiment of the present application, when the first language for getting user
After message breath, ADBF can be carried out to the first voice messaging after echo cancellor on arbitrary at least two angle direction
Processing, obtains frequency spectrum corresponding with each angle direction.
Step S603 is obtained and is waken up word included in the frequency spectrum.
Step S604 wakes up electronic equipment by the wake-up word, to realize that the voice is known by the electronic equipment
Not.
For example, the wake-up word can be the identification information being pre-stored in intelligent sound box, when getting the frequency spectrum
In when including wake-up word corresponding with the identification information of intelligent sound box, then wake up the intelligent sound box.
Spectrum signature is met the frequency spectrum institute of preset condition in the frequency spectrum at least two direction by step S605
Corresponding direction, is determined as target direction.
Step S606 obtains the second voice messaging acquired in the target direction, and carries out language to the voice messaging
Sound identification.
It should be noted that step S605 to step S606 is identical to step S303 as above-mentioned steps S302, the application is real
Example is applied to repeat no more.
Fig. 7 is an optional flow diagram of audio recognition method provided by the embodiments of the present application, as shown in fig. 7,
It the described method comprises the following steps:
Step S701, server carry out ADBF processing to the first voice messaging of acquisition, obtain at least two directions
Frequency spectrum.
Spectrum signature is met the frequency spectrum institute of preset condition in the frequency spectrum at least two direction by step S702
Corresponding direction, is determined as target direction.
It should be noted that step S701 to step S702 is identical to step S302 as above-mentioned steps S301, the application is real
Example is applied to repeat no more.
Step S703 obtains the voice collecting direction prestored on electronic equipment.
Here, preset voice collecting direction, the preset voice are stored in storage unit on an electronic device
Acquisition direction can be the direction obtained in historical time section for activating the voice messaging of electronic equipment.
Step S704 stores the target direction when the target direction and the voice collecting direction difference.
Here, due to the direction that identified target direction is with the actual direction of user relatively, i.e., the described target
Direction is the direction with Sounnd source direction relatively.Therefore, after determining the target direction, by the target direction with
The voice collecting direction prestored on electronic equipment is compared, if the target direction is identical as the voice collecting direction,
The voice collecting direction then can be directly used as the target direction, carry out subsequent processing.If the target side
To different from the voice collecting direction, then show sound source on other directions different from the voice collecting direction prestored, because
This, can also store the target direction in the storage unit of the electronic equipment, to realize in subsequent speech processes mistake
The direction that can be acquired target direction is changed as voice messaging in journey.
Step S705 obtains the second voice messaging acquired in the target direction, and carries out language to the voice messaging
Sound identification.
Audio recognition method provided by the embodiments of the present application is deposited the target direction as the voice collecting direction prestored
It stores up in the storage unit of electronic equipment, in this way, during subsequent speech recognition and speech processes, when Sounnd source direction and institute
When the voice collecting direction of storage is consistent, the voice collecting direction that prestores can be directly used to carry out the acquisition of voice messaging simultaneously
Identification, so, it is possible to guarantee that subsequent voice collecting direction is consistent with history voice collecting direction, to realize to same direction
On voice messaging be acquired.
In the following, will illustrate exemplary application of the embodiment of the present application in an actual application scenarios.
The embodiment of the present application provides a kind of audio recognition method, can greatly improve the wake-up under very noisy and strong reverberation
Rate, discrimination.
Fig. 8 is an optional flow diagram of audio recognition method provided by the embodiments of the present application, as shown in figure 8,
It the described method comprises the following steps:
Step S801 acquires the first voice messaging of user by voice collecting unit.
Step S802 is handled by AEC and is carried out echo cancellor to first voice messaging.
Step S803 carries out ADBF processing to first voice messaging on 0 °, 90 °, 180 ° of directions respectively, obtains
0 °, 90 °, the wave beam on 180 ° of these three different directions.
As shown in figure 8, respectively illustrate step S803a carries out ADBF processing on 0 ° of direction, step S803b is 90 ° of sides
ADBF processing is carried out upwards, and step S803c carries out ADBF processing on 180 ° of directions.
Step S804 carries out NS processing to the wave beam on different directions, obtains the speech enhan-cement wave beam of corresponding direction.
Here, as shown in figure 8, step S804 includes step S804a, step S804b and step S804c, respectively to 0 ° of side
Upward wave beam carries out NS processing, the wave beam progress NS processing on 90 ° of directions, the wave beam progress NS processing on 180 ° of directions.
Step S805 is carried out keyword identification to each speech enhan-cement wave beam, is determined keyword using KWS technology.
Here, as shown in figure 8, step S805 includes step S805a, step S805b and step S805c, respectively to 0 ° of side
Upward speech enhan-cement wave beam carries out keyword identification, the speech enhan-cement wave beam on 90 ° of directions carries out keyword identification, 180 ° of sides
Upward speech enhan-cement wave beam carries out keyword identification.
Step S806 wakes up electronic equipment according to the keyword, and judges whether effectively to wake up.
Here, if it is judged that be it is yes, then follow the steps S807;If it is judged that be it is no, then terminate process.
Step S807 is carried out Mutual coupling to the speech enhan-cement wave beam, is utilized array information using DOA technology
Estimate the direction of sound source.
Step S808 carries out Adaptive beamformer to the speech enhan-cement wave beam using ADBF technology, according to real-time number
According to design weighting coefficient, the wave beam for having directive property is formed.
Step S809 carries out ASR speech recognition to the wave beam with directive property is formed by.
The audio recognition method of the embodiment of the present application can be applied to tool, and there are two the electronic equipment of microphone, the two wheats
Gram wind acquires the voice messaging of user simultaneously, obtains two voice signals.
In the embodiment of the present application, 0 °, 90 ° and 180 ° direction is done respectively to two voice signals of two-way microphone acquisition
Adaptive beam design, respectively obtains three wave beams, then carries out single-channel voice enhancing to three obtained wave beam.Wherein,
The directive property of three wave beams shows the direction figure of wave beam 901 as shown in Fig. 9 A to Fig. 9 C in Fig. 9 A to Fig. 9 C, Fig. 9 A indicates 0 °
The beam position figure in direction, Fig. 9 B indicate that the beam position figure in 90 ° of directions, Fig. 9 C indicate the beam position figure in 180 ° of directions.Three
A wave beam can effectively inhibit the noise outside wave beam, as shown in Fig. 9 A to Fig. 9 C, because the wave beam of two-way microphone is wider, diagonally
The degrees of tolerance for spending mistake is more slightly higher, and speaker is that will not believe voice in a certain range for deviateing wave beam principal direction
It number causes obviously to damage.
As an example it is assumed that speaker direction, in 160 ° of directions, interference noise is delivered to electronics and sets in 60 ° of directions, Figure 10
The wake-up word frequency of standby wakeup unit is composed, and wherein a figure is clean voice signal, and b figure is the voice signal of superposition interference, c figure
It is the voice after single channel noise reduction, d figure, e figure and f figure are the wave beam output frequency in 0 °, 90 ° and 180 ° three directions respectively
Spectrum.Since noise type is voice interference, the single channel noise reduction in c figure does not almost work;And since interference noise comes from
60 ° of directions, then what d figure, e figure retained is then more the signal of noise;F figure retains more targeted voice signals, the letter of f figure
It makes an uproar than highest, then, wake-up rate naturally also highest.
In an experimental result, a figure, b figure, c figure, d figure, e figure and f figure wake-up score be 0.98 respectively, 0.32,
0.33,0.10,0.09,0.9, it can be seen that, the method for the embodiment of the present application has higher wake-up rate.
In other embodiments, please continue to refer to Figure 10, treated on above-mentioned 3 tunnel, and signal is sent to calling out for electronic equipment in real time
Awake unit, to detect whether being to wake up word, if electronic equipment wakes up, the DOA unit of electronic equipment utilizes the wake-up word cached
Voice carries out speaker's angle estimation.Usual two-way microphone will appear biggish angle estimation under high reverberation or very noisy
Error even estimates mistake, and first divides angular range using information is waken up, and angle estimation is carried out in the angular range of delimitation,
The accuracy of speaker's angle estimation can then be greatly improved.
As shown in Figure 10, since the wake-up highest scoring for scheming f estimates angle when estimating speaker's angle using DOA technology
It counts range to be arranged in the range of more inclined 180 °, for example 120 ° to 180 ° can be taken, it is clear that will be greatly reduced the estimation of angle
Mistake.
In the embodiment of the present application, after electronic equipment is waken up, the voice of speaker passes through dereverberation, Wave beam forming, list
It after channel speech enhances the processing such as dereverberation, is sent to ASR unit and is identified, to complete the interactive voice with equipment.
Audio recognition method provided by the embodiments of the present application is waken up using multi-beam by two-way microphone, be can be improved
Wake-up rate, and using the estimation accuracy for waking up information raising angle, reduce the speech damage of beam angle deviation noise,
Voice signal-to-noise ratio is improved, to further increase the accuracy rate of speech recognition.
Continue with explanation speech recognition equipment 254 provided by the embodiments of the present application is embodied as the exemplary of software module
Structure, in some embodiments, as shown in Figure 2 B, the software module being stored in the speech recognition equipment 254 of memory 250 can
To include: first processing module 2541, determining module 2542 and Second processing module 2543.
First processing module 2541 is obtained for carrying out ADBF processing to the first voice messaging of acquisition at least two
Frequency spectrum on direction;
Determining module 2542, in the frequency spectrum at least two direction, spectrum signature to be met preset condition
Frequency spectrum corresponding to direction, be determined as target direction;
Second processing module 2543, for obtaining the second voice messaging in target direction acquisition, and to described the
Two voice messagings carry out speech recognition.
In some embodiments, the frequency spectrum is to wake up word frequency spectrum;Accordingly, the first processing module is also used to: really
The keyword that fixed first voice messaging is included;Described in being carried out at least two directions to first voice messaging
ADBF processing, obtains the wave beam including the keyword corresponding with each direction;Corresponding according to each direction includes described
The wave beam of keyword determines the wake-up word frequency spectrum on corresponding direction.
In some embodiments, described device further include:
Speech enhan-cement processing module, for carrying out speech enhan-cement to the corresponding wave beam including the keyword in each direction
Processing, obtains the speech enhan-cement wave beam of corresponding direction;
Accordingly, the first processing module is also used to: by the upward speech enhan-cement wave beam of each party, being determined as counterparty
Upward wake-up word frequency spectrum.
In some embodiments, the speech enhan-cement processing module is also used to:
The wave beam corresponding to each direction carries out single-channel voice enhancing processing, obtains the speech enhan-cement of corresponding direction
Wave beam;Alternatively, the noise in the wave beam corresponding to each direction is eliminated, the speech enhan-cement wave of corresponding direction is obtained
Beam.
In some embodiments, the spectrum signature includes signal-to-noise ratio or wake-up rate;
Accordingly, the determining module is also used to: in the frequency spectrum at least two direction, will have highest noise
Direction corresponding to the frequency spectrum of ratio is determined as the target direction, alternatively, by side corresponding to the frequency spectrum with highest wake-up rate
To being determined as the target direction.
In some embodiments, described device further include:
Echo cancellation module, for the first voice messaging of acquisition carry out ADBF processing before, to first language
Message breath carries out echo cancellor;
Accordingly, the first processing module is also used to: at least two direction, to first after echo cancellor
Voice messaging carries out ADBF processing, obtains the frequency spectrum at least two direction.
In some embodiments, described device further include:
First obtains module, wakes up word included in the frequency spectrum for obtaining;
Wake-up module, for waking up electronic equipment by the wake-up word, to realize institute's predicate by the electronic equipment
Sound identification.
In some embodiments, described device further include:
And obtain module, for obtaining the voice collecting direction prestored on electronic equipment;
Memory module, for storing the target direction when the target direction and the voice collecting direction difference.
It should be noted that the description of the embodiment of the present application device, is similar, tool with the description of above method embodiment
There is the similar beneficial effect of same embodiment of the method, therefore does not repeat them here.For undisclosed technical detail in present apparatus embodiment,
It please refers to the description of the application embodiment of the method and understands.
The embodiment of the present application provides a kind of storage medium for being stored with executable instruction, wherein it is stored with executable instruction,
When executable instruction is executed by processor, processor will be caused to execute method provided by the embodiments of the present application, for example, such as Fig. 3 A
The method shown.
In some embodiments, storage medium can be ferroelectric memory (Ferromagnetic Random Access
Memory, FRAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable
Read Only Memory, PROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read Only
Memory, EPROM), band Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable
Read Only Memory, EEP ROM), flash memory, magnetic surface storage, CD or compact disc read-only memory (Compact
Disk-ReadOnly Memory, CD-ROM) etc. memories;It is also possible to include each of one of above-mentioned memory or any combination
Kind equipment.
In some embodiments, executable instruction can use program, software, software module, the form of script or code,
By any form of programming language (including compiling or interpretative code, or declaratively or process programming language) write, and its
It can be disposed by arbitrary form, including be deployed as independent program or be deployed as module, component, subroutine or be suitble to
Calculate other units used in environment.
As an example, executable instruction can with but not necessarily correspond to the file in file system, can be stored in
A part of the file of other programs or data is saved, for example, being stored in hypertext markup language (Hyper Text Markup
Language, HTML) in one or more scripts in document, it is stored in the single file for being exclusively used in discussed program,
Alternatively, being stored in multiple coordinated files (for example, the file for storing one or more modules, subprogram or code section).
As an example, executable instruction can be deployed as executing in a calculating equipment, or it is being located at one place
Multiple calculating equipment on execute, or, be distributed in multiple places and by multiple calculating equipment of interconnection of telecommunication network
Upper execution.
The above, only embodiments herein are not intended to limit the protection scope of the application.It is all in this Shen
Made any modifications, equivalent replacements, and improvements etc. within spirit and scope please, be all contained in the application protection scope it
It is interior.
Claims (10)
1. a kind of audio recognition method characterized by comprising
Adaptive beamformer ADBF processing is carried out to the first voice messaging of acquisition, obtains frequency at least two directions
Spectrum;
In the frequency spectrum at least two direction, spectrum signature is met to direction corresponding to the frequency spectrum of preset condition, really
It is set to target direction;
The second voice messaging acquired in the target direction is obtained, and speech recognition is carried out to second voice messaging.
2. the method according to claim 1, wherein the frequency spectrum is to wake up word frequency spectrum;Accordingly, it adopts for described pair
First voice messaging of collection carries out ADBF processing, obtains frequency spectrum at least two directions, comprising:
Determine the keyword that first voice messaging is included;
The ADBF processing is carried out to first voice messaging at least two directions, obtains packet corresponding with each direction
Include the wave beam of the keyword;
According to the corresponding wave beam including the keyword in each direction, the wake-up word frequency spectrum on corresponding direction is determined.
3. according to the method described in claim 2, it is characterized in that, the method also includes:
The wave beam including the keyword corresponding to each direction carries out speech enhan-cement processing, and the voice for obtaining corresponding direction increases
High-amplitude wave beam;
Accordingly, described according to the corresponding wave beam including the keyword in each direction, determine the wake-up word on corresponding direction
Frequency spectrum, comprising: by the upward speech enhan-cement wave beam of each party, the wake-up word frequency being determined as on corresponding direction is composed.
4. according to the method described in claim 3, it is characterized in that, described corresponding to each direction including the keyword
Wave beam carries out speech enhan-cement processing, obtains the speech enhan-cement wave beam of corresponding direction, comprising:
The wave beam corresponding to each direction carries out single-channel voice enhancing processing, obtains the speech enhan-cement wave of corresponding direction
Beam;Alternatively, the noise in the wave beam corresponding to each direction is eliminated, the speech enhan-cement wave beam of corresponding direction is obtained.
5. the method according to claim 1, wherein the spectrum signature includes signal-to-noise ratio or wake-up rate;
Accordingly, in the frequency spectrum at least two direction, the frequency spectrum institute that spectrum signature is met preset condition is right
The direction answered, is determined as target direction, comprising:
In the frequency spectrum at least two direction, by direction corresponding to the frequency spectrum with highest signal to noise ratio, it is determined as institute
Target direction is stated, alternatively, direction corresponding to the frequency spectrum with highest wake-up rate is determined as the target direction.
6. method according to any one of claims 1 to 5, which is characterized in that the method also includes:
Before carrying out ADBF processing to the first voice messaging of acquisition, echo cancellor is carried out to first voice messaging;
Accordingly, the first voice messaging of described pair of acquisition carries out ADBF processing, obtains frequency spectrum at least two directions, wraps
It includes:
On at least two direction, to after echo cancellor the first voice messaging carry out ADBF processing, obtain it is described extremely
Frequency spectrum in few both direction.
7. method according to any one of claims 1 to 5, which is characterized in that the method also includes:
It obtains and wakes up word included in the frequency spectrum;Electronic equipment is waken up by the wake-up word, to set by the electronics
It is standby to realize the speech recognition;
Alternatively, the method also includes: obtain the voice collecting direction prestored on electronic equipment;
When the target direction and the voice collecting direction difference, the target direction is stored.
8. a kind of speech recognition equipment characterized by comprising
First processing module obtains at least two directions for carrying out ADBF processing to the first voice messaging of acquisition
Frequency spectrum;
Determining module, in the frequency spectrum at least two direction, spectrum signature to be met to the frequency spectrum institute of preset condition
Corresponding direction, is determined as target direction;
Second processing module is believed for obtaining the second voice messaging in target direction acquisition, and to second voice
Breath carries out speech recognition.
9. a kind of speech recognition apparatus characterized by comprising
Memory, for storing executable instruction;
Processor when for executing the executable instruction stored in the memory, is realized described in any one of claim 1 to 7
Method.
10. a kind of storage medium, which is characterized in that being stored with executable instruction, when for causing processor to execute, realizing right
It is required that 1 to 7 described in any item methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910822237.0A CN110517682B (en) | 2019-09-02 | 2019-09-02 | Voice recognition method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910822237.0A CN110517682B (en) | 2019-09-02 | 2019-09-02 | Voice recognition method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110517682A true CN110517682A (en) | 2019-11-29 |
CN110517682B CN110517682B (en) | 2022-08-30 |
Family
ID=68629170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910822237.0A Active CN110517682B (en) | 2019-09-02 | 2019-09-02 | Voice recognition method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110517682B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111402873A (en) * | 2020-02-25 | 2020-07-10 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN112599126A (en) * | 2020-12-03 | 2021-04-02 | 海信视像科技股份有限公司 | Awakening method of intelligent device, intelligent device and computing device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104103277A (en) * | 2013-04-15 | 2014-10-15 | 北京大学深圳研究生院 | Time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method |
CN104810021A (en) * | 2015-05-11 | 2015-07-29 | 百度在线网络技术(北京)有限公司 | Pre-processing method and device applied to far-field recognition |
US20170076720A1 (en) * | 2015-09-11 | 2017-03-16 | Amazon Technologies, Inc. | Arbitration between voice-enabled devices |
CN109272989A (en) * | 2018-08-29 | 2019-01-25 | 北京京东尚科信息技术有限公司 | Voice awakening method, device and computer readable storage medium |
CN109599124A (en) * | 2018-11-23 | 2019-04-09 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and storage medium |
CN110164446A (en) * | 2018-06-28 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Voice signal recognition methods and device, computer equipment and electronic equipment |
-
2019
- 2019-09-02 CN CN201910822237.0A patent/CN110517682B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104103277A (en) * | 2013-04-15 | 2014-10-15 | 北京大学深圳研究生院 | Time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method |
CN104810021A (en) * | 2015-05-11 | 2015-07-29 | 百度在线网络技术(北京)有限公司 | Pre-processing method and device applied to far-field recognition |
US20170076720A1 (en) * | 2015-09-11 | 2017-03-16 | Amazon Technologies, Inc. | Arbitration between voice-enabled devices |
CN110164446A (en) * | 2018-06-28 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Voice signal recognition methods and device, computer equipment and electronic equipment |
CN109272989A (en) * | 2018-08-29 | 2019-01-25 | 北京京东尚科信息技术有限公司 | Voice awakening method, device and computer readable storage medium |
CN109599124A (en) * | 2018-11-23 | 2019-04-09 | 腾讯科技(深圳)有限公司 | A kind of audio data processing method, device and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111402873A (en) * | 2020-02-25 | 2020-07-10 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN111402873B (en) * | 2020-02-25 | 2023-10-20 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN112599126A (en) * | 2020-12-03 | 2021-04-02 | 海信视像科技股份有限公司 | Awakening method of intelligent device, intelligent device and computing device |
CN112599126B (en) * | 2020-12-03 | 2022-05-27 | 海信视像科技股份有限公司 | Awakening method of intelligent device, intelligent device and computing device |
Also Published As
Publication number | Publication date |
---|---|
CN110517682B (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109599124B (en) | Audio data processing method and device and storage medium | |
US10546593B2 (en) | Deep learning driven multi-channel filtering for speech enhancement | |
US11250383B2 (en) | Automated clinical documentation system and method | |
US10522164B2 (en) | Method and device for improving audio processing performance | |
US10249299B1 (en) | Tailoring beamforming techniques to environments | |
CN111630876B (en) | Audio device and audio processing method | |
CN111883156B (en) | Audio processing method and device, electronic equipment and storage medium | |
CN109074816B (en) | Far field automatic speech recognition preprocessing | |
US9978388B2 (en) | Systems and methods for restoration of speech components | |
CN108681440A (en) | A kind of smart machine method for controlling volume and system | |
CN107112012A (en) | It is used for low-power keyword detection and noise suppressed using digital microphone | |
US11587560B2 (en) | Voice interaction method, device, apparatus and server | |
Chatterjee et al. | ClearBuds: wireless binaural earbuds for learning-based speech enhancement | |
CN101253755A (en) | Audio data stream synchronization | |
CN110769352B (en) | Signal processing method and device and computer storage medium | |
CN109218882A (en) | The ambient sound monitor method and earphone of earphone | |
CN108335697A (en) | Minutes method, apparatus, equipment and computer-readable medium | |
JP2024507916A (en) | Audio signal processing method, device, electronic device, and computer program | |
KR20170063618A (en) | Electronic device and its reverberation removing method | |
CN110992967A (en) | Voice signal processing method and device, hearing aid and storage medium | |
CN115482830A (en) | Speech enhancement method and related equipment | |
CN110517682A (en) | Audio recognition method, device, equipment and storage medium | |
CN113241085A (en) | Echo cancellation method, device, equipment and readable storage medium | |
US11996114B2 (en) | End-to-end time-domain multitask learning for ML-based speech enhancement | |
JP2008116534A (en) | Voice communication device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |