CN110931007A - Voice recognition method and system - Google Patents

Voice recognition method and system Download PDF

Info

Publication number
CN110931007A
CN110931007A CN201911225468.XA CN201911225468A CN110931007A CN 110931007 A CN110931007 A CN 110931007A CN 201911225468 A CN201911225468 A CN 201911225468A CN 110931007 A CN110931007 A CN 110931007A
Authority
CN
China
Prior art keywords
noise
audio
voice
microphone
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911225468.XA
Other languages
Chinese (zh)
Other versions
CN110931007B (en
Inventor
周晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201911225468.XA priority Critical patent/CN110931007B/en
Publication of CN110931007A publication Critical patent/CN110931007A/en
Application granted granted Critical
Publication of CN110931007B publication Critical patent/CN110931007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The embodiment of the invention provides a voice recognition method. The method comprises the following steps: the method comprises the steps that first noisy voice audios collected by voice recognition equipment in real time are synchronously received, and the first noisy audios sent by at least one noise collection microphone are synchronously received; performing echo cancellation on the first voice audio with noise and the first noise audio, and determining a second voice audio with noise and a second noise audio after echo cancellation; estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice; and carrying out voice recognition on the clean voice. The embodiment of the invention also provides a voice recognition system. The embodiment of the invention provides the most effective noise source for noise reduction of the intelligent voice equipment. The self-noise of the noise equipment in the collected signal of the voice microphone is eliminated. The method has the advantages of no need of large amount of calculation, low time delay, wider applicable equipment, and capability of ensuring the accuracy rate of voice recognition and the success rate of awakening.

Description

Voice recognition method and system
Technical Field
The invention relates to the field of intelligent voice, in particular to a voice recognition method and a voice recognition system.
Background
With the development of intelligent voice, intelligent voice equipment gradually merges into a user's home. The intelligent voice equipment can be used for executing corresponding operation by the user speaking the sentence at any time and any place at home. For example, in the smart television, the user can jump to the corresponding video by only speaking the desired program or the desired channel. For example, the user may speak a song to be played, or the weather of the day is good, and the smart speaker may perform corresponding operations after performing speech recognition.
In a home environment, other devices always emit noise, for example, sound emitted by a smart television is equivalent to noise for recognition of a smart voice device, and voice recognition is affected. For the situation, the loudspeaker in the intelligent television acquires self-noise emitted by the intelligent television through a hardware loop/software loop, and the intelligent television actively reduces the noise of the received sound through the self-noise of the television, so that the influence of indoor external noise on voice recognition is avoided.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
however, other devices in the home environment always emit noise, such as washing machines, refrigerators, ovens, and range hoods, which may also affect the voice recognition effect. The self-noise sources of the devices are difficult to obtain, active noise reduction cannot be performed, and the awakening success rate and the recognition accuracy rate of the intelligent voice device are influenced.
Disclosure of Invention
The problem that the awakening success rate and the recognition accuracy rate of the intelligent voice equipment are influenced due to the interference of a noise source in a home environment on the intelligent voice equipment in the prior art is at least solved.
In a first aspect, an embodiment of the present invention provides a noise self-acquisition method for a noise device, which is applied to a noise acquisition microphone disposed at a noise source of the noise device, and the method includes:
the noise acquisition microphone receives analog gain configuration information and configures a signal acquisition mode according to the analog gain configuration information;
and carrying out multi-channel signal acquisition through the signal acquisition mode, and sending the acquired noise audio to voice recognition equipment.
In a second aspect, an embodiment of the present invention provides a speech recognition method, which is applied to a speech recognition device that establishes a connection with the noise collection microphone, and the method includes:
the voice recognition equipment acquires a first voice audio frequency with noise in real time and synchronously receives a first noise audio frequency sent by at least one noise acquisition microphone;
respectively carrying out echo cancellation on the first voice audio with noise and the first noise audio, and determining a second voice audio with noise and a second noise audio after echo cancellation;
estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice;
and performing voice recognition on the clean voice, and determining information corresponding to the clean voice.
In a third aspect, an embodiment of the present invention provides a noise self-acquisition system for a noise device, applied to a noise acquisition microphone disposed at a noise source of the noise device, where the system includes:
the analog gain configuration program module is used for receiving analog gain configuration information by the noise acquisition microphone and configuring a signal acquisition mode according to the analog gain configuration information;
and the noise acquisition program module is used for carrying out multi-channel signal acquisition through the signal acquisition mode and sending the acquired noise audio to the voice recognition equipment.
In a fourth aspect, an embodiment of the present invention provides a speech recognition system, which is applied to a speech recognition device connected to the noise collection microphone, and the system includes:
the audio acquisition program module is used for acquiring a first voice audio with noise in real time by the voice recognition equipment and synchronously receiving the first noise audio sent by at least one noise acquisition microphone;
the echo cancellation program module is used for performing echo cancellation on the first voice audio with noise and the first noise audio respectively, and determining a second voice audio with noise and a second noise audio after echo cancellation;
the noise reduction program module is used for estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice;
and the recognition program module is used for carrying out voice recognition on the clean voice and determining the information corresponding to the clean voice.
In a fifth aspect, an electronic device is provided, comprising: the noise self-collection device comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the steps of the noise self-collection method and the voice recognition method for the noise device of any embodiment of the invention.
In a sixth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the noise self-acquisition method and the speech recognition method for a noise device according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: the noise collection microphone is configured at the self-noise source and is established with the intelligent voice equipment on the basis of collecting the self-noise of the equipment specially, so that the noise is effectively transmitted to the intelligent voice equipment, and the most effective noise source is provided for noise reduction of the intelligent voice equipment. And synchronously acquiring the voice with noise and the noise audio, and inputting the voice with noise and the noise audio into an echo cancellation module to realize the cancellation of the equipment self-noise in the signal acquired by the voice microphone. And noise reduction processing is performed on the signal level, a large amount of calculation is not needed, the requirement on intelligent voice equipment is not high, and the method is more widely applicable. The time delay is low, the user experience is improved, and meanwhile, the accuracy rate of voice recognition and the awakening success rate are also ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a noise self-acquisition method for a noise device according to an embodiment of the present invention;
FIG. 2 is a flow chart of a speech recognition method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a noise self-acquisition system for a noise device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a speech recognition system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a noise self-acquisition method for a noise device according to an embodiment of the present invention, which includes the following steps:
s11: the noise acquisition microphone receives analog gain configuration information and configures a signal acquisition mode according to the analog gain configuration information;
s12: and carrying out multi-channel signal acquisition through the signal acquisition mode, and sending the acquired noise audio to voice recognition equipment.
In this embodiment, when some devices work, self-noise energy is very large, for example, a sweeper, a dust collector, a range hood, a washing machine, a juicer, an oven, and the like, and the strong self-noise can be collected by a microphone of the intelligent voice device, so that voice interaction performance of the intelligent device and a user is seriously affected, for example, a wake-up success rate, a recognition accuracy rate, and the like.
It is first necessary to determine which of the devices are the source of the own noise. Taking a floor sweeper as an example, the self-noise sources include a main brush motor, an auxiliary brush motor, a blower motor, a laser displacement sensor motor, friction sound of a brush and the ground and the like, and then noise microphones are respectively arranged near the self-noise sources. Noise microphones typically pick up analog microphones with a sensitivity around-38 dBV/Pa. The position of the noise microphone can be selected from the middle of several self-noise sources, so that a plurality of self-noise sources can be collected simultaneously, the number of the noise microphones can be reduced, and further the hardware cost and the calculated amount of the echo cancellation module are reduced. Generally, although the floor sweeper product has more self-noise sources, the internal structure is compact, and 2 noise microphones can achieve good effect; the self noise source of the dust collector/range hood is single, and only 1 noise microphone is needed. Manufacturers of these devices select and reserve space at the time of production to integrate the noise microphones within the interior of these devices. The user can directly use the device after buying the device without self-installation.
For step S11, each microphone receives analog gain configuration information, where the analog gain is mainly to adjust the signal strength of the linear amplification input, and the magnitude thereof directly affects the value of the output audio power within a certain range, and a larger input value is beneficial to improving the output signal-to-noise ratio, and also increases the output power in a comparable manner. However, when the input is too large, the output power increases slowly, and the distortion increases sharply. The optimum adjustment value is such that the peak output voltage is within the linear range of the amplifier. And configuring a signal acquisition mode according to the analog gain configuration information. In one embodiment, the analog gain configuration information is 0db, which is used to prevent the noise collecting microphone from collecting the speaking voice. In the present embodiment, the analog gain is 0dB, that is, the analog gain is not set, which is to prevent the noise microphone from collecting the speaker voice and causing the problem of voice self-cancellation after passing through the echo cancellation module, that is, the speaker voice is mistaken for self-noise and is cancelled. Because of no analog gain, the noise microphone cannot be too far away from the self-noise source, and the self-noise source signal with high signal-to-noise ratio cannot be acquired when the noise microphone is too far away, and the distance is preferably within 20cm, and the closer the distance is, the better the distance is.
For step S12, multi-channel signal acquisition is performed according to the signal acquisition mode determined in step S11, and the acquired noise audio is transmitted to the voice recognition apparatus. The voice recognition device is pre-connected with the noise collecting microphones of the noise devices in advance so as to facilitate the transmission of noise.
According to the implementation method, the noise collection microphone is configured on the self-noise source and used for establishing connection with the intelligent voice equipment on the basis of specially collecting the self-noise of the equipment, and the noise is effectively transmitted to the intelligent voice equipment. The most effective noise source is provided for noise reduction of the intelligent voice equipment, so that the influence of noise on identification of the intelligent voice equipment is better reduced.
Fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present invention, which includes the following steps:
s21: the voice recognition equipment acquires a first voice audio frequency with noise in real time and synchronously receives a first noise audio frequency sent by at least one noise acquisition microphone;
s22: respectively carrying out echo cancellation on the first voice audio with noise and the first noise audio, and determining a second voice audio with noise and a second noise audio after echo cancellation;
s23: estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice;
s24: and performing voice recognition on the clean voice, and determining information corresponding to the clean voice.
In this embodiment, the smart voice device establishes a connection with the noise collection microphone in advance, for example, a wireless network. Therefore, the intelligent voice equipment can receive the noise collected by the noise collecting microphone in real time.
For step S21, in use, the speech recognition device and the noise collection microphone may be connected to each other or connected to each other separately, and the speech recognition device and the noise collection microphone may be connected to each other in a multi-channel signal synchronous collection manner under the same microphone networking. The noise collection microphone collects noise audio and sends the noise audio to the voice recognition equipment (namely, intelligent voice equipment) in real time, and the voice recognition equipment receives first voice audio with noise collected by the noise collection microphone in real time and also collects the first voice audio with noise.
For step S22, the first noisy speech audio and the first noisy speech audio are then input to an echo cancellation module and output as a speech signal with most of the self-noise of the device removed. The reference tone input of the echo cancellation algorithm is the source of the echo to be cancelled, and for devices comprising loudspeakers, the audio to be played, and for devices to which the method relates, the reference tone input is the signal picked up by the source of the noise, i.e. the noise microphone. The microphone input of the echo cancellation algorithm is a signal containing echo and voice, i.e., a signal collected by the voice microphone. Echo cancellation is realized by methods such as a linear adaptive filter, residual echo suppression and the like through related information between reference sound input and microphone input. Thereby determining a second noisy speech audio and a second noisy audio.
Further to step S23, if the power spectral density of the second noise audio does not change much with time, i.e. belongs to stationary noise, a post-filtering module may be connected after the echo cancellation module. The post-filtering algorithm suppresses noise by estimating the noise power spectral density in real time and then removing the estimated noise from the noisy signal, and introduces no or little speech distortion. For example, equipment such as a sweeper, a dust collector, a range hood and the like with a fan can generate steady wind noise in the working process, and the noise can be reduced through the rear filtering module. And then generating the clean voice after noise reduction.
In step S24, after the clean voice is determined, voice recognition is performed to determine information corresponding to the clean voice, and a wakeup operation or a voice interaction operation is performed. The method does not need methods such as pattern matching, neural networks and the like to process noise of known product types. The pattern matching and neural network method needs a large amount of data to support (such as recording audio of various scene noises), and cannot achieve a good noise reduction effect for the types of noises which are not recorded. The method can be improved only by a method of adding data, namely the method is sensitive to training data and has no universality. The method can adapt to various products by adjusting specific algorithms and parameters. In addition, the calculated amount of the neural network is much larger than the signal noise reduction processing of the method due to the mode matching, and high requirements are put forward on the calculating capacity and the memory of the intelligent voice equipment. The method does not need a neural network and pattern matching, occupies little resources, and can be used in intelligent equipment with small memory or little computing capability. In addition, due to pattern matching, the calculated amount of the neural network is large, and due to the model characteristics of the neural network, certain response delay often exists, and the intuitive experience is expressed as slow response speed of awakening and the like. The method can realize real-time processing and has no delay problem.
According to the implementation method, the voice with noise and the noise audio are synchronously acquired and input to the echo cancellation module, so that the self-noise of the equipment in the voice microphone acquisition signal is eliminated. And noise reduction processing is performed on the signal level, a large amount of calculation is not needed, the requirement on intelligent voice equipment is not high, and the method is more widely applicable. The time delay is low, the user experience is improved, and meanwhile, the accuracy rate of voice recognition and the awakening success rate are also ensured.
As an implementation manner, in this embodiment, after performing echo cancellation on the first noisy speech audio and the first noise audio, respectively, and determining a second noisy speech audio and a second noise audio after echo cancellation, the method further includes:
performing beam forming processing by using phase differences among microphones in a microphone array in the voice recognition equipment, enhancing voice signals in the voice sound source direction of the microphone array, suppressing noise signals in at least one non-voice sound source direction, and determining third voice with noise and third noise audio;
and estimating the noise power spectral density of the third noisy speech audio in real time, and performing peripheral noise reduction on the third noisy speech audio according to the noise power spectral density and the third noise audio to generate noise-reduced clean speech.
The microphone array comprises at least: the microphone array comprises a double-microphone array, a linear four-microphone array, an annular four-microphone array and an annular six-microphone array.
In the present embodiment, if the speech microphone is a microphone array, such as a two-microphone array, a linear four-microphone array, a circular six-microphone array, etc., then a beam forming module may be connected after the echo cancellation module. The wave beam forming algorithm utilizes the phase difference and amplitude difference information between every two microphones of the array, can enhance the voice in the expected direction, and inhibit the noise in the undesired direction, namely the noise in the non-voice direction, so as to obtain good noise reduction effect.
It can be seen from this embodiment that the voice in the desired direction (for example, the direction in which the user is located) is enhanced through the beamforming, and the noise in the undesired direction (i.e., the noise in the non-voice direction, that is, the direction in which the user is not located) is suppressed, so that the noise reduction effect can be further improved.
Fig. 3 is a schematic structural diagram of a noise self-acquisition system for a noise device according to an embodiment of the present invention, which can execute the noise self-acquisition method for a noise device according to any of the above embodiments and is configured in a terminal.
The noise self-acquisition system for the noise equipment provided by the embodiment comprises: an analog gain configuration program module 11 and a noise acquisition program module 12.
The analog gain configuration program module 11 is configured to receive analog gain configuration information by the noise acquisition microphone, and configure a signal acquisition mode according to the analog gain configuration information; the noise acquisition program module 12 is configured to perform multi-channel signal acquisition in the signal acquisition mode, and send the acquired noise audio to the speech recognition device.
Further, the analog gain configuration information is 0 decibel, and is used for preventing the noise collection microphone from collecting speaking voice.
As one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
the noise acquisition microphone receives analog gain configuration information and configures a signal acquisition mode according to the analog gain configuration information;
and carrying out multi-channel signal acquisition through the signal acquisition mode, and sending the acquired noise audio to voice recognition equipment.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a noise self-acquisition method for a noise device in any of the method embodiments described above.
Fig. 4 is a schematic structural diagram of a speech recognition system according to an embodiment of the present invention, which can execute the speech recognition method according to any of the above embodiments and is configured in a terminal.
The speech recognition system provided by the embodiment comprises: an audio acquisition program module 21, an echo cancellation program module 22, a noise reduction program module 23 and an identification program module 24.
The audio acquisition program module 21 is configured to synchronously receive a first noise audio sent by at least one noise acquisition microphone, where the first noise audio is a first voice audio with noise and is acquired by the voice recognition device in real time; the echo cancellation program module 22 is configured to perform echo cancellation on the first noisy speech audio and the first noise audio, respectively, and determine a second noisy speech audio and a second noise audio after echo cancellation; the noise reduction program module 23 is configured to estimate a noise power spectral density of the second noisy speech audio in real time, and perform peripheral noise reduction on the second noisy speech audio according to the noise power spectral density and the second noise audio to generate a noise-reduced clean speech; the recognition program module 24 is configured to perform speech recognition on the clean speech and determine information corresponding to the clean speech.
Further, after the echo cancellation program module, the system further comprises:
a beam forming program module, configured to perform beam forming processing using phase differences between microphones in a microphone array in the speech recognition device, enhance speech signals in a speech sound source direction of the microphone array, suppress noise signals in at least one non-speech sound source direction, and determine a third noisy speech and a third noise audio;
and the noise reduction program module is used for estimating the noise power spectral density of the third noisy speech audio in real time, and performing peripheral noise reduction on the third noisy speech audio according to the noise power spectral density and the third noise audio to generate noise-reduced clean speech.
Further, the microphone array comprises at least: the microphone array comprises a double-microphone array, a linear four-microphone array, an annular four-microphone array and an annular six-microphone array.
As one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
the voice recognition equipment acquires a first voice audio frequency with noise in real time and synchronously receives a first noise audio frequency sent by at least one noise acquisition microphone;
respectively carrying out echo cancellation on the first voice audio with noise and the first noise audio, and determining a second voice audio with noise and a second noise audio after echo cancellation;
estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice;
and performing voice recognition on the clean voice, and determining information corresponding to the clean voice.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a speech recognition method in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the noise self-collection device comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the steps of the noise self-collection method and the voice recognition method for the noise device of any embodiment of the invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with speech processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A noise self-acquisition method for a noisy device, applied to a noise-acquisition microphone disposed at a noise source of the noisy device, the method comprising:
the noise acquisition microphone receives analog gain configuration information and configures a signal acquisition mode according to the analog gain configuration information;
and carrying out multi-channel signal acquisition through the signal acquisition mode, and sending the acquired noise audio to voice recognition equipment.
2. The method of claim 1, wherein the analog gain profile is 0 decibels for preventing the noise collection microphone from collecting spoken speech.
3. A speech recognition method applied to a speech recognition device that establishes a connection with the noise collection microphone, the method comprising:
the voice recognition equipment acquires a first voice audio frequency with noise in real time and synchronously receives a first noise audio frequency sent by at least one noise acquisition microphone;
respectively carrying out echo cancellation on the first voice audio with noise and the first noise audio, and determining a second voice audio with noise and a second noise audio after echo cancellation;
estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice;
and performing voice recognition on the clean voice, and determining information corresponding to the clean voice.
4. The method of claim 3, wherein after said separately echo canceling said first noisy speech audio and first noise audio, determining echo-canceled second noisy speech audio and second noise audio, the method further comprises:
performing beam forming processing by using phase differences among microphones in a microphone array in the voice recognition equipment, enhancing voice signals in the voice sound source direction of the microphone array, suppressing noise signals in at least one non-voice sound source direction, and determining third voice with noise and third noise audio;
and estimating the noise power spectral density of the third noisy speech audio in real time, and performing peripheral noise reduction on the third noisy speech audio according to the noise power spectral density and the third noise audio to generate noise-reduced clean speech.
5. The method of claim 4, wherein the microphone array comprises at least: the microphone array comprises a double-microphone array, a linear four-microphone array, an annular four-microphone array and an annular six-microphone array.
6. A noise self-acquisition system for a noisy device, applied to a noise-acquisition microphone arranged at a noise source of said noisy device, said system comprising:
the analog gain configuration program module is used for receiving analog gain configuration information by the noise acquisition microphone and configuring a signal acquisition mode according to the analog gain configuration information;
and the noise acquisition program module is used for carrying out multi-channel signal acquisition through the signal acquisition mode and sending the acquired noise audio to the voice recognition equipment.
7. The system of claim 6, wherein the analog gain profile is 0 decibels for preventing the noise collection microphone from collecting spoken speech.
8. A speech recognition system for use with a speech recognition device that establishes a connection with the noise-capturing microphone, the system comprising:
the audio acquisition program module is used for acquiring a first voice audio with noise in real time by the voice recognition equipment and synchronously receiving the first noise audio sent by at least one noise acquisition microphone;
the echo cancellation program module is used for performing echo cancellation on the first voice audio with noise and the first noise audio respectively, and determining a second voice audio with noise and a second noise audio after echo cancellation;
the noise reduction program module is used for estimating the noise power spectral density of the second voice audio with noise in real time, and performing peripheral noise reduction on the second voice audio with noise according to the noise power spectral density and the second noise audio to generate noise-reduced clean voice;
and the recognition program module is used for carrying out voice recognition on the clean voice and determining the information corresponding to the clean voice.
9. The system of claim 8, wherein after the echo cancellation program module, the system further comprises:
a beam forming program module, configured to perform beam forming processing using phase differences between microphones in a microphone array in the speech recognition device, enhance speech signals in a speech sound source direction of the microphone array, suppress noise signals in at least one non-speech sound source direction, and determine a third noisy speech and a third noise audio;
and the noise reduction program module is used for estimating the noise power spectral density of the third noisy speech audio in real time, and performing peripheral noise reduction on the third noisy speech audio according to the noise power spectral density and the third noise audio to generate noise-reduced clean speech.
10. The system of claim 9, wherein the microphone array comprises at least: the microphone array comprises a double-microphone array, a linear four-microphone array, an annular four-microphone array and an annular six-microphone array.
CN201911225468.XA 2019-12-04 2019-12-04 Voice recognition method and system Active CN110931007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911225468.XA CN110931007B (en) 2019-12-04 2019-12-04 Voice recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911225468.XA CN110931007B (en) 2019-12-04 2019-12-04 Voice recognition method and system

Publications (2)

Publication Number Publication Date
CN110931007A true CN110931007A (en) 2020-03-27
CN110931007B CN110931007B (en) 2022-07-12

Family

ID=69857724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911225468.XA Active CN110931007B (en) 2019-12-04 2019-12-04 Voice recognition method and system

Country Status (1)

Country Link
CN (1) CN110931007B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992170A (en) * 2021-01-29 2021-06-18 青岛海尔科技有限公司 Model training method and device, storage medium and electronic device
CN113286047A (en) * 2021-04-22 2021-08-20 维沃移动通信(杭州)有限公司 Voice signal processing method and device and electronic equipment
CN113808605A (en) * 2021-09-29 2021-12-17 睿云联(厦门)网络通讯技术有限公司 Building intercom system-based voice enhancement method, device and equipment
CN113838472A (en) * 2021-08-24 2021-12-24 盛景智能科技(嘉兴)有限公司 Voice noise reduction method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1210608A (en) * 1996-02-01 1999-03-10 艾利森电话股份有限公司 Noisy speech parameter enhancement method and apparatus
JP2001318687A (en) * 2000-02-28 2001-11-16 Mitsubishi Electric Corp Speech recognition device
US6674865B1 (en) * 2000-10-19 2004-01-06 Lear Corporation Automatic volume control for communication system
CN101778183A (en) * 2009-01-13 2010-07-14 华为终端有限公司 Method and device for suppressing residual echo
US20130054231A1 (en) * 2011-08-29 2013-02-28 Intel Mobile Communications GmbH Noise reduction for dual-microphone communication devices
EP2701145A1 (en) * 2012-08-24 2014-02-26 Retune DSP ApS Noise estimation for use with noise reduction and echo cancellation in personal communication
CN105976826A (en) * 2016-04-28 2016-09-28 中国科学技术大学 Speech noise reduction method applied to dual-microphone small handheld device
US9685171B1 (en) * 2012-11-20 2017-06-20 Amazon Technologies, Inc. Multiple-stage adaptive filtering of audio signals
CN107464552A (en) * 2017-08-24 2017-12-12 徐银海 A kind of distributed locomotive active noise reduction system and method
CN107680609A (en) * 2017-09-12 2018-02-09 桂林电子科技大学 A kind of double-channel pronunciation Enhancement Method based on noise power spectral density
US10079026B1 (en) * 2017-08-23 2018-09-18 Cirrus Logic, Inc. Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN108899052A (en) * 2018-07-10 2018-11-27 南京邮电大学 A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction
CN109616139A (en) * 2018-12-25 2019-04-12 平安科技(深圳)有限公司 Pronunciation signal noise power spectral density estimation method and device
CN109727605A (en) * 2018-12-29 2019-05-07 苏州思必驰信息科技有限公司 Handle the method and system of voice signal
CN110148420A (en) * 2019-06-30 2019-08-20 桂林电子科技大学 A kind of audio recognition method suitable under noise circumstance

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1210608A (en) * 1996-02-01 1999-03-10 艾利森电话股份有限公司 Noisy speech parameter enhancement method and apparatus
JP2001318687A (en) * 2000-02-28 2001-11-16 Mitsubishi Electric Corp Speech recognition device
US6674865B1 (en) * 2000-10-19 2004-01-06 Lear Corporation Automatic volume control for communication system
CN101778183A (en) * 2009-01-13 2010-07-14 华为终端有限公司 Method and device for suppressing residual echo
US20130054231A1 (en) * 2011-08-29 2013-02-28 Intel Mobile Communications GmbH Noise reduction for dual-microphone communication devices
EP2701145A1 (en) * 2012-08-24 2014-02-26 Retune DSP ApS Noise estimation for use with noise reduction and echo cancellation in personal communication
US9685171B1 (en) * 2012-11-20 2017-06-20 Amazon Technologies, Inc. Multiple-stage adaptive filtering of audio signals
CN105976826A (en) * 2016-04-28 2016-09-28 中国科学技术大学 Speech noise reduction method applied to dual-microphone small handheld device
US10079026B1 (en) * 2017-08-23 2018-09-18 Cirrus Logic, Inc. Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN107464552A (en) * 2017-08-24 2017-12-12 徐银海 A kind of distributed locomotive active noise reduction system and method
CN107680609A (en) * 2017-09-12 2018-02-09 桂林电子科技大学 A kind of double-channel pronunciation Enhancement Method based on noise power spectral density
CN108899052A (en) * 2018-07-10 2018-11-27 南京邮电大学 A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction
CN109616139A (en) * 2018-12-25 2019-04-12 平安科技(深圳)有限公司 Pronunciation signal noise power spectral density estimation method and device
CN109727605A (en) * 2018-12-29 2019-05-07 苏州思必驰信息科技有限公司 Handle the method and system of voice signal
CN110148420A (en) * 2019-06-30 2019-08-20 桂林电子科技大学 A kind of audio recognition method suitable under noise circumstance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭雨晨 等: "多窗谱估计的语音增强减法研究", 《计算机工程与应用》 *
杨毅 等: "麦克风阵列及其消噪性能研究", 《计算机工程》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992170A (en) * 2021-01-29 2021-06-18 青岛海尔科技有限公司 Model training method and device, storage medium and electronic device
CN112992170B (en) * 2021-01-29 2022-10-28 青岛海尔科技有限公司 Model training method and device, storage medium and electronic device
CN113286047A (en) * 2021-04-22 2021-08-20 维沃移动通信(杭州)有限公司 Voice signal processing method and device and electronic equipment
CN113286047B (en) * 2021-04-22 2023-02-21 维沃移动通信(杭州)有限公司 Voice signal processing method and device and electronic equipment
CN113838472A (en) * 2021-08-24 2021-12-24 盛景智能科技(嘉兴)有限公司 Voice noise reduction method and device
CN113808605A (en) * 2021-09-29 2021-12-17 睿云联(厦门)网络通讯技术有限公司 Building intercom system-based voice enhancement method, device and equipment
CN113808605B (en) * 2021-09-29 2023-09-12 睿云联(厦门)网络通讯技术有限公司 Voice enhancement method, device and equipment based on building intercom system

Also Published As

Publication number Publication date
CN110931007B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN110931007B (en) Voice recognition method and system
US9966059B1 (en) Reconfigurale fixed beam former using given microphone array
US20160205263A1 (en) Echo Cancellation Method and Apparatus
US11165399B2 (en) Compensation for ambient sound signals to facilitate adjustment of an audio volume
TWI543149B (en) Noise cancellation method
US10115412B2 (en) Signal processor with side-tone noise reduction for a headset
US9078057B2 (en) Adaptive microphone beamforming
CN111971975B (en) Active noise reduction method, system, electronic equipment and chip
CN102164203A (en) Information processing device and method and program
CN111683319A (en) Call pickup noise reduction method, earphone and storage medium
US11277685B1 (en) Cascaded adaptive interference cancellation algorithms
WO2023284402A1 (en) Audio signal processing method, system, and apparatus, electronic device, and storage medium
CN110277103A (en) Noise-reduction method and terminal based on speech recognition
CN111554317A (en) Voice broadcasting method, device, computer storage medium and system
US11627421B1 (en) Method for realizing hearing aid function based on bluetooth headset chip and a bluetooth headset
CN110972012A (en) Earphone control method and earphone
CN107452398B (en) Echo acquisition method, electronic device and computer readable storage medium
US10863296B1 (en) Microphone failure detection and re-optimization
CN107370898B (en) Ring tone playing method, terminal and storage medium thereof
CN113808566B (en) Vibration noise processing method and device, electronic equipment and storage medium
CN114420153A (en) Sound quality adjusting method, device, equipment and storage medium
CN115171703A (en) Distributed voice awakening method and device, storage medium and electronic device
CN114255779A (en) Audio noise reduction method for VR device, electronic device and storage medium
CN107197403A (en) A kind of terminal audio frequency parameter management method, apparatus and system
JP2015070291A (en) Sound collection/emission device, sound source separation unit and sound source separation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

GR01 Patent grant
GR01 Patent grant