CN109473111B - Voice enabling device and method - Google Patents

Voice enabling device and method Download PDF

Info

Publication number
CN109473111B
CN109473111B CN201811644724.4A CN201811644724A CN109473111B CN 109473111 B CN109473111 B CN 109473111B CN 201811644724 A CN201811644724 A CN 201811644724A CN 109473111 B CN109473111 B CN 109473111B
Authority
CN
China
Prior art keywords
audio data
sound source
voice
wake
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811644724.4A
Other languages
Chinese (zh)
Other versions
CN109473111A (en
Inventor
雷雄国
涂长宇
郑炜乔
郭彭亮
刘强
何家锋
徐瑞婷
卢玉环
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201811644724.4A priority Critical patent/CN109473111B/en
Publication of CN109473111A publication Critical patent/CN109473111A/en
Application granted granted Critical
Publication of CN109473111B publication Critical patent/CN109473111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice enabling device, which comprises a sound source acquisition module, a voice processing module and a voice processing module, wherein the sound source acquisition module is used for acquiring audio data and outputting the audio data to the voice processing module; the voice processing module is used for processing the audio data and generating first audio data and second audio data; the data transmission module is used for realizing data interaction with the external equipment and outputting the first audio data and the second audio data to the external equipment connected with the data transmission module. The invention also discloses a method for applying the device to carry out voice energization, and the device and the method can realize that the host equipment without voice recognition function is endowed with voice interaction function, overcome the noise processing problem of voice recognition in the prior art and optimize the voice recognition result. And the power consumption is reduced, and the resources are not occupied.

Description

Voice enabling device and method
Technical Field
The invention relates to the technical field of voice interaction, in particular to a voice enabling device and method.
Background
Along with development of science and technology, intelligent devices are more and more popular, but most intelligent devices in the market at present do not have voice interaction capability, and most of commonly used devices with voice interaction function are near-field pickup interaction or simple single-round dialogue design, so that noise processing and voice recognition accuracy in voice interaction are not high, and meanwhile, sound sources played by a host device cannot be eliminated, and far-field voice signal processing cannot be achieved.
On the other hand, most of the voice interaction of the equipment runs on the host equipment, so that the power consumption is affected to a certain extent, the low power consumption requirement cannot be met frequently, and meanwhile, most of the front-end signal processing is also put on the host equipment for operation, so that the system resources are occupied greatly, and the system running efficiency is affected.
Disclosure of Invention
Aiming at the problems, the invention aims to provide a technical scheme capable of realizing far-field voice interaction of host equipment, in particular to a solution capable of conveniently realizing far-field voice interaction function expansion of the host equipment on the basis of not changing the structure of the host equipment.
According to a first aspect of the present invention, there is provided a speech enabling apparatus comprising
The sound source acquisition module is used for acquiring audio data and outputting the audio data to the voice processing module;
the voice processing module is used for processing the audio data and generating first audio data;
and the data transmission module is used for realizing data interaction with the external equipment and outputting the first audio data to the external equipment connected with the data transmission module.
According to a second aspect of the present invention, there is provided a method of effecting speech enablement by a speech enablement device, comprising the steps of:
connecting a voice enabling device to a main device through a data transmission module;
the voice enabling device collects audio data and processes the audio data to generate first audio data and second audio data;
the voice enabling means outputs the first audio data and the second audio data to the host device.
According to the device and the method provided by the invention, the host equipment without the voice recognition function can be endowed with the voice interaction function, and can be directly communicated with the host equipment through the data transmission module, so that the acquisition and the processing of the voice information are realized, the host equipment connected with the device can conveniently have far-field voice interaction capability, and the voice function of the host equipment is greatly conveniently expanded. In addition, the device and the method provided by the embodiment of the invention can carry out front-end signal processing on the audio data, and overcome the problems of reduced power consumption, occupied resources and the like caused by the fact that the host equipment needs to carry out front-end signal processing in the prior art.
Drawings
FIG. 1 is a schematic block diagram of a speech enabling apparatus according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a speech enabling apparatus according to yet another embodiment of the present invention;
fig. 3 is a flowchart of a method for implementing voice enablement by a voice enablement device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In the present invention, "module," "device," "system," and the like refer to a related entity, either hardware, a combination of hardware and software, or software in execution, as applied to a computer. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, the application or script running on the server, the server may be an element. One or more elements may be in processes and/or threads of execution, and elements may be localized on one computer and/or distributed between two or more computers, and may be run by various computer readable media. The elements may also communicate by way of local and/or remote processes in accordance with a signal having one or more data packets, e.g., a signal from one data packet interacting with another element in a local system, distributed system, and/or across a network of the internet with other systems by way of the signal.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," comprising, "or" includes not only those elements but also other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The invention is described in further detail below with reference to the accompanying drawings.
Fig. 1 schematically shows a schematic block diagram of a speech enabling apparatus according to an embodiment of the invention. As shown in figure 1 of the drawings,
the voice enabling device comprises: the system comprises a sound source acquisition module 1, a voice processing module 2 and a data transmission module 3.
The sound source acquisition module 1 is used for acquiring audio data and outputting the audio data to the voice processing module 2. By way of example, the module is implemented as a plurality of microphones, in particular as movable directional microphones, a localization of the sound source may be achieved, and the user may issue instructions for voice interaction directly against the module, such as "i want to record" etc., to achieve far field pick-up. The microphone is arranged to be movable, so that the direction of the microphone can be adjusted to realize the enhancement aiming at the direction of the sound source, and noise at other angles is weakened, so that the quality of the audio can be ensured.
The voice processing module 2 is configured to process the audio data, generate first audio data and second audio data, where the first audio data is a voice command sent by a user, and the second audio data is a wake-up control signal, that is, content data related to a wake-up result, and perform voice wake-up in the device according to the voice command sent by the user to obtain the wake-up control signal. In other embodiments, the speech enabling means may also be arranged to generate only the first audio data, i.e. to perform only the front-end signal processing, without including the speech wake-up recognition processing.
The data transmission module 3 is configured to implement data interaction with an external device, and output the first audio data and the second audio data to the external device connected with the data transmission module, so that the host device without the voice interaction function can implement the voice interaction function according to the first audio data and the second audio data. The data transmission module 3 supports at least one of a USB protocol, a bluetooth protocol, and a WiFi protocol, and may be exemplarily implemented as a USB interface. In the case where the voice enabling apparatus performs only front-end signal processing, the data transmission module 3 outputs the first audio data to an external device connected thereto.
The sound source acquisition module 1 comprises a first sound source acquisition component 101 and a second sound source acquisition component 102. The first sound source acquisition component 101 is used for acquiring sound source audio data; the second sound source acquisition component 102 is for acquiring reference audio data. Illustratively, the first sound source acquisition component 101 and the second sound source acquisition component 102 are each implemented as two movable microphones that perform 16k/16bit audio acquisition of the recorded speech. When acquiring sound source audio data, the user may speak directly into the two movable microphones and the first sound source acquisition component 101 registers sound source audio. The reference audio data is mainly for the background sound of the host device connected with the reference audio data, and the movable microphone can be directly attached to the sound outlet (such as a loudspeaker) of the host device, or rotated by a multi-position angle to face the direction of the sound source to be shielded, so that the sound source played by the host device or the sound source in the shielding direction is collected as the reference audio data. The acquired two audio data are transmitted to the voice processing module 2.
The speech processing module 2 comprises a noise cancellation unit 201 and a beam forming unit 203.
The noise elimination unit 201 is configured to perform denoising processing on the sound source audio data according to the sound source audio data and the reference audio data, so that a result of speech recognition can be optimized, a more accurate speech recognition effect can be obtained, and interference of background noise in the prior art is overcome.
The beam forming unit 203 is configured to perform beam forming on the denoised sound source audio data, so as to implement filtering processing on the denoised sound source audio data, thereby obtaining clean first audio data that can be output to an external device.
The noise cancellation unit 201 mainly applies noise reduction technology of DSP (digital signal processing), and includes an analog-to-digital conversion component 2011, an echo cancellation component 2012, and a digital-to-analog conversion component 2013. The analog-to-digital conversion component 2011 is configured to perform analog-to-digital conversion on the sound source audio data and the reference audio data, and a circuit capable of performing analog-to-digital conversion is disposed in the unit, and generates a digital signal according to an analog-to-digital conversion manner in the prior art. The echo cancellation component 2012 is configured to perform subtraction operation according to the digital signal generated by the analog-to-digital conversion component to obtain a denoised sound source digital signal, that is, the digital signal corresponding to the sound source audio data is subtracted from the digital signal corresponding to the reference audio data to obtain the denoised digital signal, that is, the sound source digital signal. The digital-to-analog conversion component 2013 is configured to perform digital-to-analog conversion on the denoised sound source digital signal, and generate denoised sound source audio data. Audio data from which the reference sound data is removed can be obtained based on the mutual cooperation of the several components.
The filter forming unit 203 may be implemented with reference to the prior art, so a detailed description of its implementation is omitted.
According to the embodiment, the voice interaction capability of some host equipment without voice interaction function can be endowed, and the acquired voice instructions and contents of the user are subjected to front-end signal processing such as denoising and filtering, so that more optimized voice recognition results can be obtained. Meanwhile, the device provided by the embodiment of the invention can enable the external equipment to simply realize far-field pickup, the design of integrating a plurality of microphones is convenient for positioning the sound source so as to enhance the direction of the sound source, and the noise at other angles is weakened so as to ensure the quality of the audio. And the background sound sent by the host equipment is specially and pertinently close to the sound outlet of the host equipment, so that the sound source played by the host equipment or the sound source in the shielding direction can be collected as reference sound, echo cancellation is carried out, noise resistance processing is carried out on the interference of the sound source, and the function of optimizing and identifying the audio is realized.
In addition, the functions of front-end signal processing, wake-up and the like of the voice are integrated into the hardware chip, so that system resources of the host equipment are not occupied, and meanwhile, in the aspect of power consumption, voice algorithms can be optimized greatly on a special voice chip, so that the low-power consumption requirement is realized.
Fig. 2 is a schematic block diagram of a voice-activated device according to still another embodiment of the present invention. As shown in the figure 2 of the drawings,
the speech processing module 2 of the speech enabling apparatus further comprises a wake-up verification unit 202 and a second audio data generation unit 205.
The wake-up verification unit 202 is configured to perform wake-up recognition on the denoised sound source audio data, generate a wake-up control signal and a wake-up angle, and perform recognition according to semantics by analyzing the voice content of the denoised sound source audio or corresponding semantic interpretation, so as to obtain a wake-up word to be expressed by a user, where the implementation manner may refer to the prior art, and the wake-up angle is obtained by the inventor according to parameters added by semantic analysis, and the wake-up angle may be obtained by: at the sound collection place, there is a microphone collection array formed by a plurality of microphones, the data collected by the microphones are simultaneously supplied to the voice wake-up verification unit 202, the unit can confirm the sound source point according to the time delay condition and the capability distribution of the different microphones for receiving the audio by using the wake-up voice algorithm, and since each frame of audio has sound positioning, the sound positioning result can be obtained by confirming the sound source point during wake-up verification, and the sound positioning result can be output as a wake-up angle. Determining the delay profile and the capacity distribution of the audio using a speech algorithm can be achieved by prior art techniques.
Preferably, in the embodiment of the present invention, the voice processing module further includes a first audio data generating unit 204. Meanwhile, the beam forming unit 203 is configured to perform beam forming on the denoised sound source audio data, and generate three audio streams, i.e. three audio outputs of 16 k. The first audio data generating unit 204 is configured to process the three audio streams generated by the beam forming unit 203 to generate a first audio data output, and specifically, take which audio is taken as the first audio data output, and depend on a wake-up angle pointed by the sound source positioning result, where the wake-up angle pointed by the sound source positioning result is output together with the wake-up result during wake-up processing.
For the second audio data, which contains the wake-up control signal, it is directly transmitted to the second audio data generating unit 205, which is configured to process (digitally convert audio) the wake-up control signal generated by the wake-up verifying unit 202 to generate 48k of audio as well, i.e. the second audio data is output.
The two audio data, namely the first audio data and the second audio data, are transmitted to an application layer of the host device through the driving of the data transmission module 3, and the application layer splits the first audio data into three audio A, B, C by acquiring two paths of audio data and stores the three audio A, B, C into a circulation queue based on One show backtracking. And continuously monitoring the wake-up signal in the second audio data. When the wake-up signal is monitored, according to which audio of A, B, C the wake-up signal is obtained by the beam forming unit 203 as the recognition object, the corresponding recognition object is matched with the wake-up signal, and voice interaction is achieved.
According to the embodiment, the host equipment without the voice recognition function is endowed with the voice interaction function, the noise processing problem of voice recognition in the prior art is solved, and the voice recognition result is optimized. And the functions of front-end signal processing, wake-up and the like of the voice are integrated into the hardware chip, so that system resources of the host equipment are not occupied any more, and meanwhile, in the aspect of power consumption, voice algorithm can be greatly optimized on a special voice chip, so that the low-power consumption requirement is realized.
Fig. 3 schematically shows a flowchart of a method for implementing voice enablement using a voice enablement device according to an embodiment of the present invention, as shown in fig. 3, where this embodiment includes the following steps:
step S301: the voice enabling device is connected to the main equipment through the data transmission module. The voice enabling device can be connected with the main equipment through a USB protocol, a Bluetooth protocol, a WIFI protocol and the like, and supports various types of main equipment.
Step S302: the voice enabling device collects audio data and processes the audio data to generate first audio data and second audio data. The audio data collected by the voice enabling device comprises sound source audio data and reference audio data. The method is concretely realized as follows: and denoising the sound source audio data according to the sound source audio data and the reference audio data, wherein a denoising mode applies a denoising technology in the DSP. In order to facilitate the denoising calculation process, firstly, sound source audio data and reference audio data are respectively converted into digital signals, subtraction operation is carried out on the converted digital signals, and the digital signals obtained after the subtraction operation are converted into analog signals, so that the denoised sound source audio data are obtained. Thereby achieving the effect of optimizing the voice interaction.
And performing beam forming on the denoised sound source audio data to generate first audio data, and performing wake-up recognition on the denoised sound source audio data to generate second audio data. And when the denoised sound source audio data is subjected to beam forming, the audio selection is also performed according to the wake-up angle, specifically, as the sound source acquisition module 1 comprises a plurality of microphones, multiple paths of audio are generated after the sound source acquisition module is subjected to a beam forming algorithm, the multiple paths of audio respectively correspond to the enhancement audio of different angles, and specifically, which path of audio is adopted as the first audio data to be output, the wake-up angle pointed by the sound source positioning is depended on the wake-up angle pointed by the sound source positioning, and the wake-up angle pointed by the sound source positioning result is output together with the wake-up result when the wake-up processing is performed. A specific implementation may refer to the apparatus implementation principle of fig. 2.
Step S303: the voice enabling means outputs the first audio data and the second audio data to the host device. The data transmission manner may refer to step S301, and a specific implementation may be to establish multiple interfaces adapted to multiple types of host devices in the voice enabling apparatus.
According to the method, the host equipment without the voice recognition function is endowed with the voice interaction function, the noise processing problem of voice recognition in the prior art is solved, the voice recognition result is optimized, and the effects of reducing the power consumption of the host equipment, not occupying resources and the like are achieved.
Taking an external host device as an example of a television, the specific using method for realizing far-field pickup of the television by applying the voice enabling device of the invention to the television is as follows:
firstly, a user installs the voice enabling device on the top of the television, ensures that the microphone array of the main body part faces the habit direction of the user, and has no main barrier in the middle, so as to keep the horizontal angle as much as possible. Then, the USB cable of the voice enabling device is inserted into the connection position behind the television to keep power supply and signal transmission. And then the microphone array of the voice enabling device is fixed near the loudspeaker of the television by means of pasting and the like. Thereby completing the installation process of the voice enabling device.
In use, the voice-enabled device picks up the sound from the user via the microphone (i.e., the first sound source collection assembly 101 described above). And pick-up of spontaneous sounds of the television is accomplished by a microphone (i.e., the second sound source collection assembly 102 described above) affixed near the television's speaker. And comparing the two acquired groups of sounds through the voice enabling device, and completing filtering of the self-emitted sound to obtain the instruction sound actively emitted by the user. So as to complete further signal processing. Subsequent processing refers to the method section described above.
Therefore, by adopting the mode of externally transmitting the audio, the system debugging work required by the soft loop for transmitting the audio at the system layer can be avoided; the dependence of the hard loop on the terminals and the system adaptation work are also avoided. Meanwhile, the real interference of the self-sounding part of the equipment is better restored, and the problems caused by the fact that a power amplification system, a loudspeaker and the like are not synchronous with sound signals in a sound broadcasting link are avoided.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (8)

1. A voice-enabling device is characterized by comprising
The sound source acquisition module is used for acquiring audio data and outputting the audio data to the voice processing module, and comprises a first sound source acquisition assembly used for acquiring sound source audio data and a second sound source acquisition assembly used for acquiring reference audio data;
the voice processing module is used for processing the audio data to generate first audio data and second audio data, wherein the first audio data is a voice instruction sent by a user, and the second audio data is a wake-up control signal;
the data transmission module is used for realizing data interaction with external equipment and outputting first audio data and second audio data to the external equipment connected with the external equipment, wherein the external equipment is host equipment without a voice interaction function, so that the host equipment realizes the voice interaction function according to the first audio data and the second audio data;
wherein the voice processing module comprises
The noise elimination unit is used for carrying out denoising processing on the sound source audio data according to the sound source audio data and the reference audio data; and
the beam forming unit is used for carrying out beam forming on the denoised sound source audio data to generate three paths of 16k audio stream output;
the wake-up verification unit is used for carrying out wake-up recognition on the denoised sound source audio data, and generating a wake-up control signal and wake-up angle output;
the first audio data generating unit is used for processing the three paths of audio streams generated by the beam forming unit to generate first audio data output; and
the second audio data generating unit is used for carrying out digital-to-audio processing on the wake-up control signal generated by the wake-up verification unit to generate second audio data output;
the data transmission module transmits the first audio data and the second audio data to an application layer of the host device, so that the application layer of the host device can split the first audio data into three paths of audio data for storage, and continuously monitor a wake-up signal in the second audio data, and further, when the application layer of the host device monitors the wake-up signal, the application layer of the host device can determine the identification object from the three paths of audio data, and match the corresponding identification object with the wake-up signal, so that voice interaction is realized.
2. The apparatus of claim 1, wherein the noise canceling unit comprises
The analog-to-digital conversion component is used for carrying out analog-to-digital conversion on the sound source audio data and the reference audio data to generate a digital signal;
the echo cancellation component is used for carrying out subtraction operation according to the digital signal generated by the analog-to-digital conversion component to obtain a denoised sound source digital signal;
the digital-to-analog conversion component is used for carrying out digital-to-analog conversion on the denoised sound source digital signal to generate denoised sound source audio data.
3. The apparatus of claim 1, wherein the first sound source acquisition assembly and the second sound source acquisition assembly are each implemented as at least two movable microphones.
4. The apparatus of claim 3, wherein the data transmission module supports at least one of a USB protocol, a WIFI protocol, and a bluetooth protocol.
5. The method for realizing voice enablement by a voice enablement device according to claim 1, comprising the steps of:
connecting the voice enabling device to a main device through a data transmission module;
the voice enabling device collects audio data and processes the audio data to generate first audio data and second audio data;
the speech enabling means outputs first audio data and second audio data to the host device.
6. The method of claim 5, wherein the audio data collected by the speech enabling apparatus comprises source audio data and reference audio data, and wherein the processing of the audio data by the speech enabling apparatus comprises:
denoising the sound source audio data according to the sound source audio data and the reference audio data;
carrying out wave beam formation on the denoised sound source audio data to generate first audio data;
and carrying out wake-up recognition on the denoised sound source audio data to generate second audio data.
7. The method of claim 6, wherein the voice-enabled device capturing source audio data is implemented as
Setting a first sound source acquisition component of a voice enabling device towards the habit direction of a user, and completing the pickup of sound source audio through the first sound source acquisition component;
the voice enabling device collects the reference audio data to realize that
And fixing a second sound source acquisition component of the voice enabling device near a loudspeaker of the main equipment, and completing the pickup of the reference audio of the main equipment through the second sound source acquisition component.
8. The method of claim 7, wherein denoising the source audio data from the source audio data and the reference audio data comprises:
respectively converting sound source audio data and reference audio data into digital signals;
subtracting the converted digital signals;
and converting the digital signal obtained after subtraction operation into an analog signal to obtain denoised sound source audio data.
CN201811644724.4A 2018-12-29 2018-12-29 Voice enabling device and method Active CN109473111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811644724.4A CN109473111B (en) 2018-12-29 2018-12-29 Voice enabling device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811644724.4A CN109473111B (en) 2018-12-29 2018-12-29 Voice enabling device and method

Publications (2)

Publication Number Publication Date
CN109473111A CN109473111A (en) 2019-03-15
CN109473111B true CN109473111B (en) 2024-03-08

Family

ID=65678383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811644724.4A Active CN109473111B (en) 2018-12-29 2018-12-29 Voice enabling device and method

Country Status (1)

Country Link
CN (1) CN109473111B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265029A (en) * 2019-06-21 2019-09-20 百度在线网络技术(北京)有限公司 Speech chip and electronic equipment
CN110213696B (en) * 2019-06-30 2021-10-22 联想(北京)有限公司 Audio device, signal processing method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246687A (en) * 2008-03-20 2008-08-20 北京航空航天大学 Intelligent voice interaction system and method thereof
CN101753871A (en) * 2008-11-28 2010-06-23 康佳集团股份有限公司 Voice remote control TV system
CN202721771U (en) * 2012-04-24 2013-02-06 青岛海尔电子有限公司 Television system with audio recognition function
CN107566874A (en) * 2017-09-22 2018-01-09 百度在线网络技术(北京)有限公司 Far field speech control system based on television equipment
CN207603830U (en) * 2017-12-05 2018-07-10 炬芯(珠海)科技有限公司 A kind of household electrical appliance intelligent voice system
CN108364648A (en) * 2018-02-11 2018-08-03 北京百度网讯科技有限公司 Method and device for obtaining audio-frequency information
CN108447483A (en) * 2018-05-18 2018-08-24 深圳市亿道数码技术有限公司 Speech recognition system
CN108538305A (en) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer readable storage medium
CN208256287U (en) * 2017-09-29 2018-12-18 杭州聪普智能科技有限公司 Control device and smart home device based on speech recognition
CN209515191U (en) * 2018-12-29 2019-10-18 苏州思必驰信息科技有限公司 A kind of voice enabling apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120233765A1 (en) * 2007-07-31 2012-09-20 Mitchell Altman System and Method for Controlling the Environment of a Steambath

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246687A (en) * 2008-03-20 2008-08-20 北京航空航天大学 Intelligent voice interaction system and method thereof
CN101753871A (en) * 2008-11-28 2010-06-23 康佳集团股份有限公司 Voice remote control TV system
CN202721771U (en) * 2012-04-24 2013-02-06 青岛海尔电子有限公司 Television system with audio recognition function
CN107566874A (en) * 2017-09-22 2018-01-09 百度在线网络技术(北京)有限公司 Far field speech control system based on television equipment
CN208256287U (en) * 2017-09-29 2018-12-18 杭州聪普智能科技有限公司 Control device and smart home device based on speech recognition
CN207603830U (en) * 2017-12-05 2018-07-10 炬芯(珠海)科技有限公司 A kind of household electrical appliance intelligent voice system
CN108364648A (en) * 2018-02-11 2018-08-03 北京百度网讯科技有限公司 Method and device for obtaining audio-frequency information
CN108538305A (en) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer readable storage medium
CN108447483A (en) * 2018-05-18 2018-08-24 深圳市亿道数码技术有限公司 Speech recognition system
CN209515191U (en) * 2018-12-29 2019-10-18 苏州思必驰信息科技有限公司 A kind of voice enabling apparatus

Also Published As

Publication number Publication date
CN109473111A (en) 2019-03-15

Similar Documents

Publication Publication Date Title
CN107577449B (en) Wake-up voice pickup method, device, equipment and storage medium
US10318016B2 (en) Hands free device with directional interface
WO2020103703A1 (en) Audio data processing method and apparatus, device and storage medium
Okuno et al. Robot audition: Its rise and perspectives
US10297250B1 (en) Asynchronous transfer of audio data
US9076450B1 (en) Directed audio for speech recognition
JP2019191554A (en) Voice recognition method, apparatus, device and computer readable storage medium
JP2019159305A (en) Method, equipment, system, and storage medium for implementing far-field speech function
CN110288997A (en) Equipment awakening method and system for acoustics networking
US20190355354A1 (en) Method, apparatus and system for speech interaction
CN110675887B (en) Multi-microphone switching method and system for conference system
CN109920419B (en) Voice control method and device, electronic equipment and computer readable medium
US11627405B2 (en) Loudspeaker with transmitter
CN109473111B (en) Voice enabling device and method
CN106872945A (en) Sound localization method, device and electronic equipment
CN110992967A (en) Voice signal processing method and device, hearing aid and storage medium
Chatterjee et al. ClearBuds: wireless binaural earbuds for learning-based speech enhancement
TWI581255B (en) Front-end audio processing system
CN109697987B (en) External far-field voice interaction device and implementation method
CN111383629B (en) Voice processing method and device, electronic equipment and storage medium
US10747494B2 (en) Robot and speech interaction recognition rate improvement circuit and method thereof
CN110517682B (en) Voice recognition method, device, equipment and storage medium
CN112466305B (en) Voice control method and device of water dispenser
Novoa et al. Robustness over time-varying channels in DNN-hmm ASR based human-robot interaction.
CN110827845B (en) Recording method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

GR01 Patent grant
GR01 Patent grant