CN211957118U

CN211957118U - Control system for voice acquisition and recognition

Info

Publication number: CN211957118U
Application number: CN201922385345.4U
Authority: CN
Inventors: 娄燕忠; 段晓亮
Original assignee: Shanghai Fengqi Intelligent Technology Co ltd
Current assignee: Shanghai Fengqi Intelligent Technology Co ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-11-17
Anticipated expiration: 2029-12-26

Abstract

The utility model discloses a control system of pronunciation collection and discernment aims at solving the problem that current speech recognition system energy consumption is high, ambient noise influences greatly, the voice collection effect is poor. A control system for voice collection and recognition comprises a low-power consumption MCU; the main MIC chip is responsible for voice detection and the acquisition of human voice after the system is awakened; the secondary MIC chip is responsible for collecting environmental noise and transmitting the environmental noise to the high-performance processor; and the high-performance processor is used for processing the data collected by the main MIC chip and the auxiliary MIC chip. The utility model discloses can reduce speech recognition by a wide margin and give the consumption burden that the system brought, in addition, the utility model discloses still can realize that the initiative of system is fallen and is fallen the noise, improve the speech recognition success rate, compare in prior art, the utility model discloses have apparent progress.

Description

Control system for voice acquisition and recognition

Technical Field

The utility model belongs to the technical field of speech acquisition, discernment etc., specific saying indicates a speech acquisition of low-power consumption and speech recognition's control system.

Background

The voice recognition technology is widely applied to various fields of household appliances, communication, automotive electronics, medical treatment, home services, consumer electronics and the like. Speech recognition can be divided into semantic recognition and command recognition depending on the application scenario. The semantic recognition generally adopts local recognition keywords to wake up the whole system, then processes the collected voice data, and uploads the voice data to the cloud for semantic recognition. Under the technical architecture mode, the resource requirement on the main control chip is high, the keyword awakening algorithm runs in the main control chip, the keyword detection part is always in a working state, and the standby power consumption of the whole system is very high. In addition, when the environmental noise is large, the system will be repeatedly awakened, which also causes further increase of power consumption. The instruction recognition is generally an off-line scene, such as a voice control module of an intelligent home, and the scene is divided into a primary instruction entry and a secondary instruction entry. In the instruction recognition mode, the resource requirement on the voice recognition chip is relatively low, but the detection of the primary instruction entry also has the problem of power consumption in a semantic scene.

The voice recognition control system needs to quantize the voice analog signal collected by the analog microphone by using an analog-to-digital converter (ADC). The accuracy of the ADC directly determines the effect of voice acquisition, however, the high accuracy ADC also consumes more power. Collected sound data are subjected to noise reduction, echo cancellation and other processing, voice comparison and recognition are carried out, and the operation needs to occupy larger chip internal resources, so that the power consumption of a chip and a voice recognition system is increased.

For those skilled in the art, ambient noise elimination and human voice extraction are very important technical indicators in speech recognition technology. In the prior art, environmental noise and human voice are generally distinguished through a software algorithm (a neural network related algorithm can also be used for modeling and training the environmental voice), however, the performance requirement of a processor is high in this way, and the overall energy consumption and the cost of the system are high. How to balance the cost and performance of a speech recognition product is a very important topic in the technical field.

SUMMERY OF THE UTILITY MODEL

An object of the utility model is to overcome above-mentioned problem, provide a low-power consumption and possess the speech acquisition and the speech recognition's of the function of making an uproar control system of falling voluntarily.

The purpose of the utility model is realized through the following technical scheme:

a control system for voice acquisition and voice recognition comprises a low-power consumption MCU;

the main MIC chip is responsible for voice detection and the acquisition of human voice after the system is awakened;

the secondary MIC chip is responsible for collecting environmental noise and transmitting the environmental noise to the high-performance processor;

the high-performance processor is used for processing the data collected by the main MIC and the auxiliary MIC;

the first sampling switch is matched with the main MIC to be responsible for acquiring human voice and transmitting the human voice to the low-power-consumption MCU;

the second sampling switch is matched with the main MIC to be responsible for acquiring human voice and transmitting the human voice to the high-performance processor;

and the third sampling switch is matched with the secondary MIC and is responsible for collecting environmental noise.

Preferably, the main MIC chip and the low-power consumption MCU are packaged by SIP to form an MIC chip with VAD function and noise reduction function. Through the arrangement, the high-performance processor relies on the technical architecture, and can realize the functions of active noise reduction, low-power consumption standby function, voice detection and awakening system and the like at lower cost.

Further, the low power consumption MCU pin 12 is connected with an interrupt signal IRQ and sent to the high performance processor. The high-performance processor is in a sleep mode most of the time, and when a human voice is detected, the low-power consumption MCU of the MIC chip end wakes up the high-performance processor through interruption.

Furthermore, the low-power consumption MCU can simultaneously acquire sound analog signals of the main MIC and the auxiliary MIC, and converts the sound analog signals into digital signals which are respectively used for human voice detection and environmental noise acquisition.

Furthermore, the low-power consumption MCU pin 4 is externally connected with RST. Through the arrangement, the system can be ensured to be normally reset and started.

Further, the low-power consumption MCU pin 1 is connected with an LED. The LED is selected to judge whether the system detects the voice or not, and the most intuitive display is provided for users and testers.

Preferably, the low-power consumption MCU selects a CX32L003 chip.

Preferably, the primary MIC chip and the secondary MIC chip both use SC7CT27180 chip as MIC chip, which is an analog MIC chip.

Furthermore, the high-performance processor supports a voice recognition function and other high-performance processing functions, and separates voice sound detection and high-performance data processing, so that the power consumption of the system is greatly reduced.

The utility model also provides an implementation method of above-mentioned speech acquisition and speech recognition's control system, including following step:

(1) low-power consumption voice detection: the low-power consumption MCU detects whether the environmental sound has voice by a mode of entering and exiting the low-power consumption mode at regular time, at the moment, the main MIC works, the auxiliary MIC is closed, the high-performance processor is in the low-power consumption mode, and if the voice is detected, the next step is executed;

(2) the low-power consumption MCU generates an interrupt signal to wake up the high-performance processor, and simultaneously, the secondary MIC is opened to collect environmental noise;

(3) active noise reduction: the high-performance processor collects the human voice data of the main MIC and the environmental noise data of the auxiliary MIC to finish noise reduction of the environmental noise;

(4) comparing whether the voice command passes through or not by the high-performance processor, if so, executing the step (5), and if not, executing the step (6);

(5) high performance mode of operation: the high-performance processor opens the high-precision ADC and configures system resources, the system finishes voice recognition processing, and then step (6) is executed;

(6) and the high-performance processor enters a low-power-consumption mode and releases the main MIC, the auxiliary MIC is closed, and the system reenters the low-power-consumption voice detection.

The design principle of the utility model is as follows: the voice recognition system is divided into a low-power-consumption voice detection system and a high-performance processing system, and the voice detection system controls the on and off of the whole voice recognition system. The voice detection part is in a normally open state, and the high-performance processing circuit part is selectively opened according to a voice detection result, so that the requirements of reducing the energy consumption of system work and hardware resources are met. On this basis, the utility model discloses a structure of two way MICs, main MIC all the way, vice MIC all the way. The two MICs are sampled through an ADC (analog to digital converter), and the acquired sound data are digitized and quantized, wherein the main MIC is responsible for VAD (voice over VAD) voice detection and the acquisition of human voice after the system is awakened; the auxiliary MIC is responsible for collecting environmental noise, and transmits the environmental noise to the high-performance processor through data transmission interfaces such as UART or SPI, and the high-performance processor processes the data collected by the main MIC and the auxiliary MIC to realize the active noise reduction function of the system.

Compared with the prior art, the utility model, following advantage and beneficial effect have:

(1) the utility model discloses change traditional speech recognition mode into the processing part of the pronunciation of low-power consumption detection part and high performance, under the conventionality, the system is in the low-power consumption operation stage, according to the high performance operational mode that the pronunciation was listened the condition selectivity and is got into the system, accomplishes the processing back system and resets, can reduce speech recognition for the system power consumption burden that comes by a wide margin from this.

(2) The low-power consumption MCU used in the utility model has low performance requirement, low extra added cost and wide selectable range; meanwhile, due to the existence of the low-power consumption MCU, the low-power consumption design requirement on the high-performance processor is low, and the selectable range is wide.

(3) The utility model discloses a two way MIC's technical structure, main MIC are responsible for VAD voice and listen, and vice MIC is responsible for collecting ambient noise, cooperates the processing of high performance treater to data simultaneously, realizes the initiative of system and falls the function of making an uproar, improves speech recognition's success rate.

(4) The utility model discloses well main MIC adopts the SIP encapsulation with low-power consumption MCU, and the MIC chip that uses on can the direct replacement present market based on the MIC chip of this technique makes in the market all kinds of needs of selling use the completion technical upgrade that speech recognition electronic product can be simple and convenient.

(5) The utility model discloses under the applied conversation application scene such as TWS earphone, intelligent audio amplifier, can improve speech quality, solved present TWS earphone, intelligent audio amplifier voice pickup in-process effectively, ambient noise is too big, leads to speech recognition and voice to pick up the poor problem of effect.

Drawings

Fig. 1 is a schematic view of the present invention.

Fig. 2 is a system block diagram of the present invention.

Fig. 3 is a schematic circuit diagram of the present invention.

Fig. 4 is a flowchart of the present invention.

Fig. 5 is a partially enlarged view one of fig. 3.

Fig. 6 is a second enlarged view of a portion of fig. 3.

Fig. 7 is a third enlarged view of the portion of fig. 3.

Fig. 8 is a fourth partial enlarged view of fig. 3.

Detailed Description

Examples

As shown in fig. 1 to 8, the present embodiment provides a control system for voice collection and recognition, which is divided into a low power consumption processing part and a high performance processing part, wherein the low power consumption processing part includes a low power consumption processor voice detection system and is in a normally open state, in the normally open state, the low power consumption processor voice detection system collects human voice in a cyclic manner, and the high performance processing part is selectively opened according to a voice detection result. Specifically, the control system for voice acquisition and recognition comprises a low-power consumption MCU, a main MIC chip, an auxiliary MIC chip, a high-performance processor and three adopted switches (a first adopted switch, a second adopted switch and a third adopted switch).

A high-precision neural network-based voice detection algorithm is embedded in the low-power-consumption MCU and used for detecting whether human voice exists or not, and a person skilled in the art can select the distance (for example, the distance from tens of centimeters to several meters) for detecting the human voice by adjusting the sensitivity of the algorithm. And the ADC of the low-power-consumption processor is used for collecting the sound analog signal sent by the main MIC and judging whether the ambient sound data has human voice.

The voice detection algorithm occupies less system resources, and has lower requirements on the precision of the ADC and the performance of the processor, so that the embodiment can select the MCU with low power consumption, and has the advantages that: the technical architecture has low upgrading difficulty and low upgrading cost. The utility model discloses well low-power consumption MCU's ADC sampling time shortens to several milliseconds, takes the mode of cyclic sampling: and an interrupt awakening mode of an internal timer is selected to awaken the MCU to enter the ADC for sampling, so that the low-power consumption MCU repeatedly enters a low-power consumption mode, and the power consumption of the system is further reduced.

The high-performance processor is mainly responsible for processing data acquired by the main MIC and the auxiliary MIC, works after the low-power-consumption MCU detects the voice, the system enters a high-performance processing mode, the high-precision ADC is opened, higher system resources are configured, and the acquired voice data are transmitted to the low-power-consumption MCU to be identified and compared with voice instructions. And when the low-power consumption MCU does not detect the voice or completes the recognition and comparison of the voice command, the system enters the low-power consumption mode again. In the prior art, in order to complete voice recognition, the processor is required to be used for an ADC with high precision, and meanwhile, if the functions of a multi-wake-up instruction and a multi-secondary instruction are to be realized, the processor has high requirements on a RAM, a FLASH and a main frequency of a high-performance processor, and the operation power consumption is inevitably very high. In this embodiment, on the premise of the low power consumption MCU, the low power consumption design requirement of the high performance processor is low, and the selectable range is wide. Those skilled in the art can suitably select the matching specification according to the application scenario requirement of the high-performance processor: (1) if only the requirements of voice recognition and control processing are made, the requirement on ADC precision is generally high, at least SAR ADCs with more than 16 bits are integrated with a noise reduction algorithm, an echo cancellation algorithm and a voice recognition algorithm, the requirements on main frequencies of an RAM, a FLASH and a chip are high, an inner core possibly needs more than ARM M4, and the inner core needs to support a floating instruction and a DSP instruction; (2) if matched data processing functions (such as a voice playing function and a bluetooth connection function, and typical application scenarios are TWS headphones and bluetooth speakers) are to be matched in addition to voice recognition, then functions such as a radio frequency wireless connection function (such as bluetooth) and a high-performance DAC (such as a luoda company chip 1536U) are also required to be matched.

In the prior art, environmental noise has a high influence on the success rate of voice recognition, and the present embodiment adopts a two-way MIC technical structure, where a main MIC is responsible for VAD voice detection, a secondary MIC is responsible for collecting environmental noise, and a high-performance processor is matched to process two-way MIC data, so as to realize the active noise reduction function of the system. The main MIC is combined with the first sampling switch to be responsible for collecting human voice and transmitting the human voice to the low-power-consumption MCU; the main MIC is combined with a second sampling switch to be responsible for collecting human voice and transmitting the human voice to a high-performance processor; the secondary MIC is combined with a third sampling switch to collect environmental noise and is transmitted to a high-performance processor through a data transmission interface such as a UART (universal asynchronous receiver/transmitter) or an SPI (serial peripheral interface). In the embodiment, the main MIC chip and the low-power consumption MCU adopt SIP packaging to form an MIC chip with VAD function and noise reduction function; the MIC chip used by both the primary MIC chip and the secondary MIC chip is preferably SC7CT 27180. The high-performance processor is based on the technical architecture, and can realize the functions of noise reduction, low-power consumption standby function, voice detection and awakening system and the like with lower cost. Preferably, two paths of MOS devices are selected as power switches of the MIC chip to control the on-off of the MIC chip, so that the standby power consumption of the whole system is reduced.

In this embodiment, the low power consumption MCU is a CX32L003 chip, and the pin connection relationship is as follows: two paths of UARTs are selected as interfaces for transmitting data with a master control, and pins 2, 3, 5 and 6 are TX and RX signals of UART0 and UART1 respectively; pins 19 and 20 are respectively used for connecting MIC1_ EN and MIC2_ EN and controlling the power on and off of MIC1 and MIC 2; MIC1 is used as a main MIC, and an output signal MIC1_ IN is respectively connected with a pin 17 of the low-power consumption MCU and used for human voice interception and is externally sent to a high-performance main control (connected with a header pin 5); MIC2 is used as a secondary MIC, an output signal MIC2_ IN is connected with a low-power consumption MCU pin 14, and acquired data are sent to a high-performance processor through UART0 or UART 1; the low-power consumption MCU pin 12 is connected with an interrupt signal IRQ and externally sent to the high-performance master control for awakening the high-performance master control; the low-power-consumption MCU pin 4 is externally connected with RST, so that the system can be normally reset and started; the low-power consumption MCU foot 1 is externally connected with an LED, and the LED is selected to judge whether the system detects human voice or not, so that the most visual display is provided for users and testers.

The working method of the control system for voice acquisition and recognition in the embodiment is as follows: the method comprises the steps that in an initial state, the whole system is in a low-power-consumption mode, a low-power-consumption MCU detects whether ambient sound has voice or not in a mode of entering and exiting the low-power-consumption mode regularly, at the moment, a main MIC works, an auxiliary MIC is closed, a high-performance processor is in the low-power-consumption mode, a first sampling switch is opened, data collected by the main MIC are transmitted to the low-power-consumption MCU, and a second sampling switch and a third sampling switch are closed; when a human voice is detected, the low-power-consumption MCU generates an interrupt signal to wake up the high-performance processor, the low-power-consumption MCU closes the first sampling switch, opens the auxiliary MIC, opens the second sampling switch to release the main MIC to the high-performance processor, the high-performance processor collects data of the main MIC, the low-power-consumption MCU opens the third sampling switch, the auxiliary MIC collects environmental noise and transmits the environmental noise to the low-power-consumption MCU, the environmental noise is transmitted to the high-performance processor through data transmission interfaces such as UART (universal asynchronous receiver/transmitter) and SPI (serial peripheral interface), the high-performance processor receives the collected data of the main MIC and the auxiliary MIC and then carries out; the high-performance processor compares whether the voice instruction passes through or not, if so, the system enters a high-performance running mode, a low-power consumption MCU is informed through data transmission interfaces such as a UART (universal asynchronous receiver/transmitter) and an SPI (serial peripheral interface) after an event is processed, then the high-performance master control automatically enters a low-power consumption mode and releases a main MIC (many integrated core), a first sampling switch is opened, a second sampling switch and a third sampling switch are closed, and an auxiliary MIC is closed; if the collected sound is a non-voice instruction, the high-performance processor informs the low-power consumption MCU through data transmission interfaces such as UART (universal asynchronous receiver/transmitter), SPI (serial peripheral interface) and the like, then the high-performance processor automatically enters a low-power consumption mode and releases the main MIC, the first sampling switch is turned on, the second sampling switch and the third sampling switch are turned off, and the auxiliary MIC is turned off; the system re-enters the initial state, i.e. low power voice detection.

As described above, the utility model discloses alright fine realization. The above description is only a preferred embodiment of the present invention, and should not be taken as limiting the invention, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A control system for voice capture and recognition, comprising:

a low power consumption MCU;

2. The control system for speech acquisition and recognition according to claim 1, wherein: and the main MIC chip and the low-power consumption MCU adopt SIP packaging to form an MIC chip with VAD function and noise reduction function.

3. The control system for speech acquisition and recognition according to claim 2, wherein: the low-power consumption MCU pin 12 is connected with an interrupt signal IRQ and sent to a high-performance processor.

4. The control system for speech acquisition and recognition according to claim 3, wherein: CX32L003 chip is selected for use to low-power consumption MCU.

5. The voice acquisition and recognition control system of claim 4, wherein the primary and secondary MIC chips are SC7CT27180 chips, which are analog MIC chips.