CN113129905A

CN113129905A - Distributed voice awakening system based on multiple microphone array nodes

Info

Publication number: CN113129905A
Application number: CN202110346067.0A
Authority: CN
Inventors: 廖奎华
Original assignee: Shenzhen Yuliang Technology Co ltd
Current assignee: Shenzhen Yuliang Technology Co ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-07-16
Anticipated expiration: 2041-03-31
Also published as: CN113129905B

Abstract

The invention discloses a distributed voice awakening system based on multiple microphone array nodes, and particularly relates to the field of voice awakening systems, wherein the distributed voice awakening system comprises a client, a resource management server and an identification server which are sequentially connected, the client is also connected with a microphone array for capturing awakening voice, the microphone array is formed by a plurality of microphone distributed arrays, and the identification server is also connected with a sound processing module for identifying and processing the awakening voice; the voice processing module includes a voice channel connected to an identification server. According to the invention, a certain number of microphone arrays are set to obtain the awakening keywords in the voice awakening system, and the system is awakened, so that the efficiency of the voice awakening system can be greatly improved, the awakening probability of the system can be ensured under different use environments, the phenomenon of awakening errors is reduced, and the practicability of the system is improved.

Description

Distributed voice awakening system based on multiple microphone array nodes

Technical Field

The invention relates to the field of voice awakening systems, in particular to a distributed voice awakening system based on multiple microphone array nodes.

Background

Speech recognition refers to converting a speech signal into a character string or recognizing language meaning contents by analyzing the speech signal and combining the analyzed speech signal with a database of patterns.

In the speech recognition technology, a speech recognition model analyzes input speech data, extracts features, and measures similarity to a previously collected speech model database to convert the most similar one into text or a command.

Speech recognition technology is a type of pattern recognition process. Because each person's voice, pronunciation, and intonation are different, conventional speech recognition techniques collect speech data from as many people as possible, extract common features from them, and generate reference patterns.

However, when the existing voice recognition system wakes up, the voice acquisition path is single, and an effective wake-up command cannot be acquired, so that the success rate of system wake-up is different. The device has great deviation under different use environments and low practicability.

Disclosure of Invention

In order to achieve the purpose, the invention provides the following technical scheme: a distributed voice awakening system based on multiple microphone array nodes comprises a client, a resource management server and an identification server which are sequentially connected, wherein the client is further connected with a microphone array used for capturing awakening voice, the microphone array is formed by a plurality of microphones in a distributed arrangement mode, and the identification server is further connected with a sound processing module used for identifying and processing the awakening voice;

the voice processing module comprises a voice channel connected with the recognition server, and the voice channel is connected with endpoint detection, feature extraction, an acoustic model, a voice model and recognition search.

In a preferred embodiment, the client sends a connection request to the resource management server, which finds a free one from all the identification servers and then sends an allocation request to the identification server.

In a preferred embodiment, the identification service area looks for a free connection in response to an allocation success message to the resource management service server, which responds to the client with information identifying the server.

In a preferred embodiment, the client establishes a connection with the identification service area and starts the identification operation.

In a preferred embodiment, the wake-up voice information collected by the microphone array is sent to the recognition server through the client and enters the sound processing module through the sound channel.

In a preferred embodiment, the endpoint detects and receives the awakening voice information, and deletes the noise, silence and starting segment at the endpoints at both ends in the voice frame of the voice information to generate an awakening voice frame segment;

the feature extraction is used for extracting a feature segment containing an awakening word from the awakening voice frame segment;

the acoustic model is specifically an awakening model generated by adopting a voice training model, provides a comparison sample for a feature segment obtained by feature extraction, and judges whether an awakening word is in a composite awakening standard or not;

the recognition search is used to retrieve features from the acoustic model.

The invention has the technical effects and advantages that:

through setting for a certain amount of microphone arrays, obtain the key word of awakening up in the pronunciation system of awakening up, awaken up the system, can promote the efficiency of pronunciation system of awakening up by a wide margin, also can guarantee the probability that the system awakened up under the service environment of difference, reduce the phenomenon that the mistake was awakened up to the appearance, promoted the practicality of system.

Drawings

FIG. 1 is a schematic diagram of the system framework of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The distributed voice wake-up system based on the multi-microphone array node as shown in fig. 1 comprises a client, a resource management server and an identification server which are sequentially connected, wherein the client is further connected with a microphone array for capturing wake-up voice, the microphone array is formed by a plurality of microphones in a distributed arrangement mode, and the identification server is further connected with a sound processing module for identifying and processing the wake-up voice;

the voice processing module comprises a voice channel connected with the recognition server, and the voice channel is connected with endpoint detection, feature extraction, an acoustic model, a voice model and recognition search;

the client sends a connection request to a resource management server, and the resource management server searches for an idle one from all the identification servers and then sends an allocation request to the identification server;

the identification service area searches for an idle connection and responds to a successful allocation message to the resource management service server, and the resource management server responds to the information of the identification server to the client;

the client establishes connection with the identification service area and starts identification operation;

awakening voice information collected by the microphone array is sent to the recognition server through the client side and enters the sound processing module through a sound channel;

the end point detection receives awakening voice information, and deletes noise, silence and initial sections at the end points at the two ends in a voice frame of the voice information to generate an awakening voice frame section;

the identification search is used for acquiring awakening model information similar to the characteristic section from the acoustic model, comparing the awakening model information with the voice model and awakening according to a result;

on the basis, the microphone arrays connected with the client side can be uniformly placed in an operation place, and the number of the arranged microphone arrays is linearly increased according to the size of the place, the number of people in the scene and the complexity of operation instructions required to be carried out;

setting the size of a scene as A, the number of people in the scene as alpha, the number of received operation instructions as beta, and the number of microphone arrays as B, wherein B is (alpha + beta) × (1+ C);

wherein, C is the number of identification servers in standby, and C +1 is adopted to prevent the situation that the number of standby servers is 0;

the client sends a connection request to the resource management server, the resource management server searches for a free link from all the identification servers, then sends an allocation request to the identification server, the identification service area searches for a free link, responds to an allocation success message to the resource management service server, the resource management server responds the information of the identification server to the client, the client establishes a link with the identification service area, and starts identification operation;

when the microphone array acquires voice information, the voice information is transmitted to the recognition server, the voice information is divided into voice frames in a frame-by-frame state, and noise frames, mute frames and initial sections at two ends of the voice frames are deleted by endpoint detection to generate awakening voice frame sections capable of processing recognition;

and reading the awakening words in the awakening voice frame section through feature extraction, and reading the awakening words by using the acoustic model and the voice model to complete the awakening of the system.

The points to be finally explained are: first, in the description of the present application, it should be noted that, unless otherwise specified and limited, the terms "mounted," "connected," and "connected" should be understood broadly, and may be a mechanical connection or an electrical connection, or a communication between two elements, and may be a direct connection, and "upper," "lower," "left," and "right" are only used to indicate a relative positional relationship, and when the absolute position of the object to be described is changed, the relative positional relationship may be changed;

secondly, the method comprises the following steps: in the drawings of the disclosed embodiments of the invention, only the structures related to the disclosed embodiments are referred to, other structures can refer to common designs, and the same embodiment and different embodiments of the invention can be combined with each other without conflict;

and finally: the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit and principle of the present invention are intended to be included in the scope of the present invention.

Claims

1. A distributed voice awakening system based on multi-microphone array nodes is characterized by comprising a client, a resource management server and an identification server which are sequentially connected, wherein the client is also connected with a microphone array for capturing awakening voice, the microphone array is formed by a plurality of microphones in distributed arrangement, and the identification server is also connected with a sound processing module for identifying and processing the awakening voice;

2. The distributed wake-on-speech system based on multiple microphone array nodes of claim 1, wherein: the client sends a connection request to the resource management server, and the resource management server searches for a free one from all the identification servers and then sends an allocation request to the identification server.

3. The distributed wake-on-speech system based on multiple microphone array nodes of claim 2, wherein: the identification service area searches for an idle connection and responds to a successful allocation message to the resource management service server, and the resource management server responds to the client side with the information of the identification server.

4. The distributed wake-on-speech system based on multiple microphone array nodes of claim 3, wherein: and the client establishes connection with the identification service area and starts identification operation.

5. The distributed wake-on-speech system based on multiple microphone array nodes of claim 4, wherein: the awakening voice information collected by the microphone array is sent to the recognition server through the client side and enters the sound processing module through the sound channel.

6. The distributed wake-on-speech system based on multiple microphone array nodes of claim 5, wherein: the end point detection receives awakening voice information, and deletes noise, silence and initial sections at the end points at the two ends in a voice frame of the voice information to generate an awakening voice frame section;

the identification search is used for acquiring awakening model information similar to the characteristic segment from the acoustic model, comparing the awakening model information with the voice model, and awakening according to a result.