CN113129905A - Distributed voice awakening system based on multiple microphone array nodes - Google Patents

Distributed voice awakening system based on multiple microphone array nodes Download PDF

Info

Publication number
CN113129905A
CN113129905A CN202110346067.0A CN202110346067A CN113129905A CN 113129905 A CN113129905 A CN 113129905A CN 202110346067 A CN202110346067 A CN 202110346067A CN 113129905 A CN113129905 A CN 113129905A
Authority
CN
China
Prior art keywords
awakening
voice
microphone array
identification
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110346067.0A
Other languages
Chinese (zh)
Other versions
CN113129905B (en
Inventor
廖奎华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yuliang Technology Co ltd
Original Assignee
Shenzhen Yuliang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yuliang Technology Co ltd filed Critical Shenzhen Yuliang Technology Co ltd
Priority to CN202110346067.0A priority Critical patent/CN113129905B/en
Publication of CN113129905A publication Critical patent/CN113129905A/en
Application granted granted Critical
Publication of CN113129905B publication Critical patent/CN113129905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces

Abstract

The invention discloses a distributed voice awakening system based on multiple microphone array nodes, and particularly relates to the field of voice awakening systems, wherein the distributed voice awakening system comprises a client, a resource management server and an identification server which are sequentially connected, the client is also connected with a microphone array for capturing awakening voice, the microphone array is formed by a plurality of microphone distributed arrays, and the identification server is also connected with a sound processing module for identifying and processing the awakening voice; the voice processing module includes a voice channel connected to an identification server. According to the invention, a certain number of microphone arrays are set to obtain the awakening keywords in the voice awakening system, and the system is awakened, so that the efficiency of the voice awakening system can be greatly improved, the awakening probability of the system can be ensured under different use environments, the phenomenon of awakening errors is reduced, and the practicability of the system is improved.

Description

Distributed voice awakening system based on multiple microphone array nodes
Technical Field
The invention relates to the field of voice awakening systems, in particular to a distributed voice awakening system based on multiple microphone array nodes.
Background
Speech recognition refers to converting a speech signal into a character string or recognizing language meaning contents by analyzing the speech signal and combining the analyzed speech signal with a database of patterns.
In the speech recognition technology, a speech recognition model analyzes input speech data, extracts features, and measures similarity to a previously collected speech model database to convert the most similar one into text or a command.
Speech recognition technology is a type of pattern recognition process. Because each person's voice, pronunciation, and intonation are different, conventional speech recognition techniques collect speech data from as many people as possible, extract common features from them, and generate reference patterns.
However, when the existing voice recognition system wakes up, the voice acquisition path is single, and an effective wake-up command cannot be acquired, so that the success rate of system wake-up is different. The device has great deviation under different use environments and low practicability.
Disclosure of Invention
In order to achieve the purpose, the invention provides the following technical scheme: a distributed voice awakening system based on multiple microphone array nodes comprises a client, a resource management server and an identification server which are sequentially connected, wherein the client is further connected with a microphone array used for capturing awakening voice, the microphone array is formed by a plurality of microphones in a distributed arrangement mode, and the identification server is further connected with a sound processing module used for identifying and processing the awakening voice;
the voice processing module comprises a voice channel connected with the recognition server, and the voice channel is connected with endpoint detection, feature extraction, an acoustic model, a voice model and recognition search.
In a preferred embodiment, the client sends a connection request to the resource management server, which finds a free one from all the identification servers and then sends an allocation request to the identification server.
In a preferred embodiment, the identification service area looks for a free connection in response to an allocation success message to the resource management service server, which responds to the client with information identifying the server.
In a preferred embodiment, the client establishes a connection with the identification service area and starts the identification operation.
In a preferred embodiment, the wake-up voice information collected by the microphone array is sent to the recognition server through the client and enters the sound processing module through the sound channel.
In a preferred embodiment, the endpoint detects and receives the awakening voice information, and deletes the noise, silence and starting segment at the endpoints at both ends in the voice frame of the voice information to generate an awakening voice frame segment;
the feature extraction is used for extracting a feature segment containing an awakening word from the awakening voice frame segment;
the acoustic model is specifically an awakening model generated by adopting a voice training model, provides a comparison sample for a feature segment obtained by feature extraction, and judges whether an awakening word is in a composite awakening standard or not;
the recognition search is used to retrieve features from the acoustic model.
The invention has the technical effects and advantages that:
through setting for a certain amount of microphone arrays, obtain the key word of awakening up in the pronunciation system of awakening up, awaken up the system, can promote the efficiency of pronunciation system of awakening up by a wide margin, also can guarantee the probability that the system awakened up under the service environment of difference, reduce the phenomenon that the mistake was awakened up to the appearance, promoted the practicality of system.
Drawings
FIG. 1 is a schematic diagram of the system framework of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The distributed voice wake-up system based on the multi-microphone array node as shown in fig. 1 comprises a client, a resource management server and an identification server which are sequentially connected, wherein the client is further connected with a microphone array for capturing wake-up voice, the microphone array is formed by a plurality of microphones in a distributed arrangement mode, and the identification server is further connected with a sound processing module for identifying and processing the wake-up voice;
the voice processing module comprises a voice channel connected with the recognition server, and the voice channel is connected with endpoint detection, feature extraction, an acoustic model, a voice model and recognition search;
the client sends a connection request to a resource management server, and the resource management server searches for an idle one from all the identification servers and then sends an allocation request to the identification server;
the identification service area searches for an idle connection and responds to a successful allocation message to the resource management service server, and the resource management server responds to the information of the identification server to the client;
the client establishes connection with the identification service area and starts identification operation;
awakening voice information collected by the microphone array is sent to the recognition server through the client side and enters the sound processing module through a sound channel;
the end point detection receives awakening voice information, and deletes noise, silence and initial sections at the end points at the two ends in a voice frame of the voice information to generate an awakening voice frame section;
the feature extraction is used for extracting a feature segment containing an awakening word from the awakening voice frame segment;
the acoustic model is specifically an awakening model generated by adopting a voice training model, provides a comparison sample for a feature segment obtained by feature extraction, and judges whether an awakening word is in a composite awakening standard or not;
the identification search is used for acquiring awakening model information similar to the characteristic section from the acoustic model, comparing the awakening model information with the voice model and awakening according to a result;
on the basis, the microphone arrays connected with the client side can be uniformly placed in an operation place, and the number of the arranged microphone arrays is linearly increased according to the size of the place, the number of people in the scene and the complexity of operation instructions required to be carried out;
setting the size of a scene as A, the number of people in the scene as alpha, the number of received operation instructions as beta, and the number of microphone arrays as B, wherein B is (alpha + beta) × (1+ C);
wherein, C is the number of identification servers in standby, and C +1 is adopted to prevent the situation that the number of standby servers is 0;
the client sends a connection request to the resource management server, the resource management server searches for a free link from all the identification servers, then sends an allocation request to the identification server, the identification service area searches for a free link, responds to an allocation success message to the resource management service server, the resource management server responds the information of the identification server to the client, the client establishes a link with the identification service area, and starts identification operation;
when the microphone array acquires voice information, the voice information is transmitted to the recognition server, the voice information is divided into voice frames in a frame-by-frame state, and noise frames, mute frames and initial sections at two ends of the voice frames are deleted by endpoint detection to generate awakening voice frame sections capable of processing recognition;
and reading the awakening words in the awakening voice frame section through feature extraction, and reading the awakening words by using the acoustic model and the voice model to complete the awakening of the system.
The points to be finally explained are: first, in the description of the present application, it should be noted that, unless otherwise specified and limited, the terms "mounted," "connected," and "connected" should be understood broadly, and may be a mechanical connection or an electrical connection, or a communication between two elements, and may be a direct connection, and "upper," "lower," "left," and "right" are only used to indicate a relative positional relationship, and when the absolute position of the object to be described is changed, the relative positional relationship may be changed;
secondly, the method comprises the following steps: in the drawings of the disclosed embodiments of the invention, only the structures related to the disclosed embodiments are referred to, other structures can refer to common designs, and the same embodiment and different embodiments of the invention can be combined with each other without conflict;
and finally: the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit and principle of the present invention are intended to be included in the scope of the present invention.

Claims (6)

1. A distributed voice awakening system based on multi-microphone array nodes is characterized by comprising a client, a resource management server and an identification server which are sequentially connected, wherein the client is also connected with a microphone array for capturing awakening voice, the microphone array is formed by a plurality of microphones in distributed arrangement, and the identification server is also connected with a sound processing module for identifying and processing the awakening voice;
the voice processing module comprises a voice channel connected with the recognition server, and the voice channel is connected with endpoint detection, feature extraction, an acoustic model, a voice model and recognition search.
2. The distributed wake-on-speech system based on multiple microphone array nodes of claim 1, wherein: the client sends a connection request to the resource management server, and the resource management server searches for a free one from all the identification servers and then sends an allocation request to the identification server.
3. The distributed wake-on-speech system based on multiple microphone array nodes of claim 2, wherein: the identification service area searches for an idle connection and responds to a successful allocation message to the resource management service server, and the resource management server responds to the client side with the information of the identification server.
4. The distributed wake-on-speech system based on multiple microphone array nodes of claim 3, wherein: and the client establishes connection with the identification service area and starts identification operation.
5. The distributed wake-on-speech system based on multiple microphone array nodes of claim 4, wherein: the awakening voice information collected by the microphone array is sent to the recognition server through the client side and enters the sound processing module through the sound channel.
6. The distributed wake-on-speech system based on multiple microphone array nodes of claim 5, wherein: the end point detection receives awakening voice information, and deletes noise, silence and initial sections at the end points at the two ends in a voice frame of the voice information to generate an awakening voice frame section;
the feature extraction is used for extracting a feature segment containing an awakening word from the awakening voice frame segment;
the acoustic model is specifically an awakening model generated by adopting a voice training model, provides a comparison sample for a feature segment obtained by feature extraction, and judges whether an awakening word is in a composite awakening standard or not;
the identification search is used for acquiring awakening model information similar to the characteristic segment from the acoustic model, comparing the awakening model information with the voice model, and awakening according to a result.
CN202110346067.0A 2021-03-31 2021-03-31 Distributed voice awakening system based on multiple microphone array nodes Active CN113129905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110346067.0A CN113129905B (en) 2021-03-31 2021-03-31 Distributed voice awakening system based on multiple microphone array nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110346067.0A CN113129905B (en) 2021-03-31 2021-03-31 Distributed voice awakening system based on multiple microphone array nodes

Publications (2)

Publication Number Publication Date
CN113129905A true CN113129905A (en) 2021-07-16
CN113129905B CN113129905B (en) 2022-10-04

Family

ID=76774342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110346067.0A Active CN113129905B (en) 2021-03-31 2021-03-31 Distributed voice awakening system based on multiple microphone array nodes

Country Status (1)

Country Link
CN (1) CN113129905B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020156900A1 (en) * 2001-03-30 2002-10-24 Brian Marquette Protocol independent control module
US6505161B1 (en) * 2000-05-01 2003-01-07 Sprint Communications Company L.P. Speech recognition that adjusts automatically to input devices
US20130029684A1 (en) * 2011-07-28 2013-01-31 Hiroshi Kawaguchi Sensor network system for acuiring high quality speech signals and communication method therefor
CN103824560A (en) * 2014-03-18 2014-05-28 上海言海网络信息技术有限公司 Chinese speech recognition system
CN108922536A (en) * 2018-06-28 2018-11-30 深圳市沃特沃德股份有限公司 The method and system of voice wake-up processor work
CN110610711A (en) * 2019-10-12 2019-12-24 深圳市华创技术有限公司 Full-house intelligent voice interaction method and system of distributed Internet of things equipment
CN110767220A (en) * 2019-10-16 2020-02-07 腾讯科技(深圳)有限公司 Interaction method, device, equipment and storage medium of intelligent voice assistant

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6505161B1 (en) * 2000-05-01 2003-01-07 Sprint Communications Company L.P. Speech recognition that adjusts automatically to input devices
US20020156900A1 (en) * 2001-03-30 2002-10-24 Brian Marquette Protocol independent control module
US20130029684A1 (en) * 2011-07-28 2013-01-31 Hiroshi Kawaguchi Sensor network system for acuiring high quality speech signals and communication method therefor
CN103824560A (en) * 2014-03-18 2014-05-28 上海言海网络信息技术有限公司 Chinese speech recognition system
CN108922536A (en) * 2018-06-28 2018-11-30 深圳市沃特沃德股份有限公司 The method and system of voice wake-up processor work
CN110610711A (en) * 2019-10-12 2019-12-24 深圳市华创技术有限公司 Full-house intelligent voice interaction method and system of distributed Internet of things equipment
CN110767220A (en) * 2019-10-16 2020-02-07 腾讯科技(深圳)有限公司 Interaction method, device, equipment and storage medium of intelligent voice assistant

Also Published As

Publication number Publication date
CN113129905B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
US9230547B2 (en) Metadata extraction of non-transcribed video and audio streams
CN110047481B (en) Method and apparatus for speech recognition
WO2020211354A1 (en) Speaker identity recognition method and device based on speech content, and storage medium
CN112037791B (en) Conference summary transcription method, apparatus and storage medium
CN112151015B (en) Keyword detection method, keyword detection device, electronic equipment and storage medium
CN108735200B (en) Automatic speaker labeling method
CN103730115A (en) Method and device for detecting keywords in voice
US20230089308A1 (en) Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering
CN110866234B (en) Identity verification system based on multiple biological characteristics
CN111402892A (en) Conference recording template generation method based on voice recognition
CN116312552B (en) Video speaker journaling method and system
US10847154B2 (en) Information processing device, information processing method, and program
CN112259085A (en) Two-stage voice awakening algorithm based on model fusion framework
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN113129905B (en) Distributed voice awakening system based on multiple microphone array nodes
CN103247316B (en) The method and system of index building in a kind of audio retrieval
CN111221987A (en) Hybrid audio tagging method and apparatus
CN115831124A (en) Conference record role separation system and method based on voiceprint recognition
CN107180629B (en) Voice acquisition and recognition method and system
CN109345652A (en) Work attendance device and implementation method based on speech recognition typing text
CN112466287B (en) Voice segmentation method, device and computer readable storage medium
TWI769520B (en) Multi-language speech recognition and translation method and system
CN115050372A (en) Audio segment clustering method and device, electronic equipment and medium
CN110580907B (en) Voice recognition method and system for multi-person speaking scene
JP2000010578A (en) Voice message transmission/reception system, and voice message processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Distributed Voice Wakeup System Based on Multiple Microphone Array Nodes

Effective date of registration: 20231019

Granted publication date: 20221004

Pledgee: Shenzhen Rural Commercial Bank Co.,Ltd. Xixiang Branch

Pledgor: Shenzhen Yuliang Technology Co.,Ltd.

Registration number: Y2023980061858