CN110858476A - Sound collection method and device based on microphone array - Google Patents

Sound collection method and device based on microphone array Download PDF

Info

Publication number
CN110858476A
CN110858476A CN201810974352.5A CN201810974352A CN110858476A CN 110858476 A CN110858476 A CN 110858476A CN 201810974352 A CN201810974352 A CN 201810974352A CN 110858476 A CN110858476 A CN 110858476A
Authority
CN
China
Prior art keywords
voice
channel
speaker
speakers
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810974352.5A
Other languages
Chinese (zh)
Other versions
CN110858476B (en
Inventor
王峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zidong Cognitive Technology Co Ltd
Original Assignee
Beijing Zidong Cognitive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zidong Cognitive Technology Co Ltd filed Critical Beijing Zidong Cognitive Technology Co Ltd
Priority to CN201810974352.5A priority Critical patent/CN110858476B/en
Publication of CN110858476A publication Critical patent/CN110858476A/en
Application granted granted Critical
Publication of CN110858476B publication Critical patent/CN110858476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a sound collection method and device based on a microphone array. The method comprises the following steps: acquiring voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers; converting the multi-channel speech into single-channel speech; performing sentence segmentation on the single-channel voice to obtain a voice segmented data stream containing preset type sounds; matching the voice segment data stream with each speaker to obtain single separated voice of each speaker; and respectively synthesizing the single separated voice matched with each speaker into respective voice sections. The technical scheme provided by the invention has strong applicability, can be suitable for near-field and far-field voice environments, has the characteristic of high voice detection accuracy, and can realize the separation of the voices of a plurality of speakers so that each speaker corresponds to one voice separation voice section.

Description

Sound collection method and device based on microphone array
Technical Field
The invention relates to the field of automatic processing of computer information, in particular to a sound acquisition method and device based on a microphone array.
Background
The voice is one of the most natural and effective means for people to carry out information interaction. People inevitably suffer from environmental noise, room reverberation and other speaker interference while obtaining voice signals, and the voice quality is seriously affected. Speech enhancement and separation is an effective way to suppress interference as a pre-processing scheme.
The voice separation means to separate desired voice data from various sounds, and mainly studies how to effectively select and track certain sounds in a complex sound environment. Correctly distinguishing noise from target speech of interest, emphasizing the target speech, and attenuating or eliminating the noise is a research goal for speech separation. This problem has been studied for decades by signal processing experts, artificial intelligence experts and audiologists, but the proposed methods have not been satisfactory.
At present, the method of calculating auditory scene analysis, non-negative matrix decomposition and the like is mainly utilized for voice separation, and the method is simple to implement. However, this method has large limitations, few applicable scenes, rapid performance degradation in the presence of noise, failure to consider voice characteristics, damage to voice, and failure to consider far-field voice environment.
As speech technology has evolved, speech technology is being applied in more complex environments. Speech separation is also expected to work well in far-field, noisy acoustic environments.
Therefore, the invention provides a sound collection method and device based on a microphone array to solve the defects of the prior art.
Disclosure of Invention
The invention aims to provide a sound collection method and a sound collection device based on a microphone array, which solve the problems of the existing voice separation.
According to an aspect of the present invention, there is provided a sound collecting method based on a microphone array, including:
acquiring voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers;
converting the multi-channel speech into single-channel speech;
performing sentence segmentation on the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
matching the voice segment data stream with each speaker to obtain single separated voice of each speaker;
and respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
Further, converting the multi-channel speech into single-channel speech, comprising:
receiving the multi-channel voice;
performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology;
and converting the channel voice corresponding to all the enhanced microphones into single-channel voice.
Further, sentence segmentation is performed on the single-channel speech to obtain a speech segment data stream containing a preset type of sound, including:
detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and performing sentence segmentation on the voice frame in the threshold range in the single-channel voice to obtain a voice segmented data stream containing preset type voice.
Further, matching the voice segment data stream with each speaker to obtain a single separated voice of each speaker, comprising: separating the voice segmented data stream based on the number of the speakers to obtain a plurality of single-separated voices respectively corresponding to the number of the speakers;
and matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
According to another aspect of the present invention, a microphone array based sound collection device is disclosed, comprising:
the microphone array acquisition module is used for acquiring voices in an environment and information of speakers in the environment by using a microphone array to obtain multi-channel voices; the speaker information includes: the direction and number of speakers;
the voice conversion module is used for converting the multi-channel voice into single-channel voice;
the voice detection module is used for segmenting the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
the voice separation module is used for matching the voice segmented data stream with each speaker to obtain single separation voice of each speaker;
and the voice synthesis module is used for respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
Further, the voice conversion module includes:
the voice receiving submodule is used for receiving the multi-channel voice;
the voice enhancement sub-module is used for carrying out voice enhancement on far-field voice by utilizing a microphone array direction finding and wave beam technology;
and the voice conversion submodule is used for converting the channel voice corresponding to all the microphones after enhancement into single-channel voice.
Further, the voice detection module comprises:
the voice detection submodule is used for detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and the voice segmentation submodule is used for segmenting the voice frames in the threshold range in the single-channel voice to obtain a voice segmentation data stream containing preset type voice.
Further, the voice separation module includes:
a voice separation submodule, configured to separate the voice segment data stream based on the number of the speakers to obtain a plurality of single-separated voices corresponding to the number of the speakers;
and the voice matching sub-module is used for matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
The technical scheme provided by the invention acquires the voice in the environment and the information of the speaker in the environment by using the microphone array to obtain multi-channel voice, adopts the microphone array voice enhancement technology, automatically identifies and locks the microphone array and enhances the voice signal of the speaker by analyzing the far-field voice signal, automatically inhibits the surrounding random noise and background noise, and improves the accuracy of the voice signal output by the receiving end; then converting the multi-channel voice into single-channel voice; the preset voice type in the voice is cut out to form a voice segmentation data stream, so that the continuity of communication can be realized, the starting point of a continuous voice signal is automatically identified, a silent section is removed, and the input is split into a plurality of words; and finally, synthesizing the matched voice into a voice segment, and ensuring that each voice segment after voice separation only contains one speaker.
The voice separation system relies on microphone array equipment, is convenient to carry and store, is convenient and practical, and solves the problems of high cost and low efficiency of the traditional voice separation based on a cloud server; the invention can solve the problem of voice separation in scenes such as daily life, study, meetings and the like of a user, and has great significance for the development of voice separation.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 is a flow chart of far-field speech separation provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present invention provides a sound collection method based on a microphone array, which comprises the following steps:
step 1, collecting voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers;
step 2, converting the multi-channel voice into single-channel voice;
step 3, sentence segmentation is carried out on the single-channel voice to obtain a voice segmentation data stream containing preset type sounds;
step 4, matching the voice segment data stream with each speaker to obtain single separated voice of each speaker;
and 5, respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
In the embodiment of the application, a microphone array is used for acquiring the voice in the environment and the information of the speaker in the environment to obtain multi-channel voice, a microphone array voice enhancement technology is adopted, the voice signals of the speaker are automatically identified, locked and enhanced through the analysis of far-field voice signals, the random noise and the background noise around are automatically inhibited, and the accuracy of the voice signals output by a receiving end is improved; then converting the multi-channel voice into single-channel voice; the preset voice type in the voice is cut out to form a voice segmentation data stream, so that the continuity of communication can be realized, the starting point of a continuous voice signal is automatically identified, a silent section is removed, and the input is split into a plurality of words; and finally, synthesizing the matched voice into a voice segment, and ensuring that each voice segment after voice separation only contains one speaker.
In some embodiments of the present application, a microphone array is used to collect voices in an environment and speaker information in the environment, so as to obtain multi-channel voices, which specifically includes:
the microphone array receives voice signal input continuously, voice directions are searched for in a 360-degree plane in real time by utilizing a microphone array multi-speaker direction-finding technology, voice direction finding under a scene that a plurality of speakers sound at the same time can be achieved by the technology, and the direction of each speaker and the number of speakers are output.
In some embodiments of the present application, converting the multi-channel speech to single-channel speech includes:
receiving the multi-channel voice;
performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology;
and converting the channel voice corresponding to all the enhanced microphones into single-channel voice.
In some embodiments of the present application, performing sentence segmentation on the single-channel speech to obtain a speech segment data stream containing a preset type of sound, includes:
detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and performing sentence segmentation on the voice frame in the threshold range in the single-channel voice to obtain a voice segmented data stream containing preset type voice.
In some embodiments of the present application, matching the speech segment data stream with each speaker to obtain a single isolated speech for each speaker comprises: separating the voice segmented data stream based on the number of the speakers to obtain a plurality of single-separated voices respectively corresponding to the number of the speakers;
and matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
In some embodiments of the present application, the predetermined type of sound is a human voice and the speaker is a human.
Acquiring voices in an environment and information of people in the environment by using a microphone array to obtain multi-channel voices; wherein the information of the person comprises: the direction and number of people speaking; performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology; and converting the channel voice corresponding to all the enhanced microphones into single-channel voice. The voice/non-voice detection technology utilizes a trained neural network to detect voice/non-voice of each frame, and if the voice frame in a small segment of voice exceeds a preset threshold value, the voice starting frame is judged as a voice starting point. Only the voice after the starting point is saved, and the non-human voice is discarded, so that a voice segment data stream only containing human voice is obtained. And separating the voice segmented data streams according to the number of people, and distributing the single separated voice obtained in real time to each speaker. Each section of voice is guaranteed to only contain one speaker, and cross aliasing errors cannot occur. Finally, the single separated voice of each person is synthesized into voice sections to form a plurality of voice sections corresponding to the number of people.
In other embodiments of the present application, the speaker is a musical instrument, such as a violin, an accordion, a flute, a urheen, or the like. The method separates the sound of each instrument from the environmental sound, distinguishes and matches the sound corresponding to each instrument according to the tone, and synthesizes the sound of each instrument to form a plurality of speech segments corresponding to the instruments.
In other embodiments of the present application, the speaker is an animal. The method separates the sound of each animal from the environmental sound, distinguishes the sound and distributes the sound to each animal, and finally synthesizes the sound of each animal to form the voice section of each animal corresponding to the number of the animals.
Fig. 2 shows an embodiment of the present invention, applied to a far-field acoustic environment:
in a far-field environment, a plurality of users communicate at different positions, the environment contains background noises with different degrees, and the users can realize real-time and continuous separation of voice under zero operation.
In fig. 2, a microphone array receives voices in an environment, at a certain moment, a microphone array direction-finding module finds directions each including a voice at the moment, and records the directions as a speaker direction and the number of speakers, the speaker direction is used for guiding the direction of a beam, and the number of speakers is sent to a voice separation module to guide the number of output voices. And the microphone array beam module performs beam forming on each direction by using the obtained voice direction to obtain the voice enhanced in each direction and fuse the voice into single-channel voice. The voice/non-voice detection technology is utilized to decompose the continuous voice signal into voice segmented data flow, and non-voice and noise are further filtered, so that the system efficiency is improved. The voice separation module separates the mixed voice into a corresponding number of voices according to the number of speakers. The tracking module distributes each real-time voice separation segment to each speaker by utilizing similarity calculation, and ensures that the voice of each speaker does not contain the voices of other speakers after separation. The voice segment data stream is synthesized into rhythmic continuous voice by using a voice synthesis technology, and the rhythmic continuous voice can be output to a user or uploaded to a server by using a microphone of the equipment. In the process of processing the voice signal, the equipment displays the processing progress in real time.
According to the working mode and principle, the voice separation of a plurality of users in a long distance can be realized.
Based on the same inventive concept, the invention also provides a sound collection device based on a microphone array, which comprises:
the microphone array acquisition module is used for acquiring voices in an environment and information of speakers in the environment by using a microphone array to obtain multi-channel voices; the speaker information includes: the direction and number of speakers;
the voice conversion module is used for converting the multi-channel voice into single-channel voice;
the voice detection module is used for segmenting the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
the voice separation module is used for matching the voice segmented data stream with each speaker to obtain single separation voice of each speaker;
and the voice synthesis module is used for respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
Preferably, the voice conversion module includes:
the voice receiving submodule is used for receiving the multi-channel voice;
the voice enhancement sub-module is used for carrying out voice enhancement on far-field voice by utilizing a microphone array direction finding and wave beam technology;
and the voice conversion submodule is used for converting the channel voice corresponding to all the microphones after enhancement into single-channel voice.
Preferably, the voice detection module includes:
the voice detection submodule is used for detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and the voice segmentation submodule is used for segmenting the voice frames in the threshold range in the single-channel voice to obtain a voice segmentation data stream containing preset type voice.
Preferably, the voice separation module includes:
a voice separation submodule, configured to separate the voice segment data stream based on the number of the speakers to obtain a plurality of single-separated voices corresponding to the number of the speakers;
and the voice matching sub-module is used for matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A sound collection method based on a microphone array is characterized by comprising the following steps:
acquiring voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers;
converting the multi-channel speech into single-channel speech;
performing sentence segmentation on the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
matching the voice segment data stream with each speaker to obtain single separated voice of each speaker;
and respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
2. The method of claim 1, wherein converting the multi-channel speech to single-channel speech comprises:
receiving the multi-channel voice;
performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology;
and converting the channel voice corresponding to all the enhanced microphones into single-channel voice.
3. The method of claim 2, wherein performing sentence-segmentation on the single-channel speech to obtain a speech segmented data stream containing a preset type of sound comprises:
detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and performing sentence segmentation on the voice frame in the threshold range in the single-channel voice to obtain a voice segmented data stream containing preset type voice.
4. The method of claim 3, wherein matching the stream of speech segments to each speaker results in a single separate speech for each speaker, comprising: separating the voice segmented data stream based on the number of the speakers to obtain a plurality of single-separated voices respectively corresponding to the number of the speakers;
and matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
5. A sound collection device based on a microphone array, comprising:
the microphone array acquisition module is used for acquiring voices in an environment and information of speakers in the environment by using a microphone array to obtain multi-channel voices; the speaker information includes: the direction and number of speakers;
the voice conversion module is used for converting the multi-channel voice into single-channel voice;
the voice detection module is used for segmenting the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
the voice separation module is used for matching the voice segmented data stream with each speaker to obtain single separation voice of each speaker;
and the voice synthesis module is used for respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
6. The apparatus of claim 5, wherein the voice conversion module comprises:
the voice receiving submodule is used for receiving the multi-channel voice;
the voice enhancement sub-module is used for carrying out voice enhancement on far-field voice by utilizing a microphone array direction finding and wave beam technology;
and the voice conversion submodule is used for converting the channel voice corresponding to all the microphones after enhancement into single-channel voice.
7. The apparatus of claim 5, wherein the voice detection module comprises:
the voice detection submodule is used for detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and the voice segmentation submodule is used for segmenting the voice frames in the threshold range in the single-channel voice to obtain a voice segmentation data stream containing preset type voice.
8. The apparatus of claim 5, wherein the voice separation module comprises:
a voice separation submodule, configured to separate the voice segment data stream based on the number of the speakers to obtain a plurality of single-separated voices corresponding to the number of the speakers; and the voice matching sub-module is used for matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
CN201810974352.5A 2018-08-24 2018-08-24 Sound collection method and device based on microphone array Active CN110858476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810974352.5A CN110858476B (en) 2018-08-24 2018-08-24 Sound collection method and device based on microphone array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810974352.5A CN110858476B (en) 2018-08-24 2018-08-24 Sound collection method and device based on microphone array

Publications (2)

Publication Number Publication Date
CN110858476A true CN110858476A (en) 2020-03-03
CN110858476B CN110858476B (en) 2022-09-27

Family

ID=69635531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810974352.5A Active CN110858476B (en) 2018-08-24 2018-08-24 Sound collection method and device based on microphone array

Country Status (1)

Country Link
CN (1) CN110858476B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883135A (en) * 2020-07-28 2020-11-03 北京声智科技有限公司 Voice transcription method and device and electronic equipment
CN112804401A (en) * 2020-12-31 2021-05-14 中国人民解放军战略支援部队信息工程大学 Conference role determination and voice acquisition control method and device
CN113464858A (en) * 2021-08-03 2021-10-01 浙江欧菲克斯交通科技有限公司 Mobile emergency lighting control method and device
CN113825082A (en) * 2021-09-19 2021-12-21 武汉左点科技有限公司 Method and device for relieving hearing aid delay
CN113963694A (en) * 2020-07-20 2022-01-21 中移(苏州)软件技术有限公司 Voice recognition method, voice recognition device, electronic equipment and storage medium
CN115762525A (en) * 2022-11-18 2023-03-07 北京中科艺杺科技有限公司 Voice filtering and recording method and system based on omnibearing voice acquisition
WO2024099359A1 (en) * 2022-11-09 2024-05-16 北京有竹居网络技术有限公司 Voice detection method and apparatus, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102388416A (en) * 2010-02-25 2012-03-21 松下电器产业株式会社 Signal processing apparatus and signal processing method
CN106782563A (en) * 2016-12-28 2017-05-31 上海百芝龙网络科技有限公司 A kind of intelligent home voice interactive system
US20180082690A1 (en) * 2012-11-09 2018-03-22 Mattersight Corporation Methods and system for reducing false positive voice print matching
CN107919133A (en) * 2016-10-09 2018-04-17 赛谛听股份有限公司 For the speech-enhancement system and sound enhancement method of destination object
CN108074576A (en) * 2017-12-14 2018-05-25 讯飞智元信息科技有限公司 Inquest the speaker role's separation method and system under scene

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102388416A (en) * 2010-02-25 2012-03-21 松下电器产业株式会社 Signal processing apparatus and signal processing method
US20180082690A1 (en) * 2012-11-09 2018-03-22 Mattersight Corporation Methods and system for reducing false positive voice print matching
CN107919133A (en) * 2016-10-09 2018-04-17 赛谛听股份有限公司 For the speech-enhancement system and sound enhancement method of destination object
CN106782563A (en) * 2016-12-28 2017-05-31 上海百芝龙网络科技有限公司 A kind of intelligent home voice interactive system
CN108074576A (en) * 2017-12-14 2018-05-25 讯飞智元信息科技有限公司 Inquest the speaker role's separation method and system under scene

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963694A (en) * 2020-07-20 2022-01-21 中移(苏州)软件技术有限公司 Voice recognition method, voice recognition device, electronic equipment and storage medium
CN111883135A (en) * 2020-07-28 2020-11-03 北京声智科技有限公司 Voice transcription method and device and electronic equipment
CN112804401A (en) * 2020-12-31 2021-05-14 中国人民解放军战略支援部队信息工程大学 Conference role determination and voice acquisition control method and device
CN113464858A (en) * 2021-08-03 2021-10-01 浙江欧菲克斯交通科技有限公司 Mobile emergency lighting control method and device
CN113464858B (en) * 2021-08-03 2023-02-28 浙江欧菲克斯交通科技有限公司 Mobile emergency lighting control method and device
CN113825082A (en) * 2021-09-19 2021-12-21 武汉左点科技有限公司 Method and device for relieving hearing aid delay
CN113825082B (en) * 2021-09-19 2024-06-11 武汉左点科技有限公司 Method and device for relieving hearing aid delay
WO2024099359A1 (en) * 2022-11-09 2024-05-16 北京有竹居网络技术有限公司 Voice detection method and apparatus, electronic device and storage medium
CN115762525A (en) * 2022-11-18 2023-03-07 北京中科艺杺科技有限公司 Voice filtering and recording method and system based on omnibearing voice acquisition
CN115762525B (en) * 2022-11-18 2024-05-07 北京中科艺杺科技有限公司 Voice filtering and recording method and system based on omnibearing voice acquisition

Also Published As

Publication number Publication date
CN110858476B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN110858476B (en) Sound collection method and device based on microphone array
Chen et al. Continuous speech separation: Dataset and analysis
US11132997B1 (en) Robust audio identification with interference cancellation
Cai et al. Sensor network for the monitoring of ecosystem: Bird species recognition
CN111508498B (en) Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium
CN109147796B (en) Speech recognition method, device, computer equipment and computer readable storage medium
CN111816218A (en) Voice endpoint detection method, device, equipment and storage medium
CN111429939B (en) Sound signal separation method of double sound sources and pickup
CN110111808B (en) Audio signal processing method and related product
Wang et al. Deep learning assisted time-frequency processing for speech enhancement on drones
CN108520756B (en) Method and device for separating speaker voice
CN113593601A (en) Audio-visual multi-modal voice separation method based on deep learning
US20240249714A1 (en) Multi-encoder end-to-end automatic speech recognition (asr) for joint modeling of multiple input devices
Wang et al. Attention-based fusion for bone-conducted and air-conducted speech enhancement in the complex domain
CN111429916B (en) Sound signal recording system
CN114333874A (en) Method for processing audio signal
CN113823303A (en) Audio noise reduction method and device and computer readable storage medium
CN111009259B (en) Audio processing method and device
CN117198324A (en) Bird sound identification method, device and system based on clustering model
Kamble et al. Teager energy subband filtered features for near and far-field automatic speech recognition
WO2022068675A1 (en) Speaker speech extraction method and apparatus, storage medium, and electronic device
Weber et al. Constructing a dataset of speech recordings with lombard effect
Yeow et al. Real-Time Sound Event Localization and Detection: Deployment Challenges on Edge Devices
CN115171716B (en) Continuous voice separation method and system based on spatial feature clustering and electronic equipment
Venkatesan et al. Analysis of monaural and binaural statistical properties for the estimation of distance of a target speaker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant