CN110858476A - Sound collection method and device based on microphone array - Google Patents
Sound collection method and device based on microphone array Download PDFInfo
- Publication number
- CN110858476A CN110858476A CN201810974352.5A CN201810974352A CN110858476A CN 110858476 A CN110858476 A CN 110858476A CN 201810974352 A CN201810974352 A CN 201810974352A CN 110858476 A CN110858476 A CN 110858476A
- Authority
- CN
- China
- Prior art keywords
- voice
- channel
- speaker
- speakers
- data stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000000926 separation method Methods 0.000 claims abstract description 33
- 230000011218 segmentation Effects 0.000 claims abstract description 18
- 238000001514 detection method Methods 0.000 claims abstract description 12
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 9
- 238000005516 engineering process Methods 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000004519 manufacturing process Methods 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000003786 synthesis reaction Methods 0.000 claims description 4
- 241001465754 Metazoa Species 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001020 rhythmical effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention provides a sound collection method and device based on a microphone array. The method comprises the following steps: acquiring voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers; converting the multi-channel speech into single-channel speech; performing sentence segmentation on the single-channel voice to obtain a voice segmented data stream containing preset type sounds; matching the voice segment data stream with each speaker to obtain single separated voice of each speaker; and respectively synthesizing the single separated voice matched with each speaker into respective voice sections. The technical scheme provided by the invention has strong applicability, can be suitable for near-field and far-field voice environments, has the characteristic of high voice detection accuracy, and can realize the separation of the voices of a plurality of speakers so that each speaker corresponds to one voice separation voice section.
Description
Technical Field
The invention relates to the field of automatic processing of computer information, in particular to a sound acquisition method and device based on a microphone array.
Background
The voice is one of the most natural and effective means for people to carry out information interaction. People inevitably suffer from environmental noise, room reverberation and other speaker interference while obtaining voice signals, and the voice quality is seriously affected. Speech enhancement and separation is an effective way to suppress interference as a pre-processing scheme.
The voice separation means to separate desired voice data from various sounds, and mainly studies how to effectively select and track certain sounds in a complex sound environment. Correctly distinguishing noise from target speech of interest, emphasizing the target speech, and attenuating or eliminating the noise is a research goal for speech separation. This problem has been studied for decades by signal processing experts, artificial intelligence experts and audiologists, but the proposed methods have not been satisfactory.
At present, the method of calculating auditory scene analysis, non-negative matrix decomposition and the like is mainly utilized for voice separation, and the method is simple to implement. However, this method has large limitations, few applicable scenes, rapid performance degradation in the presence of noise, failure to consider voice characteristics, damage to voice, and failure to consider far-field voice environment.
As speech technology has evolved, speech technology is being applied in more complex environments. Speech separation is also expected to work well in far-field, noisy acoustic environments.
Therefore, the invention provides a sound collection method and device based on a microphone array to solve the defects of the prior art.
Disclosure of Invention
The invention aims to provide a sound collection method and a sound collection device based on a microphone array, which solve the problems of the existing voice separation.
According to an aspect of the present invention, there is provided a sound collecting method based on a microphone array, including:
acquiring voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers;
converting the multi-channel speech into single-channel speech;
performing sentence segmentation on the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
matching the voice segment data stream with each speaker to obtain single separated voice of each speaker;
and respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
Further, converting the multi-channel speech into single-channel speech, comprising:
receiving the multi-channel voice;
performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology;
and converting the channel voice corresponding to all the enhanced microphones into single-channel voice.
Further, sentence segmentation is performed on the single-channel speech to obtain a speech segment data stream containing a preset type of sound, including:
detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and performing sentence segmentation on the voice frame in the threshold range in the single-channel voice to obtain a voice segmented data stream containing preset type voice.
Further, matching the voice segment data stream with each speaker to obtain a single separated voice of each speaker, comprising: separating the voice segmented data stream based on the number of the speakers to obtain a plurality of single-separated voices respectively corresponding to the number of the speakers;
and matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
According to another aspect of the present invention, a microphone array based sound collection device is disclosed, comprising:
the microphone array acquisition module is used for acquiring voices in an environment and information of speakers in the environment by using a microphone array to obtain multi-channel voices; the speaker information includes: the direction and number of speakers;
the voice conversion module is used for converting the multi-channel voice into single-channel voice;
the voice detection module is used for segmenting the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
the voice separation module is used for matching the voice segmented data stream with each speaker to obtain single separation voice of each speaker;
and the voice synthesis module is used for respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
Further, the voice conversion module includes:
the voice receiving submodule is used for receiving the multi-channel voice;
the voice enhancement sub-module is used for carrying out voice enhancement on far-field voice by utilizing a microphone array direction finding and wave beam technology;
and the voice conversion submodule is used for converting the channel voice corresponding to all the microphones after enhancement into single-channel voice.
Further, the voice detection module comprises:
the voice detection submodule is used for detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and the voice segmentation submodule is used for segmenting the voice frames in the threshold range in the single-channel voice to obtain a voice segmentation data stream containing preset type voice.
Further, the voice separation module includes:
a voice separation submodule, configured to separate the voice segment data stream based on the number of the speakers to obtain a plurality of single-separated voices corresponding to the number of the speakers;
and the voice matching sub-module is used for matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
The technical scheme provided by the invention acquires the voice in the environment and the information of the speaker in the environment by using the microphone array to obtain multi-channel voice, adopts the microphone array voice enhancement technology, automatically identifies and locks the microphone array and enhances the voice signal of the speaker by analyzing the far-field voice signal, automatically inhibits the surrounding random noise and background noise, and improves the accuracy of the voice signal output by the receiving end; then converting the multi-channel voice into single-channel voice; the preset voice type in the voice is cut out to form a voice segmentation data stream, so that the continuity of communication can be realized, the starting point of a continuous voice signal is automatically identified, a silent section is removed, and the input is split into a plurality of words; and finally, synthesizing the matched voice into a voice segment, and ensuring that each voice segment after voice separation only contains one speaker.
The voice separation system relies on microphone array equipment, is convenient to carry and store, is convenient and practical, and solves the problems of high cost and low efficiency of the traditional voice separation based on a cloud server; the invention can solve the problem of voice separation in scenes such as daily life, study, meetings and the like of a user, and has great significance for the development of voice separation.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 is a flow chart of far-field speech separation provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present invention provides a sound collection method based on a microphone array, which comprises the following steps:
step 1, collecting voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers;
step 2, converting the multi-channel voice into single-channel voice;
step 3, sentence segmentation is carried out on the single-channel voice to obtain a voice segmentation data stream containing preset type sounds;
step 4, matching the voice segment data stream with each speaker to obtain single separated voice of each speaker;
and 5, respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
In the embodiment of the application, a microphone array is used for acquiring the voice in the environment and the information of the speaker in the environment to obtain multi-channel voice, a microphone array voice enhancement technology is adopted, the voice signals of the speaker are automatically identified, locked and enhanced through the analysis of far-field voice signals, the random noise and the background noise around are automatically inhibited, and the accuracy of the voice signals output by a receiving end is improved; then converting the multi-channel voice into single-channel voice; the preset voice type in the voice is cut out to form a voice segmentation data stream, so that the continuity of communication can be realized, the starting point of a continuous voice signal is automatically identified, a silent section is removed, and the input is split into a plurality of words; and finally, synthesizing the matched voice into a voice segment, and ensuring that each voice segment after voice separation only contains one speaker.
In some embodiments of the present application, a microphone array is used to collect voices in an environment and speaker information in the environment, so as to obtain multi-channel voices, which specifically includes:
the microphone array receives voice signal input continuously, voice directions are searched for in a 360-degree plane in real time by utilizing a microphone array multi-speaker direction-finding technology, voice direction finding under a scene that a plurality of speakers sound at the same time can be achieved by the technology, and the direction of each speaker and the number of speakers are output.
In some embodiments of the present application, converting the multi-channel speech to single-channel speech includes:
receiving the multi-channel voice;
performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology;
and converting the channel voice corresponding to all the enhanced microphones into single-channel voice.
In some embodiments of the present application, performing sentence segmentation on the single-channel speech to obtain a speech segment data stream containing a preset type of sound, includes:
detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and performing sentence segmentation on the voice frame in the threshold range in the single-channel voice to obtain a voice segmented data stream containing preset type voice.
In some embodiments of the present application, matching the speech segment data stream with each speaker to obtain a single isolated speech for each speaker comprises: separating the voice segmented data stream based on the number of the speakers to obtain a plurality of single-separated voices respectively corresponding to the number of the speakers;
and matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
In some embodiments of the present application, the predetermined type of sound is a human voice and the speaker is a human.
Acquiring voices in an environment and information of people in the environment by using a microphone array to obtain multi-channel voices; wherein the information of the person comprises: the direction and number of people speaking; performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology; and converting the channel voice corresponding to all the enhanced microphones into single-channel voice. The voice/non-voice detection technology utilizes a trained neural network to detect voice/non-voice of each frame, and if the voice frame in a small segment of voice exceeds a preset threshold value, the voice starting frame is judged as a voice starting point. Only the voice after the starting point is saved, and the non-human voice is discarded, so that a voice segment data stream only containing human voice is obtained. And separating the voice segmented data streams according to the number of people, and distributing the single separated voice obtained in real time to each speaker. Each section of voice is guaranteed to only contain one speaker, and cross aliasing errors cannot occur. Finally, the single separated voice of each person is synthesized into voice sections to form a plurality of voice sections corresponding to the number of people.
In other embodiments of the present application, the speaker is a musical instrument, such as a violin, an accordion, a flute, a urheen, or the like. The method separates the sound of each instrument from the environmental sound, distinguishes and matches the sound corresponding to each instrument according to the tone, and synthesizes the sound of each instrument to form a plurality of speech segments corresponding to the instruments.
In other embodiments of the present application, the speaker is an animal. The method separates the sound of each animal from the environmental sound, distinguishes the sound and distributes the sound to each animal, and finally synthesizes the sound of each animal to form the voice section of each animal corresponding to the number of the animals.
Fig. 2 shows an embodiment of the present invention, applied to a far-field acoustic environment:
in a far-field environment, a plurality of users communicate at different positions, the environment contains background noises with different degrees, and the users can realize real-time and continuous separation of voice under zero operation.
In fig. 2, a microphone array receives voices in an environment, at a certain moment, a microphone array direction-finding module finds directions each including a voice at the moment, and records the directions as a speaker direction and the number of speakers, the speaker direction is used for guiding the direction of a beam, and the number of speakers is sent to a voice separation module to guide the number of output voices. And the microphone array beam module performs beam forming on each direction by using the obtained voice direction to obtain the voice enhanced in each direction and fuse the voice into single-channel voice. The voice/non-voice detection technology is utilized to decompose the continuous voice signal into voice segmented data flow, and non-voice and noise are further filtered, so that the system efficiency is improved. The voice separation module separates the mixed voice into a corresponding number of voices according to the number of speakers. The tracking module distributes each real-time voice separation segment to each speaker by utilizing similarity calculation, and ensures that the voice of each speaker does not contain the voices of other speakers after separation. The voice segment data stream is synthesized into rhythmic continuous voice by using a voice synthesis technology, and the rhythmic continuous voice can be output to a user or uploaded to a server by using a microphone of the equipment. In the process of processing the voice signal, the equipment displays the processing progress in real time.
According to the working mode and principle, the voice separation of a plurality of users in a long distance can be realized.
Based on the same inventive concept, the invention also provides a sound collection device based on a microphone array, which comprises:
the microphone array acquisition module is used for acquiring voices in an environment and information of speakers in the environment by using a microphone array to obtain multi-channel voices; the speaker information includes: the direction and number of speakers;
the voice conversion module is used for converting the multi-channel voice into single-channel voice;
the voice detection module is used for segmenting the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
the voice separation module is used for matching the voice segmented data stream with each speaker to obtain single separation voice of each speaker;
and the voice synthesis module is used for respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
Preferably, the voice conversion module includes:
the voice receiving submodule is used for receiving the multi-channel voice;
the voice enhancement sub-module is used for carrying out voice enhancement on far-field voice by utilizing a microphone array direction finding and wave beam technology;
and the voice conversion submodule is used for converting the channel voice corresponding to all the microphones after enhancement into single-channel voice.
Preferably, the voice detection module includes:
the voice detection submodule is used for detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and the voice segmentation submodule is used for segmenting the voice frames in the threshold range in the single-channel voice to obtain a voice segmentation data stream containing preset type voice.
Preferably, the voice separation module includes:
a voice separation submodule, configured to separate the voice segment data stream based on the number of the speakers to obtain a plurality of single-separated voices corresponding to the number of the speakers;
and the voice matching sub-module is used for matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A sound collection method based on a microphone array is characterized by comprising the following steps:
acquiring voice in an environment and information of a speaker in the environment by using a microphone array to obtain multi-channel voice; the speaker information includes: the direction and number of speakers;
converting the multi-channel speech into single-channel speech;
performing sentence segmentation on the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
matching the voice segment data stream with each speaker to obtain single separated voice of each speaker;
and respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
2. The method of claim 1, wherein converting the multi-channel speech to single-channel speech comprises:
receiving the multi-channel voice;
performing voice enhancement on far-field voice by using a microphone array direction finding and beam technology;
and converting the channel voice corresponding to all the enhanced microphones into single-channel voice.
3. The method of claim 2, wherein performing sentence-segmentation on the single-channel speech to obtain a speech segmented data stream containing a preset type of sound comprises:
detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and performing sentence segmentation on the voice frame in the threshold range in the single-channel voice to obtain a voice segmented data stream containing preset type voice.
4. The method of claim 3, wherein matching the stream of speech segments to each speaker results in a single separate speech for each speaker, comprising: separating the voice segmented data stream based on the number of the speakers to obtain a plurality of single-separated voices respectively corresponding to the number of the speakers;
and matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
5. A sound collection device based on a microphone array, comprising:
the microphone array acquisition module is used for acquiring voices in an environment and information of speakers in the environment by using a microphone array to obtain multi-channel voices; the speaker information includes: the direction and number of speakers;
the voice conversion module is used for converting the multi-channel voice into single-channel voice;
the voice detection module is used for segmenting the single-channel voice to obtain a voice segmented data stream containing preset type sounds;
the voice separation module is used for matching the voice segmented data stream with each speaker to obtain single separation voice of each speaker;
and the voice synthesis module is used for respectively synthesizing the single separated voice matched with each speaker into respective voice sections.
6. The apparatus of claim 5, wherein the voice conversion module comprises:
the voice receiving submodule is used for receiving the multi-channel voice;
the voice enhancement sub-module is used for carrying out voice enhancement on far-field voice by utilizing a microphone array direction finding and wave beam technology;
and the voice conversion submodule is used for converting the channel voice corresponding to all the microphones after enhancement into single-channel voice.
7. The apparatus of claim 5, wherein the voice detection module comprises:
the voice detection submodule is used for detecting each frame of voice of the single-channel voice according to a pre-established neural network;
and the voice segmentation submodule is used for segmenting the voice frames in the threshold range in the single-channel voice to obtain a voice segmentation data stream containing preset type voice.
8. The apparatus of claim 5, wherein the voice separation module comprises:
a voice separation submodule, configured to separate the voice segment data stream based on the number of the speakers to obtain a plurality of single-separated voices corresponding to the number of the speakers; and the voice matching sub-module is used for matching the single separated voice corresponding to each speaker based on the voice production directions of all the speakers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810974352.5A CN110858476B (en) | 2018-08-24 | 2018-08-24 | Sound collection method and device based on microphone array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810974352.5A CN110858476B (en) | 2018-08-24 | 2018-08-24 | Sound collection method and device based on microphone array |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110858476A true CN110858476A (en) | 2020-03-03 |
CN110858476B CN110858476B (en) | 2022-09-27 |
Family
ID=69635531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810974352.5A Active CN110858476B (en) | 2018-08-24 | 2018-08-24 | Sound collection method and device based on microphone array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110858476B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111883135A (en) * | 2020-07-28 | 2020-11-03 | 北京声智科技有限公司 | Voice transcription method and device and electronic equipment |
CN112804401A (en) * | 2020-12-31 | 2021-05-14 | 中国人民解放军战略支援部队信息工程大学 | Conference role determination and voice acquisition control method and device |
CN113464858A (en) * | 2021-08-03 | 2021-10-01 | 浙江欧菲克斯交通科技有限公司 | Mobile emergency lighting control method and device |
CN113825082A (en) * | 2021-09-19 | 2021-12-21 | 武汉左点科技有限公司 | Method and device for relieving hearing aid delay |
CN113963694A (en) * | 2020-07-20 | 2022-01-21 | 中移(苏州)软件技术有限公司 | Voice recognition method, voice recognition device, electronic equipment and storage medium |
CN115762525A (en) * | 2022-11-18 | 2023-03-07 | 北京中科艺杺科技有限公司 | Voice filtering and recording method and system based on omnibearing voice acquisition |
WO2024099359A1 (en) * | 2022-11-09 | 2024-05-16 | 北京有竹居网络技术有限公司 | Voice detection method and apparatus, electronic device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102388416A (en) * | 2010-02-25 | 2012-03-21 | 松下电器产业株式会社 | Signal processing apparatus and signal processing method |
CN106782563A (en) * | 2016-12-28 | 2017-05-31 | 上海百芝龙网络科技有限公司 | A kind of intelligent home voice interactive system |
US20180082690A1 (en) * | 2012-11-09 | 2018-03-22 | Mattersight Corporation | Methods and system for reducing false positive voice print matching |
CN107919133A (en) * | 2016-10-09 | 2018-04-17 | 赛谛听股份有限公司 | For the speech-enhancement system and sound enhancement method of destination object |
CN108074576A (en) * | 2017-12-14 | 2018-05-25 | 讯飞智元信息科技有限公司 | Inquest the speaker role's separation method and system under scene |
-
2018
- 2018-08-24 CN CN201810974352.5A patent/CN110858476B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102388416A (en) * | 2010-02-25 | 2012-03-21 | 松下电器产业株式会社 | Signal processing apparatus and signal processing method |
US20180082690A1 (en) * | 2012-11-09 | 2018-03-22 | Mattersight Corporation | Methods and system for reducing false positive voice print matching |
CN107919133A (en) * | 2016-10-09 | 2018-04-17 | 赛谛听股份有限公司 | For the speech-enhancement system and sound enhancement method of destination object |
CN106782563A (en) * | 2016-12-28 | 2017-05-31 | 上海百芝龙网络科技有限公司 | A kind of intelligent home voice interactive system |
CN108074576A (en) * | 2017-12-14 | 2018-05-25 | 讯飞智元信息科技有限公司 | Inquest the speaker role's separation method and system under scene |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113963694A (en) * | 2020-07-20 | 2022-01-21 | 中移(苏州)软件技术有限公司 | Voice recognition method, voice recognition device, electronic equipment and storage medium |
CN111883135A (en) * | 2020-07-28 | 2020-11-03 | 北京声智科技有限公司 | Voice transcription method and device and electronic equipment |
CN112804401A (en) * | 2020-12-31 | 2021-05-14 | 中国人民解放军战略支援部队信息工程大学 | Conference role determination and voice acquisition control method and device |
CN113464858A (en) * | 2021-08-03 | 2021-10-01 | 浙江欧菲克斯交通科技有限公司 | Mobile emergency lighting control method and device |
CN113464858B (en) * | 2021-08-03 | 2023-02-28 | 浙江欧菲克斯交通科技有限公司 | Mobile emergency lighting control method and device |
CN113825082A (en) * | 2021-09-19 | 2021-12-21 | 武汉左点科技有限公司 | Method and device for relieving hearing aid delay |
CN113825082B (en) * | 2021-09-19 | 2024-06-11 | 武汉左点科技有限公司 | Method and device for relieving hearing aid delay |
WO2024099359A1 (en) * | 2022-11-09 | 2024-05-16 | 北京有竹居网络技术有限公司 | Voice detection method and apparatus, electronic device and storage medium |
CN115762525A (en) * | 2022-11-18 | 2023-03-07 | 北京中科艺杺科技有限公司 | Voice filtering and recording method and system based on omnibearing voice acquisition |
CN115762525B (en) * | 2022-11-18 | 2024-05-07 | 北京中科艺杺科技有限公司 | Voice filtering and recording method and system based on omnibearing voice acquisition |
Also Published As
Publication number | Publication date |
---|---|
CN110858476B (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110858476B (en) | Sound collection method and device based on microphone array | |
Chen et al. | Continuous speech separation: Dataset and analysis | |
US11132997B1 (en) | Robust audio identification with interference cancellation | |
Cai et al. | Sensor network for the monitoring of ecosystem: Bird species recognition | |
CN111508498B (en) | Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium | |
CN109147796B (en) | Speech recognition method, device, computer equipment and computer readable storage medium | |
CN111816218A (en) | Voice endpoint detection method, device, equipment and storage medium | |
CN111429939B (en) | Sound signal separation method of double sound sources and pickup | |
CN110111808B (en) | Audio signal processing method and related product | |
Wang et al. | Deep learning assisted time-frequency processing for speech enhancement on drones | |
CN108520756B (en) | Method and device for separating speaker voice | |
CN113593601A (en) | Audio-visual multi-modal voice separation method based on deep learning | |
US20240249714A1 (en) | Multi-encoder end-to-end automatic speech recognition (asr) for joint modeling of multiple input devices | |
Wang et al. | Attention-based fusion for bone-conducted and air-conducted speech enhancement in the complex domain | |
CN111429916B (en) | Sound signal recording system | |
CN114333874A (en) | Method for processing audio signal | |
CN113823303A (en) | Audio noise reduction method and device and computer readable storage medium | |
CN111009259B (en) | Audio processing method and device | |
CN115171716B (en) | Continuous voice separation method and system based on spatial feature clustering and electronic equipment | |
CN117198324A (en) | Bird sound identification method, device and system based on clustering model | |
Kamble et al. | Teager energy subband filtered features for near and far-field automatic speech recognition | |
WO2022068675A1 (en) | Speaker speech extraction method and apparatus, storage medium, and electronic device | |
Weber et al. | Constructing a dataset of speech recordings with lombard effect | |
Yeow et al. | Real-Time Sound Event Localization and Detection: Deployment Challenges on Edge Devices | |
Venkatesan et al. | Analysis of monaural and binaural statistical properties for the estimation of distance of a target speaker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |