CN109300475A - Microphone array sound pick-up method and device - Google Patents
Microphone array sound pick-up method and device Download PDFInfo
- Publication number
- CN109300475A CN109300475A CN201710608727.1A CN201710608727A CN109300475A CN 109300475 A CN109300475 A CN 109300475A CN 201710608727 A CN201710608727 A CN 201710608727A CN 109300475 A CN109300475 A CN 109300475A
- Authority
- CN
- China
- Prior art keywords
- face
- recognition
- donor
- microphone array
- beam forming
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Collating Specific Patterns (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The present invention discloses a kind of microphone array sound pick-up method and device.The microphone array sound pick-up method includes: to carry out recognition of face using panoramic video, captures the face of donor, obtains the azimuth information of authorization human face;According to the azimuth information of power human face, beam forming is carried out;Pickup is carried out using the microphone array of beam forming.The present invention reduces the difficulty of Wave beam forming by video identification, solves the problems, such as speech recognition under cocktail party effect, realizes high directivity pickup.
Description
Technical field
The present invention relates to field of speech recognition, in particular to a kind of microphone array sound pick-up method and device.
Background technique
Wired home audio access is the hot spot of wired home at present.
The auditory system of people can distinguish and track oneself interested voice in the environment of noisy multiple talkers
Signal, and sound required for oneself is told, this resolution capability is one kind specific to inside of human body speech understanding mechanism
Sensing capability, that is, the mankind speech Separation ability, referred to as " cocktail party effect ".
Current speech recognition system can reach very high discrimination to clean speech, but when voice is by noise pollution
When, system performance can sharply decline.
Summary of the invention
In view of the above technical problem, the present invention provides a kind of microphone array sound pick-up method and devices, are known by video
Not Jiang Di Wave beam forming difficulty, realize high directivity pickup.
According to an aspect of the present invention, a kind of microphone array sound pick-up method is provided, comprising:
Recognition of face is carried out using panoramic video, captures the face of donor, obtains the azimuth information of authorization human face;
According to the azimuth information of power human face, beam forming is carried out;
Pickup is carried out using the microphone array of beam forming.
In one embodiment of the invention, described to include: using the microphone array progress pickup of beam forming
Signal is separated according to Wave beam forming, only picks up the voice signal in authorization human face orientation.
In one embodiment of the invention, the method also includes:
Joint authentication is carried out using recognition of face and Application on Voiceprint Recognition.
In one embodiment of the invention, it is described using recognition of face and Application on Voiceprint Recognition carry out joint authentication include:
Donor is confirmed using recognition of face;
The keyword issued using Application on Voiceprint Recognition donor, to further confirm that donor.
In one embodiment of the invention, after in joint, the authentication is passed, the method also includes:
Extract the control instruction that donor sends;
The control instruction is parsed, and corresponding controlling behavior is completed according to the control instruction after parsing.
According to another aspect of the present invention, a kind of microphone array sound pick up equipment is provided, comprising:
Face recognition module captures the face of donor, obtains donor for carrying out recognition of face using panoramic video
The azimuth information of face;
Beamforming block carries out beam forming for the azimuth information according to power human face;
Pickup module, for carrying out pickup using the microphone array of beam forming.
In one embodiment of the invention, pickup module is used to separate signal according to Wave beam forming, only picks up donor
The voice signal in facial orientation.
In one embodiment of the invention, microphone array sound pick up equipment is joined using recognition of face and Application on Voiceprint Recognition
Close authentication.
In one embodiment of the invention, described device further include:
Face recognition module is used to confirm donor using recognition of face;
Voiceprint identification module, the keyword for being issued using Application on Voiceprint Recognition donor, to further confirm that donor.
In one embodiment of the invention, described device further include:
Acoustic control module extracts the control instruction that donor sends after the authentication is passed in joint;To the control instruction
It is parsed, and corresponding controlling behavior is completed according to the control instruction after parsing.
The present invention reduces the difficulty of Wave beam forming by video identification, solves the difficulty of speech recognition under cocktail party effect
Topic, realizes high directivity pickup.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the schematic diagram of inventive microphone array sound pick-up method first embodiment.
Fig. 2 is the schematic diagram of recognition of face in one embodiment of the invention.
Fig. 3 is the contrast schematic diagram of single microphone and microphone array pickup in one embodiment of the invention.
Fig. 4 is the schematic diagram of inventive microphone array sound pick-up method second embodiment.
Fig. 5 is the schematic diagram of inventive microphone array sound pick up equipment first embodiment.
Fig. 6 is the schematic diagram of inventive microphone array sound pick up equipment second embodiment.
Fig. 7 is the operation schematic diagram of microphone array sound pick up equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Below
Description only actually at least one exemplary embodiment be it is illustrative, never as to the present invention and its application or make
Any restrictions.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Unless specifically stated otherwise, positioned opposite, the digital table of the component and step that otherwise illustrate in these embodiments
It is not limited the scope of the invention up to formula and numerical value.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality
Proportionate relationship draw.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as authorizing part of specification.
It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without
It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
Fig. 1 is the schematic diagram of inventive microphone array sound pick-up method first embodiment.Preferably, the present embodiment can be by this
Invention microphone array sound pick up equipment executes.Method includes the following steps:
Step 11, as shown in Fig. 2, carrying out recognition of face using panoramic video, the face of donor is captured;Known using face
It Que Ren not donor;Obtain the azimuth information of authorization human face.
In one embodiment of the invention, in step 11, specifically may be used using the step of panoramic video progress recognition of face
To include: to carry out recognition of face using VR camera (virtual reality camera) panoramic video.
Step 12, according to the azimuth information of power human face, beam forming is carried out.
Step 13, pickup is carried out using the microphone array of beam forming.
Wherein, microphone array (Microphone Array) increases a spatial domain on the basis of time domain and frequency domain,
Empty time-frequency combined processing is carried out to the signal from space different direction, microphone array inherits the related calculation of aerial array
Method, while the method for some single microphone speech processes is absorbed again.Microphone array has spatial selectivity, is capturing it
While the high-quality signal of specific direction, and reduce noise and other interference.It speaks in addition, microphone array is not necessarily limited
The activity of people can detect automatically, position and track speaker in its receiving area.
In one embodiment of the invention, microphone array can be using the space of planar array and space topological battle array.
Fig. 3 is the contrast schematic diagram of single microphone and microphone array pickup in one embodiment of the invention.Such as Fig. 3 institute
Show, single microphone picks up required voice with ambient noise simultaneously;Realization suppression is listed in using the microphone array of beam forming technique
Voice needed for being picked up while ambient noise processed (such as: the voice signal in authorization human face orientation).
In one embodiment of the invention, step 13 can specifically include: separating signal according to Wave beam forming, only picks up
Authorize the voice signal in human face orientation.
Based on the microphone array sound pick-up method that the panoramic video secondary beam that the above embodiment of the present invention provides is formed, lead to
Crossing video identification reduces the difficulty of Wave beam forming, solves the problems, such as speech recognition under cocktail party effect, realizes high direction
Property pickup.
The above embodiment of the present invention combines video channel, the problem of indoor pickup is directed toward is simplified, in conjunction with space MIC gusts
Column and beam-forming technology may be implemented relatively good pickup and be directed toward, be effectively reduced the language of noise and other people.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, wherein blind speech separation refers in source voice
It is only extensive by observation signal according to the statistical property of input source voice signal in the case that signal and transport channel parameters are unknown
It appears again the process of each source signal." blind " has double meaning: first is that source voice signal cannot be observed;How two refer to source signal
Aliasing is unknown.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, improves the reliable of system voice Signal separator
Property, therefore the success rate of speech recognition can be improved.
Fig. 4 is the schematic diagram of inventive microphone array sound pick-up method second embodiment.Preferably, the present embodiment can be by this
Invention microphone array sound pick up equipment executes.Method includes the following steps:
Step 41, as shown in Fig. 2, carrying out recognition of face using panoramic video, the face of donor is captured;Known using face
It Que Ren not donor;Obtain the azimuth information of authorization human face.
Step 42, according to the azimuth information of power human face, beam forming is carried out.
Step 43, pickup is carried out using the microphone array of beam forming.
Step 44, the keyword issued using Application on Voiceprint Recognition donor, to further confirm that donor.Thus in the present invention
State that embodiment can use recognition of face and Application on Voiceprint Recognition carries out joint authentication.
In one embodiment of the invention, Application on Voiceprint Recognition may include:
Step 441, voice is collected.
Step 442, noise suppressed and speech detection are carried out.
Step 443, feature extraction is carried out.
The task of feature extraction is to extract and select have the characteristics such as separability is strong, stability is high to the vocal print of speaker
Acoustics or language feature.Different from speech recognition, the feature of Application on Voiceprint Recognition must be " personalization " feature, and Speaker Identification
Feature must be " common feature " for speaker.
Although at present major part Voiceprint Recognition System be all acoustics level feature, one personal touch of characterization
Feature should be multifaceted.
In an embodiment of the invention, features described above may include: the anatomical structure of (1) and the pronunciation mechanism of the mankind
Related acoustic feature (such as frequency spectrum, cepstrum, formant, fundamental tone, reflection coefficient), nasal sound, band deep breathing sound, hoarse
Sound, laugh etc.;(2) semanteme, the rhetoric, pronunciation, speech habit influenced by socioeconomic status, education level, birthplace etc.
Deng;(3) features such as personal touch or the rhythm influenced by parent, rhythm, speed, intonation, volume.
Step 444, sound modeling is carried out.
In an embodiment of the invention, the feature that vocal print automatic identification model can be used includes: (1) acoustic feature
(cepstrum);(2) lexical characteristics (speaker relevant word n-gram, phoneme n-gram);(3) prosodic features (utilizes n-gram
The fundamental tone and energy " posture " of description);(4) languages, dialect and accent information;(5) channel information (which kind of channel used);Deng
Deng.
Step 445, identification matching is carried out.
In an embodiment of the invention, the identification matching in step 445 may include following a few major class methods:
(1) template matching method: main to use using dynamic time bending (DTW) to be directed at trained and test feature sequence
In the application (usually text inter-related task) of fixed phrases.
(2) arest neighbors method: retaining all characteristic vectors when training, and when identification finds in trained vector each vector
Nearest K, are identified accordingly, and the amount of usual model storage and similar calculating is all very big.
(3) neural network method: there are many kinds of forms, such as Multilayer Perception, radial basis function (RBF), can explicitly instruct
Practice to distinguish speaker and its background speaker, training burden is very big, and the replicability of model is bad.
(4) hidden Markov model (HMM) method: the HMM or gauss hybrid models (GMM) of usually used list state,
It is popular method, effect is relatively good.
(5) VQ clustering method (such as LBG): effect is relatively good, and algorithm complexity is not high yet and HMM method cooperates more
Better effect can be received.
(6) multinomial classifier methods: there is higher precision, but model storage and calculation amount are all bigger.
Step 45, the control instruction that donor sends is extracted.
Step 46, the control instruction is parsed, and corresponding control row is completed according to the control instruction after parsing
For.
The above embodiment of the present invention, can be with VR in the case where several individuals are talked into test room each other simultaneously
Camera carries out face recognition, confirms donor;Wave beam forming is directed toward authorization human face;Donor issues keyword and control
Word, Application on Voiceprint Recognition keyword, confirmation authorization;Start to manipulate, parse control word, completes controlling behavior.
Wave beam forming pickup is the technology of comparative maturity, but speech recognition scene this for cocktail party effect, solution
It determines or highly difficult.
The recognition of face that the above embodiment of the present invention introduces VR video carrys out secondary beam and is formed, and improves the reliable of authentication
Property, the difficulty of fanaticism speech processes is reduced, the accuracy rate of speech recognition is improved.
The above embodiment of the present invention can realize bio-identification by recognition of face and Application on Voiceprint Recognition, realize face knowledge
Other and Application on Voiceprint Recognition multiple authentication, to strengthen the safety of system.
Fig. 5 is the schematic diagram of inventive microphone array sound pick up equipment first embodiment.As shown in figure 5, the microphone
Array sound pick up equipment may include face recognition module 51, beamforming block 52 and pickup module 53, in which:
Face recognition module 51, is used for, as shown in Fig. 2, carrying out recognition of face using panoramic video, captures the face of donor
Portion;Donor is confirmed using recognition of face;Obtain the azimuth information of authorization human face.
In one embodiment of the invention, face recognition module 51 can be implemented as VRcamera (virtual reality camera shooting
Head).
Beamforming block 52 carries out beam forming for the azimuth information according to power human face.
Pickup module 53, for carrying out pickup using the microphone array of beam forming.
In one embodiment of the invention, pickup module 53 specifically can be implemented as microphone array.
In one embodiment of the invention, microphone array can be using the space of planar array and space topological battle array.
In one embodiment of the invention, pickup module 53 specifically can be used for separating signal according to Wave beam forming, only
Pick up the voice signal in authorization human face orientation.It is listed in the above embodiment of the present invention using the microphone array of beam forming technique
Realize inhibit ambient noise while, can pick up required voice (such as: authorization human face orientation voice signal).
Based on the microphone array sound pick up equipment that the panoramic video secondary beam that the above embodiment of the present invention provides is formed, lead to
Crossing video identification reduces the difficulty of Wave beam forming, solves the problems, such as speech recognition under cocktail party effect, realizes high direction
Property pickup.
The above embodiment of the present invention combines video channel, the problem of indoor pickup is directed toward is simplified, in conjunction with space MIC gusts
Column and beam-forming technology may be implemented relatively good pickup and be directed toward, be effectively reduced the language of noise and other people.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, improves the reliable of system voice Signal separator
Property, therefore the success rate of speech recognition can be improved.
Fig. 6 is the schematic diagram of inventive microphone array sound pick up equipment second embodiment.Compared with embodiment illustrated in fig. 5,
In the embodiment shown in fig. 6, the microphone array sound pick up equipment can also include voiceprint identification module 54 and acoustic control module 55,
Wherein:
Face recognition module 51 is used to confirm donor using recognition of face.
Voiceprint identification module 54, the keyword for being issued using Application on Voiceprint Recognition donor, to further confirm that donor.
Thus the microphone array sound pick up equipment of the above embodiment of the present invention is joined using recognition of face and Application on Voiceprint Recognition
Close authentication.
Acoustic control module 55 extracts the control instruction that donor sends after the authentication is passed in joint;The control is referred to
Order is parsed, and completes corresponding controlling behavior according to the control instruction after parsing.
The above embodiment of the present invention can realize bio-identification by recognition of face and Application on Voiceprint Recognition, realize face knowledge
Other and Application on Voiceprint Recognition multiple authentication, to strengthen the safety of system.
The recognition of face that the above embodiment of the present invention introduces VR video carrys out secondary beam and is formed, and improves the reliable of authentication
Property, the difficulty of fanaticism speech processes is reduced, the accuracy rate of speech recognition is improved.
Fig. 7 is the operation schematic diagram of microphone array sound pick up equipment in one embodiment of the invention.As shown in fig. 7, described
Microphone array sound pick up equipment may include microphone array and VR camera, in which:
VR camera carries out recognition of face, and microphone array carries out Application on Voiceprint Recognition authentication, and the above embodiment of the present invention passes through
The joint of recognition of face and Application on Voiceprint Recognition authentication, strengthens the safety of system.
VR camera carries out recognition of face using panoramic video, captures the face of donor;Obtain recognition of face position
(azimuth information of authorization human face).Later, inventive microphone array sound pick up equipment utilizes the azimuth information for authorizing human face
It generates Wave beam forming and separates signal, thus microphone array can be used for separating signal according to Wave beam forming, only pick up donor
The voice signal in facial orientation.
Thus the microphone array in the above embodiment of the present invention using beam forming technique is listed in realization inhibition ambient noise
While, can pick up required voice (such as: authorization human face orientation voice signal).
Such as: in the case where several individuals are talked into test room each other simultaneously, inventive microphone array is picked up
Mixer can carry out face recognition using VR camera, confirm donor;Wave beam forming is directed toward authorization human face;Donor
Keyword and control word are issued, using Application on Voiceprint Recognition keyword, confirmation authorization;Start to manipulate, parse control word, completes control row
For.
The recognition of face that the above embodiment of the present invention introduces VR video carrys out secondary beam and is formed, and improves the reliable of authentication
Property, the difficulty of fanaticism speech processes is reduced, the accuracy rate of speech recognition is improved.
The above embodiment of the present invention can realize bio-identification by recognition of face and Application on Voiceprint Recognition, realize face knowledge
Other and Application on Voiceprint Recognition multiple authentication, to strengthen the safety of system.
The above embodiment of the present invention can reduce the difficulty of Wave beam forming by video identification, solve cocktail party effect
The problem of lower speech recognition, realizes high directivity pickup.
The above embodiment of the present invention combines video channel, the problem of indoor pickup is directed toward is simplified, in conjunction with space MIC gusts
Column and beam-forming technology may be implemented relatively good pickup and be directed toward, be effectively reduced the language of noise and other people.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, improves the reliable of system voice Signal separator
Property, therefore the success rate of speech recognition can be improved.
The functional units such as beamforming block 52 described above, voiceprint identification module 54, acoustic control module 55 can be with
It is embodied as general processor, the programmable logic controller (PLC) (PLC), Digital Signal Processing for executing function described herein
It is device (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components or it is any appropriately combined.
So far, the present invention is described in detail.In order to avoid covering design of the invention, it is public that this field institute is not described
The some details known.Those skilled in the art as described above, completely it can be appreciated how implementing technology disclosed herein
Scheme.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
Description of the invention is given for the purpose of illustration and description, and is not exhaustively or will be of the invention
It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches
It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage
The solution present invention is to design various embodiments suitable for specific applications with various modifications.
Claims (10)
1. a kind of microphone array sound pick-up method characterized by comprising
Recognition of face is carried out using panoramic video, captures the face of donor, obtains the azimuth information of authorization human face;
According to the azimuth information of power human face, beam forming is carried out;
Pickup is carried out using the microphone array of beam forming.
2. the method according to claim 1, wherein the microphone array using beam forming carries out pickup
Include:
Signal is separated according to Wave beam forming, only picks up the voice signal in authorization human face orientation.
3. method according to claim 1 or 2, which is characterized in that further include:
Joint authentication is carried out using recognition of face and Application on Voiceprint Recognition.
4. according to the method described in claim 3, it is characterized in that, described carry out joint mirror using recognition of face and Application on Voiceprint Recognition
Power includes:
Donor is confirmed using recognition of face;
The keyword issued using Application on Voiceprint Recognition donor, to further confirm that donor.
5. according to the method described in claim 3, it is characterized in that, after in joint, the authentication is passed, further includes:
Extract the control instruction that donor sends;
The control instruction is parsed, and corresponding controlling behavior is completed according to the control instruction after parsing.
6. a kind of microphone array sound pick up equipment characterized by comprising
Face recognition module captures the face of donor, obtains authorization human face for carrying out recognition of face using panoramic video
Azimuth information;
Beamforming block carries out beam forming for the azimuth information according to power human face;
Pickup module, for carrying out pickup using the microphone array of beam forming.
7. device according to claim 6, which is characterized in that
Pickup module is used to separate signal according to Wave beam forming, only picks up the voice signal in authorization human face orientation.
8. device according to claim 6 or 7, which is characterized in that
Microphone array sound pick up equipment carries out joint authentication using recognition of face and Application on Voiceprint Recognition.
9. device according to claim 8, which is characterized in that further include:
Face recognition module is used to confirm donor using recognition of face;
Voiceprint identification module, the keyword for being issued using Application on Voiceprint Recognition donor, to further confirm that donor.
10. device according to claim 8, which is characterized in that further include:
Acoustic control module extracts the control instruction that donor sends after the authentication is passed in joint;The control instruction is carried out
Parsing, and corresponding controlling behavior is completed according to the control instruction after parsing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710608727.1A CN109300475A (en) | 2017-07-25 | 2017-07-25 | Microphone array sound pick-up method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710608727.1A CN109300475A (en) | 2017-07-25 | 2017-07-25 | Microphone array sound pick-up method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109300475A true CN109300475A (en) | 2019-02-01 |
Family
ID=65167613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710608727.1A Pending CN109300475A (en) | 2017-07-25 | 2017-07-25 | Microphone array sound pick-up method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109300475A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110301890A (en) * | 2019-05-31 | 2019-10-08 | 华为技术有限公司 | The method and device of apnea monitoring |
CN111688580A (en) * | 2020-05-29 | 2020-09-22 | 北京百度网讯科技有限公司 | Method and device for picking up sound by intelligent rearview mirror |
CN113301476A (en) * | 2021-03-31 | 2021-08-24 | 阿里巴巴新加坡控股有限公司 | Pickup device and microphone array structure |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030212552A1 (en) * | 2002-05-09 | 2003-11-13 | Liang Lu Hong | Face recognition procedure useful for audiovisual speech recognition |
CN102160398A (en) * | 2008-07-31 | 2011-08-17 | 诺基亚公司 | Electronic device directional audio-video capture |
CN103440686A (en) * | 2013-07-29 | 2013-12-11 | 上海交通大学 | Mobile authentication system and method based on voiceprint recognition, face recognition and location service |
CN104053088A (en) * | 2013-03-11 | 2014-09-17 | 联想(北京)有限公司 | Microphone array adjustment method, microphone array and electronic device |
CN104735598A (en) * | 2013-12-18 | 2015-06-24 | 刘璟锋 | Hearing aid system and voice acquisition method of hearing aid system |
CN105224850A (en) * | 2015-10-24 | 2016-01-06 | 北京进化者机器人科技有限公司 | Combined right-discriminating method and intelligent interactive system |
CN106097495A (en) * | 2016-06-03 | 2016-11-09 | 赵树龙 | A kind of intelligent voice control vocal print face authentication door access control system and method |
CN106599866A (en) * | 2016-12-22 | 2017-04-26 | 上海百芝龙网络科技有限公司 | Multidimensional user identity identification method |
CN106782585A (en) * | 2017-01-26 | 2017-05-31 | 芋头科技(杭州)有限公司 | A kind of sound pick-up method and system based on microphone array |
CN106887236A (en) * | 2015-12-16 | 2017-06-23 | 宁波桑德纳电子科技有限公司 | A kind of remote speech harvester of sound image combined positioning |
-
2017
- 2017-07-25 CN CN201710608727.1A patent/CN109300475A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030212552A1 (en) * | 2002-05-09 | 2003-11-13 | Liang Lu Hong | Face recognition procedure useful for audiovisual speech recognition |
CN102160398A (en) * | 2008-07-31 | 2011-08-17 | 诺基亚公司 | Electronic device directional audio-video capture |
CN104053088A (en) * | 2013-03-11 | 2014-09-17 | 联想(北京)有限公司 | Microphone array adjustment method, microphone array and electronic device |
CN103440686A (en) * | 2013-07-29 | 2013-12-11 | 上海交通大学 | Mobile authentication system and method based on voiceprint recognition, face recognition and location service |
CN104735598A (en) * | 2013-12-18 | 2015-06-24 | 刘璟锋 | Hearing aid system and voice acquisition method of hearing aid system |
CN105224850A (en) * | 2015-10-24 | 2016-01-06 | 北京进化者机器人科技有限公司 | Combined right-discriminating method and intelligent interactive system |
CN106887236A (en) * | 2015-12-16 | 2017-06-23 | 宁波桑德纳电子科技有限公司 | A kind of remote speech harvester of sound image combined positioning |
CN106097495A (en) * | 2016-06-03 | 2016-11-09 | 赵树龙 | A kind of intelligent voice control vocal print face authentication door access control system and method |
CN106599866A (en) * | 2016-12-22 | 2017-04-26 | 上海百芝龙网络科技有限公司 | Multidimensional user identity identification method |
CN106782585A (en) * | 2017-01-26 | 2017-05-31 | 芋头科技(杭州)有限公司 | A kind of sound pick-up method and system based on microphone array |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110301890A (en) * | 2019-05-31 | 2019-10-08 | 华为技术有限公司 | The method and device of apnea monitoring |
WO2020238954A1 (en) * | 2019-05-31 | 2020-12-03 | 华为技术有限公司 | Apnea monitoring method and device |
CN111688580A (en) * | 2020-05-29 | 2020-09-22 | 北京百度网讯科技有限公司 | Method and device for picking up sound by intelligent rearview mirror |
US11631420B2 (en) | 2020-05-29 | 2023-04-18 | Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. | Voice pickup method for intelligent rearview mirror, electronic device and storage medium |
CN113301476A (en) * | 2021-03-31 | 2021-08-24 | 阿里巴巴新加坡控股有限公司 | Pickup device and microphone array structure |
CN113301476B (en) * | 2021-03-31 | 2023-11-14 | 阿里巴巴(中国)有限公司 | Pickup device and microphone array structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107767863B (en) | Voice awakening method and system and intelligent terminal | |
Huang et al. | Audio-visual deep learning for noise robust speech recognition | |
Stiefelhagen et al. | Enabling multimodal human–robot interaction for the karlsruhe humanoid robot | |
EP3050052B1 (en) | Speech recognizer with multi-directional decoding | |
Neti et al. | Audio visual speech recognition | |
EP3156978A1 (en) | A system and a method for secure speaker verification | |
Maheswari et al. | A hybrid model of neural network approach for speaker independent word recognition | |
Saffari et al. | Ava (a social robot): Design and performance of a robotic hearing apparatus | |
CN109300475A (en) | Microphone array sound pick-up method and device | |
Këpuska | Wake-up-word speech recognition | |
Reich et al. | A real-time speech command detector for a smart control room | |
Hamidia et al. | Voice interaction using Gaussian mixture models for augmented reality applications | |
Kolossa et al. | Audiovisual speech recognition with missing or unreliable data. | |
Lecouteux et al. | Distant speech recognition for home automation: Preliminary experimental results in a smart home | |
Gharsellaoui et al. | Automatic emotion recognition using auditory and prosodic indicative features | |
Paleček | Experimenting with lipreading for large vocabulary continuous speech recognition | |
Okuno et al. | Robot audition: Missing feature theory approach and active audition | |
Kandagal et al. | Automatic bimodal audiovisual speech recognition: A review | |
JP7347511B2 (en) | Audio processing device, audio processing method, and program | |
Suo et al. | Using SVM as back-end classifier for language identification | |
WO2019022722A1 (en) | Language identification with speech and visual anthropometric features | |
Murali Karthick et al. | Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM | |
Tamura et al. | Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features. | |
Phyu et al. | Text Independent Speaker Identification for Myanmar Speech | |
Jadczyk | Audio-visual speech processing system for Polish applicable to human-computer interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |