CN109300475A - Microphone array sound pick-up method and device - Google Patents

Microphone array sound pick-up method and device Download PDF

Info

Publication number
CN109300475A
CN109300475A CN201710608727.1A CN201710608727A CN109300475A CN 109300475 A CN109300475 A CN 109300475A CN 201710608727 A CN201710608727 A CN 201710608727A CN 109300475 A CN109300475 A CN 109300475A
Authority
CN
China
Prior art keywords
face
recognition
donor
microphone array
beam forming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710608727.1A
Other languages
Chinese (zh)
Inventor
施隆海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201710608727.1A priority Critical patent/CN109300475A/en
Publication of CN109300475A publication Critical patent/CN109300475A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Collating Specific Patterns (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention discloses a kind of microphone array sound pick-up method and device.The microphone array sound pick-up method includes: to carry out recognition of face using panoramic video, captures the face of donor, obtains the azimuth information of authorization human face;According to the azimuth information of power human face, beam forming is carried out;Pickup is carried out using the microphone array of beam forming.The present invention reduces the difficulty of Wave beam forming by video identification, solves the problems, such as speech recognition under cocktail party effect, realizes high directivity pickup.

Description

Microphone array sound pick-up method and device
Technical field
The present invention relates to field of speech recognition, in particular to a kind of microphone array sound pick-up method and device.
Background technique
Wired home audio access is the hot spot of wired home at present.
The auditory system of people can distinguish and track oneself interested voice in the environment of noisy multiple talkers Signal, and sound required for oneself is told, this resolution capability is one kind specific to inside of human body speech understanding mechanism Sensing capability, that is, the mankind speech Separation ability, referred to as " cocktail party effect ".
Current speech recognition system can reach very high discrimination to clean speech, but when voice is by noise pollution When, system performance can sharply decline.
Summary of the invention
In view of the above technical problem, the present invention provides a kind of microphone array sound pick-up method and devices, are known by video Not Jiang Di Wave beam forming difficulty, realize high directivity pickup.
According to an aspect of the present invention, a kind of microphone array sound pick-up method is provided, comprising:
Recognition of face is carried out using panoramic video, captures the face of donor, obtains the azimuth information of authorization human face;
According to the azimuth information of power human face, beam forming is carried out;
Pickup is carried out using the microphone array of beam forming.
In one embodiment of the invention, described to include: using the microphone array progress pickup of beam forming
Signal is separated according to Wave beam forming, only picks up the voice signal in authorization human face orientation.
In one embodiment of the invention, the method also includes:
Joint authentication is carried out using recognition of face and Application on Voiceprint Recognition.
In one embodiment of the invention, it is described using recognition of face and Application on Voiceprint Recognition carry out joint authentication include:
Donor is confirmed using recognition of face;
The keyword issued using Application on Voiceprint Recognition donor, to further confirm that donor.
In one embodiment of the invention, after in joint, the authentication is passed, the method also includes:
Extract the control instruction that donor sends;
The control instruction is parsed, and corresponding controlling behavior is completed according to the control instruction after parsing.
According to another aspect of the present invention, a kind of microphone array sound pick up equipment is provided, comprising:
Face recognition module captures the face of donor, obtains donor for carrying out recognition of face using panoramic video The azimuth information of face;
Beamforming block carries out beam forming for the azimuth information according to power human face;
Pickup module, for carrying out pickup using the microphone array of beam forming.
In one embodiment of the invention, pickup module is used to separate signal according to Wave beam forming, only picks up donor The voice signal in facial orientation.
In one embodiment of the invention, microphone array sound pick up equipment is joined using recognition of face and Application on Voiceprint Recognition Close authentication.
In one embodiment of the invention, described device further include:
Face recognition module is used to confirm donor using recognition of face;
Voiceprint identification module, the keyword for being issued using Application on Voiceprint Recognition donor, to further confirm that donor.
In one embodiment of the invention, described device further include:
Acoustic control module extracts the control instruction that donor sends after the authentication is passed in joint;To the control instruction It is parsed, and corresponding controlling behavior is completed according to the control instruction after parsing.
The present invention reduces the difficulty of Wave beam forming by video identification, solves the difficulty of speech recognition under cocktail party effect Topic, realizes high directivity pickup.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the schematic diagram of inventive microphone array sound pick-up method first embodiment.
Fig. 2 is the schematic diagram of recognition of face in one embodiment of the invention.
Fig. 3 is the contrast schematic diagram of single microphone and microphone array pickup in one embodiment of the invention.
Fig. 4 is the schematic diagram of inventive microphone array sound pick-up method second embodiment.
Fig. 5 is the schematic diagram of inventive microphone array sound pick up equipment first embodiment.
Fig. 6 is the schematic diagram of inventive microphone array sound pick up equipment second embodiment.
Fig. 7 is the operation schematic diagram of microphone array sound pick up equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Below Description only actually at least one exemplary embodiment be it is illustrative, never as to the present invention and its application or make Any restrictions.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Unless specifically stated otherwise, positioned opposite, the digital table of the component and step that otherwise illustrate in these embodiments It is not limited the scope of the invention up to formula and numerical value.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as authorizing part of specification.
It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
Fig. 1 is the schematic diagram of inventive microphone array sound pick-up method first embodiment.Preferably, the present embodiment can be by this Invention microphone array sound pick up equipment executes.Method includes the following steps:
Step 11, as shown in Fig. 2, carrying out recognition of face using panoramic video, the face of donor is captured;Known using face It Que Ren not donor;Obtain the azimuth information of authorization human face.
In one embodiment of the invention, in step 11, specifically may be used using the step of panoramic video progress recognition of face To include: to carry out recognition of face using VR camera (virtual reality camera) panoramic video.
Step 12, according to the azimuth information of power human face, beam forming is carried out.
Step 13, pickup is carried out using the microphone array of beam forming.
Wherein, microphone array (Microphone Array) increases a spatial domain on the basis of time domain and frequency domain, Empty time-frequency combined processing is carried out to the signal from space different direction, microphone array inherits the related calculation of aerial array Method, while the method for some single microphone speech processes is absorbed again.Microphone array has spatial selectivity, is capturing it While the high-quality signal of specific direction, and reduce noise and other interference.It speaks in addition, microphone array is not necessarily limited The activity of people can detect automatically, position and track speaker in its receiving area.
In one embodiment of the invention, microphone array can be using the space of planar array and space topological battle array.
Fig. 3 is the contrast schematic diagram of single microphone and microphone array pickup in one embodiment of the invention.Such as Fig. 3 institute Show, single microphone picks up required voice with ambient noise simultaneously;Realization suppression is listed in using the microphone array of beam forming technique Voice needed for being picked up while ambient noise processed (such as: the voice signal in authorization human face orientation).
In one embodiment of the invention, step 13 can specifically include: separating signal according to Wave beam forming, only picks up Authorize the voice signal in human face orientation.
Based on the microphone array sound pick-up method that the panoramic video secondary beam that the above embodiment of the present invention provides is formed, lead to Crossing video identification reduces the difficulty of Wave beam forming, solves the problems, such as speech recognition under cocktail party effect, realizes high direction Property pickup.
The above embodiment of the present invention combines video channel, the problem of indoor pickup is directed toward is simplified, in conjunction with space MIC gusts Column and beam-forming technology may be implemented relatively good pickup and be directed toward, be effectively reduced the language of noise and other people.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, wherein blind speech separation refers in source voice It is only extensive by observation signal according to the statistical property of input source voice signal in the case that signal and transport channel parameters are unknown It appears again the process of each source signal." blind " has double meaning: first is that source voice signal cannot be observed;How two refer to source signal Aliasing is unknown.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, improves the reliable of system voice Signal separator Property, therefore the success rate of speech recognition can be improved.
Fig. 4 is the schematic diagram of inventive microphone array sound pick-up method second embodiment.Preferably, the present embodiment can be by this Invention microphone array sound pick up equipment executes.Method includes the following steps:
Step 41, as shown in Fig. 2, carrying out recognition of face using panoramic video, the face of donor is captured;Known using face It Que Ren not donor;Obtain the azimuth information of authorization human face.
Step 42, according to the azimuth information of power human face, beam forming is carried out.
Step 43, pickup is carried out using the microphone array of beam forming.
Step 44, the keyword issued using Application on Voiceprint Recognition donor, to further confirm that donor.Thus in the present invention State that embodiment can use recognition of face and Application on Voiceprint Recognition carries out joint authentication.
In one embodiment of the invention, Application on Voiceprint Recognition may include:
Step 441, voice is collected.
Step 442, noise suppressed and speech detection are carried out.
Step 443, feature extraction is carried out.
The task of feature extraction is to extract and select have the characteristics such as separability is strong, stability is high to the vocal print of speaker Acoustics or language feature.Different from speech recognition, the feature of Application on Voiceprint Recognition must be " personalization " feature, and Speaker Identification Feature must be " common feature " for speaker.
Although at present major part Voiceprint Recognition System be all acoustics level feature, one personal touch of characterization Feature should be multifaceted.
In an embodiment of the invention, features described above may include: the anatomical structure of (1) and the pronunciation mechanism of the mankind Related acoustic feature (such as frequency spectrum, cepstrum, formant, fundamental tone, reflection coefficient), nasal sound, band deep breathing sound, hoarse Sound, laugh etc.;(2) semanteme, the rhetoric, pronunciation, speech habit influenced by socioeconomic status, education level, birthplace etc. Deng;(3) features such as personal touch or the rhythm influenced by parent, rhythm, speed, intonation, volume.
Step 444, sound modeling is carried out.
In an embodiment of the invention, the feature that vocal print automatic identification model can be used includes: (1) acoustic feature (cepstrum);(2) lexical characteristics (speaker relevant word n-gram, phoneme n-gram);(3) prosodic features (utilizes n-gram The fundamental tone and energy " posture " of description);(4) languages, dialect and accent information;(5) channel information (which kind of channel used);Deng Deng.
Step 445, identification matching is carried out.
In an embodiment of the invention, the identification matching in step 445 may include following a few major class methods:
(1) template matching method: main to use using dynamic time bending (DTW) to be directed at trained and test feature sequence In the application (usually text inter-related task) of fixed phrases.
(2) arest neighbors method: retaining all characteristic vectors when training, and when identification finds in trained vector each vector Nearest K, are identified accordingly, and the amount of usual model storage and similar calculating is all very big.
(3) neural network method: there are many kinds of forms, such as Multilayer Perception, radial basis function (RBF), can explicitly instruct Practice to distinguish speaker and its background speaker, training burden is very big, and the replicability of model is bad.
(4) hidden Markov model (HMM) method: the HMM or gauss hybrid models (GMM) of usually used list state, It is popular method, effect is relatively good.
(5) VQ clustering method (such as LBG): effect is relatively good, and algorithm complexity is not high yet and HMM method cooperates more Better effect can be received.
(6) multinomial classifier methods: there is higher precision, but model storage and calculation amount are all bigger.
Step 45, the control instruction that donor sends is extracted.
Step 46, the control instruction is parsed, and corresponding control row is completed according to the control instruction after parsing For.
The above embodiment of the present invention, can be with VR in the case where several individuals are talked into test room each other simultaneously Camera carries out face recognition, confirms donor;Wave beam forming is directed toward authorization human face;Donor issues keyword and control Word, Application on Voiceprint Recognition keyword, confirmation authorization;Start to manipulate, parse control word, completes controlling behavior.
Wave beam forming pickup is the technology of comparative maturity, but speech recognition scene this for cocktail party effect, solution It determines or highly difficult.
The recognition of face that the above embodiment of the present invention introduces VR video carrys out secondary beam and is formed, and improves the reliable of authentication Property, the difficulty of fanaticism speech processes is reduced, the accuracy rate of speech recognition is improved.
The above embodiment of the present invention can realize bio-identification by recognition of face and Application on Voiceprint Recognition, realize face knowledge Other and Application on Voiceprint Recognition multiple authentication, to strengthen the safety of system.
Fig. 5 is the schematic diagram of inventive microphone array sound pick up equipment first embodiment.As shown in figure 5, the microphone Array sound pick up equipment may include face recognition module 51, beamforming block 52 and pickup module 53, in which:
Face recognition module 51, is used for, as shown in Fig. 2, carrying out recognition of face using panoramic video, captures the face of donor Portion;Donor is confirmed using recognition of face;Obtain the azimuth information of authorization human face.
In one embodiment of the invention, face recognition module 51 can be implemented as VRcamera (virtual reality camera shooting Head).
Beamforming block 52 carries out beam forming for the azimuth information according to power human face.
Pickup module 53, for carrying out pickup using the microphone array of beam forming.
In one embodiment of the invention, pickup module 53 specifically can be implemented as microphone array.
In one embodiment of the invention, microphone array can be using the space of planar array and space topological battle array.
In one embodiment of the invention, pickup module 53 specifically can be used for separating signal according to Wave beam forming, only Pick up the voice signal in authorization human face orientation.It is listed in the above embodiment of the present invention using the microphone array of beam forming technique Realize inhibit ambient noise while, can pick up required voice (such as: authorization human face orientation voice signal).
Based on the microphone array sound pick up equipment that the panoramic video secondary beam that the above embodiment of the present invention provides is formed, lead to Crossing video identification reduces the difficulty of Wave beam forming, solves the problems, such as speech recognition under cocktail party effect, realizes high direction Property pickup.
The above embodiment of the present invention combines video channel, the problem of indoor pickup is directed toward is simplified, in conjunction with space MIC gusts Column and beam-forming technology may be implemented relatively good pickup and be directed toward, be effectively reduced the language of noise and other people.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, improves the reliable of system voice Signal separator Property, therefore the success rate of speech recognition can be improved.
Fig. 6 is the schematic diagram of inventive microphone array sound pick up equipment second embodiment.Compared with embodiment illustrated in fig. 5, In the embodiment shown in fig. 6, the microphone array sound pick up equipment can also include voiceprint identification module 54 and acoustic control module 55, Wherein:
Face recognition module 51 is used to confirm donor using recognition of face.
Voiceprint identification module 54, the keyword for being issued using Application on Voiceprint Recognition donor, to further confirm that donor.
Thus the microphone array sound pick up equipment of the above embodiment of the present invention is joined using recognition of face and Application on Voiceprint Recognition Close authentication.
Acoustic control module 55 extracts the control instruction that donor sends after the authentication is passed in joint;The control is referred to Order is parsed, and completes corresponding controlling behavior according to the control instruction after parsing.
The above embodiment of the present invention can realize bio-identification by recognition of face and Application on Voiceprint Recognition, realize face knowledge Other and Application on Voiceprint Recognition multiple authentication, to strengthen the safety of system.
The recognition of face that the above embodiment of the present invention introduces VR video carrys out secondary beam and is formed, and improves the reliable of authentication Property, the difficulty of fanaticism speech processes is reduced, the accuracy rate of speech recognition is improved.
Fig. 7 is the operation schematic diagram of microphone array sound pick up equipment in one embodiment of the invention.As shown in fig. 7, described Microphone array sound pick up equipment may include microphone array and VR camera, in which:
VR camera carries out recognition of face, and microphone array carries out Application on Voiceprint Recognition authentication, and the above embodiment of the present invention passes through The joint of recognition of face and Application on Voiceprint Recognition authentication, strengthens the safety of system.
VR camera carries out recognition of face using panoramic video, captures the face of donor;Obtain recognition of face position (azimuth information of authorization human face).Later, inventive microphone array sound pick up equipment utilizes the azimuth information for authorizing human face It generates Wave beam forming and separates signal, thus microphone array can be used for separating signal according to Wave beam forming, only pick up donor The voice signal in facial orientation.
Thus the microphone array in the above embodiment of the present invention using beam forming technique is listed in realization inhibition ambient noise While, can pick up required voice (such as: authorization human face orientation voice signal).
Such as: in the case where several individuals are talked into test room each other simultaneously, inventive microphone array is picked up Mixer can carry out face recognition using VR camera, confirm donor;Wave beam forming is directed toward authorization human face;Donor Keyword and control word are issued, using Application on Voiceprint Recognition keyword, confirmation authorization;Start to manipulate, parse control word, completes control row For.
The recognition of face that the above embodiment of the present invention introduces VR video carrys out secondary beam and is formed, and improves the reliable of authentication Property, the difficulty of fanaticism speech processes is reduced, the accuracy rate of speech recognition is improved.
The above embodiment of the present invention can realize bio-identification by recognition of face and Application on Voiceprint Recognition, realize face knowledge Other and Application on Voiceprint Recognition multiple authentication, to strengthen the safety of system.
The above embodiment of the present invention can reduce the difficulty of Wave beam forming by video identification, solve cocktail party effect The problem of lower speech recognition, realizes high directivity pickup.
The above embodiment of the present invention combines video channel, the problem of indoor pickup is directed toward is simplified, in conjunction with space MIC gusts Column and beam-forming technology may be implemented relatively good pickup and be directed toward, be effectively reduced the language of noise and other people.
The above embodiment of the present invention simplifies the difficulty of blind source signal separation, improves the reliable of system voice Signal separator Property, therefore the success rate of speech recognition can be improved.
The functional units such as beamforming block 52 described above, voiceprint identification module 54, acoustic control module 55 can be with It is embodied as general processor, the programmable logic controller (PLC) (PLC), Digital Signal Processing for executing function described herein It is device (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components or it is any appropriately combined.
So far, the present invention is described in detail.In order to avoid covering design of the invention, it is public that this field institute is not described The some details known.Those skilled in the art as described above, completely it can be appreciated how implementing technology disclosed herein Scheme.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
Description of the invention is given for the purpose of illustration and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage The solution present invention is to design various embodiments suitable for specific applications with various modifications.

Claims (10)

1. a kind of microphone array sound pick-up method characterized by comprising
Recognition of face is carried out using panoramic video, captures the face of donor, obtains the azimuth information of authorization human face;
According to the azimuth information of power human face, beam forming is carried out;
Pickup is carried out using the microphone array of beam forming.
2. the method according to claim 1, wherein the microphone array using beam forming carries out pickup Include:
Signal is separated according to Wave beam forming, only picks up the voice signal in authorization human face orientation.
3. method according to claim 1 or 2, which is characterized in that further include:
Joint authentication is carried out using recognition of face and Application on Voiceprint Recognition.
4. according to the method described in claim 3, it is characterized in that, described carry out joint mirror using recognition of face and Application on Voiceprint Recognition Power includes:
Donor is confirmed using recognition of face;
The keyword issued using Application on Voiceprint Recognition donor, to further confirm that donor.
5. according to the method described in claim 3, it is characterized in that, after in joint, the authentication is passed, further includes:
Extract the control instruction that donor sends;
The control instruction is parsed, and corresponding controlling behavior is completed according to the control instruction after parsing.
6. a kind of microphone array sound pick up equipment characterized by comprising
Face recognition module captures the face of donor, obtains authorization human face for carrying out recognition of face using panoramic video Azimuth information;
Beamforming block carries out beam forming for the azimuth information according to power human face;
Pickup module, for carrying out pickup using the microphone array of beam forming.
7. device according to claim 6, which is characterized in that
Pickup module is used to separate signal according to Wave beam forming, only picks up the voice signal in authorization human face orientation.
8. device according to claim 6 or 7, which is characterized in that
Microphone array sound pick up equipment carries out joint authentication using recognition of face and Application on Voiceprint Recognition.
9. device according to claim 8, which is characterized in that further include:
Face recognition module is used to confirm donor using recognition of face;
Voiceprint identification module, the keyword for being issued using Application on Voiceprint Recognition donor, to further confirm that donor.
10. device according to claim 8, which is characterized in that further include:
Acoustic control module extracts the control instruction that donor sends after the authentication is passed in joint;The control instruction is carried out Parsing, and corresponding controlling behavior is completed according to the control instruction after parsing.
CN201710608727.1A 2017-07-25 2017-07-25 Microphone array sound pick-up method and device Pending CN109300475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710608727.1A CN109300475A (en) 2017-07-25 2017-07-25 Microphone array sound pick-up method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710608727.1A CN109300475A (en) 2017-07-25 2017-07-25 Microphone array sound pick-up method and device

Publications (1)

Publication Number Publication Date
CN109300475A true CN109300475A (en) 2019-02-01

Family

ID=65167613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710608727.1A Pending CN109300475A (en) 2017-07-25 2017-07-25 Microphone array sound pick-up method and device

Country Status (1)

Country Link
CN (1) CN109300475A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110301890A (en) * 2019-05-31 2019-10-08 华为技术有限公司 The method and device of apnea monitoring
CN111688580A (en) * 2020-05-29 2020-09-22 北京百度网讯科技有限公司 Method and device for picking up sound by intelligent rearview mirror
CN113301476A (en) * 2021-03-31 2021-08-24 阿里巴巴新加坡控股有限公司 Pickup device and microphone array structure

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030212552A1 (en) * 2002-05-09 2003-11-13 Liang Lu Hong Face recognition procedure useful for audiovisual speech recognition
CN102160398A (en) * 2008-07-31 2011-08-17 诺基亚公司 Electronic device directional audio-video capture
CN103440686A (en) * 2013-07-29 2013-12-11 上海交通大学 Mobile authentication system and method based on voiceprint recognition, face recognition and location service
CN104053088A (en) * 2013-03-11 2014-09-17 联想(北京)有限公司 Microphone array adjustment method, microphone array and electronic device
CN104735598A (en) * 2013-12-18 2015-06-24 刘璟锋 Hearing aid system and voice acquisition method of hearing aid system
CN105224850A (en) * 2015-10-24 2016-01-06 北京进化者机器人科技有限公司 Combined right-discriminating method and intelligent interactive system
CN106097495A (en) * 2016-06-03 2016-11-09 赵树龙 A kind of intelligent voice control vocal print face authentication door access control system and method
CN106599866A (en) * 2016-12-22 2017-04-26 上海百芝龙网络科技有限公司 Multidimensional user identity identification method
CN106782585A (en) * 2017-01-26 2017-05-31 芋头科技(杭州)有限公司 A kind of sound pick-up method and system based on microphone array
CN106887236A (en) * 2015-12-16 2017-06-23 宁波桑德纳电子科技有限公司 A kind of remote speech harvester of sound image combined positioning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030212552A1 (en) * 2002-05-09 2003-11-13 Liang Lu Hong Face recognition procedure useful for audiovisual speech recognition
CN102160398A (en) * 2008-07-31 2011-08-17 诺基亚公司 Electronic device directional audio-video capture
CN104053088A (en) * 2013-03-11 2014-09-17 联想(北京)有限公司 Microphone array adjustment method, microphone array and electronic device
CN103440686A (en) * 2013-07-29 2013-12-11 上海交通大学 Mobile authentication system and method based on voiceprint recognition, face recognition and location service
CN104735598A (en) * 2013-12-18 2015-06-24 刘璟锋 Hearing aid system and voice acquisition method of hearing aid system
CN105224850A (en) * 2015-10-24 2016-01-06 北京进化者机器人科技有限公司 Combined right-discriminating method and intelligent interactive system
CN106887236A (en) * 2015-12-16 2017-06-23 宁波桑德纳电子科技有限公司 A kind of remote speech harvester of sound image combined positioning
CN106097495A (en) * 2016-06-03 2016-11-09 赵树龙 A kind of intelligent voice control vocal print face authentication door access control system and method
CN106599866A (en) * 2016-12-22 2017-04-26 上海百芝龙网络科技有限公司 Multidimensional user identity identification method
CN106782585A (en) * 2017-01-26 2017-05-31 芋头科技(杭州)有限公司 A kind of sound pick-up method and system based on microphone array

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110301890A (en) * 2019-05-31 2019-10-08 华为技术有限公司 The method and device of apnea monitoring
WO2020238954A1 (en) * 2019-05-31 2020-12-03 华为技术有限公司 Apnea monitoring method and device
CN111688580A (en) * 2020-05-29 2020-09-22 北京百度网讯科技有限公司 Method and device for picking up sound by intelligent rearview mirror
US11631420B2 (en) 2020-05-29 2023-04-18 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Voice pickup method for intelligent rearview mirror, electronic device and storage medium
CN113301476A (en) * 2021-03-31 2021-08-24 阿里巴巴新加坡控股有限公司 Pickup device and microphone array structure
CN113301476B (en) * 2021-03-31 2023-11-14 阿里巴巴(中国)有限公司 Pickup device and microphone array structure

Similar Documents

Publication Publication Date Title
CN107767863B (en) Voice awakening method and system and intelligent terminal
Huang et al. Audio-visual deep learning for noise robust speech recognition
Stiefelhagen et al. Enabling multimodal human–robot interaction for the karlsruhe humanoid robot
EP3050052B1 (en) Speech recognizer with multi-directional decoding
Neti et al. Audio visual speech recognition
EP3156978A1 (en) A system and a method for secure speaker verification
Maheswari et al. A hybrid model of neural network approach for speaker independent word recognition
Saffari et al. Ava (a social robot): Design and performance of a robotic hearing apparatus
CN109300475A (en) Microphone array sound pick-up method and device
Këpuska Wake-up-word speech recognition
Reich et al. A real-time speech command detector for a smart control room
Hamidia et al. Voice interaction using Gaussian mixture models for augmented reality applications
Kolossa et al. Audiovisual speech recognition with missing or unreliable data.
Lecouteux et al. Distant speech recognition for home automation: Preliminary experimental results in a smart home
Gharsellaoui et al. Automatic emotion recognition using auditory and prosodic indicative features
Paleček Experimenting with lipreading for large vocabulary continuous speech recognition
Okuno et al. Robot audition: Missing feature theory approach and active audition
Kandagal et al. Automatic bimodal audiovisual speech recognition: A review
JP7347511B2 (en) Audio processing device, audio processing method, and program
Suo et al. Using SVM as back-end classifier for language identification
WO2019022722A1 (en) Language identification with speech and visual anthropometric features
Murali Karthick et al. Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM
Tamura et al. Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features.
Phyu et al. Text Independent Speaker Identification for Myanmar Speech
Jadczyk Audio-visual speech processing system for Polish applicable to human-computer interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201