CN107369449A - A kind of efficient voice recognition methods and device - Google Patents
A kind of efficient voice recognition methods and device Download PDFInfo
- Publication number
- CN107369449A CN107369449A CN201710573521.XA CN201710573521A CN107369449A CN 107369449 A CN107369449 A CN 107369449A CN 201710573521 A CN201710573521 A CN 201710573521A CN 107369449 A CN107369449 A CN 107369449A
- Authority
- CN
- China
- Prior art keywords
- voice
- time point
- image
- dehiscing
- sound object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Abstract
Description
Claims (10)
- A kind of 1. efficient voice recognition methods, it is characterised in that including:The speech data of sound object is recorded, while obtains the face image data of the sound object;ASR identifications are carried out to the speech data, obtain ASR recognition results, the ASR recognition results are included in some voices Appearance and corresponding voice record time point;Feature recognition of dehiscing is carried out to the face image data of the sound object, some frames is obtained and dehisces image and described per frame Dehisce image acquisition time point corresponding to image;AndCompare corresponding to every voice content in the front and rear preset time range at voice record time point, if having corresponding Dehisce image image acquisition time point within this range;If so, voice content corresponding to record is efficient voice.
- 2. according to the method for claim 1, it is characterised in that voice record time point corresponding to every voice content For:Record time point that every voice content starts, record time point among every voice content or, note Record the time point that every voice content terminates.
- 3. according to the method for claim 1, it is characterised in that the face image data tool for obtaining the sound object Body includes:Camera detects the face of the sound object;The face is focused on, face scope is occupied the preset value of the cam lens;AndObtain the face image data of the sound object.
- 4. according to the method for claim 1, it is characterised in that the face image data to the sound object is carried out Feature recognition of dehiscing specifically includes:Position the nozzle type feature of the face image data;AndJudge the ratio dehisced highly with lip height of nozzle type whether than or equal to default ratio;When than or equal to, The described face image data of identification is image of dehiscing;Wherein, the height of dehiscing is upper lip lower edge and lower lip top The distance between edge, the lip are highly the distance between upper lip top edge and lower lip lower edge.
- 5. according to the method for claim 1, it is characterised in that voice record time point corresponding to every voice content Front and rear preset time range be front and rear 1 second of the voice record time point.
- A kind of 6. efficient voice identification device, it is characterised in that including:Recording device, for recording the speech data of sound object;Camera device, for the face image data with sound object described in the recording device synchronous recording;ASR identification devices, for carrying out ASR identifications to the speech data, obtain ASR recognition results, the ASR recognition results Including some voice contents and corresponding voice record time point;Image arrangement for detecting, for carrying out feature recognition of dehiscing to the face image data of the sound object, obtain some frames Dehisce image and described to dehisce image acquisition time point corresponding to image per frame;AndEfficient voice extraction element, when being preset for comparing before and after voice record time point corresponding to every voice content Between in the range of, if having the image acquisition time point of corresponding image of dehiscing within this range;If so, corresponding to record in voice Hold for efficient voice.
- 7. device according to claim 6, it is characterised in that voice record time point corresponding to every voice content For:Record time point that every voice content starts, record time point among every voice content or, note Record the time point that every voice content terminates.
- 8. device according to claim 6, it is characterised in that the camera device is specifically used for:Detect the sound source The face of object;The face is focused on, face scope is occupied the preset value of the camera head lens;Obtain the sound source pair The face image data of elephant.
- 9. device according to claim 6, it is characterised in that described image arrangement for detecting is specifically used for:Position the face The nozzle type feature of portion's view data;And judge the ratio dehisced highly with lip height of nozzle type whether than or equal to default Ratio;When than or equal to it is image of dehiscing to identify described face image data;Wherein, the height of dehiscing is upper mouth The distance between lip lower edge and lower lip top edge, the lip is highly between upper lip top edge and lower lip lower edge Distance.
- 10. device according to claim 6, it is characterised in that voice record time corresponding to every voice content The front and rear preset time range of point is front and rear 1 second of the voice record time point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710573521.XA CN107369449B (en) | 2017-07-14 | 2017-07-14 | A kind of efficient voice recognition methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710573521.XA CN107369449B (en) | 2017-07-14 | 2017-07-14 | A kind of efficient voice recognition methods and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107369449A true CN107369449A (en) | 2017-11-21 |
CN107369449B CN107369449B (en) | 2019-11-26 |
Family
ID=60307217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710573521.XA Active CN107369449B (en) | 2017-07-14 | 2017-07-14 | A kind of efficient voice recognition methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107369449B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154878A (en) * | 2017-12-12 | 2018-06-12 | 北京小米移动软件有限公司 | Control the method and device of monitoring device |
CN109697976A (en) * | 2018-12-14 | 2019-04-30 | 北京葡萄智学科技有限公司 | A kind of pronunciation recognition methods and device |
WO2019227552A1 (en) * | 2018-06-01 | 2019-12-05 | 深圳市鹰硕技术有限公司 | Behavior recognition-based speech positioning method and device |
CN115250373A (en) * | 2022-06-30 | 2022-10-28 | 北京随锐会见科技有限公司 | Method for synchronously calibrating audio and video stream and related product |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0254409B1 (en) * | 1986-07-25 | 1991-10-30 | Smiths Industries Public Limited Company | Speech recognition apparatus and methods |
CN102013103A (en) * | 2010-12-03 | 2011-04-13 | 上海交通大学 | Method for dynamically tracking lip in real time |
CN202329640U (en) * | 2011-08-19 | 2012-07-11 | 广东好帮手电子科技股份有限公司 | System for applying auxiliary voice recognition technology by mouth shape in vehicular navigation |
CN103218842A (en) * | 2013-03-12 | 2013-07-24 | 西南交通大学 | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method |
CN104409075A (en) * | 2014-11-28 | 2015-03-11 | 深圳创维-Rgb电子有限公司 | Voice identification method and system |
CN106155707A (en) * | 2015-03-23 | 2016-11-23 | 联想(北京)有限公司 | Information processing method and electronic equipment |
-
2017
- 2017-07-14 CN CN201710573521.XA patent/CN107369449B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0254409B1 (en) * | 1986-07-25 | 1991-10-30 | Smiths Industries Public Limited Company | Speech recognition apparatus and methods |
CN102013103A (en) * | 2010-12-03 | 2011-04-13 | 上海交通大学 | Method for dynamically tracking lip in real time |
CN202329640U (en) * | 2011-08-19 | 2012-07-11 | 广东好帮手电子科技股份有限公司 | System for applying auxiliary voice recognition technology by mouth shape in vehicular navigation |
CN103218842A (en) * | 2013-03-12 | 2013-07-24 | 西南交通大学 | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method |
CN104409075A (en) * | 2014-11-28 | 2015-03-11 | 深圳创维-Rgb电子有限公司 | Voice identification method and system |
CN106155707A (en) * | 2015-03-23 | 2016-11-23 | 联想(北京)有限公司 | Information processing method and electronic equipment |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154878A (en) * | 2017-12-12 | 2018-06-12 | 北京小米移动软件有限公司 | Control the method and device of monitoring device |
WO2019227552A1 (en) * | 2018-06-01 | 2019-12-05 | 深圳市鹰硕技术有限公司 | Behavior recognition-based speech positioning method and device |
CN109697976A (en) * | 2018-12-14 | 2019-04-30 | 北京葡萄智学科技有限公司 | A kind of pronunciation recognition methods and device |
CN115250373A (en) * | 2022-06-30 | 2022-10-28 | 北京随锐会见科技有限公司 | Method for synchronously calibrating audio and video stream and related product |
Also Published As
Publication number | Publication date |
---|---|
CN107369449B (en) | 2019-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107369449A (en) | A kind of efficient voice recognition methods and device | |
US7219062B2 (en) | Speech activity detection using acoustic and facial characteristics in an automatic speech recognition system | |
CN104937519B (en) | Device and method for controlling augmented reality equipment | |
CN105100356B (en) | The method and system that a kind of volume automatically adjusts | |
CN106782585A (en) | A kind of sound pick-up method and system based on microphone array | |
US11227638B2 (en) | Method, system, medium, and smart device for cutting video using video content | |
CN106294774A (en) | User individual data processing method based on dialogue service and device | |
CN109410957A (en) | Positive human-computer interaction audio recognition method and system based on computer vision auxiliary | |
CN109754801A (en) | A kind of voice interactive system and method based on gesture identification | |
CN109905764A (en) | Target person voice intercept method and device in a kind of video | |
CN109558788B (en) | Silence voice input identification method, computing device and computer readable medium | |
CN107545887A (en) | Phonetic order processing method and processing device | |
CN109935226A (en) | A kind of far field speech recognition enhancing system and method based on deep neural network | |
CN106157957A (en) | Audio recognition method, device and subscriber equipment | |
WO2017219450A1 (en) | Information processing method and device, and mobile terminal | |
CN107863098A (en) | A kind of voice identification control method and device | |
CN106648760A (en) | Terminal and method thereof for cleaning background application programs based on face recognition | |
Valbonesi et al. | Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures | |
CN105869636A (en) | Speech recognition apparatus and method thereof, smart television set and control method thereof | |
US11819996B2 (en) | Expression feedback method and smart robot | |
JPH1173297A (en) | Recognition method using timely relation of multi-modal expression with voice and gesture | |
CN107274895A (en) | A kind of speech recognition apparatus and method | |
Gebre et al. | The gesturer is the speaker | |
CN108600511A (en) | The control system and method for intelligent sound assistant's equipment | |
CN105227744B (en) | The method and apparatus of recording call content in communication terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003 Applicant after: SHANGHAI MROBOT TECHNOLOGY Co.,Ltd. Address before: Room 402, Building 33 Guangshun Road, Changning District, Shanghai, 2003 Applicant before: SHANGHAI MUYE ROBOT TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 200335 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Patentee after: Shanghai zhihuilin Medical Technology Co.,Ltd. Address before: 200335 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Patentee before: Shanghai Zhihui Medical Technology Co.,Ltd. Address after: 200335 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Patentee after: Shanghai Zhihui Medical Technology Co.,Ltd. Address before: 200335 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Patentee before: SHANGHAI MROBOT TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210812 Address after: 200335 Room 401, floor 4, building 2, No. 33, Guangshun Road, Changning District, Shanghai Patentee after: Noah robot technology (Shanghai) Co.,Ltd. Address before: 200335 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Patentee before: Shanghai zhihuilin Medical Technology Co.,Ltd. |