KR20170052082A - Method and apparatus for voice recognition based on infrared detection - Google Patents
Method and apparatus for voice recognition based on infrared detection Download PDFInfo
- Publication number
- KR20170052082A KR20170052082A KR1020150154045A KR20150154045A KR20170052082A KR 20170052082 A KR20170052082 A KR 20170052082A KR 1020150154045 A KR1020150154045 A KR 1020150154045A KR 20150154045 A KR20150154045 A KR 20150154045A KR 20170052082 A KR20170052082 A KR 20170052082A
- Authority
- KR
- South Korea
- Prior art keywords
- speech
- image
- data
- speech recognition
- vocal
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
Abstract
The present invention relates to a method and apparatus for detecting infrared-based speech, and more particularly, to a method and apparatus for recognizing an infrared ray based on infrared rays, comprising the steps of receiving vocal data including vocal image data and vocal sound data from a vocalizing source, Processing a vocal sound image based on vocal image data with a vocal airflow image based on infrared detection and a vocal mouth image based on visible light detection; generating a voice recognition voice feature on the basis of the vocal sound data; And outputting a speech recognition result through pattern recognition based on the speech recognition speech feature and the speech recognition image feature. The speech recognition method according to claim 1, further comprising the steps of: A method and apparatus for recognizing infrared based speech recognition A.
Description
BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an infrared detection-based speech recognition method and apparatus, and more particularly, to an infrared detection-based speech recognition method and apparatus capable of detecting a sound of an object by infrared rays.
As speech recognition has been widely used in real life, speech recognition technology has greatly improved. In general, speech recognition recognizes and analyzes a human voice, so that a device such as a computer can understand the language of a person. That is, speech recognition has been developed as a new input method so that a user can use the device more conveniently.
Speech recognition has been developed in a way that recognizes the human voice and more accurately recognizes the human voice. Accordingly, a method of analyzing human utterance as acoustic data has been widely used. For example, there is a technique of separating the speech and noise of a speaking person in order to clearly recognize the speech of the speaking person in an environment where noises exist.
However, there are many cases in which the accuracy and the use conditions are limited in the technique of using only human speech for speech recognition. Accordingly, not only acoustic data according to human voice but also visual data are used as data for voice recognition. For example, visual data of the lips shape of the speaker together with acoustic data of a speaker can be used for speech recognition.
Thus, the speech recognition method using the lip shape of the speaker as the visual data has a problem that the speaker must be photographed from the front. In addition, the speech recognition method using only acoustic data has a problem that the speech recognition success rate is very low when the signal-to-noise ratio (SNR) is very small, such as when the environment is very noisy or the voice of the speaker is very small Lt; / RTI >
Accordingly, there is an increasing need to provide a method that can be used for voice recognition, as well as sound data of a speaking person and airflow generated from a mouth of a speaking person.
[Related Technical Literature]
A speech recognition apparatus and a speech recognition method thereof (Korean Patent Publication No. 10-2014-0024536)
SUMMARY OF THE INVENTION It is an object of the present invention to provide an infrared detection-based speech recognition method and apparatus capable of recognizing a speech even in a noisy environment or a low speech output of a speaking person.
Another object of the present invention is to provide an infrared detection-based speech recognition method and apparatus capable of detecting speech by detecting an air current generated from a mouth of a speaking person using an infrared camera.
The problems of the present invention are not limited to the above-mentioned problems, and other problems not mentioned can be clearly understood by those skilled in the art from the following description.
According to an aspect of the present invention, there is provided an infrared detection-based speech recognition method including receiving speech data including voiced image data and voiced speech data from a speech source, Processing a speech sound image based on infrared detection and a vocal mouth image based on visible light detection; generating a speech recognition speech feature based on the speech speech data; and generating a speech recognition image feature based on the speech image data And outputting a speech recognition result through a pattern recognition method based on the speech recognition speech feature and the speech recognition image feature.
According to still another aspect of the present invention, the step of generating a speech recognition image feature includes calculating at least one of a velocity, an amount, and a pressure of a voicing air current propagated from a voicing source through infrared detection of a voiced air current image And a control unit.
According to another aspect of the present invention, the infrared ray detection is characterized by detecting a far-infrared ray.
According to another aspect of the present invention, the step of generating a speech recognition image feature includes a step of recognizing a speech interval and distinguishing phonemes based on the speech recognition image feature.
According to another aspect of the present invention, the pattern recognition is based only on a speech recognition image feature when voiced speech data is not detected or when the percentage of noise in the speech speech data is larger than the speech of the speaker.
According to another aspect of the present invention, there is provided an infrared-ray-based speech recognition apparatus including a receiving unit for receiving vocal data including vocal image data and vocal sound data from a vocalizing source, A processing unit for processing a voiced air current image based on infrared detection and a voiced image based on visible light detection to generate a voice recognition voice feature on the basis of the voiced voice data and generating a voice recognition image feature on the basis of the voiced image data And an output unit for outputting a speech recognition result through pattern recognition based on the speech recognition speech feature and the speech recognition image feature.
According to another aspect of the present invention, the processing unit calculates at least one of the velocity, the amount, and the pressure of the voicing air current propagated from the voicing source through the infrared detection of the voiced air current image.
According to another aspect of the present invention, the infrared ray detection is characterized by detecting a far-infrared ray.
According to still another aspect of the present invention, the processing unit identifies a voice section and distinguishes phonemes based on a voice recognition image feature.
According to another aspect of the present invention, the pattern recognition is based only on a speech recognition image feature when voiced speech data is not detected or when the percentage of noise in the speech speech data is larger than the speech of the speaker.
According to an aspect of the present invention, there is provided a computer-readable medium storing instructions for providing an infrared detection-based speech recognition method, the computer-readable medium storing speech recognition data including voiced image data and voiced speech data, A voiced air image based on infrared light detection and a voiced image based on visible light detection based on the voiced image data to generate a voice recognition voice feature on the basis of the voiced speech data, And generating a speech recognition image feature based on the speech recognition speech feature and the speech recognition image feature.
The details of other embodiments are included in the detailed description and drawings.
The present invention provides an infrared detection-based speech recognition method and apparatus capable of recognizing speech even in a noisy environment or in a case where the speech output of a speaking person is low.
The present invention provides an infrared detection-based speech recognition method and apparatus capable of detecting speech by detecting an air current generated from a mouth of a speaking person using an infrared camera.
The effects according to the present invention are not limited by the contents exemplified above, and more various effects are included in the specification.
FIG. 1 shows a schematic configuration of an infrared detection-based speech recognition module according to an embodiment of the present invention.
FIG. 2 illustrates a procedure for recognizing a speech of a speaking person according to an infrared detection-based speech recognition method according to an embodiment of the present invention.
FIG. 3 illustrates an exemplary photographing for infrared detection based speech recognition according to an embodiment of the present invention.
FIG. 4 illustrates an exemplary configuration and a flow of a speech recognition process according to an exemplary embodiment of the present invention.
FIG. 5 illustrates exemplary infrared detection results of phonemes according to phonemes in a speech recognition process according to an embodiment of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.
Like reference numerals refer to like elements throughout the specification unless otherwise specified.
It is to be understood that each of the features of the various embodiments of the present invention may be combined or combined with each other partially or entirely and technically various interlocking and driving is possible as will be appreciated by those skilled in the art, It may be possible to cooperate with each other in association.
In the present specification, speech recognition basically means an operation in which an electronic device interprets a voice uttered by a speaking person and recognizes the contents as text. Specifically, when a waveform of a voice uttered by a speaker is input to the electronic device, a voice feature including pattern information of the voice can be obtained by analyzing the voice waveform. Accordingly, the voice feature is compared with the previously learned acoustical and linguistic statistical data, so that the text with the highest likelihood of matching with the input voice can be recognized.
In the present specification, a speaker is a user who generates vocal data to be subjected to speech recognition.
In the present specification, speech data is data that is uttered by a utterance originator and propagated to the speech recognition module, and includes utterance image data and utterance voice data. More specifically, the vocalization data can be propagated to the infrared detection-based speech recognition module as well as speech in an image format.
In the present specification, the vocal image data is image data that can be obtained through the photographing equipment such as a camera among the vocal data, and means the result of photographing the vocalization state of the speaking person. The vocal image data includes a vocal mouth image based on visible light detection and a vocal air image based on infrared detection.
In the present specification, a voiced air flow image refers to an image of an airflow generated from a mouth of a speaking person during voicing, and an air stream around the mouth of a speaking person through a camera. The voiced air current image may be the result of photographing the air current around the mouth of the igniter with an infrared detecting device.
In the present specification, a vocal mouth image is an image of a mouth shape taken when a speaking person is vocalized, and is an image of a mouth shape change which can be photographed in a visible ray region.
In the present specification, vocal sound data is data in the form of voice to be transmitted to the voice recognition module, and includes all sounds excluding vocal image data. That is, the voiced speech data may include not only the voice of the speaker but also the noise between the voicing source and the voice.
Herein, the speech recognition speech feature is speech data processed by the speech recognition module for speech recognition, and is data processed or extracted for conversion from speech data to text. For example, a speech recognition voice feature is a result of analysis of the voice level, waveform, etc., and includes the time at which the voice is pronounced, the distinction between the phonemes and the predictive phonemes.
In the present specification, the speech recognition image feature is speech image data processed by the speech recognition module for speech recognition, and is data processed or extracted for converting from speech image data to text. For example, the speech recognition image feature includes a mouth shape taken by an image, a predicted phoneme corresponding to a mouth shape, an infrared ray flow of a voicing air flow propagating from a voicing source, a phoneme interval according to an infrared ray flow, and a predicted phoneme .
In the present specification, pattern recognition is a technique that uses a probabilistic method or artificial neural network using a computable mechanical device to recognize a phoneme's pronunciation of a phoneme by using a speech recognition speech feature and a speech recognition image feature The method of intelligence is collectively called.
Various embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
FIG. 1 shows a schematic configuration of an infrared detection-based speech recognition module according to an embodiment of the present invention.
Referring to FIG. 1, the infrared detection-based
The receiving
The
The
Each of the components of the infrared detection-based
FIG. 2 illustrates a procedure for recognizing a speech of a speaking person according to an infrared detection-based speech recognition method according to an embodiment of the present invention. Will be described with reference to Fig. 1 for convenience of explanation.
The infrared detection-based speech recognition method according to the present invention is initiated by the receiving
The receiving
The
The
In the case where the receiving
The
The
The
The
The
More specifically, the
FIG. 3 illustrates an exemplary photographing for infrared detection based speech recognition according to an embodiment of the present invention.
Referring to FIG. 3, the infrared ray-based photographing
The voicing air current 220 is detected by infrared rays, and the brightness may be displayed differently depending on the temperature. The voicing
The photographing
FIG. 4 illustrates an exemplary configuration and a flow of a speech recognition process according to an exemplary embodiment of the present invention.
Referring to FIG. 4, the voiced
The voicing
The
The
FIG. 5 illustrates exemplary infrared detection results of phonemes according to phonemes in a speech recognition process according to an embodiment of the present invention.
Referring to Figs. 5 (a) to 5 (c), an image is captured at regular time intervals while the speaker speaks "gomawaru". Accordingly, the infrared detecting-based photographing
Accordingly, the first to third
The
The
The
The infrared detection-based
In this specification, each block or each step may represent a part of a module, segment or code that includes one or more executable instructions for executing the specified logical function (s). It should also be noted that in some alternative embodiments, the functions mentioned in the blocks or steps may occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order according to the corresponding function.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software module may reside in a RAM memory, a flash memory, a ROM memory, an EPROM memory, an EEPROM memory, a register, a hard disk, a removable disk, a CD-ROM or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, which is capable of reading information from, and writing information to, the storage medium. Alternatively, the storage medium may be integral with the processor. The processor and the storage medium may reside within an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal. Alternatively, the processor and the storage medium may reside as discrete components in a user terminal.
Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, it is to be understood that the present invention is not limited to those embodiments and various changes and modifications may be made without departing from the scope of the present invention. . Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. Therefore, it should be understood that the above-described embodiments are illustrative in all aspects and not restrictive. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.
100 Infrared Detection Based Voice Recognition Module
110 receiver
120 processor
130 output section
210 Infrared Detection Base
220 voices
230 Shooting Range
311 phonetic voice data
312 Speaker voice
313 Noise
315 voice features
321 speech image data
322 vocal flow image
323 vocal mouth shape image
325 image features
400 pattern recognition module
410 Speech recognition result
511 First vocal stream
513 Second vocal stream
515 Third voicing stream
521 1st mouth
523 second mouth shape
525 third mouth shape
Claims (11)
Processing the voiced image based on the voiced image data into a voiced air image based on infrared detection and a voiced image based on visible light detection;
Generating a speech recognition speech feature based on the speech speech data and generating a speech recognition image feature based on the speech speech data; And
And outputting a speech recognition result through pattern recognition based on the speech recognition speech feature and the speech recognition image feature.
Wherein the step of generating the speech recognition image feature includes calculating at least one of a velocity, an amount, and a pressure of a voicing air current propagated from the voicing source through an infrared ray detection of the voicing air current image, Based infrared detection based speech recognition method.
Wherein the infrared detection detects far-infrared rays.
Wherein the step of generating the speech recognition image features comprises the steps of identifying a speech segment and distinguishing phonemes based on the speech recognition image feature.
Wherein the pattern recognition module is based only on the speech recognition image feature when voiced speech data is not detected or when the ratio of noise in the speech speech data is larger than the speech of the speaker.
Processing the voiced image data based on the voiced image data to generate a voiced speech voice feature based on the voiced speech data, and generating the voiced image data based on the voiced image data based on the voiced image data, A processor for generating a speech recognition image feature; And
And an output unit for outputting a speech recognition result through pattern recognition based on the speech recognition speech feature and the speech recognition image feature.
Wherein the processor calculates at least one of a velocity, an amount, and a pressure of a voicing air current propagated from the voicing source through an infrared detection of the voiced air image.
Wherein the infrared detection detects far-infrared rays.
Wherein the processor identifies a speech section and distinguishes phonemes based on the speech recognition image feature.
Wherein the pattern recognition module is based only on the speech recognition image feature when the speech speech data is not detected or the ratio of the noise among the speech speech data is larger than the speech of the speaker.
A voiced air flow image based on infrared light detection and a voiced image based on visible light detection based on the voiced image data,
Generating a speech recognition speech feature based on the speech speech data, generating a speech recognition image feature based on the speech image data,
And outputting a speech recognition result through pattern recognition based on the speech recognition speech feature and the speech recognition image feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150154045A KR20170052082A (en) | 2015-11-03 | 2015-11-03 | Method and apparatus for voice recognition based on infrared detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150154045A KR20170052082A (en) | 2015-11-03 | 2015-11-03 | Method and apparatus for voice recognition based on infrared detection |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20170052082A true KR20170052082A (en) | 2017-05-12 |
Family
ID=58740263
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150154045A KR20170052082A (en) | 2015-11-03 | 2015-11-03 | Method and apparatus for voice recognition based on infrared detection |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20170052082A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689464A (en) * | 2019-10-09 | 2020-01-14 | 重庆医药高等专科学校 | Mouth shape recognition-based English pronunciation quality assessment method |
-
2015
- 2015-11-03 KR KR1020150154045A patent/KR20170052082A/en unknown
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689464A (en) * | 2019-10-09 | 2020-01-14 | 重庆医药高等专科学校 | Mouth shape recognition-based English pronunciation quality assessment method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108305615B (en) | Object identification method and device, storage medium and terminal thereof | |
US6185529B1 (en) | Speech recognition aided by lateral profile image | |
US10482872B2 (en) | Speech recognition apparatus and speech recognition method | |
JP4795919B2 (en) | Voice interval detection method | |
Luettin | Visual speech and speaker recognition | |
EP2562746A1 (en) | Apparatus and method for recognizing voice by using lip image | |
JP4715738B2 (en) | Utterance detection device and utterance detection method | |
JP2007264473A (en) | Voice processor, voice processing method, and voice processing program | |
JPH11219421A (en) | Image recognizing device and method therefor | |
JP2010256391A (en) | Voice information processing device | |
JP2008310382A (en) | Lip reading device and method, information processor, information processing method, detection device and method, program, data structure, and recording medium | |
JP5040778B2 (en) | Speech synthesis apparatus, method and program | |
KR101187600B1 (en) | Speech Recognition Device and Speech Recognition Method using 3D Real-time Lip Feature Point based on Stereo Camera | |
JP2018013549A (en) | Speech content recognition device | |
JP2007199552A (en) | Device and method for speech recognition | |
KR20170073113A (en) | Method and apparatus for recognizing emotion using tone and tempo of voice signal | |
JPH0792988A (en) | Speech detecting device and video switching device | |
An et al. | Detecting laughter and filled pauses using syllable-based features. | |
WO2020079733A1 (en) | Speech recognition device, speech recognition system, and speech recognition method | |
US20150039314A1 (en) | Speech recognition method and apparatus based on sound mapping | |
WO2020250828A1 (en) | Utterance section detection device, utterance section detection method, and utterance section detection program | |
KR102265874B1 (en) | Method and Apparatus for Distinguishing User based on Multimodal | |
JP4775961B2 (en) | Pronunciation estimation method using video | |
KR20170052082A (en) | Method and apparatus for voice recognition based on infrared detection | |
JP2005276230A (en) | Image recognition apparatus |