WO2004090867A1 - 変化情報認識装置および変化情報認識方法 - Google Patents
変化情報認識装置および変化情報認識方法 Download PDFInfo
- Publication number
- WO2004090867A1 WO2004090867A1 PCT/JP2004/005155 JP2004005155W WO2004090867A1 WO 2004090867 A1 WO2004090867 A1 WO 2004090867A1 JP 2004005155 W JP2004005155 W JP 2004005155W WO 2004090867 A1 WO2004090867 A1 WO 2004090867A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- change
- information
- change information
- basic
- recognition
- Prior art date
Links
- 230000008859 change Effects 0.000 title claims abstract description 580
- 238000000034 method Methods 0.000 title claims description 94
- 238000003384 imaging method Methods 0.000 claims description 21
- 238000010586 diagram Methods 0.000 description 51
- 239000013598 vector Substances 0.000 description 24
- 238000012545 processing Methods 0.000 description 20
- 230000008569 process Effects 0.000 description 18
- 238000001514 detection method Methods 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 230000008921 facial expression Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008451 emotion Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 102100036312 5-hydroxytryptamine receptor 1E Human genes 0.000 description 2
- 102220598635 5-hydroxytryptamine receptor 1E_P42A_mutation Human genes 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 230000002940 repellent Effects 0.000 description 2
- 239000005871 repellent Substances 0.000 description 2
- 241001293250 Lagascea decipiens Species 0.000 description 1
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 102220567056 Ornithine decarboxylase antizyme 1_P53A_mutation Human genes 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 102220114368 rs397517248 Human genes 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Definitions
- the present invention relates to a change information recognition device and a change information recognition / recognition method for recognizing a change state of a recognition target, such as a mouth movement, a voice emitted from the mouth, or a human motion. .
- the face direction detection device disclosed in Japanese Patent Laid-Open Publication No. H10-27474516 detects the position of the mouth and then detects the direction of the face. It cannot detect the movement of the object. Therefore, there was a problem that it was not possible to recognize human words.
- an object of the present invention is to provide a change information recognition apparatus and a change information recognition apparatus capable of accurately recognizing a change state of an object to be recognized and recognizing, for example, words spoken by a person. It is to provide a method.
- the change information recognition device includes: a change information obtaining unit configured to obtain change information of a recognition target; and a basic change storing a basic change state of the recognition target in advance as basic change information.
- Information storage means change state comparison means for detecting the change state of the recognition object by comparing the change information of the recognition object acquired by the change information acquisition means with the basic change information stored in advance, It is provided with.
- the recognition target object is recognized by comparing the change information acquired by the change information acquiring means with the basic change information stored in advance. . For example, if the object to be recognized frequently moves with a certain correlation, such as the mouth of a person, this movement is stored in advance as basic change information and compared with the obtained change information. This makes it possible to reliably recognize the object to be recognized.
- the basic change information is stored in the basic change information storage unit as a change information unit obtained by dividing the change state of the recognition target into basic units.
- the change information obtaining means may be imaging means for capturing an image of the recognition target, and the change information may be information relating to an image change around the mouth caused by speech in the image obtained by the imaging means. it can.
- the change information obtaining means is a sound collecting means for obtaining the sound generated by the recognition target, and the change information is information on the frequency component change of the sound generated by the recognition target obtained by the sound collecting means. It can also be.
- the sound generated by the object to be recognized for example, the content of the utterance of a person is recognized with high accuracy can do.
- the change information obtaining means may be imaging means for capturing an image of the recognition target, and the change information may be information relating to an image change associated with the movement of the recognition target in the image obtained by the imaging means. it can.
- the change information acquiring means is an imaging means for imaging the recognition target, and the change information is related to an image change accompanying rotation of the recognition target. Information.
- the change information is a change in the image due to the movement or rotation of the recognition target, the change can be recognized with high accuracy.
- the basic change information described above is the basic change sequence information set as a sequence of information in the feature space, and the change sequence information of the recognition target acquired by the change information acquisition means is stored in the feature space.
- a change information projecting device that creates projection change information by projecting the projection change information and the basic change information, instead of comparing the change information of the recognition target with the basic change information. It is preferable that the state of change of the detected elephant is recognized by comparing with the above.
- the change information is recognized as an information sequence in the feature space. For this reason, the obtained change sequence information is compared with the preset basic change sequence information. Therefore, it is possible to quantitatively determine the similarity (fitness) between the acquired change sequence information and the basic change sequence information.
- the change state comparison means detects the change state of the recognition target object by comparing the continuity of the basic change sequence information with the continuity of the projected change information. Is preferred. As described above, by comparing the continuity of the change sequence information projected as the projection change information with the basic change sequence information, it is possible to accurately recognize the recognition target moving while changing.
- Information that can be represented in multi-dimensions can be represented as a point in a multi-dimensional space. This information can be projected as a point on a lower dimensional space, and this lower dimensional space is defined as a feature space.
- one still image that can be expressed as one point in a multidimensional space can be projected as one point in a three-dimensional space (feature space).
- a plurality of continuous images can be represented as lines (projected trajectories) in a three-dimensional space (feature space).
- the basic sequence information is set as a tubular region in the feature space, and the projected trajectory in the feature space obtained from the projection change information is compared (for example, By comparing whether or not the projection locus is included in the Ube-shaped region), it is possible to recognize the change state of the recognition target.
- the information processing apparatus further includes a learning unit that creates basic change sequence information based on the change information of the recognition target acquired by the change information acquisition unit.
- a learning unit that creates basic change sequence information based on the change information of the recognition target acquired by the change information acquisition unit.
- the change information recognition method includes a change information obtaining step of obtaining change information of a recognition target, and a basic change state of the recognition target.
- the change state of the recognition target is compared by comparing the change information of the recognition target acquired in the change information acquisition step with the basic change information stored in advance. And a change state comparing step to be recognized.
- a change information unit obtained by dividing the change state of the recognition target into basic units is stored in advance as the basic change information. In this way, by storing the basic change information as information divided into change information units, the recognition target can be recognized with higher accuracy.
- the change information acquired in the change information acquiring step is information relating to an image of the object to be recognized
- the basic change information stored in advance in the basic change information preliminary storage step is also information relating to the image.
- the change state comparison step it is preferable that the change state of the recognition target object is recognized by comparing information on the image of the change information and the basic change information. In this way, by using the change information as information relating to an image of the object to be recognized, a change in the object to be recognized can be recognized with high accuracy.
- the change information acquired in the change information acquiring step is a recognition target.
- the information about the sound generated by the object, and the basic change information pre-stored in the basic change information preliminary storage step is also information about the sound.
- the change information and the sound frequency of the basic change information are used. It is preferable to recognize the change state of the recognition target by comparing the components. In this way, by using the change information as information on the sound generated by the recognition target, the sound generated by the recognition target, for example, the content of a human utterance can be recognized with high accuracy.
- FIG. 1 is a block diagram of the change information recognition device according to the first embodiment of the present invention.
- FIG. 2 is a diagram schematically illustrating an image of a face including a mouth.
- FIG. 3A is a diagram schematically showing a template showing a state (first example) of a continuous change in the shape of the mouth.
- FIG. 3B is a diagram schematically showing a template indicating a state of a continuous change in the shape of the mouth (second example).
- 4A to 4H are diagrams schematically showing a state of a continuous change in the shape of a mouth in an image.
- FIGS. 5A to 5H are diagrams schematically showing the shape of the mouth in the image and the position corresponding to the shape of the mouth in the template.
- 6A to 6F are diagrams schematically showing a conventional mouth-shaped template.
- FIG. 7 is a flowchart showing the procedure of the change information recognition method according to the first embodiment.
- FIG. 8A is a diagram schematically showing a position in a still image that can be recognized as a mouth.
- FIG. 8B is a diagram schematically showing positions that can be recognized as mouths in a moving image.
- FIG. 9 is a block diagram of a change information recognition device according to the second embodiment.
- FIG. 10A is a diagram schematically showing a continuous mouth change pattern over time.
- FIG. 1OB is a diagram schematically showing the change in the first half of the change pattern in FIG. 1OA.
- FIG. 10C is a diagram schematically showing a change in the latter half of the change pattern in FIG. 1OA.
- FIG. 11 is a flowchart showing a main part of the procedure of the change information recognition method according to the second embodiment.
- FIGS. 12A to 12H are diagrams schematically showing a continuous change state in an image including a mouth.
- FIG. 13 is a block diagram of the change information recognition device according to the third embodiment.
- Figure 14 is a table showing the correspondence between pronunciation changes and the symbols assigned to them.
- FIG. 15 is a diagram schematically showing a mouth deformation process from the shape of the mouth emitting the vowel “A” to the shape of the mouth emitting the vowel “I”.
- Figure 16A shows the sound of "Good morning” in text.
- FIG. 16B is a diagram schematically showing the shape of the mouth corresponding to each of the sounds in FIG. 16A.
- FIG. 16C is a diagram showing symbols corresponding to the sound changes in FIG. 16B.
- FIG. 17 is a flowchart showing the main part of the procedure of the deformation information recognition method according to the third embodiment.
- FIG. 18A is a diagram schematically showing a change in the shape of the mouth that changes from “a” to “i”.
- FIG. 18B is a diagram schematically showing a change in the shape of the mouth that changes from “i” to “u”.
- FIG. 19 is a block diagram of the change information recognition device according to the fourth embodiment.
- FIG. 21 is a diagram schematically showing voice change information cut into a frame length of a fixed frame length at a fixed frame interval.
- FIGS. 22A to 22H are graphs showing waveforms of voice change units generated from voice change information cut out at eight times T1 to T8.
- FIGS. 23A to 23D are explanatory diagrams showing portions that match the graph of the voice change unit in FIGS.
- FIG. 24 is a flowchart showing the procedure of the deformation information recognition method according to the fourth embodiment.
- FIG. 25 is a flowchart showing the procedure of voice recognition according to the fourth embodiment.
- FIG. 26 is a block diagram of the change information recognition device according to the fifth embodiment.
- FIG. 27 is a diagram schematically illustrating an example of an image at a time of a basic body change unit used for gesture recognition.
- FIGS. 28A to 28J show examples of images at the time of the basic body change unit used for pedestrian recognition.
- FIG. 29 is a block diagram of the change information recognition device according to the sixth embodiment.
- 30A to 30L are diagrams schematically showing images when the head of the doll rotates.
- FIG. 31 is a block diagram of the change information recognition device according to the seventh embodiment.
- FIG. 32 is a diagram schematically showing a moving image showing the movement of the mouth uttering “a” from “n”.
- FIG. 33 is a diagram showing a graph on the feature space created based on the moving image shown in FIG.
- FIG. 34 is a diagram showing a graph on a feature space created based on a moving image including other deformation patterns. .
- Figure 35 is a diagram showing the curves when the mouth movements in a certain utterance of multiple people are projected onto the feature space.
- FIG. 36 shows the hypertube generated in the feature space.
- FIG. 37 is a diagram showing a feature space in which a plurality of hypertubes are arranged.
- FIG. 38 is a flowchart showing a procedure for generating a hypertube.
- FIG. 39 is a diagram showing three trajectories formed by connecting points where the three-dimensional feature vectors are plotted in the feature space.
- FIG. 40 is a diagram showing three trajectories and a representative trajectory formed based on those trajectories.
- FIG. 41 is a diagram for explaining a procedure for obtaining a hyperplane generated when a representative trajectory is obtained.
- FIG. 42 is a diagram showing a hyperplane for explaining a procedure for obtaining a representative trajectory.
- FIG. 43 is a flowchart showing the procedure of the change information recognition method according to the seventh embodiment.
- FIG. 44A is a diagram also showing the trajectory of the hypertube HT representing a certain deformation and the trajectory 1 L of the input sequence.
- FIG. 44B is a diagram showing a state where the start point of the hypertube is set to 0 and the end point is set to 1 on the horizontal axis, and the distance from the representative locus is set to the vertical axis.
- FIG. 45 is a block diagram of the change information recognition device according to the eighth embodiment.
- FIG. 46 is a diagram for explaining a state in which a moving image is captured without moving a window to be cut out for a predetermined time.
- Figure 47A is a diagram illustrating a moving image in which the mouth is tracked by moving the cutout window.
- FIG. 47B is a diagram showing a trajectory in the feature space and a hypertube corresponding to FIG. 47A.
- FIG. 1 is a block configuration diagram of a change information recognition device according to the present embodiment.
- the change information recognition device 1 includes a sequence information storage device 11, a basic change information storage device 12, and a change state comparison device 13.
- the sequence information storage device 11 is connected to a camera (not shown) serving as an imaging device, which is change information acquisition means of the present invention.
- the camera images the face including the mouth to be recognized.
- the captured image of the mouth of the person is output to the sequence information storage device 11 at regular intervals.
- the sequence information storage device 11 stores a plurality of images output at a fixed time as sequence information J11.
- the sequence information is output from the sequence information storage device 11, and the basic change information is output from the basic change information storage device 12.
- the change state comparison device 13 detects a change in the shape of the mouth by comparing the series information and the basic change information, and detects a portion corresponding to the mouth. Further, the change state comparison device 13 is connected to an output device (not shown), and outputs the position of a portion corresponding to the detected mouth to the output device as position information J12 of change information. In addition, the change state comparison device 13 detects the portion corresponding to the mouth and also detects the movement of the mouth. The detected mouth movement is also output to an output device (not shown) as symbol information J13 corresponding to the change information.
- the sequence information storage device 11 of the change information recognition device 1 according to the present embodiment sequentially stores images such as an image G1 of a face F including a mouth M shown in FIG. Is output.
- the sequence information storage device 11 stores these images.
- multiple images, such as eight These images are combined as sequence information and output to the change state comparison device 13.
- the basic change information storage device 12 stores a plurality of pieces of image information representing patterns of mouth changes.
- the image after a certain time has elapsed 3
- the image after a certain period of time has elapsed 4.
- the shape of the mouth M that is wide open (the shape of the mouth when the vowel “A” is emitted) is changed from the shape of the mouth M that is elongated horizontally (“ The shape of the mouth when the vowel “i” is emitted).
- the shape of the mouth M that is opened vertically (the shape of the mouth when emitting the vowel “A") is changed from the shape of the mouth M that is opened vertically (" (The shape of the mouth when the vowel “O” is emitted).
- the change state comparison device 13 outputs a moving image of sequence information including eight images from the sequence information storage device 11, and a template indicating the first and second change patterns. Pl and P2 are output from the basic change information storage device 12.
- the movement of the template P1 is shown in FIG. It can be seen that the movement coincides with the movement shown in FIG. From this, it can be recognized that the portion indicated by the broken line B in FIGS.
- 5D to 5G is a portion corresponding to the mouth.
- a plurality of mouth templates T1 to T6 are prepared, and an image captured by the imaging device is raster-scanned.
- portions corresponding to templates ⁇ 1 to ⁇ 6 were detected as mouths.
- inconveniences such as false detections, such as detection of a part of the background of the wall existing in the image and a shape similar to the mouth as the mouth, and omission of detection may be considered.
- FIG. 7 is a flowchart showing the procedure of the change information recognition method according to the present embodiment.
- the position of the mouth as the recognition target is detected from the motion of the image represented by the moving image instead of the still image. .
- the conventional recognition method using a template based on a still image recognizes many mouth candidates C 1, C 2,.
- the recognition method according to the present embodiment since the change in the mouth M is detected from a plurality of images taken at a fixed time interval, as shown in FIG. The mouth M can be reliably recognized. Moreover, since the movement of the mouth M is tracked by the changes seen in the multiple images, it is possible to detect even the movement of the mouth M. it can.
- FIG. 9 is a block configuration diagram of the change information recognition device according to the present embodiment.
- the change information recognition device 2 includes a sequence information storage device 21, a basic change information storage device 22, and a change state comparison device 2 similar to those in the first embodiment. 3 is provided.
- the sequence information storage device 21 and the change state comparison device 23 have the same configuration as the sequence information storage device 11 and the change state comparison device 13 in the above embodiment, respectively.
- the information storage device 22 has a different configuration from the basic change information storage device 12 in the first embodiment.
- the basic change information storage device 22 includes a plurality of basic change information unit storage devices 24 A, 24 B,. Each of the basic change information unit storage devices 24A, 24B... Stores a change information unit (change pattern) obtained by dividing the change state of the recognition target object into basic units.
- FIG. 11 is a flowchart showing a main part of the procedure of the change information recognition method according to the present embodiment.
- the sequence information J 21 for example, eight Is obtained and output to the change state comparison device 23.
- the change state comparison device 23 detects each change information unit from the output sequence information (S21). Further, from the basic change information storage device 22, templates P 3 and P 4 indicating the basic change information units stored in the basic change information unit storage device 24 are output to the change state comparison device 23. You.
- the change state comparison device 23 compares the detected change information unit with the templates P 3 and P 4 of the basic change information unit output from the basic change information storage device 22 to obtain the change information.
- a series of changes is detected in consideration of unit continuity (S22). For example, it is assumed that the sequence information J 21 output from the sequence information storage device 21 has eight images shown in FIGS.
- the change state comparison device 23 compares the image based on the sequence information J 21 with the templates P 3 and P 4 output from the basic change information storage device 22 to obtain a series of sequence information in the sequence information. Detect changes.
- the shape of the mouth M surrounded by the solid line R in the images shown in FIGS. This shows the same change as the change in template P3 shown in B.
- the shape of the mouth M surrounded by the broken line B in the images shown in FIGS. 12D to G shows the same change as the change of the template P4 shown in FIG. 10C. From this, it is recognized that the sequence information J 21 has a mouth as a recognition target that undergoes the shape change shown in FIG. 1OA.
- the mouth as change information which is the recognition object recognized in this way, is output from the change state comparison device 23 to an output device (not shown) as the position information J 22 of the change information unit.
- the symbol information J 23 corresponding to the change information unit is output from the change state comparison device 23 to an output device (not shown).
- the change information J 24 is acquired from the position information of the change information unit, and the symbol information J 23 corresponding to the change information unit and the symbol information J 25 corresponding to the change information are acquired.
- 013 is a block configuration diagram of the change information recognition device according to the present embodiment.
- the change information recognition device according to the present embodiment can be used as a mouth deformation recognition and recognition device.
- the change information recognition device 3 according to the present embodiment includes a moving image storage device 31, a mouth basic deformation information storage device 32, and a mouth deformation state comparison device 33.
- the moving image storage device 31 is connected to a moving image pickup device (not shown).
- This moving image capturing apparatus captures a moving image of a face including a mouth serving as a recognition target.
- the moving image capturing apparatus outputs moving image information J31 of the captured moving image to a moving image storage device 31. I have.
- the mouth basic deformation information storage device 32 has a plurality of mouth basic deformation unit storage devices 34A, 34B, ... in which patterns that can move the mouth of a person can be stored in advance. ing.
- the mouth basic deformation units are stored in advance in each mouth basic deformation unit storage device 34 A, 34 B. This mouth basic deformation unit will be described later.
- the mouth deformation state comparison device 33 outputs the mouth deformation unit information from the moving image storage device 31, and the mouth basic deformation information storage device 32 outputs the mouth basic deformation unit information.
- the mouth deformation state comparison device 33 recognizes the movement of the mouth by comparing the mouth deformation unit information with the mouth basic deformation unit information.
- the mouth deformation state comparison device 33 is connected to an output device (not shown), and outputs the position of the b deformation unit output from the moving image storage device 31 to the output device as mouth deformation unit position information J32. Output. At the same time, the symbol information J33 corresponding to the mouth deformation unit is output to the output device.
- the mouth basic deformation unit storage devices 34A, 34B ... in the mouth basic deformation information storage device 32 include the shape of the mouth deformation unit corresponding to the moving image showing the mouth deformation pattern. Symbol information corresponding to the shape is stored.
- the shape of the mouth when a person speaks is mainly determined by vowels and sound repellent (in the case of Japanese). Vowels refer to the five sounds of “A”, “I”, “U”, “E”, and “O”, but if we add “N”, which is a repellent sound, all utterances are It can be expressed as a combination of 6 sounds and 5 other sounds.
- FIG. 14 is a table showing assignments of symbols for all combinations from the above six sounds to the other five sounds.
- FIG. 14 As shown in FIG. 14, “A J is 1,“ I ”is 2,“ U ”is 3,“ E ”is 4,“ O ”is 5, and“ N ”is 0. Then, for example, the transformation from “a” to “i” is represented by the symbol “1 2”.
- FIG. 17 is a flowchart showing a main part of a procedure of the deformation information recognition method according to the present embodiment.
- a moving image of a face including a mouth captured by the moving image capturing device is output from the moving image capturing device to the moving image storage device 31 and stored in the moving image storage device 31.
- the mouth basic deformation information storage device 32 the change information corresponding to the mouth basic deformation unit and the symbol corresponding to the mouth basic deformation unit in each of the mouth basic deformation unit storage devices 34A, 34B ... Is stored in advance.
- the moving image storage device 31 outputs a moving image to the mouth deformation state comparison device 33, and the mouth basic deformation information storage device 32 outputs the mouth basic deformation unit to the mouth deformation state comparison device 33.
- the transformation information and the symbol corresponding to the mouth basic unit are output.
- step S31 when it is determined that the image is not detected at the same position, the process returns to step S31 to repeat the same processing.
- the symbol representing the end of the symbol corresponding to the first variant is compared with the symbol representing the start of the symbol corresponding to the second variant. Then, it is determined whether or not both are the same (S33). As a result, when it is determined that the two are not the same, the process returns to step S31 to repeat the same processing.
- the first variant It is probable that the sound corresponding to the symbol was emitted at the time position in the connection with the second deformation.
- the symbol indicating the end of the symbol corresponding to the first modification and the symbol indicating the start of the symbol corresponding to the second modification are both “2”, which is the same. In such a case, it can be determined that the first deformation and the second deformation are performed continuously.
- the mouth position information J34 is obtained from the mouth deformation unit position information J32
- the utterance word information J35 is obtained from the symbol information J33 corresponding to the mouth deformation unit.
- the deformation of the mouth caused by the utterance is changed by the mouth corresponding to the six types of sounds, namely, five types of vowels and one type of sound repellency. Is divided into units called transformations into five different mouth shapes other than the sound. For this reason, since the position of the mouth can be detected from the input moving image, and which sound has been pronounced at which time can be reliably recognized, it can be used as an utterance recognition device. Also, by recognizing the pronounced sound continuously, the spoken word can be recognized.
- the basic mouth deforming unit is created from six vowels and vowel sounds, but the basic mouth deforming unit is created for all 50 sounds. You can also. In this case, since there are 68 sounds in Japanese, including voiced and semi-voiced sounds, we use 6 7 X 68 mouth basic deformation units.
- FIG. 19 is a block diagram of the change information recognition device according to the present embodiment.
- the change information recognition device according to the present embodiment can be used as a voice change recognition device.
- the change information recognition device 4 according to the present embodiment includes a waveform analysis device 41, a sound waveform storage device 42, a sound information storage device 43, and a sound change comparison device 44. And.
- the waveform analyzer 41 is not shown! / Connected to a microphone, for example, which serves as a voice acquisition means.
- the microphone acquires voice when a person speaks.
- the microphone outputs the acquired audio information J41 to the waveform analyzer 41.
- Waveform analyzer 41 analyzes the output speech information J41 by, for example, performing a wavelet transform.
- the waveform obtained by the Uvlet analysis is output to the audio waveform storage device 42.
- the audio waveform storage device 42 stores the output waveform.
- the voice information storage device 43 includes voice change unit storage devices 45A, 45B, and so on.
- the voice change unit storage devices 45A, 45B ... store basic voice change units pre-stored as voice change units, and symbols corresponding thereto.
- the basic voice change unit represents a change from a frequency waveform when an arbitrary phoneme is uttered to a frequency when another arbitrary phoneme is uttered.
- This basic voice change unit has, for example, a frequency spectrum shown in FIG. 2 OA or B.
- voice waveform information is output from the voice waveform storage device 42, and a basic voice change unit and a symbol corresponding thereto are output from the voice information storage device 43.
- the voice change comparison device 44 recognizes the voice by comparing the voice waveform information with the basic voice change unit. Further, the sound change comparing device 44 is connected to an output device (not shown), and outputs information J42 corresponding to the detected sound change unit (hereinafter referred to as "sound change unit corresponding symbol") to the output device. are doing.
- the microphone A wavelet analysis is performed on the voice information J41 obtained by the above method to create a voice change unit.
- a voice change is recognized. For example, assume that voice change information is acquired from a microphone and output to the waveform analyzer 41.
- voice change information is transmitted at a certain frame interval, for example, 10 msec, and at a certain frame length, for example, 30 msec. Cut to frame length and create multiple voice change units.
- FIGS. 22A to 22H show waveforms of voice change units created from the voice change information cut out at eight times from tl to t8.
- these voice change units are compared with the basic voice change unit shown in FIG. 20 stored in the voice information storage device 43.
- the voice representing the basic voice change unit is generated. It can be determined that it is being uttered.
- FIG. 24 is a flowchart showing the procedure of the deformation information recognition method according to the present embodiment.
- the audio information obtained from a microphone (not shown) is output to the waveform analysis device 41 of the change information recognition device 4.
- the waveform analyzer 41 performs wavelet analysis on these sounds and divides them into, for example, the eight waveforms shown in FIG. 22 to create a sound change unit consisting of eight frames.
- the waveform analyzer 41 converts the created voice change unit Output to the audio waveform storage device 42.
- the voice waveform storage device 42 stores these voice change units.
- the voice information storage device 43 stores a plurality of basic voice change units. Then, the voice waveform storage device 42 outputs the stored voice change unit to the voice change comparison device 44, and the voice information storage device 43 stores the stored basic voice change unit and the corresponding symbol in the voice change unit. Output to the comparison device 44.
- the voice change comparison device 44 compares the voice change unit output from the loudness waveform storage device 42 and the basic voice change unit output from the voice information storage device 43.
- a plurality of change patterns including the two change patterns shown in FIG. 2 OA and B are stored. The number is determined based on the number of phonemes. For example, if the number of phonemes is n, the number of change patterns can be n X (n ⁇ 1).
- the waveform () is found and the time (or frame number) at which it was found is stored (S45).
- FIG. 25 is a flowchart showing the procedure of speech recognition according to the present embodiment.
- the voice change unit corresponding symbol information J 42 (FIG. 19) is not shown from the input voice change unit.
- Output to the output device (S52).
- the output device refers to the voice change unit corresponding symbol information J42 output from the voice change comparison device 44, and among the symbols corresponding to the plurality of output voice change units, the first voice change unit corresponding symbol and It is determined whether the symbol corresponding to the second voice change unit is temporally continuous (S53).
- step S52 it is determined whether the symbol indicating the end of the first voice change unit corresponding symbol and the symbol indicating the start of the second voice change unit corresponding symbol are the same. Yes (S54). For example If the first voice change unit corresponding symbol changes from phoneme A to phoneme B, and the second voice change unit corresponding symbol changes from phoneme B to phoneme C, the first voice change It is determined that the symbol indicating the end of the unit corresponding symbol and the symbol indicating the start of the second voice change unit corresponding symbol match.
- the first voice change unit corresponding symbol changes from phoneme A to phoneme B and the second voice change unit corresponding symbol changes from phoneme A to phoneme C
- the first voice change unit It is determined that the symbol indicating the end of the change unit corresponding symbol and the symbol indicating the start of the second voice change unit corresponding symbol do not match.
- the utterance word J 43 (FIG. 19) can be recognized.
- the acquired sound is divided into sound change units, and the sound is detected by comparing with the basic sound change unit. . For this reason, the spoken word and the like can be reliably recognized.
- the change information is targeted for recognition of a body change accompanying movement and deformation of each part of the body, such as gesture recognition, pedestrian recognition, and facial expression recognition.
- FIG. 26 is a block diagram of the change information recognition device according to the present embodiment.
- the change information recognition device 5 includes a moving image storage device 51, a body change information storage device 52, and a body change comparison device 53. You.
- the change information recognition device 5 can be used as a body change recognition device, a pedestrian recognition device, a facial expression recognition device, or the like.
- the moving image storage device 51 is connected to a moving image pickup device (not shown). This moving image capturing apparatus captures a moving image of a human body as a recognition target, and the moving image capturing apparatus outputs the captured moving image information J51 to a moving image storage device 51.
- the body change information storage device 52 has a plurality of body change unit storage devices 54 A, 54 B,... In which patterns that can take the motion of the human body are stored in advance.
- the body change unit storage devices 54A, 54B ... store in advance the basic body change units representing the movement of the human body.
- the body change comparison device 53 receives the body change unit information from the moving image storage device 51 and the basic body change unit information from the body change information storage device 52.
- the body change comparison device 53 recognizes the movement of the human body by comparing the body change unit information with the basic body change unit information.
- the body change comparison device 53 is connected to an output device (not shown), and the position of the human body on the image output from the moving image storage device 51 is used as position information J 52 of the body change unit. Output to the output device.
- it outputs the symbol information J53 corresponding to the body change unit to the output device.
- position information J54 of the mouth is obtained from position information J52 of the body change unit, and symbol information J533 corresponding to the body change unit is obtained, and identification information J55 of the body movement is obtained. .
- the body change unit storage devices 54A, 54B store the shapes and shapes of basic body change units that show changes in hands, feet, arms, etc., corresponding to moving images showing human movement. Are stored respectively.
- the position and the position of the human body are changed by the same method as the change information recognition method by the change information recognition device 3 according to the third embodiment.
- the body movement (body movement) can be recognized.
- the basic mouth deformation information storage device 32 in the third embodiment is replaced with a body change information storage device 52 of the present embodiment, and the mouth deformation state comparison device 33 is replaced with a body change comparison device 53.
- the position of the body change unit and the body movement can be recognized.
- the operation up to this point is shown continuously.
- the images of the pedestrians shown in FIGS. 28A to 28J are stored in advance in the body change unit storage devices 54 A, 54 B,... In the body change information storage device 52.
- a human walking motion (body motion) can be recognized by the same method as the change information recognition device 3 according to the third embodiment. Also in this case, the position and the body motion of the body change unit can be recognized by performing the process according to the same procedure as the flowchart shown in FIG. In this way, the position of the pedestrian and the motion status of the pedestrian can be identified based on the moving image output from the video imaging device.
- the change information recognizing device 5 can also recognize changes in human facial expressions.
- human emotions There are emotions and emotions in human emotions, A person takes an expression corresponding to those emotions or an emotionless expression. In this way, from the five images of faces corresponding to anger, romance, and expressionlessness, by preparing 5X4 change information units such as expressionless joy, anger and sadness, the position of the face in the moving image is obtained. And the facial expression can also be recognized.
- FIG. 29 is a block diagram of the change information recognition device according to the present embodiment.
- the change information recognition device 6 according to the present embodiment includes a moving image storage device 61, a rotation information storage device 62, and a rotating object comparison device 63.
- the change information recognition device 6 according to the present embodiment can be used as a rotating object recognition device.
- the moving image storage device 61 is connected to a moving image pickup device (not shown).
- This moving image capturing apparatus captures a moving image of a rotating recognition object, which is a recognition object, for example, a human head.
- This moving image pickup device outputs the picked-up moving image information J61 to the moving image storage device 61.
- the rotation information storage device 62 has a plurality of rotation unit storage devices 64A, 64B ... in which the rotation pattern of the rotating recognition object is stored in advance.
- the rotation unit storage devices 64 A, 64 B Store in advance basic rotation units representing the rotation of the recognition target.
- the rotating object comparison device 63 receives the rotation unit information from the moving image storage device 61 and the basic rotation unit information from the rotation information storage device 62.
- the rotating object comparison device 63 recognizes, for example, a change due to the rotation of the human head by comparing the rotation unit information with the basic rotation unit information.
- the rotating object comparison device 63 is connected to an output device (not shown), and outputs the position of the person's head on the image output from the moving image storage device 61 as position information J 62 of the rotation unit. Output to At the same time, the symbol information J 63 corresponding to the rotation unit is output to the output device.
- the position information J64 of the rotating object is obtained from the position information J62 of the rotation unit, and the symbol information J63 corresponding to the rotation unit J63 Force, al, rotation identification information J6 Ask for 5.
- the rotation unit storage devices 6 4 A, 6 4 B ... store the shape of the rotation change unit indicating the change of the head direction corresponding to the video indicating the rotation of the human head and the shape of the rotation change unit. Each symbol information is stored.
- FIGS. 30A to 30L schematically show images when the doll's head rotates. Of these, the first rotation starts from 0 degrees shown in Figs. 30A to E and reaches 120 degrees, and starts from 120 degrees shown in Figs. The second rotation is the rotation up to the degree, and the rotation starts from 240 degrees until it reaches 360 degrees (0 degree) until it returns to Fig. 30A through Fig. 31 to L. The rotation is the third rotation. Conversely, starting from OA in Fig.
- the rotation is the sixth rotation.
- the images from the first rotation to the sixth rotation and the symbols corresponding thereto are stored in the rotation unit storage devices 64 A, 64 B.
- a rotating recognition target object is obtained by the same method as the change information recognition method by the change information recognition device 3 according to the third embodiment. It is possible to recognize the position of the rotation unit and its rotation operation. Specifically, the mouth basic deformation information storage device 32 in the third embodiment is replaced with a rotation information storage device 62 of the present embodiment, and the mouth deformation state comparison device 33 is replaced with a rotating object comparison device 63, By performing the processing in the same procedure as in the flowchart shown in FIG. 17, the position of the rotation unit and the rotation operation can be recognized. [0099] In this way, based on the moving image output from the moving image capturing apparatus, the position of the rotating recognition target object and the symbol corresponding to the rotation unit can be obtained. It is possible to identify what kind of rotation state it is from.
- FIG. 31 is a block diagram of the change information recognition device according to the present embodiment.
- the change information recognition device 7 according to the present embodiment includes a learning device 71 and a recognition device 72.
- the learning device 71 includes a feature space generation device 73 and a projection device 74.
- the projection device 74 is used in both the learning device 71 and the recognition device 72.
- the learning sequence information J71 prepared in advance is input to the feature space generation device 73 in the learning device 71.
- the projection device 74 is connected to the feature space generation device 3 and a moving image capturing device (not shown).
- the feature space generation device 73 outputs feature space generation information for generating a feature space to the projection device 74.
- a moving image capturing apparatus (not shown) captures a moving image of a face including a mouth serving as a recognition target, and a moving image of the captured face is output from the moving image capturing apparatus as recognition sequence information J72.
- the projecting device 74 generates a projection trajectory obtained by projecting the moving image onto the feature space based on the moving image (recognition sequence information J72) of the face output from the moving image capturing device. I have.
- the learning device 71 is provided with a hypertube generation device 75 for generating a tubular model (hereinafter referred to as “hypertube”) described later in the feature space.
- a hypertube storage device 76 used for each of the learning device 71 and the recognition device 72 is provided.
- the recognition device 72 is provided with a sequence comparison device 77 that recognizes a change in the hypertube in the feature space.
- the projecting device 74 outputs the projected locus of the moving image to the hypertube generating device 75 and the sequence comparing device 77 as projected locus information.
- the hypertube generation device 75 generates a hypertube in the feature space from the projection trajectory information of the moving image output from the projection device 74, and stores the hypertube as hypertube information.
- the hypertube storage device 76 stores the hypertube information output from the hypertube generation device 75 and a symbol corresponding to each hypertube. Further, the hypertube storage device 76 outputs the stored hypertube information and the symbol corresponding to each hypertube to the series comparison device 77.
- the sequence comparison device 77 compares the projection trajectory output from the projection device 74 with the hypertube information output from the hypertube storage device 76, thereby obtaining the position of the change information unit and the corresponding symbol. Ask for. Then, they are output to an output device (not shown) as change information unit position information J73 and change information corresponding symbol information J74, respectively.
- a predetermined feature amount is extracted from an image and is represented by a feature space.
- a feature space For example, when the feature of one image is represented by a three-dimensional vector, one image is represented as one point in a three-dimensional space.
- a moving image showing the movement of the mouth uttering “a” from “n” shown in Fig. 32 is input, and the images that compose the moving image are put into a three-dimensional space. I do.
- each of the nine images representing this moving image is expressed as a trajectory in the feature space connecting the points of each image in the order of time.
- the feature quantity is not particularly limited.
- the projection component to the space (eigenspace) based on the eigenvector corresponding to the upper eigenvalue obtained by principal component analysis is can do.
- an image is a vector
- a 16-by-16-pixel gray-scale image can be represented as a 16-dimensional 16-dimensional vector with each element having a gray value. . Therefore, a large number of images expressed as vectors are used as the learning sequence information J71, and the variance-covariance matrix of the vectors in the learning sequence information J71 is calculated. Find the value.
- learning sequence information J71 a plurality of images representing a moving image of the 30 pattern deformation unit shown in FIG. Prepare for the number of people.
- the deformation unit that is one of the patterns
- the deformation unit represented by the symbol 12 the shape of the mouth when pronounced “a” is changed from the shape of the mouth when pronounced “a”
- the transformation of the shape of the mouth which transforms to the maximum, is represented by several images continuously.
- learning sequence information J71 for a plurality of persons is prepared, and a feature space (eigenspace) is obtained from these learning sequence information J71. .
- the feature space obtained here is a space in which a mouth image using the learning sequence information J71 can be expressed as a smaller amount of information.
- images that are deformed with only a slight difference in appearance are projected close to each other in the feature space.
- Figure 34 shows a three-dimensional feature space.
- a polygonal line showing the transformation of "n” ⁇ "a” a polygonal line showing the transformation of "n” ⁇ "i”, and "n” ⁇
- a polygonal line indicating the deformation of “U” a polygonal line indicating the deformation of “n” ⁇ “E”, and a polygonal line indicating the deformation of “n” ⁇ “O” are shown.
- a tube-like model can be constructed.
- This tube-shaped model can be referred to as a hypertube HT.
- this hypertube HT represents the same deformation, it can be considered as a model stochastically expressing the variation caused by individual differences.
- a representative trajectory CC for generating the hypertube HT a plurality of trajectories C1 to C6 representing the same deformation as shown in FIG. 35 can be averaged.
- another suitable calculation method can be adopted.
- the radius of the circle representing the dispersion is calculated by calculating the variance ⁇ 2 of the distance to each point on each trajectory in the direction perpendicular to the traveling direction of each point on the representative trajectory, and distributing the variance to the normal distribution.
- the 95% point, 1.966 ⁇ can be used as the radius, or it can be obtained using another method.
- hypertubes generated in this way are arranged in the feature space.
- one hypertube corresponds to one deformation.
- Fig. 37 shows the feature space where multiple hypertubes are arranged.
- hypertube ⁇ ⁇ 1 has a certain deformation.
- A for example, represents a transformation from “A” to “I” when speaking
- Hypertube HT 2 represents another transformation B, for example, from “U” to “E” when speaking.
- a, b, and c correspond to the projection trajectory information of the moving image output from the projection device 74.
- FIG. 38 is a flowchart showing a procedure for generating a hyper tube in the change information recognition method according to the present embodiment.
- the learning device 71 learns a great deal of change information in advance to generate a hypertube.
- the recognition device 72 uses the hypertube to recognize change information.
- Learning sequence information for generating a hypertube J71 The moving image that becomes 1 continuously changes from the mouth shape when one sound is pronounced to the mouth shape when another sound is pronounced. They are arranged in chronological order.
- learning sequence information J71 for all the deformations to be recognized is prepared for a plurality of people, for example, for each change of 300 people (S71).
- 30 patterns are required for the change unit.
- the number of patterns can be appropriately defined depending on the object to be recognized and the details of the recognition. For example, when speech recognition is performed using the change information recognition device 7 as a speech recognition device, if the number of phonemes is 29, the possible deformation pattern is an 812 pattern of 29 ⁇ 28.
- a feature space is generated from the learning sequence information J71 prepared for the 300 people by using the feature space generation device 73 (S72).
- the generation of the feature space is performed as follows. In general, if the color and gray value of each pixel of an image are to be treated as they are, the amount of information is too large, the calculation takes too much time, or extra information unnecessary for recognition is included. Is often a problem. Therefore, it is common to extract some feature amount from the image and process it. This embodiment Then, the gray value of each pixel is used as a feature value.
- n-dimensional betatle having a grayscale value (feature amount) for each pixel as an element. It is represented as a point on the dimensional space.
- feature amount extracted from the image is represented by an m-dimensional solid; if m ⁇ n, one image having an n-dimensional information amount is m-dimensionally extracted by the feature amount extraction. And can be represented as a point on the m-dimensional space.
- the feature space generation device 73 obtains m axes extending this m-dimensional space.
- the eigenspace is used as the feature space.
- the eigenspace is a space in which m eigenvectors S extend from the larger eigenvalue of the eigenvector and eigenvalue pair obtained by principal component analysis.
- One image is regarded as an n-dimensional vector, and the beta of that image is denoted by X here.
- X the beta of that image.
- the variance-covariance matrix is calculated by the following equation (1) by inputting the respective vectors X in these multiple images.
- m x is the average base-vector of the plurality of Betatoru X
- the projection device 74 generates a projection trajectory obtained by projecting the moving image onto the feature space.
- the above-described eigenvector u is used, and the n-dimensional vector X in the image of the learning sequence information J71 is converted to the m-dimensional feature vector y obtained by converting the following equation (3). Can be used.
- the three-dimensional feature vector is y.
- the feature space generating device 73 outputs the feature space to the hypertube generating device 75, and the projecting device 74 generates the three-dimensional feature vector into the hypertube. Output to device 75.
- the hypertube generator 75 generates a hypertube based on the output feature space and the three-dimensional feature vector. As a prerequisite for generating a hypertube, one image is projected onto one point in the 3D feature space, so the image sequence representing a series of deformations is represented as a locus of points in the 3D feature space. Can be represented.
- the projection device 74 outputs a plurality of three-dimensional feature vector sequences corresponding to the number of learning sequence information J 71. Is output.
- the hypertube generator 75 classifies the plurality of three-dimensional feature vector sequences into deformation units in the learning sequence information J71 before projection (S73).
- the 3D feature vector sequence classified for each deformation unit in the learning sequence information J71 before projection is plotted in the feature space for each of these deformation units.
- the locus is determined (S74). These trajectories are represented by, for example, curves C1 to C6 shown in FIG.
- a representative locus representative of the number locus is obtained (S75).
- the representative trajectory can be obtained by various methods. Here, a method using an average of a plurality of obtained trajectories will be described. Now, since each trajectory is a trajectory representing the same type of deformation, a trajectory that is almost similar in the feature space is drawn. However, even when representing the same type of deformation, the number and arrangement of the three-dimensional feature vectors constituting the series are not necessarily the same. Now
- Figure 39 shows an example of three trajectories formed by connecting the points where the three-dimensional feature vectors are plotted in the feature space.
- three trajectories C11 to C13 showing the same deformation are shown, and the trajectory C11 is a combination of the six points P11 to P16 plotted in the feature space. It is formed by connecting.
- the trajectory C12 shows five points P21 to P25 plotted in the feature space, and the trajectory C13 shows five points P31 to P35 plotted in the feature space. Each is formed by connecting.
- the points on the trajectories C11 to C13 are re-plotted so that each trajectory is composed of the same number of points.
- a method of re-plotting the points on the trajectories C11 to C13 there are various methods such as a spline curve method, but here, the trajectories C11 to C13 are simply the same.
- the points P41 to ⁇ 47 on the locus C11, the points ⁇ 51 to ⁇ 57 on the locus C12, and the points ⁇ 61-P67 on the locus C13 are respectively arranged. become.
- the variance of the distance to the surrounding trajectory for each point of the representative trajectory is obtained (S76).
- This variance assumes a hyperplane in a direction orthogonal to the traveling direction of the representative trajectory CM at each point P71 to P77 on the representative trajectory CC, and this hyperplane intersects with each trajectory C11 to C13. It can be obtained by the variance of the distance from the point.
- the plane is not a hyperplane but a two-dimensional plane. This point will be described with reference to FIG. 41.
- FIG. 42 a circle ⁇ 2 having a radius of a value obtained by inputting ⁇ ( ⁇ ) obtained as the argument X in the equation (4) is set on the hyperplane S ⁇ 2.
- hyperplanes S ⁇ 1 and SP 3 to SP 7 are obtained by the same processing, and a circle as shown in FIG. Find E1, E3 to E7.
- the circles E1 to E7 having the radius of the value of the function with the variance as an argument are set to the respective points P71 to P77 (S77), and these circles E1 to E7 are connected.
- the hypertube HT as shown in FIG. 36 can be generated. [0130] After the hypertube is generated in this way, recognition processing can be performed. Next, the process of change recognition using the hypertube will be described.
- FIG. 43 is a flowchart showing the procedure of the change information recognition method according to the present embodiment.
- a moving image of a face including the mouth of a person to be recognized is captured and output to the projection device 74 as recognition sequence information.
- a window having an appropriate size is set for the input moving image (S81).
- a part of the output moving image is cut out in accordance with the window (S82).
- the size of the clipped window is appropriately enlarged or reduced, and finally the size of the moving image is adjusted to the size of the learning image (the image used when creating the learning sequence information J71) (S83) ).
- the moving image in the window whose size has been adjusted is mapped as a trajectory on the feature space generated by the feature space generation device 73 by the same procedure as that used to create the hypertube, and the input sequence trajectory is calculated.
- the input sequence trajectory thus generated is output to sequence comparison device 77.
- a plurality of hypertubes stored in the hypertube storage device 76 and a symbol corresponding to the hypertube are output to the sequence comparison device 77 [0 1 3 2] Series comparison device 7 7
- the input sequence trajectory output from the projection device 74 and the hypertube output from the hypertube storage device 76 are compared, and the fitness of both is determined (S85).
- the relevance of both can be determined as follows.
- the hypertube HT is a model that stochastically represents individual differences that occur for the same deformation. Since this model can be regarded as a probability density function that represents the variation of each position on the representative trajectory CC by the radius of the circle, the fitness between the input sequence trajectory and the hypertube is calculated as a probability be able to.
- FIG. 44A also shows the trajectory of the hypertube HT representing a certain deformation and the trajectory IL of the input sequence.
- the hypertube HT has a representative trajectory CC.
- the distance from the representative trajectory C on the horizontal axis with the start point of the hypertube at 0 and the end point at 1 Can be associated with a graph having a horizontal axis.
- This graph can be regarded as just a horizontal extension of the hypertube.
- the radius of the hypertube at the position X on the representative trajectory CC is defined as a function p (x) for the domain 0 ⁇ 1, and the input sequence trajectory from the position X on the representative trajectory CC
- the distance to IL is f (X)
- the fitness S i between the hypertube i and the input sequence can be expressed by the following equation (5).
- N (., D (x) is a normal probability density function with mean 0 and variance 1. According to equation (5), the fitness of the input sequence trajectory IL with the hypertube HT is You can ask.
- Such a degree of conformity is obtained for a plurality of hypertubes HT, and it is determined whether or not the calculation of the degree of conformity between all the hypertubes HT and the input sequence trajectory IL has been completed (S8). 6) If not finished, return to step S85 to calculate the degree of conformity between the other hypertube HT and the input sequence trajectory IL.
- the hypertube HT whose fitness with the input sequence locus IL is larger than a predetermined threshold is selected (S87), and the hypertube is determined. HT and the corresponding symbol are stored.
- the input sequence is the input video clipped to fit the window. Therefore, move or scale the window, and repeat the same series of processes for other parts of the input video. Therefore, it is determined whether or not the above processing has been performed for all the areas of the input moving image (S888). As a result, if there is an area that has not been processed yet, the cutout window is moved or scaled (S89), and the process returns to step S82 to repeat the same processing.
- the change information corresponding symbol information J 7 4 (FIG. 31) corresponding to the selected hypertube HT and the change information unit position information of the window ⁇ ⁇ at that time. J73 is output to an output device (not shown) (S90). In this way, a hypertube is generated from the learning sequence information, placed in a special space, and the degree of matching with the input sequence is calculated. The type can be detected.
- the moving image when a certain moving image is input, the moving image is represented as a trajectory in the feature space.
- the evaluation value of the trajectory and each hypertube is calculated.
- the force s described in the manner of recognizing speech from the movement of the mouth using the hypertube obtained from the learning sequence information and the same applies to other deformation information.
- Deformation recognition that generated a hypertube can be performed.
- the change information is a voice change obtained from the voice obtaining means
- a change in frequency from “A” to “I” or “A” to “U” can be represented by a hypertube.
- the change information is a change in the gesture in the moving image captured by the moving image capturing means
- the change from the closed state to the open state can be represented by a hypertube.
- the change information is a change in the walking state of the pedestrian captured by the moving image capturing means
- the deformation in one walking motion is represented by a hypertube. be able to.
- the change information is a change in the facial expression captured by the moving image capturing means
- the change from the non-expression to the joyful facial expression can be represented by a hypertube.
- the change information is a change of a rotating object imaged by the moving image imaging means
- a change when the face direction changes from a 0 degree state to a 90 degree state can be represented by a hypertube.
- FIG. 45 is a block diagram of the change information recognition device according to the present embodiment. As shown in FIG. 45, the change information recognition device 8 according to the present embodiment is provided with a trajectory continuity storage device 88 and a partial sequence cutout device 89, as compared with the seventh embodiment. Mainly different in point.
- the continuity storage device 88 stores the continuity of the trajectory corresponding to the representative trajectory in the hypertube.
- the continuity of the trajectory is determined based on whether or not the amount of change in the trajectory is equal to or less than a predetermined threshold value, and is determined to be continuous when the amount of change is equal to or less than a predetermined threshold value.
- the continuity storage device 88 is connected to the partial sequence cutout device 89, and the partial sequence cutout device 89 is used to store the continuity of the trajectory stored in the continuity storage device 88. 8 Output from 8.
- the learning sequence information is output to the learning device 81 as in the seventh embodiment.
- a feature space is generated in the feature space generation device 83 from the output learning sequence information J71
- a hypertube is generated in the hypertube generation device 85
- the generated hypertube is converted into a hypertube.
- the tube storage 86 stores.
- the recognition device 82 outputs information based on a moving image captured by a moving image capturing unit (not shown).
- the recognition sequence information is output in the seventh embodiment, but this embodiment is different in this respect.
- a moving image captured by a moving image capturing unit (not shown) The input sequence information J 82 composed of images is cut into partial sequence information J 83 by the partial cutout device 89.
- the continuity of the trajectory is output to the subsequence cutout device 89, and the subsequence cutout device 89 cuts out the input sequence information J82 based on the continuity of the trajectory.
- Sequence information J83 is generated.
- the input sequence information is cut out so as to form the partial sequence information of the moving image corresponding to the continuity of the change of the trajectory.
- the recognition target is moving, as shown in Fig. 47A
- the window W can correctly track the mouth M as the recognition target, projecting into the feature space as shown in Fig. 47B
- the projected trajectory C has a high degree of matching with the specific hypertube HT and draws a smooth curve in the feature space.
- the frame rate is about the same as a normal television signal (for example, 30 Hz)
- the movement of objects in the scene is slight between adjacent frames, and the change is not very sharp. Therefore, the change in the position of the tracked window W usually draws a smooth trajectory.
- the window W is deformed by moving the window W so that the continuity of the trajectory of the hypertube in the feature space and the continuity of the trajectory of the window W in the input sequence information are simultaneously satisfied.
- the moving mouth M is detected and tracked while moving, and its deformation can be detected at the same time.
- the subsystem It generates column information J83 and outputs the partial sequence information to the projection device 84.
- the projection device 84 projects the partial sequence information J83 into the feature space to generate the trajectory information J84 of the partial sequence. And outputs it to the sequence comparison device 87.
- the trajectory of the subsequence output from the projection device 74 and the hypertube output from the hypertube storage device 76 are compared, and the similarity between the two is determined in the same manner as in the seventh embodiment. Determined by the method. Then, similarly to the seventh embodiment, the symbol information J86 corresponding to the selected hypertube HT and the window position information J85 at that time are output to an output device (not shown). In this manner, the recognition and the position of the recognition target in the moving image and the type of deformation can be detected.
- the continuity storing means for storing the continuity of the trajectory since the continuity storing means for storing the continuity of the trajectory is provided, the deformation of the recognition target moving in the moving image and the The position can be reliably recognized.
- FIGS. 1 and 12 in the above embodiment a human mouth is described as an example for easy description of the embodiment. It can easily be applied to other things.
- a change information recognition device which can accurately recognize a change state of an object to be recognized and can recognize, for example, words spoken by a person, A change information recognition method can be provided.
- the present invention can be used for a change information recognition device and a change information recognition method for recognizing a change state of an object to be recognized, such as a mouth movement, a voice emitted from the mouth, or a human motion. it can.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP04726766.1A EP1619660B1 (en) | 2003-04-09 | 2004-04-09 | Change information recognition device and change information recognition method |
US11/240,598 US7302086B2 (en) | 2003-04-09 | 2005-10-03 | Change information recognition apparatus and change information recognition method |
US11/976,691 US7508959B2 (en) | 2003-04-09 | 2007-10-26 | Change information recognition apparatus and change information recognition method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003105649A JP4075670B2 (ja) | 2003-04-09 | 2003-04-09 | 変化情報認識装置および変化情報認識方法 |
JP2003-105649 | 2003-04-09 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/240,598 Continuation US7302086B2 (en) | 2003-04-09 | 2005-10-03 | Change information recognition apparatus and change information recognition method |
US11/976,691 Continuation US7508959B2 (en) | 2003-04-09 | 2007-10-26 | Change information recognition apparatus and change information recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004090867A1 true WO2004090867A1 (ja) | 2004-10-21 |
Family
ID=33156887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/005155 WO2004090867A1 (ja) | 2003-04-09 | 2004-04-09 | 変化情報認識装置および変化情報認識方法 |
Country Status (5)
Country | Link |
---|---|
US (2) | US7302086B2 (ja) |
EP (2) | EP1881484B1 (ja) |
JP (1) | JP4075670B2 (ja) |
DE (1) | DE602004022472D1 (ja) |
WO (1) | WO2004090867A1 (ja) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006293455A (ja) * | 2005-04-06 | 2006-10-26 | Dainippon Printing Co Ltd | 不適正データ確認システム |
US7551801B2 (en) * | 2006-04-17 | 2009-06-23 | Honda Motor Co., Ltd. | Classification of composite actions involving interaction with objects |
JP2008310382A (ja) * | 2007-06-12 | 2008-12-25 | Omron Corp | 読唇装置および方法、情報処理装置および方法、検出装置および方法、プログラム、データ構造、並びに、記録媒体 |
JP4922095B2 (ja) * | 2007-08-01 | 2012-04-25 | 日本放送協会 | 感情表現抽出処理装置及びプログラム |
KR100978929B1 (ko) * | 2008-06-24 | 2010-08-30 | 한국전자통신연구원 | 기준 제스처 데이터 등록방법, 이동단말의 구동방법 및이를 수행하는 이동단말 |
US8903130B1 (en) * | 2011-05-09 | 2014-12-02 | Google Inc. | Virtual camera operator |
JP5837860B2 (ja) * | 2012-06-11 | 2015-12-24 | Kddi株式会社 | 動き類似度算出装置、動き類似度算出方法およびコンピュータプログラム |
US10518480B2 (en) | 2018-04-02 | 2019-12-31 | Nanotronics Imaging, Inc. | Systems, methods, and media for artificial intelligence feedback control in additive manufacturing |
US11084225B2 (en) | 2018-04-02 | 2021-08-10 | Nanotronics Imaging, Inc. | Systems, methods, and media for artificial intelligence process control in additive manufacturing |
CN115210781A (zh) * | 2021-01-26 | 2022-10-18 | 京东方科技集团股份有限公司 | 控制方法、电子设备及存储介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05249990A (ja) * | 1992-03-04 | 1993-09-28 | Sony Corp | パターンマッチング方法およびパターン認識装置 |
JPH07146938A (ja) * | 1993-11-25 | 1995-06-06 | Omron Corp | 信号波形データ比較装置およびその方法 |
JPH07261789A (ja) * | 1994-03-22 | 1995-10-13 | Mitsubishi Electric Corp | 音声認識の境界推定方法及び音声認識装置 |
JPH08187368A (ja) * | 1994-05-13 | 1996-07-23 | Matsushita Electric Ind Co Ltd | ゲーム装置、入力装置、音声選択装置、音声認識装置及び音声反応装置 |
JPH11353468A (ja) * | 1998-06-11 | 1999-12-24 | Nippon Hoso Kyokai <Nhk> | 発話速度計測システム、方法および記録媒体 |
JP2000099071A (ja) * | 1998-09-18 | 2000-04-07 | Nippon Telegr & Teleph Corp <Ntt> | 音声認識装置及びその方法 |
JP2001147697A (ja) * | 1999-11-19 | 2001-05-29 | Matsushita Electric Ind Co Ltd | 音響データ分析方法及びその装置 |
JP2001209814A (ja) * | 2000-01-24 | 2001-08-03 | Sharp Corp | 画像処理装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07306692A (ja) * | 1994-05-13 | 1995-11-21 | Matsushita Electric Ind Co Ltd | 音声認識装置及び音声入力装置 |
JPH09305195A (ja) | 1996-05-13 | 1997-11-28 | Omron Corp | 音声認識装置および音声認識方法 |
JP3112254B2 (ja) * | 1997-03-04 | 2000-11-27 | 富士ゼロックス株式会社 | 音声検出装置 |
JPH10274516A (ja) | 1997-03-31 | 1998-10-13 | Victor Co Of Japan Ltd | 顔の方向検出装置 |
US6219539B1 (en) * | 1997-04-08 | 2001-04-17 | Nortel Networks Corporation | Systems and methods for implementing private wireless communications |
US6219639B1 (en) * | 1998-04-28 | 2001-04-17 | International Business Machines Corporation | Method and apparatus for recognizing identity of individuals employing synchronized biometrics |
JP3893763B2 (ja) * | 1998-08-17 | 2007-03-14 | 富士ゼロックス株式会社 | 音声検出装置 |
JP3403363B2 (ja) | 1999-11-01 | 2003-05-06 | 株式会社国際電気通信基礎技術研究所 | 3次元連続動作の検定装置 |
JP2002197465A (ja) | 2000-03-31 | 2002-07-12 | Fujitsu Ltd | 自動口形状検出装置とそれを用いた自動単語認識装置 |
US7076429B2 (en) * | 2001-04-27 | 2006-07-11 | International Business Machines Corporation | Method and apparatus for presenting images representative of an utterance with corresponding decoded speech |
-
2003
- 2003-04-09 JP JP2003105649A patent/JP4075670B2/ja not_active Expired - Fee Related
-
2004
- 2004-04-09 WO PCT/JP2004/005155 patent/WO2004090867A1/ja active Application Filing
- 2004-04-09 EP EP07021669A patent/EP1881484B1/en not_active Expired - Fee Related
- 2004-04-09 DE DE602004022472T patent/DE602004022472D1/de not_active Expired - Lifetime
- 2004-04-09 EP EP04726766.1A patent/EP1619660B1/en not_active Expired - Fee Related
-
2005
- 2005-10-03 US US11/240,598 patent/US7302086B2/en not_active Expired - Fee Related
-
2007
- 2007-10-26 US US11/976,691 patent/US7508959B2/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05249990A (ja) * | 1992-03-04 | 1993-09-28 | Sony Corp | パターンマッチング方法およびパターン認識装置 |
JPH07146938A (ja) * | 1993-11-25 | 1995-06-06 | Omron Corp | 信号波形データ比較装置およびその方法 |
JPH07261789A (ja) * | 1994-03-22 | 1995-10-13 | Mitsubishi Electric Corp | 音声認識の境界推定方法及び音声認識装置 |
JPH08187368A (ja) * | 1994-05-13 | 1996-07-23 | Matsushita Electric Ind Co Ltd | ゲーム装置、入力装置、音声選択装置、音声認識装置及び音声反応装置 |
JPH11353468A (ja) * | 1998-06-11 | 1999-12-24 | Nippon Hoso Kyokai <Nhk> | 発話速度計測システム、方法および記録媒体 |
JP2000099071A (ja) * | 1998-09-18 | 2000-04-07 | Nippon Telegr & Teleph Corp <Ntt> | 音声認識装置及びその方法 |
JP2001147697A (ja) * | 1999-11-19 | 2001-05-29 | Matsushita Electric Ind Co Ltd | 音響データ分析方法及びその装置 |
JP2001209814A (ja) * | 2000-01-24 | 2001-08-03 | Sharp Corp | 画像処理装置 |
Non-Patent Citations (7)
Title |
---|
ABE Y. ET AL.: "Jotikai sen'i sokubakugata HMM ni yoru on'in kijutsu", THE ACOUSTICAL SOCIETY OF JAPAN HEISEI 5 NENDO SHUKI KENKYU HAPPYOKAI, 5 October 1993 (1993-10-05), pages 9 - 10, XP002984729 * |
FUNAYAMA T. ET AL.: "Fukusu no doteki na ami no model no kyocho to sono kaobuhin chushutsu eno oyo", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS GIJUTSU KENKYU HOKOKU UPATTERN NINSHIKI.RIKAI], vol. 95, no. 446, 22 December 1995 (1995-12-22), pages 15 - 22, XP002984730 * |
KADOMARU T. ET AL.: "Koshin keijo henka no tokucho o riyoshita wasoku henka joho chushutsu ni kansuru kisoteki kento", THE ACOUSTICAL SOCIETY OF JAPAN 2000 NEN SHUKI KENKYU HAPPYOKAI KOEN RONBUNSHU I, 20 September 2000 (2000-09-20), pages 253 - 254, XP002984733 * |
SAGAYAMA S.: "Topikkusu 16 DPvs.HMM", NIPPON ONKYO GAKKAISHI, vol. 57, no. 1, 25 December 2000 (2000-12-25), pages 68, XP002984728 * |
See also references of EP1619660A4 * |
SUGAWARA K. ET AL.: "Gazo joho o toriireta tango ninshiki system no jitsujikan jitsugen", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS GIJUTSU KENKYU HOKOKU UPATTERN NINSHIKI.MEDIA RIKAI], vol. 99, no. 710, 17 March 2000 (2000-03-17), pages 57 - 63, XP002984732 * |
TAMOTO M. ET AL.: "Onsei no onkyoteki tokusei o mochiita kaohoko ninshiki", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS GIJUTSU KENKYU HOKOKU UONSEI], vol. 96, no. 364, 15 November 1996 (1996-11-15), pages 1 - 4, XP002984731 * |
Also Published As
Publication number | Publication date |
---|---|
EP1881484B1 (en) | 2009-08-05 |
JP4075670B2 (ja) | 2008-04-16 |
US20080056582A1 (en) | 2008-03-06 |
US20060029277A1 (en) | 2006-02-09 |
EP1881484A1 (en) | 2008-01-23 |
US7302086B2 (en) | 2007-11-27 |
DE602004022472D1 (de) | 2009-09-17 |
EP1619660B1 (en) | 2014-05-07 |
EP1619660A1 (en) | 2006-01-25 |
US7508959B2 (en) | 2009-03-24 |
JP2004310606A (ja) | 2004-11-04 |
EP1619660A4 (en) | 2007-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7508959B2 (en) | Change information recognition apparatus and change information recognition method | |
US7720775B2 (en) | Learning equipment and learning method, and robot apparatus | |
Matthews et al. | Extraction of visual features for lipreading | |
US11783615B2 (en) | Systems and methods for language driven gesture understanding | |
Ong et al. | Automatic sign language analysis: A survey and the future beyond lexical meaning | |
Azar et al. | Trajectory-based recognition of dynamic Persian sign language using hidden Markov model | |
JP2005044330A (ja) | 弱仮説生成装置及び方法、学習装置及び方法、検出装置及び方法、表情学習装置及び方法、表情認識装置及び方法、並びにロボット装置 | |
KR20010062767A (ko) | 정보 처리 장치, 정보 처리 방법 및 저장 매체 | |
Hassanat | Visual speech recognition | |
Kaluri et al. | An enhanced framework for sign gesture recognition using hidden Markov model and adaptive histogram technique. | |
Er-Rady et al. | Automatic sign language recognition: A survey | |
Yang et al. | Modeling dynamics of expressive body gestures in dyadic interactions | |
CN115169507B (zh) | 类脑多模态情感识别网络、识别方法及情感机器人 | |
Shinde et al. | Real time two way communication approach for hearing impaired and dumb person based on image processing | |
Ballard et al. | A multimodal learning interface for word acquisition | |
JP4518094B2 (ja) | 変化情報認識装置および変化情報認識方法 | |
KR101621304B1 (ko) | 마우스맵을 이용한 능동형태모델 기반 입술 형태 추정 방법 및 시스템 | |
Yousfi et al. | Automatic speech recognition for the holy Qur ‘an, A review | |
JPH08115408A (ja) | 手話認識装置 | |
Goutsu et al. | Multi-modal gesture recognition using integrated model of motion, audio and video | |
KR100795947B1 (ko) | 치열영상을 이용한 생체인식 시스템과 그 인식 방법 및이를 기록한 기록매체 | |
Gopinath et al. | A Survey on Hand Gesture Recognition Using Machine Learning | |
Nakamura et al. | Multimodal concept and word learning using phoneme sequences with errors | |
Kirandziska et al. | Comparing emotion recognition from voice and facial data using time invariant features | |
Demidenko et al. | Developing Automatic Markerless Sign Language Gesture Tracking and Recognition System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2004726766 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11240598 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2004726766 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11240598 Country of ref document: US |