WO2002042242A2 - Candidate level multi-modal integration system - Google Patents
Candidate level multi-modal integration system Download PDFInfo
- Publication number
- WO2002042242A2 WO2002042242A2 PCT/EP2001/013414 EP0113414W WO0242242A2 WO 2002042242 A2 WO2002042242 A2 WO 2002042242A2 EP 0113414 W EP0113414 W EP 0113414W WO 0242242 A2 WO0242242 A2 WO 0242242A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- characterization
- modal
- signals
- candidate
- sensing
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
Definitions
- the invention relates to the field integrating signals representing sensed data from multiple sensing modalities, also known as multi-modal integration; and in particular to integration of data from multiple sensing modalities that preprocess data and make at least a tentative characterization or labeling of that data.
- Fig. 1 gives a conceptual view of decision level integration as applied to taking data from a "scene" 101.
- the scene is sensed and processed by at least two separate modules at 102 and 103.
- Each module includes a sensing operation 104, a feature extraction operation 105, and a recognition operation 106.
- Each module yields a uni-modal ("UM") "decision” 107, which characterizes or labels data gathered from the scene.
- UM uni-modal
- characterization is intended to be a generic term, which includes both the concepts of "decision” and "label.”
- Feature extraction 105 normally involves applying a mathematical transformation or predetermined algorithm to the data acquired in the sensing step.
- Recognition 106 normally involves a type of processing that requires some training, for instance through use of a neural network.
- a multi-modal integration unit (“MMI”) applies multi-modal heuristics and or rules to decide how to yield a final multi- modal decision, which characterizes or labels some aspect of the scene based on the disparate data gathered and processed in the processes 102 and 103.
- MMI multi-modal integration unit
- Decision level integration has the advantage of simplicity of implementation. It can incorporate uni-modal systems that are independently studied, developed, and updated. These systems thus can operate as pre-processors. Also the communication channels between the uni-modal systems and the MMI are one-way and have little bandwidth.
- Decision-level integration is limited in the level of cooperation that can be implemented between different modalities.
- correlation between modalities is not fully exploited; therefore, information from one modality cannot be used to improve decisions made on the others. For instance, when the decisions from two redundant modalities do not agree, the most confident one is to be taken and the other is to be discarded, resulting in no overall improvement, if not degrading the results obtained with one modality, because of the competition with the others.
- the independent uni-modal systems create sets of characterization pairs, each pair including a respective candidate characterization and confidence level.
- the MMI receives and processes the sets of characterization pairs and supplies at least one final characterization of the signals.
- the final characterization is chosen from at least one of the characterization pairs.
- the object is achieved in that the MMI receives candidate characterizing signals from the uni-modal contributors and provides at least one control signal thereto.
- the control signal controls processing and/or sensing.
- the control signal is derived from the candidate characterizing signals.
- the object is achieved in a training method.
- the method includes a training phase and a normal operation phase.
- candidate characterization signals and ground truths are received.
- the candidate characterization signals are from a plurality of previously trained sensing devices, which devices include trained processors, and the candidate characterization signals result from an initial physical reality setting.
- training parameters are tuned to achieve ground truths about the physical reality, by evaluating optimization criteria and the candidate characterization signals.
- further candidate characterization signals are received from the plurality of previously trained sensing devices.
- a tentative final characterization signal is created.
- at least one control signal is fed back to the at least one of the sensing devices.
- the control signal is adapted to cause a change in training and/or performance of a sensing device.
- the steps of the normal operation phase are repeated until a characterization criterion is met.
- the object of the invention is achieved in a uni-modal sensing device which provides characterization information upwards to a multi-modal integration unit, and receives multi-modal contextual information down from the multi-modal integration unit.
- Fig. 2 shows the general concept of feature level processing.
- a scene is presented.
- at 202 at least two different types of sensing occur.
- all sensed data is subjected to some type of feature extraction to yield a feature vector.
- the feature vector is then processed to yield some kind of multi-modal recognition at 204, with a multi-modal decision being output at 205.
- feature extraction typicallyresults from applying some sort of mathematical transformation or predefined algorithm to the sensed data; while recognition is usually an operation requiring some kind of training, such as use of a neural network.
- Fig. 1 shows a prior art multi-modal integration architecture.
- Fig. 2 shows a prior art multi-modal integration architecture.
- Fig. 3 shows a system in accordance with the invention.
- Fig. 4 shows a system in accordance with the invention.
- Fig. 5 illustrates a list of symbols used to explain the invention.
- Fig. 6 is a flowchart describing development of a candidate list.
- Fig. 7 is a schematic diagram of a super-system including several MMI devices.
- Fig. 8 is a flow chart describing operation of an MMI. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Application areas
- sensing video data may include gathering and characterizing information about any number of things such as fingerprints and facial images including: feature positions, feature appearance, and profile shapes.
- One camera may be used to gather more than one type of data about a scene, with different processing modules within a connected processor using the data in different ways.
- the modules that gather different types of information from an image then effectively become different sensing devices, even though they may physically be housed within a single processor.
- signals from other types of sensors may need to be combined.
- Other types of sensors that might be useful in multi-modal integration applications include infrared and range sensors.
- user entry devices including keyboards and pointer devices such as mice, stylus type sensors, track balls, and so forth, can be used as uni-modal sensing devices.
- Other areas where multi-modal integration may be useful include acoustic localization via microphone arrays and the use of echo cancellation by direct input of a known source of audio/noise. Even text data might be used in some applications.
- FIG. 3 shows an architecture of a system in accordance with the invention. Again, there is a scene 101, which is sensed by sensors 301, 301', and 302. These sensors are shown to be a microphone and a video camera, but they might be any sensors appropriate to a desired application area, including user entry devices such as keyboards, mice, touch screens, or any other user entry device. At 303, 304, and 305, features are extracted from signals derived from the sensors.
- the extracted features are processed and recognized.
- candidate decisions are presented to the MMI 317.
- control signals in the form of multi-modal contextual information, are provided back down to boxes 306, 307, and 308.
- two sets of features are shown as being extracted from the video data, at 304 and 305. For instance, facial feature data might be extracted at 305, while gesture feature data might be extracted at 304. Boxes 305 and 308 function together as a separate sensing device from boxes 304 and 307.
- the video camera 302 is actually connected to two sensing devices. In other words, a single sensing element can be connected with any number of sensing devices.
- the plurality of microphone in the array 301 and 301 ' function together with a single pair of boxes 303 and 306. Boxes 303 and 306 thus function together as a third sensing device, for instance to collect position data.
- more than one sensing element can feed a single sensing device.
- Additional sensing devices might be added ⁇ whether coupled to the existing sensing elements or to additional sensing elements. There can be any number of sensing elements and sensing devices.
- control data fed back at 309, 311, and 313 will affect the performance and/or training of the respective sensing devices. For instance, control signals to a video sensing device might bias what part of the picture the sensing device looks at .
- Fig. 3 the sensing devices are shown in the same processor 316 with the MMI 317.
- Fig. 4 an alternative embodiment is shown, where the sensing devices 416, 417, and 418 are housed separately from the MMI 417.
- the connections 409-414 that supply the candidate decisions are now external leads.
- Boxes 303-305 do feature extraction on the data received from the scene.
- the output of boxes 303-305 will be in the form of feature vectors per formula (3) from Fig. 5.
- Boxes 306-308 produce candidate lists in accordance with the invention.
- the field of discriminating functions is well- developed, for instance as described in K.
- the discriminating functions will normally be probability distributions, denominated "P" herein.
- P probability distributions
- those of ordinary skill in the art will be able to devise other discriminating functions in accordance with the needs of whatever application area is chosen.
- each sensing device should produce a candidate list per formula (1) of Fig 5, where * - is a variable representing a candidate from a uni-modal sensing device
- Fig. 6 is a flow-chart showing more of the operation of the individual recognition units, 306-308 within the sensing devices. The labels of the flowchart make reference to the formula numbers from Fig. 5.
- the list of multi-modal contextual information of Fig. 5 in the form of an initialized list of default values for formula (2) is received from the MMI on lines 313, 311, and 309.
- formula (5) is applied to get the candidates (1).
- Formula (5) expresses multiplication of the results of formula (2), received from the MMI, with a probability based discriminating function, per formula (4).
- some criterion is evaluated. The criterion could be that some fixed number of iterations have been completed, or that no change in the candidate list (6) has been achieved since the last iteration, or any other suitable criterion devised by the skilled artisan.
- the current list of candidate pairs per formula (6) is sent to the MMI 317, 417.
- the candidate pair list includes the candidates from formula (1) together with the confidence level from formula (4).
- the candidate pair list is an example of the term "characterization pairs" used elsewhere herein, and is provided to the MMI on lines 310, 312, and 314.
- new multi-modal contextual information is received from the MMI at 606 in the form of formula (2), based on the new proposed candidate list and control is returned to 602.
- a final set of candidates in the form of formula (6) is sent to the MMI.
- the MMI 317, 417 in turn performs an evaluation of all the combinations of candidates from the uni-modal sensing devices.
- Fig. 8 shows a flowchart of the operation of the MMI.
- the candidate pair lists, per formula (6) are received from the uni- modal sensing devices.
- Each uni-modal sensing device, k produces a list of candidate pairs, per equation (6).
- a list of combinations of uni-modal candidates is formed as expressed in formula (7). The total number of combinations is L and the index numbering the combinations is c.
- Each combination of candidates normally includes one uni-modal candidate from each of the uni-modal sensing devices.
- Each combinations of uni-modal candidates is used to create a multi-modal characterization c of the scene.
- the multi-modal characterization may be the same as one of the characterizations (1) coming from the uni- modal sensing devices. Alternatively, the multi-modal characterization may characterize some combination pattern derived from the patterns recognized by the uni-modal devices.
- the multi-modal characterizations are analyzed according to a multi-modal discriminating function (8).
- This function evaluates a product of a) super-multi-modal contextual information P(c); and b) a product of a probability function applied to each combination with a product of all of the probabilities of all the of the uni-modal decisions, per formula (4).
- the super-multi-modal contextual information P(c) will first be initialized to some default value.
- the value of P(c) can then be modified based on information received at a higher level from the MMI. This modified value will then be supplied as new super-multi-modal contextual information from the higher level.
- super-candidates are chosen to be supplied from the MMI. These are a subset ⁇ c ⁇ of the possible combinations (7).
- the super-candidates will be provided as another list of characterization pairs. This time the characterization pairs will have the format of formula (9).
- a criterion is tested. This criterion may be a number of iterations, lack of change of the output (2) since the last iteration, lack of change of the multi- modal candidate pairs (9) since the last iteration, or any other suitable criterion devised by the skilled artisan. If the criterion is not met, then the multi-modal contextual information, per formula (2) is sent to the individual uni-modal devices at 804. The values sent to the uni- modal device will typically vary according to what type of data that device is gathering.
- Fig. 7 shows a system with a super-MMI 701. In this case, there are three MMI' S 702-704, each of which corresponds to the MMI 317, 417 discussed before.
- Each MMI is coupled with a plurality of uni-modal sensing devices 705.
- the MMFs 702-704 send super-candidate lists, i.e. characterization pairs, per formula (9) via 707 to the super-MMI 701 and receive super-multi-modal contextual information P(c) via 706 from the super-MMI 701.
- the super-MMI may produce further characterization pairs at 708, and can therefore be part of a super-super-MMI system, with another level of hierarchy.
- the super-MMI 70 operates analogously to the MMI, treating the MMFs the way the MMI's treat uni-modal sensing devices.
- Fig. 7 there are three MMFs (702) each with three uni-modal sensing devices (705).
- the super-MMI might be coupled with at least one MMI and at least one free-standing uni-modal sensing device.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020027009315A KR20020070491A (en) | 2000-11-22 | 2001-11-16 | Candidate level multi-modal integration system |
EP01989488A EP1340187A2 (en) | 2000-11-22 | 2001-11-16 | Candidate level multi-modal integration system |
JP2002544381A JP2004514970A (en) | 2000-11-22 | 2001-11-16 | Candidate level multimodal integration system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71825500A | 2000-11-22 | 2000-11-22 | |
US09/718,255 | 2000-11-22 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002042242A2 true WO2002042242A2 (en) | 2002-05-30 |
WO2002042242A3 WO2002042242A3 (en) | 2002-11-28 |
Family
ID=24885400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2001/013414 WO2002042242A2 (en) | 2000-11-22 | 2001-11-16 | Candidate level multi-modal integration system |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1340187A2 (en) |
JP (1) | JP2004514970A (en) |
KR (1) | KR20020070491A (en) |
WO (1) | WO2002042242A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007072425A2 (en) | 2005-12-20 | 2007-06-28 | Koninklijke Philips Electronics, N.V. | Device for detecting and warning of a medical condition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5586215A (en) * | 1992-05-26 | 1996-12-17 | Ricoh Corporation | Neural network acoustic and visual speech recognition system |
EP0921509A2 (en) * | 1997-10-16 | 1999-06-09 | Navigation Technologies Corporation | System and method for updating, enhancing or refining a geographic database using feedback |
US6009199A (en) * | 1996-07-12 | 1999-12-28 | Lucent Technologies Inc. | Classification technique using random decision forests |
-
2001
- 2001-11-16 WO PCT/EP2001/013414 patent/WO2002042242A2/en not_active Application Discontinuation
- 2001-11-16 KR KR1020027009315A patent/KR20020070491A/en active IP Right Grant
- 2001-11-16 EP EP01989488A patent/EP1340187A2/en not_active Withdrawn
- 2001-11-16 JP JP2002544381A patent/JP2004514970A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5586215A (en) * | 1992-05-26 | 1996-12-17 | Ricoh Corporation | Neural network acoustic and visual speech recognition system |
US6009199A (en) * | 1996-07-12 | 1999-12-28 | Lucent Technologies Inc. | Classification technique using random decision forests |
EP0921509A2 (en) * | 1997-10-16 | 1999-06-09 | Navigation Technologies Corporation | System and method for updating, enhancing or refining a geographic database using feedback |
Non-Patent Citations (10)
Title |
---|
"DECODING OF A CONSISTENT MESSAGE USING BOTH SPEECH AND HANDWRITING RECOGNITION" IBM TECHNICAL DISCLOSURE BULLETIN, IBM CORP. NEW YORK, US, vol. 36, no. 1, 1993, pages 415-418, XP000333898 ISSN: 0018-8689 * |
A. JAIN ET AL.: "A Multimodal Biometric System using Fingerprints, Face and Speech" 2ND INT. CONF. ON AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION,, [Online] 23 - 24 March 1999, pages 182-187, XP002211742 Washington D. C., USA Retrieved from the Internet: <URL:http://www.cse.msu.edu/publications/t ech/TR/MSU-CPS-98-32.ps.Z> [retrieved on 2002-08-29] cited in the application * |
A. ROSS ET AL: "Information Fusion in Biometrics" PROC. INT. CONF. ON AUDIO- AND VIDEO-BASED PERSON AUTHENTICATION (AVBPA), [Online] 6 June 2001 (2001-06-06) - 8 June 2201 (2201-06-08), pages 354-.359, XP002211743 Halmstad, Sweden Retrieved from the Internet: <URL:http://www.cse.msu.edu/publications/t ech/TR/MSU-CSE-01-18.ps> [retrieved on 2002-08-29] * |
AALO V A ET AL: "Multilevel quantisation and fusion scheme for the decentralised detection of an unknown signal" IEE PROCEEDINGS: RADAR, SONAR & NAVIGATION, INSTITUTION OF ELECTRICAL ENGINEERS, GB, vol. 141, no. 1, 1 February 1994 (1994-02-01), pages 37-44, XP006002055 ISSN: 1350-2395 * |
BORGHYS D ET AL: "MULTILEVEL DATA FUSION FOR THE DETECTION OF TARGETS USING MULTISPECTRAL IMAGE SEQUENCES" OPTICAL ENGINEERING, SOC. OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS. BELLINGHAM, US, vol. 37, no. 2, 1 February 1998 (1998-02-01), pages 477-484, XP000742662 ISSN: 0091-3286 * |
DASARATHY B V: "FUSION STRATEGIES FOR ENHANCING DECISION RELIABILITY IN MULTISENSORENVIRONMENTS" OPTICAL ENGINEERING, SOC. OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS. BELLINGHAM, US, vol. 35, no. 3, 1 March 1996 (1996-03-01), pages 603-616, XP000597449 ISSN: 0091-3286 * |
MCCULLOUGH C L ET AL: "Multi-level sensor fusion for improved target discrimination" DECISION AND CONTROL, 1996., PROCEEDINGS OF THE 35TH IEEE CONFERENCE ON KOBE, JAPAN 11-13 DEC. 1996, NEW YORK, NY, USA,IEEE, US, 11 December 1996 (1996-12-11), pages 3674-3675, XP010214092 ISBN: 0-7803-3590-2 * |
PAVLOVIC V ET AL: "MULTIMODAL SPEAKER DETECTION USING ERROR FEEDBACK DYNAMIC BAYESIAN NETWORKS" PROCEEDINGS 2000 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. CVPR 2000. HILTON HEAD ISLAND, SC, JUNE 13-15, 2000, PROCEEDINGS OF THE IEEE COMPUTER CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, LOS ALAMITOS, CA: IEEE COMP. SOC, vol. 2 OF 2, 13 June 2000 (2000-06-13), pages 34-41, XP001035625 ISBN: 0-7803-6527-5 * |
SERPICO S B ET AL: "Structured neural networks for the classification of multisensor remote-sensing images" GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 1993. IGARSS '93. BETTER UNDERSTANDING OF EARTH ENVIRONMENT., INTERNATIONAL TOKYO, JAPAN 18-21 AUG. 1993, NEW YORK, NY, USA,IEEE, 18 August 1993 (1993-08-18), pages 907-909, XP010114470 ISBN: 0-7803-1240-6 * |
TSE MIN CHEN ET AL: "A generalized look-ahead method for adaptive multiple sequential data fusion and decision making" MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS, 1999. MFI '99. PROCEEDINGS. 1999 IEEE/SICE/RSJ INTERNATIONAL CONFERENCE ON TAIPEI, TAIWAN 15-18 AUG. 1999, PISCATAWAY, NJ, USA,IEEE, US, 15 August 1999 (1999-08-15), pages 199-204, XP010366571 ISBN: 0-7803-5801-5 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007072425A2 (en) | 2005-12-20 | 2007-06-28 | Koninklijke Philips Electronics, N.V. | Device for detecting and warning of a medical condition |
Also Published As
Publication number | Publication date |
---|---|
WO2002042242A3 (en) | 2002-11-28 |
KR20020070491A (en) | 2002-09-09 |
JP2004514970A (en) | 2004-05-20 |
EP1340187A2 (en) | 2003-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fierrez et al. | Multiple classifiers in biometrics. Part 2: Trends and challenges | |
Al-Jarrah et al. | Recognition of gestures in Arabic sign language using neuro-fuzzy systems | |
Chatzis et al. | Multimodal decision-level fusion for person authentication | |
Raheja et al. | Robust gesture recognition using Kinect: A comparison between DTW and HMM | |
Lee et al. | Kinect-based Taiwanese sign-language recognition system | |
Kumar et al. | Face and gait biometrics authentication system based on simplified deep neural networks | |
CN114764869A (en) | Multi-object detection with single detection per object | |
CN115294658A (en) | Personalized gesture recognition system and gesture recognition method for multiple application scenes | |
Li et al. | Adaptive deep feature fusion for continuous authentication with data augmentation | |
El-Henawy et al. | Online signature verification: state of the art | |
JP3998628B2 (en) | Pattern recognition apparatus and method | |
Huang et al. | Multimodal finger recognition based on asymmetric networks with fused similarity | |
Hiremath et al. | Human age and gender prediction using machine learning algorithm | |
Bature et al. | Boosted gaze gesture recognition using underlying head orientation sequence | |
Borgelt | Objective functions for fuzzy clustering | |
Darwish et al. | Hand gesture recognition for sign language: a new higher order fuzzy HMM approach | |
Sharma et al. | Multimodal classification using feature level fusion and SVM | |
Khalifa et al. | Multimodal biometric authentication using choquet integral and genetic algorithm | |
Gornale et al. | Multimodal Biometrics Data Analysis for Gender Estimation Using Deep Learning | |
García et al. | Dynamic facial landmarking selection for emotion recognition using Gaussian processes | |
WO2002042242A2 (en) | Candidate level multi-modal integration system | |
Bodyanskiy et al. | Kernel fuzzy kohonen’s clustering neural network and it’s recursive learning | |
Singh | Review on multibiometrics: classifications, normalization and fusion levels | |
Li et al. | Cross-people mobile-phone based airwriting character recognition | |
Nayak et al. | Multimodal biometric face and fingerprint recognition using neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001989488 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2002 544381 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020027009315 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 1020027009315 Country of ref document: KR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWP | Wipo information: published in national office |
Ref document number: 2001989488 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001989488 Country of ref document: EP |