KR20020070491A

KR20020070491A - Candidate level multi-modal integration system

Info

Publication number: KR20020070491A
Application number: KR1020027009315A
Authority: KR
Inventors: 콜메나레즈안토니오; 구타스리니바스
Original assignee: 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 2000-11-22
Filing date: 2001-11-16
Publication date: 2002-09-09
Also published as: WO2002042242A3; JP2004514970A; WO2002042242A2; EP1340187A2

Abstract

다중 모델리티 시스템은 감지 장치들과 다중 모델리티 통합 유닛을 포함한다. 감지 장치들은 후보 쌍들의 리스트를 제공하며, 각 쌍은 후보 특성 및 그 후보에 관한 신뢰성을 나타내는 확률을 포함한다. 다중 모델리티 통합 유닛은 감지 장치로부터 리스트들을 수신하고 리스트들에 응답하여 각 감지 장치에 다중 모델리티 문맥 정보를 제공한다. 그 다음 감지 장치들은 감지 또는 다른 성능을 바꾸기 위해 다중 모델리티 통합 유닛으로부터 새로운 정보를 사용하여, 반복적으로 후보 쌍들의 새로운 리스트들을 제공한다. 이 다중 모델리티 통합 유닛들의 계층을 포함하는 수퍼-시스템은 구성될 수 있다.The multi-modelity system includes sensing devices and a multi-modelity integration unit. The sensing devices provide a list of candidate pairs, each pair including a candidate characteristic and a probability indicative of the reliability of that candidate. The multi-modelity integration unit receives the lists from the sensing device and provides the multi-modelity context information to each sensing device in response to the lists. The sensing devices then repeatedly use the new information from the multi-modelity integration unit to change the sensing or other capabilities, repeatedly providing new lists of candidate pairs. A super-system comprising a hierarchy of these multiple modelability integration units can be configured.

Description

Candidate level multi-modal integration system

본 발명의 배경Background of the present invention

다중 모델리티 통합의 분야는 다수의 응용들을 가지고 있다. 이는 상세한 설명에서 보다 더 길게 논의되어질 것이다. 다중 모델리티 통합에서 종래 기술의 시도들로 몇 가지 접근법들이 채택되었다. 오디오 및 비디오에 기초한 생물학적 사람 인증에 관한 2차 국제 회의(99년 3월 , 미국, 워싱턴)에서 "핑거프린트, 얼굴 및 음성을 사용하는 다중 모델리티 시스템(A multimodal biometric system using fingerprint, face and speech)"이 A. Jain et al.에 의해, 한 예로 알려지게 되었다. 이 작업의 일반적인 접근법은 여기서 "결정 레벨 통합(decision level integration)"으로 불려질 것이다. 도 1은 "장면(scene)"(101)으로부터 취해진 데이터에 적용된 결정 레벨 통합의 개념도를 나타낸다. 장면은 102, 103에서 적어도 2개의 개별적인 모듈들에 의해 감지되고 처리된다. 각 모듈은 감지하는 동작(104), 특징(feature) 추출 동작(105) 및 인식 동작(106)을 포함한다. 각 모듈은 장면으로부터 모아진 데이터의 특징을 나타내거나 또는 라벨을 붙이는 단일-모델리티(uni-modal)("UM") "결정"(107)을 산출한다. 여기서, 용어 "특성(characterization)"은 "결정" 및 "라벨"의 개념들 둘 다를 포함하는 일반적인 용어로 의도된다.The field of multi-modelity integration has a number of applications. This will be discussed longer than in the detailed description. Several approaches have been adopted in prior art attempts at multi-modelity integration. "A multimodal biometric system using fingerprint, face and speech at the 2nd International Conference on Biological Human Authentication based on Audio and Video (Washington, USA, March 99). ) Is known by A. Jain et al. As an example. The general approach to this work will be called "decision level integration" here. 1 shows a conceptual diagram of decision level integration applied to data taken from " scene " The scene is detected and processed by at least two separate modules at 102 and 103. Each module includes a sensing operation 104, a feature extraction operation 105, and a recognition operation 106. Each module yields a uni-modal ("UM") "determination" 107 that characterizes or labels the data collected from the scene. Here, the term "characterization" is intended to be a general term that includes both concepts of "determination" and "label".

특징 추출(105)은 일반적으로 수학상의 변환 또는 소정의 알고리즘을 감지하는 단계에서 획득된 데이터에 적용하는 것을 수반한다. 인식(106)은 일반적으로 이를테면 신경 망(neural network)의 사용을 통해 몇 가지의 트레이닝을 필요로 하는 타입의 처리를 수반한다. 나중에, 다중 모델리티 통합 유닛(multi-modal integration unit)("MMI")은 최종 다중 모델리티 결정을 산출하는 방법을 결정하기 위해 다중 모델리티 발견적 교수법(heuristics) 및 또는 규칙들을 적용하며, 이는 프로세스들(102 및 103)에서 모아지고 처리된 이종 데이터(disparate data)에 기초하여 장면의 몇 가지 양상(aspect)의 특징을 나타내거나 또는 라벨을 붙인다.Feature extraction 105 generally involves applying mathematical data or data obtained in the step of sensing a predetermined algorithm. Recognition 106 generally involves a type of processing that requires some training, such as through the use of a neural network. Later, a multi-modal integration unit (“MMI”) applies multiple modelity heuristics and / or rules to determine how to produce the final multi-modelity decision, which is Characterize or label several aspects of the scene based on disparate data collected and processed in processes 102 and 103.

결정 레벨 통합은 구현이 간단하다는 이점을 갖는다. 이는 독립하여 연구되고, 개발되고, 업데이트되는 단일 모델리티 시스템들을 포함할 수 있다. 따라서 이 시스템들은 전처리기(pre-processor)들로서 동작할 수 있다. 또한 단일 모델리티 시스템들과 MMI간의 통신 채널들은 한 방향이며 작은 대역폭을 갖는다.Decision level integration has the advantage of being simple to implement. It may include single modeling systems that are independently studied, developed, and updated. Thus, these systems can operate as pre-processors. In addition, the communication channels between single modeling systems and the MMI are unidirectional and have a small bandwidth.

그러나, 결정 레벨 통합은 상이한 모델리티들 간에서 구현될 수 있는 협력의 레벨에서 한정된다. 일반적으로, 모델리티들간의 상관성은 완전히 이용되지 않으며, 따라서, 한 모델리티로부터 정보는 나머지들에 대해 행해진 결정들을 향상시키는데 사용될 수 없다. 이를테면, 2개의 여분의(redundant) 모델리티들로부터 결정들이 일치하지 않을 때 가장 신뢰성이 있는 것이 취해질 것이며 나머지는 버려질 것이고, 나머지들과 경쟁하기 때문에, 한 모델리티로 얻어진 결과들을 떨어뜨리지 않으면, 어떠한 전체적인 향상도 없다.However, decision level integration is limited at the level of cooperation that can be implemented between different modelities. In general, the correlation between modelities is not fully exploited, so information from one modelity cannot be used to improve the decisions made on the others. For example, if the determinations from two redundant modelities do not match, the most reliable one will be taken and the rest will be discarded and compete with the rest, so as not to drop the results obtained with one modelity, There is no overall improvement.

본 발명의 분야FIELD OF THE INVENTION

본 발명은 또한 다중 모델리티 통합(multi-modal integration)으로 알려진, 다수의 감지 모델리티들(multiple sensing modalities)로부터 감지된 데이터를 나타내는 필드 통합 신호들(field integrating signals)에 관한 것이며, 특히 데이터를 전처리(preprocess)하고 적어도 임시의 특성(at least a tentative characterization) 또는 그 데이터의 라벨링 또는 을 만드는 다수의 감지 모델리티들로부터의 데이터의 통합에 관한 것이다.The present invention also relates to field integrating signals representing data sensed from multiple sensing modalities, also known as multi-modal integration, and in particular to data. It relates to the integration of data from multiple sensory modelities that preprocess and at least a tentative characterization or labeling of the data.

도 1은 종래 기술의 다중 모델리티 통합 아키텍처를 도시하는 도면.1 illustrates a prior art multi-modelity integration architecture.

도 2는 종래 기술의 다중 모델리티 통합 아키텍처를 도시하는 도면.2 illustrates a prior art multi-modelity integration architecture.

도 3은 본 발명에 따른 시스템을 도시하는 도면.3 shows a system according to the invention.

도 4는 본 발명에 따른 시스템을 도시하는 도면.4 shows a system according to the invention.

도 5는 본 발명을 설명하는데 사용된 심볼들의 리스트를 도시하는 도면.5 shows a list of symbols used to illustrate the present invention.

도 6은 후보 리스트의 개발(development)을 설명하는 흐름도.6 is a flow chart illustrating the development of a candidate list.

도 7은 몇몇 MMI 장치들을 포함하는 수퍼-시스템의 개략도.7 is a schematic diagram of a super-system including several MMI devices.

도 8은 MMI의 동작을 설명하는 흐름도.8 is a flowchart for explaining an operation of an MMI.

다중 모델리티 시스템에 대하여 단일 모델리티 공헌자(contributor)들간의 데이터의 협력적인 사용을 향상함으로써 다중 모델리티 통합을 강화하고, 독립적인 단일 모델리티 시스템들로부터 전처리의 이점들을 계속 유지하는 것이 본 발명의 목적이다.Enhancing multi-modelity integration by improving the collaborative use of data between single-modelity contributors for multiple-modelity systems and continuing to maintain the benefits of preprocessing from independent single-modelity systems Purpose.

이 목적은 독립적인 단일 모델리티 시스템들이 특성 쌍들(characterization pairs)의 세트들을 생성하고, 각 쌍은 각 후보 특성 및 신뢰도 레벨을 포함한다는 점에서 달성된다. MMI는 특성 쌍들의 세트들을 수신하고 처리하며 신호들의 적어도 하나의 최종 특성을 공급한다. 최종 특성은 특성 쌍들 중 적어도 하나로부터 선택된다.This object is achieved in that independent single modeling systems generate sets of characterization pairs, each pair containing each candidate characteristic and confidence level. The MMI receives and processes sets of characteristic pairs and supplies at least one final characteristic of the signals. The final characteristic is selected from at least one of the characteristic pairs.

대안적으로, 목적은 MMI가 단일 모델리티 공헌자들로부터 후보의 특징을 나타내는 신호들을 수신하고 그것에 적어도 하나의 제어 신호를 제공한다는 점에서달성된다. 제어 신호는 처리(processing) 및/또는 감지(sensing)를 제어한다. 제어 신호는 후보의 특징을 나타내는 신호들로부터 유도된다.Alternatively, the object is achieved in that the MMI receives signals from a single modelity contributors that characterize the candidate and provides at least one control signal thereto. Control signals control processing and / or sensing. The control signal is derived from signals that characterize the candidate.

제 2 대안에서, 목적은 트레이닝 방법으로 달성된다. 방법은 트레이닝 페이스(training phase) 및 정규 동작 페이스(phase)를 포함한다.In a second alternative, the object is achieved with a training method. The method includes a training phase and a normal operating phase.

트레이닝 페이스에서, 후보 특성 신호들 및 그라운드 트루쓰(ground truths)가 수신된다. 후보 특성 신호들은 다수의 이전에 트레인된 감지하는 장치들로부터 수신되며, 그 장치들은 트레인된 프로세서들을 포함하며, 후보 특성 신호들은 초기의 물리적 실재 세팅(initial physical reality setting)으로서 생긴다. 그 다음 트레이닝 파라미터들은 최적 기준들 및 후보 특성 신호들을 평가함으로써, 물리적 실재에 관한 그라운드 트루쓰를 달성하도록 조절된다.At the training phase, candidate characteristic signals and ground truths are received. Candidate characteristic signals are received from a number of previously trained sensing devices, the devices comprising trained processors, the candidate characteristic signals occurring as an initial physical reality setting. The training parameters are then adjusted to achieve ground true with respect to physical reality by evaluating optimal criteria and candidate characteristic signals.

정규 동작 페이스에서, 다른 후보 특성 신호들은 다수의 이전의 트레인된 감지 장치들로부터 수신된다. 임시의 최종 특성 신호가 생성된다. 그 다음 적어도 하나의 제어 신호가 감지 장치들 중 적어도 하나로 피드백된다. 제어 신호는 감지 장치의 트레이닝 및/또는 성능에 있어서 변화를 야기하도록 적응된다. 정규 동작 페이스의 단계들은 특성 기준이 충족될 때까지 반복된다.At normal operation phase, other candidate characteristic signals are received from a number of previous trained sensing devices. A temporary final characteristic signal is generated. At least one control signal is then fed back to at least one of the sensing devices. The control signal is adapted to cause a change in the training and / or performance of the sensing device. The steps of normal operation phase are repeated until the characteristic criterion is met.

부가적으로, 본 발명의 목적은 다중 모델리티 통합 유닛의 위쪽으로 특성 정보를 제공하며 다중 모델리티 통합 유닛에서 아래로 다중 모델리티 문맥 정보를 수신하는 단일 모델리티 감지 장치에서 달성된다.In addition, an object of the present invention is achieved in a single modelity sensing apparatus that provides characteristic information above the multi-modelity integration unit and receives the multi-modelity context information down in the multi-modelity integration unit.

관련된 분야에서, 단일 모델리티 장치들간의 더 이상의 협력이 전처리 없이 달성된다. 이 분야는 여기서 "특징 레벨 통합"으로 불려질 것이다. 이 분야의 예는 미국 특허 번호 제 5,586,215호에서 발견되어질 것이다. 도 2는 특징 레벨 처리의 일반적인 개념을 도시한다. 게다가, 101에서, 장면이 나타난다. 202에서, 적어도 2개의 상이한 타입들의 감지가 발생한다. 그 다음 203에서, 감지된 모든 데이터는 특징 벡터를 산출하기 위해 몇몇 타입의 특징 추출을 겪게 된다. 특징 벡터는 그 다음에 204에서 몇몇 종류의 다중 모델리티를 산출하기 위해 처리되고 205에서 다중 모델리티 결정이 출력된다. 게다가, 특징 추출은 전형적으로 몇 가지 종류의 수학상의 변환 또는 소정 알고리즘을 감지된 데이터에 적용하는 것에서 생기며; 인식은 보통 신경 망의 사용과 같은, 몇 가지 종류의 트레이닝을 필요로 하는 동작이다.In the related art, further cooperation between single modeling devices is achieved without preprocessing. This field will be referred to herein as "feature level integration." Examples of this field will be found in US Pat. No. 5,586,215. 2 illustrates a general concept of feature level processing. In addition, at 101, a scene appears. At 202, at least two different types of sensing occur. Then, at 203, all sensed data undergo some type of feature extraction to yield a feature vector. The feature vector is then processed at 204 to yield some kind of multi-modelity and at 205 a multi-modelity decision is output. In addition, feature extraction typically results from applying some kind of mathematical transformation or some algorithm to the sensed data; Recognition is an action that usually requires some kind of training, such as the use of neural networks.

적용 영역들Areas of application

다중 모델리티(multi-modal) 결정을 하기 위한 다수의 응용 분야들이 있다. 하나는 립-리딩(lip-reading)이며, 여기서 음소 인식(phoneme recognition)을 위한 오디오 데이터는 화자(speaker)를 이해시키기 위한 노력으로 비디오 데이터와 결합하여 사용된다. 유사하게, 사람에 대한 식별은 오디오 및 비디오 데이터의 조합을 포함할 수도 있다.There are a number of application areas for making multi-modal decisions. One is lip-reading, where audio data for phoneme recognition is used in combination with video data in an effort to understand the speaker. Similarly, identification of a person may include a combination of audio and video data.

다중 모델리티 정보가 통합될 수도 있는 다른 영역(area)은 이미지 처리시 있을 수 있으며, 여기서 예를 들어 로컬 또는 글로벌 2-D 형상, 컬러 특성들(color characterizations), 그레이 레벨 외관(appearance), 및 텍스처 속성들(textural properties)인 상이한 이미지 양상들이 이미지의 특징을 나타낼 때 모두 고려되어질 수도 있다. 일반적으로, 감지하는 비디오 데이터는 예를 들면 특징 위치들, 특징 외관, 및 프로파일 형상들을 포함하는 핑거프린트들(fingerprints) 및 안면의 이미지들(facial images)인 꽤 많은 수의 물체(thing)들에 관한 수집되어지고(gathering) 특징을 나타내는(characterizing) 정보를 포함할 수 있다.Other areas in which multiple modelability information may be incorporated may be present in image processing, for example local or global 2-D shapes, color characterizations, gray level appearance, and Different image aspects, which are textural properties, may all be considered when characterizing an image. In general, the sensing video data is contained in a fairly large number of things, for example, fingerprints and facial images, including feature locations, feature appearance, and profile shapes. It may include information relating to gathering and characterizing.

한 카메라는 상이한 방식들로 데이터를 사용하는 접속된 프로세서내의 상이한 처리 모듈들로, 장면(scene)에 관한 하나 이상의 타입의 데이터를 모으는데 사용될 수 있다. 그 다음 이미지로부터 상이한 타입들의 정보를 모으는 모듈들은 사실상(effectively) 그들이 단일 프로세서 내에 물리적으로 수용될 수 있다 하더라도, 상이한 감지 장치들이 되게 한다.One camera may be used to gather one or more types of data about a scene with different processing modules in the connected processor that use the data in different ways. Modules that then collect different types of information from the image effectively result in different sensing devices, even though they may be physically housed within a single processor.

다른 응용들에서, 다른 타입들의 센서들로부터 신호들은 결합되어질 필요가있을 수 있다. 다중 모델리티 통합 응용에 유용할 수도 있는 다른 타입들의 센서들이 적외선 및 범위 센서들을 포함한다. 게다가, 예를 들어 마우스들, 스타일러스 타입 센서(stylus type sensor)들, 트랙 볼(track ball)들 등인 키보드들 및 프린터 장치들을 포함하는 사용자 기입 장치들(user entry devices)은 단일-모델리티 감지 장치들로서 사용될 수 있다. 다중-모델리티 통합이 유용할 수 있는 그 밖의 영역들은 마이크로폰 어레이들을 통해 음향 위치 측정(acoustic localization)과 공지된 오디오/노이즈 소스의 직접 입력에 의해 에코 해제의 사용을 포함한다. 텍스트 데이터라도 몇몇 응용들에서 사용될 수도 있다.In other applications, signals from different types of sensors may need to be combined. Other types of sensors that may be useful in multi-modelity integration applications include infrared and range sensors. In addition, user entry devices including keyboards and printer devices, such as, for example, mice, stylus type sensors, track balls, etc., may be used as single-modelity sensing devices. Can be used. Other areas where multi-modeling integration may be useful include acoustic localization through microphone arrays and the use of echo cancellation by direct input of known audio / noise sources. Even text data may be used in some applications.

일반적으로 당업자는 다중 모델리티 통합에 대해 꽤 많은 수의 응용들을 고안할 수 있다.In general, one skilled in the art can devise a fairly large number of applications for multi-modelity integration.

본 발명에 따른 시스템의 아키텍처Architecture of the system according to the invention

도 3 및 도 4는 2개의 비디오 및 오디오 단일-모델리티 시스템들을 갖는 이를테면, 비디오 회의 시스템인 시스템에 관련하여 도시된다. 그러나, 후보 레벨 통합의 개념들은 예를 들어 선행부(preceding section)에 리스트된 것인 많은 다른 응용들에 동일하게 적용될 수 있다.3 and 4 are shown in relation to a system having two video and audio single-modelity systems, such as a video conferencing system. However, the concepts of candidate level integration may equally apply to many other applications, for example those listed in the preceding section.

도 3은 본 발명에 따른 시스템의 아키텍처를 도시한다. 게다가, 센서들(301, 301' 및 302)에 의해 감지된 장면(101)이 있다. 이 센서들은 마이크로폰 및 비디오 카메라로 도시되고 있지만, 그들은 예를 들어 키보드들, 마우스들, 터치 스크린들인 사용자 기입 장치들 또는 임의의 다른 사용자 기입 장치를 포함하는, 원하는 적용 영역에 적합한 임의의 센서들일 수도 있다. 303, 304 및 305에서, 특징들은 센서들로부터 유도된 신호들로부터 추출된다. 306, 307 및 308에서 추출된 특징들은 처리되고 인식된다. 310, 312 및 314에서, 후보 결정들은 MMI(317)에 나타난다. 309, 311 및 313에서, 제어 신호들이 다중 모델리티 문맥 정보의 형태로, 박스들(306, 307 및 308)로 들어오도록(back down) 제공된다.3 shows the architecture of a system according to the invention. In addition, there is a scene 101 sensed by the sensors 301, 301 ′ and 302. While these sensors are shown as microphones and video cameras, they may be any sensors suitable for the desired area of application, including for example keyboards, mice, user writing devices that are touch screens or any other user writing device. have. At 303, 304, and 305, features are extracted from signals derived from sensors. Features extracted at 306, 307, and 308 are processed and recognized. At 310, 312, and 314, candidate decisions appear in MMI 317. At 309, 311, and 313, control signals are provided to back down boxes 306, 307, and 308 in the form of multi-modelity context information.

이 시스템에서, 특징들의 2개의 세트들은 304 및 305에서 비디오 데이터로부터 추출되어 도시된다. 이를테면, 안면의 특징 데이터는 305에서 추출될 수 있으며, 반면에 제스처(gesture) 특징 데이터는 304에서 추출될 수 있다. 박스들(305 및 308)은 박스들(304 및 307)로부터 개별적인 감지 장치로서 함께 기능한다. 따라서, 비디오 카메라(302)는 실제로 2개의 감지 장치들에 접속된다. 다시 말하자면, 단일 감지 요소(element)는 꽤 많은 수의 감지 장치들과 접속될 수 있다.In this system, two sets of features are shown extracted from video data at 304 and 305. For example, facial feature data may be extracted at 305, while gesture feature data may be extracted at 304. Boxes 305 and 308 function together as separate sensing devices from boxes 304 and 307. Thus, the video camera 302 is actually connected to two sensing devices. In other words, a single sensing element can be connected with quite a large number of sensing devices.

시스템의 비디오 부분과 대비하여, 어레이에서 다수의 마이크로폰(301 및 301')이 단일 쌍의 박스들(303 및 306)과 함께 기능한다. 따라서 박스들(303 및 306)은 제 3 감지 장치로서 이를테면 위치 데이터를 수집하기 위해 함께 기능한다. 따라서 하나 이상의 감지 요소는 단일 감지 장치를 공급할 수 있다.In contrast to the video portion of the system, multiple microphones 301 and 301 ′ in the array function with a single pair of boxes 303 and 306. The boxes 303 and 306 thus function together as a third sensing device to collect position data, for example. Thus, one or more sensing elements can supply a single sensing device.

추가의 감지 장치들은 현존하는 감지 요소들 또는 부가적인 감지 요소들에 결합되든지 안 되든지 간에 부가될 수 있다. 꽤 많은 수의 감지 요소들 및 감지 장치들이 있을 수 있다.Additional sensing devices may be added whether or not coupled to existing sensing elements or additional sensing elements. There may be a significant number of sensing elements and sensing devices.

309, 311 및 313에서 피드백된 제어 데이터는 각 감지 장치들의 성능 및/또는 트레이닝에 영향을 미칠 것이다. 이를테면, 비디오 감지 장치에 대한 제어 신호들은 감지 장치가 화상의 일부로 보는 것을 바이어스할 수 있다.The control data fed back at 309, 311 and 313 will affect the performance and / or training of the respective sensing devices. For example, control signals for the video sensing device can bias what the sensing device sees as part of the picture.

도 3에서, 감지 장치들은 MMI(317)를 갖는 동일한 프로세서(316)내에 도시되어 있다. 도 4에서, 대안적인 실시예가 도시되며, 감지 장치들(416, 417 및 418)은 MMI(417)로부터 개별적으로 수용된다. 후보 결정들을 공급하는 접속들(409-414)은 지금 외부 리드(lead)들이다.In FIG. 3, the sensing devices are shown in the same processor 316 with the MMI 317. In FIG. 4, an alternative embodiment is shown, with the sensing devices 416, 417, and 418 received separately from the MMI 417. The connections 409-414 that supply the candidate decisions are now external leads.

박스들(303-305)은 장면으로부터 수신된 데이터 상에서 특징 추출을 한다. 박스들(303-305)의 출력은 도 5에서 공식(3)마다 특징 벡터들의 형태로 있을 것이다.Boxes 303-305 perform feature extraction on data received from the scene. The output of the boxes 303-305 will be in the form of feature vectors per formula (3) in FIG. 5.

박스들(306-308)은 본 발명에 따라 후보 리스트들을 생성한다.Boxes 306-308 generate candidate lists in accordance with the present invention.

종래 기술에서, 일반적으로 단일 결정만이 식별 함수 또는 함수들의 값에 기초하여 행해진다. 식별 함수들에 관한 분야는 이를테면, K. Fukunaga에 의해,통계상의 패턴 인식에 대한 소개(Introduction to Statistical Pattern Recognition)(2판, 아카데미 출판, 10/99)에 기술된 바와 같이 잘 개발되어 있다. 단일 식별 함수가 적용되면, 이는 전형적으로 다수의 로컬 최대량을 가질 것이다. 그 다음 결정은 그 로컬 최대량의 가장 큰 것일 것이다. 다수의 식별 함수들이 적용되거나 또는 단일 식별 함수가 그 데이터의 여러 부분들에 반복적으로 적용되면, 결정은 모든 함수들 또는 함수의 응용들로부터 수신된 가장 큰 값일 것이다.In the prior art, generally only a single decision is made based on the identification function or the value of the functions. The field of identification functions is well developed as described, for example, by K. Fukunaga in Introduction to Statistical Pattern Recognition (2nd edition, Academy Publishing, 10/99). If a single identification function is applied, it will typically have multiple local maxima. The next decision will be the largest of its local maximums. If multiple identification functions are applied or if a single identification function is applied repeatedly to various parts of the data, the decision will be the largest value received from all functions or applications of the function.

양호한 실시예에서, 식별 함수들은 일반적으로 여기서 "P"로 표기된 확률 분포들일 것이다. 그러나, 당업자는 어떤 적용 영역이 선택될지라도 그 필요성들에 따라 다른 식별 함수들을 고안할 수 있을 것이다.In a preferred embodiment, the identification functions will generally be probability distributions denoted here as "P". However, those skilled in the art will be able to devise other identification functions depending on their needs no matter what application area is selected.

본 발명에 따라, 라인들 310, 312 또는 314 상에 감지 장치들로부터 후보 리스트를 공급하는 것이 바람직하다. 각 감지 장치는 도 5의 공식(1)마다 후보 리스트를 생성할 것이며, 여기서,In accordance with the present invention, it is desirable to supply a candidate list from sensing devices on lines 310, 312 or 314. Each sensing device will generate a candidate list for each formula (1) of FIG. 5, where

-는 단일 모델리티 감지 장치로부터 후보를 나타내는 변수이며,-Is a variable representing a candidate from a single modelity detection device,

k는 단일 모델리티 감지 장치들에 번호를 매기는 인덱스 변수이며,k is an index variable that numbers single modelity sensing devices,

i는 인덱스 변수이며,i is an index variable,

M_k는 단일 모델리티 감지 장치 번호 K에 대해 생성될 후보들의 수이다.M _k is the number of candidates to be generated for a single modelity sensing device number K.

도 6은 감지 장치들 내의 개개의 인식 유닛들(306-308)의 동작의 그 이상을 도시한 흐름도이다. 흐름도의 라벨들은 도 5로부터 공식 번호들을 언급한다.6 is a flow diagram illustrating further of the operation of the individual recognition units 306-308 within the sensing devices. The labels in the flowchart refer to official numbers from FIG. 5.

601에서, 공식(2)에 대한 디폴트 값들의 초기화된 리스트의 형태로 도 5의 다중 모델리티 문맥 정보의 리스트가 라인들 313, 311 및 309로부터 수신된다. 602에서, 공식(5)은 후보들(1)을 얻기 위해 적용된다. 공식(5)은 공식(4)마다 확률에 기초한 식별 함수와 MMI로부터 수신된, 공식(2)의 결과들의 곱을 나타낸다. 603에서, 몇몇 기준이 평가된다. 기준은 몇 가지 고정된 수의 반복들이 완료되었거나, 또는 후보 리스트(6)에서 어떠한 변화도 마지막 반복 이래로 달성되지 않았거나 또는 기술이 좋은 기능공에 의해 고안된 임의의 다른 적합한 기준일 수 있다. 기준이 충족되지 않으면, 그 다음 604에서 공식(6)마다 후보 쌍들의 현재 리스트는 MMI 317, 417로 전송된다. 후보 쌍 리스트는 공식(4)으로부터 신뢰성 레벨과 함께 공식(1)으로부터 후보들을 포함한다. 후보 쌍 리스트는 여기서 다른 경우에 사용된 용어 "특성 쌍들(characterization pairs)"의 예이며 라인들 310, 312 및 314상의 MMI에 제공된다. 606에서, 새로운 다중 모델리티 문맥 정보는 새롭게 제안된 후보 리스트에 기초하여, 공식(2)의 형태로 606에서 MMI로부터 수신되며 제어는 602로 리턴된다.At 601, the list of multi-modelity context information of FIG. 5 is received from lines 313, 311 and 309 in the form of an initialized list of default values for formula (2). At 602, formula (5) is applied to get candidates (1). Formula (5) represents the product of the results of formula (2), received from MMI, with an identification function based on probability, per formula (4). At 603, several criteria are evaluated. The criterion may be some fixed number of iterations completed, or any other suitable criterion in which no change in the candidate list 6 has been achieved since the last iteration, or designed by a skilled technician. If the criteria are not met, then at 604 the current list of candidate pairs per formula (6) is sent to MMI 317, 417. The candidate pair list includes candidates from formula (1) with confidence levels from formula (4). The candidate pair list is an example of the term “characterization pairs” used elsewhere herein and is provided in the MMI on lines 310, 312 and 314. At 606, the new multi-modelity context information is received from the MMI at 606 in the form of formula (2), based on the newly proposed candidate list and control is returned to 602.

기준이 충족되면, 605에서 공식(6)의 형태로 최종 세트의 후보들이 MMI로 전송된다. MMI(317, 417)는 교대로 단일 모델리티 감지 장치들로부터 후보들의 모든 조합들의 평가를 수행한다. 도 8은 MMI의 동작의 흐름도를 도시한다.If the criteria are met, then at 605 the final set of candidates in the form of formula (6) is sent to the MMI. MMI 317 and 417 alternately perform evaluation of all combinations of candidates from single modelity sensing devices. 8 shows a flowchart of the operation of the MMI.

801에서, 공식(6)마다 후보 쌍 리스트들은 단일 모델리티 감지 장치들로부터 수신된다. 각 단일 모델리티 감지 장치 k는 공식(6)마다 후보 쌍들의 리스트를 생성한다.At 801, candidate pair lists per formula (6) are received from single modelity sensing devices. Each single modelability sensing device k generates a list of candidate pairs per formula (6).

802에서, 단일 모델리티 후보들의 조합들의 리스트가 공식(7)에 나타난 바와 같이 형성된다. 조합들의 총 수는 L이며 조합들에 번호를 붙인 인덱스는 c이다.At 802, a list of combinations of single modelity candidates is formed as shown in equation (7). The total number of combinations is L and the index numbering the combinations is c.

후보들의 각 조합은 일반적으로 각각의 단일 모델리티 감지 장치들로부터 하나의 단일 모델리티 후보를 포함한다. 단일 모델리티 후보들의 각 조합들은 장면의 다중 모델리티 특성 c^*를 생성하는데 사용된다. 다중 모델리티 특성은 단일 모델리티 감지 장치들로부터 나온 특성들(1) 중 하나와 같을 수 있다. 대안적으로, 다중 모델리티 특성은 단일 모델리티 장치들에 의해 인식된 패턴들로부터 유도된 몇몇 조합 패턴을 특징으로 할 수 있다.Each combination of candidates generally includes one single modelity candidate from each single modelity sensing apparatus. Each combination of single modelity candidates is used to generate multiple modelability properties c ^* of the scene. The multi-modelity feature may be the same as one of the features 1 from the single modelity sensing devices. Alternatively, the multi-modelity feature can be characterized by several combination patterns derived from patterns recognized by single modeling devices.

다중 모델리티 특성들은 다중 모델리티 식별 함수(8)에 따라 분석된다. 이 함수는 a) 수퍼 다중 모델리티 문맥 정보 P(c)의 곱; 및 b) 공식(4)마다 모든 단일모델리티 결정들의 모든 확률들의 곱과 각 조합에 적용된 확률 함수의 곱을 평가한다. 유사하게 단일 모델리티 시스템을 사용하여, 수퍼 다중 모델리티 문맥 정보 P(c)는 몇몇 디폴트 값으로 우선 초기화될 것이다. 다행스럽게도, P(c)의 값은 그 다음 MMI로부터 보다 더 높은 레벨에서 수신된 정보에 기초하여 수정될 수 있다. 그 다음 이 수정된 값은 보다 더 높은 레벨로부터 새로운 수퍼 다중 모델리티 문맥 정보로서 공급될 것이다.Multi-modelity properties are analyzed according to the multi-modelity identification function 8. This function includes: a) the product of the super multi-modelity context information P (c); And b) for each formula (4) evaluate the product of all probabilities of all monomodelity decisions and the product of the probability function applied to each combination. Similarly, using a single modeling system, the super multi-modelity contextual information P (c) will be initialized with some default values first. Fortunately, the value of P (c) can then be modified based on the information received at a higher level from the MMI. This modified value will then be supplied as new super multi-modelity context information from a higher level.

공식(8)에서 설명된 분석에 기초하여, 수퍼 후보들은 MMI로부터 공급되게 선택된다. 이는 가능한 조합들(7)의 서브세트{c^*}이다. 수퍼 후보들은 특성 쌍들의 다른 리스트로서 제공될 것이다. 이 때 특성 쌍들은 공식(9)의 포맷을 가질 것이다.Based on the analysis described in formula (8), super candidates are selected to be supplied from the MMI. This is a subset {c ^* } of possible combinations 7. Super candidates will be provided as another list of feature pairs. The property pairs will then have the format of formula (9).

그 다음, 803에서, 기준이 테스트된다. 이 기준은 다수의 반복들, 마지막 반복이래 출력(2)의 변화의 부족, 마지막 반복이래 다중 모델리티 후보 쌍들(9)의 변화의 부족, 또는 기술이 좋은 기능공에 의해 고안된 임의의 다른 적합한 기준일 수 있다. 기준이 충족되지 않으면, 공식(2)마다 다중 모델리티 문맥 정보는 804에서 개개의 단일 모델리티 장치들에 전송된다. 단일 모델리티 장치에 전송된 값들은 전형적으로 장치가 모으는 데이터의 타입이 무엇인지에 따라 변할 것이다.Next, at 803, the criteria are tested. This criterion may be multiple iterations, lack of change in output 2 since the last iteration, lack of change in multiple modelability candidate pairs 9 since the last iteration, or any other suitable criterion devised by a skilled technician. have. If the criteria are not met, then per model (2) multi-modelity context information is sent to the individual single modeling devices at 804. The values sent to a single modelity device will typically vary depending on what type of data the device collects.

도 7은 수퍼 MMI(701)를 갖는 시스템을 도시한다. 이 경우에, 3개의 MMI들(702-704)이 있으며, 그 각각은 전에 논의된 MMI(317, 417)에 대응한다. 각 MMI는 다수의 단일 모델리티 감지 장치들(705)과 연결된다. MMI들(702-704)은 707을 통해 공식(9)마다 수퍼 후보 리스트들, 즉 특성 쌍들을 수퍼 MMI(701)로 전송하고, 수퍼 MMI(701)로부터 706을 통해 수퍼 다중 모델리티 문맥 정보 P(c)를 수신한다. 수퍼 MMI는 708에서 다른 특성 쌍들을 생성할 수 있으며, 따라서 다른 레벨의 계층(hierarchy)을 갖는, 수퍼-수퍼-MMI 시스템의 일부일 수 있다. 수퍼 MMI(70)는 MMI들이 단일 모델리티 감지 장치들을 처리(treat)하는 것처럼 MMI들을 처리하는(treating) MMI와 유사하게 동작한다.7 shows a system with a super MMI 701. In this case, there are three MMIs 702-704, each corresponding to the MMIs 317, 417 discussed previously. Each MMI is connected with a number of single modelity sensing devices 705. The MMIs 702-704 send the super candidate lists, i.e., property pairs, per formula 9 to the super MMI 701 per 707, and the super multi-modelity context information P via the 706 from the super MMI 701. (c) is received. The super MMI may generate different feature pairs at 708, and thus may be part of a super-super-MMI system, with different levels of hierarchy. Super MMI 70 operates similarly to MMI treating MMIs as MMIs treat single modelity sensing devices.

도 7에는, 3개의 단일 모델리티 감지 장치들(705)을 각각 갖는 3개의 MMI들(702)이 있다. 그러나, 당업자는 다른 수들의 구성 성분들이 있을 수 있음을 이해할 것이다. 이를테면, 수퍼 MMI는 적어도 하나의 MMI 및 적어도 하나의 독립적으로 있는(free-standing) 단일 생체유닛 감지 장치와 연결될 수 있다. 대안적으로 2개의 MMI들이 있을 수 있으며, 각각은 2개의 단일 모델리티 감지 장치들 등에 의해 공급된다.In FIG. 7, there are three MMIs 702 each having three single modelity sensing devices 705. However, those skilled in the art will appreciate that there may be other numbers of components. For example, a super MMI can be coupled with at least one MMI and at least one free-standing single biounit sensing device. There may alternatively be two MMIs, each supplied by two single modelity sensing devices or the like.

본 설명서(disclosure)를 판독하면, 다른 수정예들이 당업자들에게 명백해질 것이다. 이러한 수정예들은 디자인, 제조 및 감지된 데이터의 인식의 사용시 이미 공지되고, 그 대신에 사용될 수 있는 또는 여기서 이미 기술된 특징들에 더하여 다른 특징들을 수반할 수 있다. 청구항들이 특징들의 특정 조합들로 이 출원에서 공식화되었다 하더라도,Upon reading this disclosure, other modifications will become apparent to those skilled in the art. Such modifications may already be known in the use of design, manufacture and recognition of sensed data, and may involve other features in addition to the features that may be used or described herein. Although the claims are formulated in this application with specific combinations of features,

이것이 본 발명을 수행하는 것과 동일한 임의의 또는 모든 기술적 문제점들을 완화시키는지 어떤지 본 출원 명세서의 범위가 또한 임의의 새로운 특징 또는 명백하게 또는 함축적으로 여기서 기재된 특징들의 새로운 조합 또는 그것의 임의의 일반화를 포함함이 이해되어질 것이다. 이것에 의하여 출원인들은 새로운 청구항들이 본 출원 또는 그것으로부터 유도된 임의의 다른 출원의 소추 절차(prosecution) 동안 이러한 특징들로 공식화될 수 있음에 유념한다.The scope of the present application also includes any new features or novel combinations of features described herein, explicitly or implicitly, or any generalization thereof, whether this mitigates any or all technical problems equivalent to those of carrying out the present invention. This will be understood. Applicants note that new claims may be formulated with these features during the prosecution of the present application or any other application derived therefrom.

여기서 사용되는 단어 "포함하는(comprising)", "포함하다(comprise)" 또는 "포함하다(comprises)"는 부가적인 요소들을 배제하는 것으로 봐서는 안될 것이다. 여기서 사용되는 단수 관사 "a" 또는 "an"은 다수의 요소들을 배제하는 것으로 봐서는 안될 것이다.The words "comprising", "comprise" or "comprises" as used herein should not be viewed as excluding additional elements. The singular article “a” or “an”, as used herein, should not be regarded as excluding a number of elements.

Claims

In the multi-modal integration unit 317, 417, 702, 703, 704,

From each of the plurality of sensing devices 416-418, 303-308, a set of characterization pairs ((6)) containing each candidate characteristic (1) and each candidate characteristic Means (310, 312, 314, 410, 412, 414) for receiving an indication of each trust (4) relating, wherein said pairs of characteristics result from preprocessing in sensing devices,

Means for processing sets of characteristic pairs and for supplying 315 a final characteristic (9) of at least one of the signals received at the sensing devices, the final characteristic being selected from at least one of the characteristic pairs; Means for multi-modeling integration.

In a data processing system,

At least one sensing element 301, 301 ′, 302 adapted to receive input signals indicative of physical reality, and

A plurality of processors or processes 303-308, 416-418, adapted to supply respective characteristic signals for each sensing device that characterize the input signals, wherein the respective characteristic signals from each sensing device are the pairs of characteristic pairs. A plurality of processors or processes (303-308, 416-418), each set comprising a set, each pair of characteristics including each candidate characteristic and each confidence indication for each candidate characteristic;

With a plurality of sensing devices, including

A data processing system comprising a multi-modeling integration unit as claimed in claim 1.

3. The system of claim 2, wherein the multi-modeling integration unit employs a discriminating function to process sets of character pairs.

4. The system of claim 3, wherein the identification function is a probability distribution.

In a super-system,

At least one system as claimed in claim 2, wherein the at least one final characteristic for each system comprises a different set of respective pairs of characteristics (9);

If there is only one such system, it is at least one other uni-modal sensing device,

Receive and process different sets of feature pairs along with signals from at least one other sensing device,

Adapted to supply a super-final characteristic 708 of at least one of the signals

At least one super-multiple modelity integration unit 701, wherein the super-final feature comprises the at least one super-multiple modelability integration unit selected from at least one feature pair from another set of feature pairs , Super-system.

In the multi-modelity integration unit 702-704, 317, 417,

Receive respective candidate characteristic signals (1), (6) from each of a plurality of sensing devices 303-308, 416-418 having preprocessing capability (314, 312, 310, 410, 412, 414). Candidate reception signals are characterized by physical existence,

Means 409, 411, 413, 309, 311, 313 for supplying at least one control signal to the sensing devices for controlling processing and / or sensing at the sensing devices, and

Means for processing the candidate characteristic signals to derive at least one final characteristic signal and at least one control signal therefrom.

In a data processing system,

At least one sensing element 301, 301 ′, 302 adapted to receive input signals indicative of physical existence, and

Provide at least one candidate characteristic signal representing a characteristic of the physical entity based on input signals (414, 412, 410),

Adapted to receive (413, 411, 409) control signals for controlling processing and / or sensing

Each comprising at least one processor or process (303-308, 416-418)

Multiple sensing devices,

A data processing system comprising the multi-modelity integration unit of claim 6.

8. The system of claim 7, wherein the control signals relate to biasing the selection of a physical entity.

8. The system of claim 7, wherein the control signals are in the form of feedback from the multi-modelity integration unit to the sensing devices.

8. The system of claim 7, wherein each candidate characteristic signal comprises a respective candidate list ((1)) from each sensing device.

11. The method of claim 10, wherein each candidate list is a set of feature pairs (6), each feature pair comprising each candidate feature (1) and each candidate representation (4) relating to each candidate feature. Including, system.

8. The system of claim 7, wherein the multi-modelity integration unit employs an identification function for processing each candidate characteristic signals.

13. The system of claim 12, wherein the identification function is a probability distribution.

In a super-system,

At least one systems as claimed in claim 12,

If there is only one system, at least one other single modelity sensing device,

Receive and process any signals from at least one final characteristic signal (9), 707 from any other single modelity sensing device from at least one system,

Suitable for deriving from at least one super-final characteristic signal 701

At least one super multi-modelity integration unit (701).

A sensing device suitable for use in a multi-modelity integration system as claimed in claim 7, wherein the sensing device comprises:

Receive signals indicative of physical existence from at least one sensing element 301, 301 ′, 302,

For bidirectional communication with multi-modelity integration unit (409-414, 309-314)

Connection means,

Receive control signals (409, 411, 413, 309, 311, 313) from multiple modeling integration units 317, 417, 702-704 to control processing and / or sensing;

Provide signals 410, 412, 414, 310, 312, 314 representing a list of candidate physical properties for the multiple modeling integration unit in response to control signals.

At least one processor or process (303-308, 416-418) adapted.

The apparatus of claim 15, wherein the data representing the physical entity comprises video data.

The apparatus of claim 16, wherein the control signals bias the video data to a portion of the field of view.

In the method of training a data processing system,

The following operations on at least one data processing device:

Receiving candidate characteristic signals from a plurality of previous trained sensing devices, wherein the devices include trained processors, the candidate characteristic signals being derived from an initial physical reality setting;

Searching for signals representing ground truths about physical reality,

Adjusting training parameters to achieve ground true by evaluating optimal criteria and candidate characteristic signals,

Fill out the training pace,

After completing the training pace,

Receiving other candidate characteristic signals from a plurality of previous trained sensing devices,

Generating a temporary final characteristic signal,

Feedbacking at least one control signal to at least one of the sensing devices, wherein the control signal is adapted to cause a change in training and / or performance of at least one of the sensing devices,

Performing writing a normal operation phase comprising repeating receiving, generating, and feeding back another candidate characteristic signal until the characteristic criteria are met.