CN101625675B

CN101625675B - Information processing device, information processing method and computer program

Info

Publication number: CN101625675B
Application number: CN2009101588467A
Authority: CN
Inventors: 泽田务; 小川浩明; 山田敬一
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-07-08
Filing date: 2009-07-08
Publication date: 2012-06-06
Anticipated expiration: 2029-07-08
Also published as: JP2010020374A; US20100036792A1; JP4730404B2; CN101625675A

Abstract

The invention discloses an information processing device, an information processing method and a computer program. The information processing device includes an information input unit, an event detection unit and an information synthesizing and processing unit. An observed value including user identification data is obtained based on image and sound information from the information input unit such as video camera and microphone. Target data, which is equipped with multiple user confidence degrees, is updated, and then user is identified. User identification information in the observed value is used for updating simultaneous probability of target and candidate data corresponding to relevant user. The updating value of the probability is used for calculating the user confidence degree corresponding to the target.

Description

Signal conditioning package, information processing method and computer program

Technical field

The present invention relates to signal conditioning package, information processing method and computer program.Particularly; The present invention relates to a kind of signal conditioning package; This signal conditioning package receives from the information of external world's input; For example such as image and the such information of sound, and carry out analysis, particularly handle so that just analyze in talker's position and identity etc. to external environment condition based on the information of input.The invention still further relates to a kind of information processing method that is used for carrying out such analyzing and processing at signal conditioning package.The invention further relates to a kind of computer program that is used to make signal conditioning package execution analysis processing.

Background technology

The system that between individual and signal conditioning package such as PC or robot, handles as communicating with interaction process is called man-machine interactive system.In man-machine interactive system, to signal conditioning package such as PC or robot input image information or acoustic information, and signal conditioning package is analyzed so that discern people's action for example people's motion and language based on the information of input.

When personal comminication's information, the individual not only utilizes language but also utilizes various channels, such as expression in the eyes and the expression as channel of communication.If can in machine, carry out the analysis of all channels like this, then also can reach and human with individual exchanging with exchanging of machine, and can reach this level with human.Analysis is called multi-modal interface from the interface of the information of such multiple support channels (being also referred to as mode or model) input, and multi-modal interface is in recent years by active development and research.

For example, when being transfused to and analyzing when carrying out more detailed analysis, be effective from a plurality of video cameras and a plurality of microphone input bulk information that is arranged on difference by the video camera captured image information with by the acoustic information that microphone obtains.

As specific system, for example can consider following system.Can realize such system; In said system; Signal conditioning package (televisor) is transfused to image and sound with the user before the televisor (father, mother, sister and brother) via video camera and microphone; For example analyze each user's position and speak, and carry out and the analytical information corresponding processing, for example with video camera push to make a speech user or accurately respond the speech user with which user.

Most in the past common man-machine interactive system is handled so that determinacy ground synthesizes the information from multiple support channels (mode), and confirm where corresponding a plurality of users are present in, the user be who and who send signal.The example that discloses the correlation technique of such system comprises japanese laid-open patent application 2005-271137 number and japanese laid-open patent application 2002-264051 number.

Yet the use of in system in the past, carrying out comes the disposal route of determinacy ground composite signal to lack robustness from the uncertain and asynchronous data of microphone and video camera input.Utilize this method only to obtain the lower data of accuracy.In real system, the sensor information that can in actual environment, obtain that is be the uncertain data that comprises various extraneous informations such as noise and unnecessary information from the image of video camera input with from the acoustic information of microphone input.When carrying out graphical analysis and phonetic analysis, importantly handle, so that from such sensor information, synthesize effective information effectively.

Summary of the invention

Thereby, the object of the present invention is to provide a kind of signal conditioning package, information processing method and computer program.Provide them to be used for improving robustness in the following manner and carrying out the high analysis of accuracy in following system; This system carries out from the analysis of the input information of multiple support channels (mode or model) and specifically is the processing that is used for individual around the tag system, and this mode is to carry out the probability processing to the uncertain information that comprises in various input informations such as image and the acoustic information to be estimated as the higher information processing of accuracy to be used for information synthesized.

In addition, the object of the present invention is to provide a kind of for synthetic uncertain and asynchronous positional information and identification information and the signal conditioning package, information processing method and the computer program that provide by a plurality of mode structures of statistical.When a plurality of targets of hypothesis where be positioned at and they when whom being, probability of happening (joint probability) when can in the independence of getting rid of between the target, calculate the ID that is used for all targets.This signal conditioning package, information processing method and computer program can have improvement estimated performance and the pinpoint accuracy analysis that is used for ID.

The first embodiment of the present invention is a kind of signal conditioning package that comprises the synthetic processing unit of a plurality of information input units, event detection unit and information.

The image information that provides a plurality of information input units to be used for importing to comprise real space or the information of acoustic information.

Provide the event detection unit to be used for generating event information through analyzing from the information of information input unit input, this event information comprises the user's who is present in the real space estimation identification information.

The probability distribution data that provide the synthetic processing unit of information to be used to be provided with the hypothesis relevant with user's identification information, and through upgrading based on event information and selecting hypothesis to carry out the processing that the user who is present in real space is identified.

The synthetic processing unit of information is carried out the processing that is used to upgrade the target data that comprises user's confidence information based on the user totem information that comprises in the event information, and this user's confidence information shows which user is corresponding to being provided as the target that the source takes place incident.

The synthetic processing unit of information is carried out the processing that is used to calculate user's degree of confidence through using the not simultaneous restriction of same subscriber to the processing that is used to upgrade target data.

In the signal conditioning package of the embodiment of the invention, the synthetic processing unit of information upgrades probability of happening (joint probability) when allowing the target candidate data corresponding with relative users based on the user totem information that comprises in the event information.Then, the synthetic processing unit of information is used the updating value of while probability of happening to the processing that is used to calculate user's degree of confidence corresponding with target, and carries out such processing.

In addition in the signal conditioning package of the embodiment of the invention, the synthetic processing unit of information is based on the value marginalisation of the user totem information that comprises in event information probability of happening when upgrading, to calculate the degree of confidence of the user identifier corresponding with each target.

In addition in the signal conditioning package of the embodiment of the invention, the restriction that the synthetic processing unit of information is not distributed to a plurality of targets based on same subscriber identifier (ID) comes that probability of happening (joint probability) carries out initial setting up when allowing the target candidate data corresponding with relative users.The probable value P (Xu) of probability of happening was P (Xu)=0.0 when wherein same subscriber identifier (ID) be provided with to be given the candidate data of different target; And the probable value of other target data is P (Xu)=0.0＜P≤1.0.

In the signal conditioning package of the embodiment of the invention, the synthetic processing unit of information is carried out exception processing is set in addition.That is; This exception is provided with to be handled as follows: even same subscriber identifier (ID-the unknown) is provided with to different target with respect to the non-registered users that is provided with user identifier (ID-the unknown), the probable value P (Xu) of probability of happening remains P (Xu)=0.0＜P≤1.0 simultaneously.

In addition in the signal conditioning package of the embodiment of the invention; The synthetic processing unit of information is deleted wherein, and same subscriber identifier (ID) is provided with the candidate data to different target; And only keep other candidate data, and the upgating object of the candidate data conduct of reservation based on event information only is provided.

In the signal conditioning package of the embodiment of the invention, information is synthesized the probable value that the processing unit utilization uses following formula to calculate in addition:

P(Xu _t|θ _t，zu _t，Xu _t-1)

＝R×P(θ _t，zu _t|Xu _t)P(Xu _t-1|Xu _t)P(Xu _t)/P(Xu _t-1)

Wherein R represents the normalization item.This formula is through supposing as the foundation of getting off:

Even probability P (θ _t, zu _t), observed value (zu wherein _t) be and at the corresponding event information of identification information that time t obtains, be provided as the generation source of target (θ), and when the processing that execution is used for calculating while probability of happening (joint probability), be assumed to be non-homogeneous; And

Target information [Xu], it is illustrated in the user totem information state { xu that comprises in the target data of time t _t ¹, xu _t ²..., xu _t ⁿ, and be assumed to be evenly.

In the signal conditioning package of the embodiment of the invention, the synthetic processing unit of information is through using formula: P (xu in addition ⁱ)=∑ _Xu=xuiP (Xu) when probability is represented the degree of confidence corresponding with each target, carries out the marginalisation that is used for probable value P (Xu) and handles.In this formula, i representes to be used to calculate the object identifier (tID) of probability of user's degree of confidence of user identifier.In addition, the synthetic processing unit of information utilizes this formula to calculate the probability of the degree of confidence of the representative user identifier corresponding with each target.

In addition in the signal conditioning package of the embodiment of the invention; The synthetic processing unit of information is carried out and is handled so that will when the candidate data that comprises target to be deleted is provided with, the value marginalisation of probability of happening give the reservation candidate data after the deletion target, and carries out processing so that the total value of probability of happening is normalized into 1 (one) when will be provided with to all candidate datas.

In addition in the signal conditioning package of the embodiment of the invention; When generating and add additional object to candidate data, the synthetic processing unit of information is carried out and is handled so that the state assignment corresponding with number of users given the additional candidate data that increase through the target of adding generation.Then, the synthetic processing unit of information is carried out and is handled that the value of probability of happening distributes to the additional candidate data when having candidate data now so that will be provided with.Subsequently, the synthetic processing unit of information is carried out and is handled so that the total value of probability of happening is normalized into 1 (one) when will be provided with to all candidate datas.

The second embodiment of the present invention is a kind of information processing method of carrying out at signal conditioning package of being used for.This method may further comprise the steps: by information input unit information is input in the event detection unit, wherein this information comprises image information or the acoustic information in the real space.This method is further comprising the steps of: allow the event detection unit to generate event information through analyzing from the information of information input unit input, wherein event information comprises the user's who is present in the real space estimation identification information.In addition; This method may further comprise the steps: execution information is synthetic to be handled; Wherein the synthetic processing unit of information is provided for the probability distribution data of the hypothesis of user totem information, and through upgrading based on event information and selecting hypothesis to carry out the processing that the user who is present in the real space is identified.Here, the synthetic treatment step of information comprises following substep: carry out the processing that is used to upgrade target data based on the user totem information that comprises in the event information.In this case, target data is the target data that comprises user's confidence information, and which user is this user's confidence information represent corresponding to being provided as the target that the source takes place incident.The synthetic processing unit of information is also carried out the processing that is used to calculate user's degree of confidence through using the not simultaneous restriction of same subscriber to the processing that is used to upgrade target data.

In the information processing method of the embodiment of the invention, the information synthesis step upgrades probability of happening (joint probability) when allowing the target candidate data corresponding with relative users based on the user totem information that comprises in the event information.Then, the information synthesis step is used the updating value of probability of happening simultaneously to the processing that is used to calculate user's degree of confidence corresponding with target, and carries out such processing.

One embodiment of the present of invention are a kind of computer programs that are used to allow signal conditioning package execution information processing.This computer program comprises following steps: by information input unit information is input in the event detection unit, wherein this information comprises image information or the acoustic information in the real space.This computer program is further comprising the steps of: generate event information through analyzing from the information of information input unit input, wherein this event information comprises the user's who is present in the real space estimation identification information.In addition; This computer program comprises the synthetic processing of execution information; Wherein the synthetic processing unit of information is provided for the probability distribution data of the hypothesis of user totem information, and through upgrading based on event information and selecting hypothesis to carry out the processing that the user who is present in the real space is identified.Here, the synthetic treatment step of information may further comprise the steps: carry out the processing that is used to upgrade target data based on the user totem information that comprises in the event information.In this case, target data is the target data that comprises user's confidence information, and which user is this user's confidence information represent corresponding to being provided as the target that the source takes place incident.The synthetic processing unit of information is also carried out the processing that is used to calculate user's degree of confidence through using the not simultaneous restriction of same subscriber to the processing that is used to upgrade target data.

In the computer program of the embodiment of the invention; The information synthesis step upgrades probability of happening (joint probability) when allowing the target candidate data corresponding with relative users based on the user totem information that comprises in the event information; And use the updating value of probability of happening simultaneously to the processing that is used to calculate user's degree of confidence corresponding, and carry out such processing with target.

Computer program according to the embodiment of the invention for example is the computer program that can offer general-purpose computing system, and this general-purpose computing system can perhaps be carried out various program codes through communication media through the storage medium that provides with computer-readable format.Through such program is provided with computer-readable format, on computer system, realize and the program corresponding processing.

From the explanation more specifically based on the embodiment of the invention of describing after a while and accompanying drawing, other purpose of the present invention, feature and advantage will be tangible.In this manual, system is the configuration of one group of multiple arrangement of logic, and is not limited to wherein in identical casings, provide the system of the device with independent configuration.

According to any embodiment of the present invention, the event information that comprises subscriber identity data based on the image information that obtains by video camera or microphone or acoustic information input.Execution is provided with the renewal of target data of a plurality of user's degree of confidence to generate user totem information.

Probability (joint probability) when upgrading the candidate data corresponding of target wherein with relative users based on the user totem information that comprises in the event information.The updating value of probability is used for calculating the user degree of confidence corresponding with target then simultaneously.Therefore, can pinpoint accuracy ground effectively carry out the processing that is used for ID and the hypothesis that can not make the mistake, such as thinking different targets by mistake identical user.

Description of drawings

Fig. 1 is the figure that illustrates the overview of the processing of carrying out according to the signal conditioning package of the embodiment of the invention;

Fig. 2 illustrates according to the structure of the signal conditioning package of embodiment and the figure of processing;

Fig. 3 A and 3B are used to explain the figure that is generated and be input to the information example of the synthetic processing unit 131 of sound/image by sound event detecting unit 122 or image event detecting unit 112;

Fig. 4 A to 4C is the figure that illustrates the base conditioning example of using particle filter;

Fig. 5 illustrates the figure that handles the particle structure that is provided with in the example;

Fig. 6 is the figure that illustrates the target data structure of the respective objects that comprises in the corresponding example;

Fig. 7 is the process flow diagram that is used to explain the processing sequence of being carried out by the synthetic processing unit 131 of sound/image;

Fig. 8 illustrates to be used to calculate target weight [W _TID] the figure of processing details;

Fig. 9 illustrates to be used to calculate particle weight [W _PID] the figure of processing details;

Figure 10 illustrates to be used to calculate particle weight [W _PID] the figure of processing details;

Figure 11 illustrates as target numbers n=2 (Target id (tID=0 to 1)) and is used to calculate the figure of the processing example of prior probability P during registered user's number k=3 (ID (uID=0 to 2));

Figure 12 illustrates as target numbers n=2 (0 to 1) and is used for the figure of the processing example of computing mode transition probabilities P during registered user's number k=3 (ID (0 to 2));

Figure 13 illustrates when in order observed information being observed in the processing example that is keeping independence between the target and the figure of the transformation example of the probable value of the corresponding ID of Target id (2,1,0) (0 to 2) or user's degree of confidence;

Figure 14 is the figure that illustrates the marginalisation result who obtains through the processing shown in Figure 13;

Figure 15 illustrates as target numbers n=3 (0 to 2) and carries out the figure of the example that original state is provided with during registered user's number k=3 (0 to 2) with the restriction of " same user identifier (ID) is not assigned to a plurality of targets ";

Figure 16 is the figure that illustrates according to the example of the analyzing and processing of the embodiment of the invention, and wherein, the independence between the target is excluded, and it has been applied the restriction of " same user identifier (ID) is not assigned to a plurality of targets ";

Figure 17 is the figure that illustrates by handling the marginalisation result who obtains shown in Figure 16;

Figure 18 is the figure that illustrates the processing example of the existence of deleting at least one xu that overlaps with another xu (user identifier (ID));

Figure 19 is the figure that illustrates the process that is used to delete target in the synthetic processing unit 131 of sound/image;

Figure 20 is the figure that illustrates the processing example when deletion target (tID=0) from three targets (tID=0,1,2);

Figure 21 is the figure that illustrates the processing that is used to generate fresh target in the synthetic processing unit 131 of sound/image;

Figure 22 is the figure that illustrates the processing example when target (tID=3) quilt regenerates and be added to two targets (tID=1,2); And

Figure 23 is the process flow diagram that illustrates the processing sequence when carrying out the analyzing and processing that independence is excluded between the target.

Embodiment

Hereinafter will be described signal conditioning package, information processing method and computer program according to the embodiment of the invention particularly with reference to accompanying drawing.Here, embodiments of the invention are based on invention disclosed configuration in Japanese patent application 2007-193930 number, and this application is the previous application of being submitted to by the applicant identical with the application's applicant.With reference to disclosed configuration in Japanese patent application 2007-193930 number, embodiments of the invention further improve the supposition performance that is used for identifying user through the independence of getting rid of between the target.

Hereinafter will be that preface is described embodiments of the invention with following each item:

(1) is used for based on the event information input through upgrading the processing that hypothesis obtains customer position information and user totem information; And

(2) improve the example process of the supposition performance that is used for ID through getting rid of independence between the target.

For (1), to dispose embodiments of the invention with the similar mode of disclosed mode in Japanese patent application 2007-193930 number.(2) is the improvement as embodiment of the invention advantage.

(1) is used for based on the event information input through supposing more newly arrive discovery user's the position and the processing of identifying user

The overview of the processing of carrying out according to the signal conditioning package of first embodiment of the invention is at first described with reference to Fig. 1.The signal conditioning package 100 of present embodiment comprises the sensor that is used for input environment information, such as video camera 21 and a plurality of microphones 31 to 34.Signal conditioning package 100 obtains image information and acoustic information through these sensors, comes analysis environments information based on the information of input then.Particularly, signal conditioning package 100 is analyzed by a plurality of users' 1 to 4 of label 11 to 14 expressions position with in the user's of these positions sign.

In the example shown in the figure, for example when user 1 to 4 (11 to 14) is father, mother, sister and the brother of family, 100 pairs of signal conditioning packages are analyzed from video camera 21 and the image information and the acoustic informations of a plurality of microphones 31 to 34 inputs.Then, which among father, mother, sister and the brother position of four users of signal conditioning package 100 signs, 1 to 4 existence and the user in the relevant position are.Identification process result can be used for various processing, for example is used for video camera push to make a speech user and respond the speech user from televisor.

Main processing according to the signal conditioning package 100 of this embodiment is; ID is handled and is performed as such processing: based on the input information from a plurality of information input units (video camera 21 and microphone 31 to 34), the position of identifying user and identifying user.Use sign result's processing not receive concrete restriction.Image information of importing from video camera 21 perhaps a plurality of microphones 31 to 34 or acoustic information, comprise various uncertain informations.The uncertain information that comprises in the input information of signal conditioning package 100 to these kinds according to this embodiment carries out probability to be handled, and handles so that input information is synthesized the high information of accuracy that is estimated as.Robustness is handled through this estimation and is improved, and carries out the analysis of pin-point accuracy.

The configuration example of signal conditioning package 100 has been shown in Fig. 2.Signal conditioning package 100 has as the image input block (video camera) of input equipment 111 and a plurality of sound input blocks (microphone) 121a to 121d.From image input block (video camera) 111 input image informations, and from sound input block (microphone) 121 sound import information.Signal conditioning package 100 is analyzed based on the input information of these kinds.Corresponding a plurality of sound input blocks (microphone) 121a is to the 121d all places that is arranged in as shown in fig. 1.

The acoustic information of input is input to the synthetic processing unit 131 of sound/image via sound event detecting unit 122 from a plurality of microphone 121a to 121d.Sound event detecting unit 122 is analyzed and the synthetic acoustic information of importing to 121d from a plurality of sound input blocks (microphone) 121a that is arranged in a plurality of diverse locations.Particularly; Acoustic information based on input from sound input block (microphone) 121a to 121d; Sound event detecting unit 122 generates the position of the sound that shows generation and the user totem information which user generates this sound, and user totem information is input to the synthetic processing unit 131 of sound/image.

The particular procedure of being carried out by signal conditioning package 100 for example is such processing: which user who identifies among the user 1 to 4 as shown in fig. 1 is present in which the position speech in the environment wherein a plurality of users.In other words be to carry out to handle, and handle so that allocate event takes place by the individual of source such as sounding so that carry out customer location and ID.

Sound event detecting unit 122 is analyzed the acoustic information of importing to 121d from a plurality of sound input blocks (microphone) 121a that is arranged in a plurality of diverse locations, and the positional information that generation sound generates the source is as the probability distribution data.Particularly, sound event detecting unit 122 generates expectation value and the variance data N (m relevant with Sounnd source direction _e, σ _e).Sound event detecting unit 122 generates user totem information based on the comparison process with the characteristic information of the user speech of registered in advance.Identification information also is generated as the probability estimate value.The registered in advance characteristic information relevant in sound event detecting unit 122 with a plurality of users' that should verify sound.Sound event detecting unit 122 is carried out the comparison process of sound import with registration sound, judges that sound import high probability ground is the processing of which user's speech, and calculates posterior probability or mark to all registered users.

By this way; Sound event detecting unit 122 is analyzed the acoustic information of importing to 121d from a plurality of sound input blocks (microphone) 121a that is arranged in a plurality of diverse locations; The probability distribution data that generate according to the positional information that generates the source according to sound with comprise that the user totem information of probability estimate value generates synthetic sound event information, and synthetic sound event information is input to the synthetic processing unit 131 of sound/image.

On the other hand, the image information from image input block (video camera) 111 inputs is input to the synthetic processing unit 131 of sound/image via image event detecting unit 112.Image event detecting unit 112 is analyzed from the image information of image input block (video camera) 111 inputs, extracts the people's face that comprises in the image, and the positional information of generation face is as the probability distribution data.Particularly, image event detecting unit 112 generates expectation value and the variance data N (m relevant with direction with the position of face _e, σ _e).Image event detecting unit 112 generates user totem information based on the comparison process with the characteristic information of the user face of registered in advance.Identification information also is generated as the probability estimate value.The registered in advance characteristic information relevant in image event detecting unit 112 with a plurality of users' that should verify face.112 pairs of registration characteristic information execution comparison process of image event detecting unit from the characteristic information and the face image of the image of the face area of input picture extraction.Then, image event detecting unit 112 is carried out and is handled so that judge that the image high probability ground of face area is which user's face, calculates posterior probability or mark to all registered users then.

Known in the past technology is applied to sound sign, face detection and face's identification process of execution in sound event detecting unit 122 and the image event detecting unit 112.For example, can use in the following document disclosed technology as face detection and face's identification process:

Kotaro?Sabe?and?Ken-ichi?Hidai，“Learning?of?an?Actual?TimeArbitrary?Posture?and?Face?Detector?Using?a?Pixel?DifferenceCharacteristic”，

Tenth Image Sensing Symposium Lecture Proceedings, pp.547 to 552,2004; And

The japanese laid-open patent application that name is called " Face Identification Apparatus, Face Identification Method, Recording Medium, and Robot Apparatus " discloses 2004-302644 number.

The synthetic processing unit 131 of sound/image is based on carrying out processing from the input information of sound event detecting unit 122 or image event detecting unit 112.That is unit 131 confirms where a plurality of users are present in respectively, the user be who and who send signal such as sound.This processing can be described in detail after a while.The synthetic processing unit 131 of sound/image is handled definite unit 132 with following (a) with (b) outputing to based on the input information from sound event detecting unit 122 or image event detecting unit 112:

(a) [target information] is as showing where a plurality of users are present in respectively and whose estimated information the user is; And

(b) [signal message] shows that the source takes place incident, such as the speech user.

Handle and confirm that unit 132 receives the result of the identification process of these kinds, and use the identification process result to carry out processing.For example, handle to confirm that unit 132 handles, such as video camera being pushed make a speech to user and respond the speech user from televisor.

As stated, the positional information in sound event detecting unit 122 generation sound generation sources is as the probability distribution data.Particularly, sound event detecting unit 122 generates expectation value and the variance data N (m relevant with Sounnd source direction _e, σ _e).Sound event detecting unit 122 generates user totem information based on the comparison process with the characteristic information of the user speech of registered in advance, and user totem information is input to the synthetic processing unit 131 of sound/image.Image event detecting unit 112 extracts the people's face that comprises in the image, and the positional information of generation face is as the probability distribution data.Particularly, image event inspection unit 112 generates expectation value and the variance data N (m relevant with direction with the position of face _e, σ _e).Image event detecting unit 112 generates user totem information based on the comparison process with the characteristic information of the user face of registered in advance, and user totem information is input to the synthetic processing unit 131 of sound/image.

With reference to Fig. 3 A and 3B the information example that is generated and be input to the synthetic processing unit 131 of sound/image by sound event detecting unit 122 or image event detecting unit 112 is described.Fig. 3 A shows the example of the actual environment that comprise video camera and microphone identical with the actual environment of explaining with reference to Fig. 1.A plurality of users 1 to k (201 to 20k) are present in the actual environment.In this environment, when certain user utterance, through the microphone sound import.Video camera is photographic images constantly.

Is substantially the same information by sound event detecting unit 122 with the information that image event detecting unit 112 generated and be input to the synthetic processing unit 131 of sound/image, and comprises two kinds of information shown in Fig. 3 B.That is this information comprises:

(a) customer position information; And

(b) user totem information (face's identification information or speaker identification information).

Generate these two kinds of information to each incident.When from sound input block (microphone) 121a to 121d sound import information, sound event detecting unit 122 generates (a) customer position information and (b) user totem information based on acoustic information, and information is input to the synthetic processing unit 131 of sound/image.Image event detecting unit 112 for example generates (a) customer position information and (b) user totem information based on the image information from image input block (video camera) 111 inputs according to the anchor-frame that is provided with in advance at interval, and information is input to the synthetic processing unit 131 of sound/image.In this example, a video camera is set to image input block (video camera) 111.A plurality of users' image is taken by this video camera.In this case, image event generation unit 112 generates (a) customer position informations and (b) user totem information to the corresponding a plurality of users that comprise in the image, and information is input to the synthetic processing unit 131 of sound/image.

The following processing that sound event detecting unit 122 carries out will be described:, generate (a) customer position information and (b) user totem information (speaker identification information) based on the acoustic information of input from sound input block (microphone) 121a to 121d.

Sound event detecting unit 122 generates the processing of (a) customer position information.Sound event detecting unit 122 generates estimated information based on the acoustic information of the input from sound input block (microphone) 121a to 121d, and this estimated information is relevant with the position of sending by the user of analyzing speech that is [spokesman].In other words, sound event detecting unit 122 is estimated with the spokesman that the position that belongs to is generated as and is comprised expectation value (mean value) [m _e] and variance information [σ _e] Gaussian distribution (normal distribution) data N (m _e, σ _e).

Sound event detecting unit 122 generates the processing of (b) user totem information (speaker identification information).

Characteristics of speech sounds information through to the user 1 to k of sound import and registered in advance compares processing, and sound event detecting unit 122 estimates based on the acoustic information of the input from sound input block (microphone) 121a to 121d whom the spokesman is.Particularly, sound event detecting unit 122 calculates the probability that the spokesman is a relative users 1 to k.The value of calculating through this calculating is set to (b) user totem information (speaker identification information).For example, to generate and use spokesmans are probability of relative users and the data that are provided with are provided with this data then as (b) user totem information (speaker identification information) for sound event detecting unit 122.In this case; Through carrying out the generation that following processing realizes such data: highest score is distributed to the immediate user of characteristic of registration sound property and sound import, and lowest fractional (for example 0) is distributed to the sound property user least identical with the characteristic of sound import.

Describe now based on image information and generate following two kinds of information processings from image input block (video camera) 111 inputs:

(a) customer position information; And

(b) user totem information (face's identification information).

Image event detecting unit 112 generates the processing of (a) customer position information.

Image event detecting unit 112 generates the estimated information relevant with the position of face to the corresponding face that from the image information of image input block (video camera) 111 inputs, comprises.In other words, image event detecting unit 112 will from image detection to face estimated that the position that belongs to is generated as and comprised expectation value (mean value) [m _e] and variance information [σ _e] Gaussian distribution (normal distribution) data N (m _e, σ _e).

Image event detecting unit 112 generates the processing of (b) user totem information (face's identification information).Based on image information from image input block (video camera) 111 inputs; The face that comprises in the image event detecting unit 112 detected image information, and compare to handle through the face's characteristic information to the user 1 to k of input image information and registered in advance and estimate that whose face corresponding face is.Particularly, image event detecting unit 112 calculates the probability that the corresponding face that extracts is a relative users 1 to k.The value of calculating through this calculating is set to (b) user totem information (face's identification information).For example; Image event detecting unit 112 generates through carrying out following processing that to use face be the data that the probability of relative users is provided with: the immediate user of characteristic who highest score is distributed to the face that comprises in registration face's characteristic and the input picture; And lowest fractional (for example 0) is distributed to the least identical user of characteristic of the face that comprises in face's characteristic and the input picture, and said data are set to (b) user totem information (face's identification information).

When detecting a plurality of face the image of taking from video camera; Image event detecting unit 112 generates (a) customer position information and (b) user totem information (face's identification information) according to detected corresponding face, and information is input to the synthetic processing unit 131 of sound/image.

Used a video camera as image input block 111 in this example.Yet can use the photographic images of a plurality of video cameras.Under the sort of situation; Image event detecting unit 112 generates (a) customer position information and (b) user totem information (face's identification information) to the corresponding face that comprises in the corresponding shooting picture of corresponding video camera, and information is input to the synthetic processing unit 131 of sound/image.

The processing of being carried out by the synthetic processing unit 131 of sound/image will be described.

As stated; In turn import two kinds of information shown in Fig. 3 B from sound event detecting unit 122 or image event detecting unit 112 to the synthetic processing unit 131 of sound/image, that is (a) customer position information and (b) user totem information (face's identification information or speaker identification information).As the incoming timing that is used for these kinds of information, various settings are possible.For example in possible the setting; When the new sound of input; Sound event detecting unit 122 generate and import corresponding kinds of information (a) with (b) as sound event information, and image event detecting unit 112 is according to the generation of anchor-frame cycle unit with import corresponding kinds of information (a) and (b) as image event information.

To 4C and subsequent figures the processing of being carried out by the synthetic processing unit 131 of sound/image is described with reference to Fig. 4 A.

The synthetic processing unit 131 of sound/image is provided with the probability distribution data of the hypothesis relevant with identification information with user's position, and upgrades hypothesis based on input information, to handle thus so that only stay more possible hypothesis.As the method for this processing, the synthetic processing unit 131 of sound/image is carried out the processing of having used particle filter.

The processing of having used particle filter is such processing: be provided with and the corresponding a large amount of particles of various hypothesis; In this example; Be the hypothesis relevant with identity with user's position; And based on from two kinds of information shown in Fig. 3 B of sound

event detecting unit

122 or 112 inputs of image event detecting unit, that is (a) customer position information and (b) user totem information (face's identification information or speaker identification information), increase the weight of more possible particle.

Referring now to Fig. 4 A the base conditioning example of having used particle filter is described to 4C.For example, Fig. 4 A has shown the processing example that uses particle filter to estimate the position corresponding with certain user to the example shown in the 4C.Fig. 4 A is the processing of the position that belongs in the one dimension zone of estimating user 301 on certain bar straight line to the example shown in the 4C.

Original hypothesis (H) is the even distribution of particles data shown in Fig. 4 A.Then, obtain view data 302, and obtain based on the user who obtains image 301 have probability distribution data, as the data shown in Fig. 4 B.On basis, upgrade the distribution of particles data shown in Fig. 4 A based on the probability distribution data of obtaining image.Obtain the hypothetical probabilities distributed data of the renewal shown in Fig. 4 C.Come repeatedly to carry out such processing based on input information, to obtain user's more possible positional information.

For example at [D.Schulz; D.Fox and J.Hightower; People Tracking withAnonymous and ID-sensors Using Rao-Blackwellised Particle Filters, Proc.of the International Joint Conference on Artificial Intelligence (IJCAI-03)] in described through using the details of the processing that particle filter carries out.

Fig. 4 A just only is used for user 301 position as input information wherein to the processing example shown in the 4C the processing example of view data will be described.Corresponding particle has only relevant with user 301 position information.

On the other hand; Processing according to this embodiment is such processing: based on from two kinds of information shown in Fig. 3 B of sound

event detecting unit

122 or 112 inputs of image event detecting unit; That is (a) customer position information and (b) user totem information (face's identification information or speaker identification information), distinguish whom a plurality of users' position and a plurality of user be.

Therefore; Application in this embodiment in the processing of example wave filter; Synthetic processing unit 131 settings of sound/image are the pairing a large amount of particles of whose relevant hypothesis with customer location and user, and based on coming more new particle from two kinds of information shown in Fig. 3 B of sound

event detecting unit

122 or 112 inputs of image event detecting unit.

Describe this referring now to Fig. 5 and handle the particle structure that is provided with in the example.

The synthetic processing unit 131 of sound/image has m (number that is provided with in advance) particle.In other words, these are the particles 1 to m shown in Fig. 5.For corresponding particle is provided with the particle ID (pID=1 is to m) as identifier.

Be corresponding particle setting and position and the pairing a plurality of targets of object corresponding virtual object to be identified.In this example, for example be the corresponding particle setting a plurality of targets corresponding with Virtual User, the number of these Virtual User is equal to, or greater than and is estimated as the number that is present in the real space.In a corresponding m particle, be equivalent to the number of targets destination data and remain in the object element.In the example shown in Fig. 5, in a particle, comprise n target.The target data structure of the respective objects that comprises in the corresponding particle has been shown in Fig. 6.

The corresponding target data that comprises in the corresponding particle is described with reference to Fig. 6.Fig. 6 is a target (Target id: tID=n) 311 target data structure that comprises in the particle 1 (pID=1) shown in Fig. 5.The target data of target 311 comprises following data as shown in Figure 6:

(a) probability distribution of the position corresponding [Gaussian distribution: N (m with respective objects _1n, σ _1n)]; And

(b) show that whose user's confidence information (uID) is respective objects be, that is uID _1n1=0.0, uID _1n2=0.1 ... and uID _1nk=0.5.

Incidentally, the Gaussian distribution N (m described in (a) _1n, σ _1n) in [m _1n, σ _1n] (1n) refer to the Target id among conduct and the particle ID:pID=1: the Gaussian distribution that has probability distribution that tID=n is corresponding.

In addition, [the uID in the user's confidence information (uID) shown in (b) _1n1] in (1n1) that comprise user of referring to the Target id that has among the particle ID:pID=1: tID=n be user 1 probability.In other words, it is 0.0 for user 1 probability that the data with Target id=n refer to the user, and the user is that user 2 probability is 0.1 ... and the user is that the probability of user k is 0.5.

Return with reference to Fig. 5, continue explanation about the particle that is provided with by the synthetic processing unit 131 of sound/image.As shown in Figure 5, the synthetic processing unit 131 of sound/image is provided with m (number that is provided with in advance) particle (pID=1 is to m).For being estimated as the respective objects (tID=1 is to n) that is present in the real space, corresponding particle has following target data: (a) probability distribution of the position corresponding with respective objects [Gaussian distribution: N (m, σ)]; And (b) show that whose user's confidence information (uID) is relative users be.

Import the event information shown in Fig. 3 B from sound event detecting unit 122 or image event detecting unit 122 to the synthetic processing unit 131 of sound/image; That is (a) customer position information and (b) user totem information (face's identification information or speaker identification information), and the synthetic processing unit 131 of sound/image is handled so that upgrade m particle (pID=1 is to m).

The synthetic processing unit 131 of sound/image is carried out and is handled so that new particle more; Generate (a) as showing where a plurality of users are present in respectively and the user is the target information of whose estimated information and show that (b) signal message of source like the speech user takes place incident, and information outputed to handle confirm unit 132.

Shown in the target information 305 of right-hand member among Fig. 5, target information be generated as with corresponding particle (pID=1 is to m) in the weighted sum data of the corresponding data of the respective objects (tID=1 is to n) that comprises.The weight of corresponding particle is described after a while.

Target information 305 is to show following information: the position of the target (tID=1 to n) corresponding with Virtual User that (a) is provided with in advance by the synthetic processing unit 131 of sound/image; And (b) target whom is (target be uID1 in the uIDk which).In turn upgrade target information according to more newly arriving of particle.For example, when user 1 to k moves in actual environment, relative users 1 to k assemble be be selected from n target (tID=1 is to n) in k the data that target is corresponding.

For example, the user's confidence information (uID) that comprises in the data of the top target 1 (tID=1) in the target information shown in Fig. 5 305 has the maximum probability (uID relevant with user 2 ₁₂=0.7).Therefore, the data estimation with target 1 (tID=1) is corresponding with user 2.Data [the uID that shows user's confidence information (uID) ₁₂=0.7] (the uID in ₁₂) in (12) shown the corresponding probability of user's confidence information (uID) with the user with Target id=1 2.

The data of the top target 1 (tID=1) in the target information 305 are corresponding to the highest user 2 of probability.User 2 position is estimated as existing in the scope that the probability distribution data show of in the data by the top target 1 (tID=1) in the target information 305, comprising.

By this way, (tID=1 is to n) is relevant with the respective objects that initially is set to virtual objects (Virtual User), and target information 305 shows following corresponding kinds of information: (a) position of target; And (b) target whom is (target be uID1 in the uIDk which).Therefore, when user 1 moved to k, the corresponding k bar target information convergence of respective objects (tID=1 is to n) was with corresponding to these users.

When target (tID=1 is to n) number during greater than the number of user k, existing does not have corresponding target with the user.For example, in the bottom target (tID=n) in target information 305, user's confidence information (uID) is 0.5 to the maximum, and exists the probability distribution data not have big peak value.Such data are judged as the data that are not corresponding with the specific user.Can handle so that delete such target.The processing that is used to delete target is described after a while.

As the preceding text explanation; The synthetic processing unit 131 of sound/image is carried out based on input information and is handled so that new particle more; Generate (a) as showing where a plurality of users are present in respectively and the user is the target information of whose estimated information and show that (b) signal message of source like the speech user takes place incident, and information outputed to handle confirm unit 132.

Target information is the information with reference to 305 explanations of the target information shown in Fig. 5.

Except target information, the synthetic processing unit 131 of sound/image also generates and shows that the signal message of source like the speech user takes place incident, and the output signal message.The signal message that shows incident generation source relates to sound event (data that show whose speech (being the spokesman)), and relates to image event (showing that the face that comprises in the image is the data of whose face).In this example, the result is, the signal message under the situation of image event overlaps with the signal message that user's confidence information (uID) from target information obtains.

As stated; Import the event information shown in Fig. 3 B from sound event detecting unit 122 or image event detecting unit 112 to the synthetic processing unit 131 of sound/image; That is customer position information and user totem information (face's identification information or speaker identification information); The synthetic processing unit 131 of sound/image generates (a) as showing where a plurality of users are present in respectively and the user is the target information of whose estimated information and show that (b) signal message of source like the speech user takes place incident, and information outputed to handle confirms unit 132.Hereinafter is described this processing with reference to Fig. 7 and subsequent figures.

Fig. 7 is the process flow diagram that is used to explain the processing sequence of being carried out by the synthetic processing unit 131 of sound/image.At first in step S101; Detect 112 to the event information shown in the synthetic processing unit 131 input Fig. 3 B of sound/image from sound event detecting unit 122 or image event, that is customer position information and user totem information (face's identification information or speaker identification information).

When obtaining the event information success, the synthetic processing unit 131 of sound/image continues step S102.When obtaining the event information failure, the synthetic processing unit 131 of sound/image continues step S121.Processing among the step S121 is described after a while.The process of the civilian part description of step S121 in back.

When obtaining the event information success, the synthetic processing unit 131 of sound/image carries out the particle update processing based on input information in step S102 and subsequent step.Before the particle update processing, in step S102, the hypothesis in source takes place in the incident that the synthetic processing unit 131 of sound/image is provided with in corresponding m the particle (pID=1 is to m) shown in Fig. 5.The source takes place incident for example is the speech user under the situation of sound event, and is the user with the face that is extracted under the situation of image event.

In the example shown in Fig. 5, show the tentation data (tID=xx) that the source takes place incident in the bottom of corresponding particle.In the example shown in Fig. 5, being that corresponding particle setting shows that incident generation source is which the hypothesis in the target 1 to n such as following mode:

TID=2 is used for particle 1 (pID=1),

TID=n is used for particle 2 (pID=2) ..., and

TID=n is used for particle m (pID=m).

In the example shown in Fig. 5, the target data in source takes place with two bar lines encirclements in the incident that is set to suppose, and indicates to corresponding particle.

When carrying out the particle update processing based on incoming event, the hypothesis setting in source takes place in the execution incident at every turn.

In other words, the synthetic processing unit 131 of sound/image is provided with the hypothesis that the source takes place incident for corresponding particle 1 to m.Under hypothesis; Import as the event information shown in Fig. 3 B of incident to the synthetic processing unit 131 of sound/image from sound event detecting unit 122 or image event detecting unit 112; That is (a) customer position information and (b) user totem information (face's identification information or speaker identification information), and the synthetic processing unit 131 of sound/image is handled so that upgrade m particle (pID=1 is to m).

When carrying out the particle update processing, the hypothesis that the source takes place the incident that is provided with for corresponding particle 1 to m is reset, and for corresponding particle 1 to m new hypothesis is set.As the form that hypothesis is set, can adopt any following method:

(1) is provided with at random; And

(2) internal model according to the synthetic processing unit 131 of sound/image is provided with.

Number of particles m is arranged to greater than target numbers n.Therefore, be that the hypothesis that the source takes place incident is provided with a plurality of particles according to same target wherein.For example, when target numbers n is 10, for example handle so that number of particles m is provided with written treaty 100 to 1000.

Will describe to (2) and the particular procedure example that hypothesis is handled is set according to the internal model that sound/image synthesizes processing unit 131.

At first; Through the target data that comprises in the particle that relatively the synthetic processing unit 131 of sound/image keeps and the event information that obtains from sound event detecting unit 122 or image event detecting unit 112; That is two kinds of information shown in Fig. 3 B; That is (a) customer position information and (b) user totem information (face's identification information or speaker identification information), the synthetic processing unit 131 of sound/image calculates the weight [W of respective objects _TID].The synthetic processing unit 131 of sound/image is based on the calculating weight [W of respective objects _TID] be that corresponding particle (pID=1 is to m) is provided with the hypothesis that the source takes place incident.Hereinafter will be described the particular procedure example.

Under original state, the hypothesis that the source takes place the incident that is provided with for corresponding particle (pID=1 is to m) is configured to equate.In other words; When setting has m the particle (pID=1 is to m) of n target (tID=1 is to n); The original hypothesis target (tID=1 is to n) that the source takes place the incident that is provided with for corresponding particle (pID=1 is to m) is configured to come by this way equally distribute: m/n particle is to be the particle in incident generation source with target 1 (tID=1); M/n particle is to be the particle in incident generation source with target 2 (tID=2); ..., and m/n particle is to be the particle in incident generation source with target n (tID=n).

In the step S101 shown in Fig. 7; The synthetic processing unit 131 of sound/image obtains event information from sound event detecting unit 122 or image event detecting unit 112; That is two kinds of information shown in Fig. 3 B, that is (a) customer position information and (b) user totem information (face's identification information or speaker identification information).When obtaining the event information success, in step S102, the synthetic processing unit 131 of sound/image is provided with the hypothetical target (tID=1 is to n) that the source takes place incident for a corresponding m particle (pID=1 is to m).

The hypothetical target corresponding among the description of step S102 with particle details is set.At first, the target data that comprises in the particle that event information that the synthetic processing unit 131 of sound/image is relatively imported in step S101 and the synthetic processing unit 131 of sound/image keep, and use comparative result to calculate the target weight [W of respective objects _TID].

Explanation is used to calculate target weight [W with reference to Fig. 8 _TID] the processing details.Carry out following processing of target weight calculating conduct: calculate with the respective objects 1 that is the corresponding particle setting shown in the right-hand member among Fig. 8 and arrive n corresponding target weight of n.When calculating n target weight, the synthetic processing unit 131 of sound/image at first calculates the likelihood score that synthesizes the similarity indicated value between the event information that processing unit 131 imports as the incoming event information shown in the corresponding target data of corresponding particle and Fig. 8 (1) that is from sound event detecting unit 122 or image event detecting unit 112 to sound/image.

Likelihood score computing example shown in Fig. 8 (2) is such example: through comparing a target data (tID=n) of incoming event information (1) and particle 1, calculating incident-target likelihood score.

The example that compares with a target data has been shown in Fig. 8.Yet, the corresponding target data of corresponding particle is carried out identical likelihood score computing.

Likelihood score computing (2) shown in the bottom of Fig. 8 will be described.

Shown in Fig. 8 (2), as the likelihood score computing, the synthetic processing unit 131 of sound/image at first individually calculates:

(a) likelihood score [DL] between the Gaussian distribution of conduct incident relevant and the similarity data between the target data with customer position information, and

(b) likelihood score [UL] between user's confidence information (uID) of conduct incident relevant and the similarity data between the target data with user totem information (face's identification information or speaker identification information).

The processing of likelihood score [DL] between the Gaussian distribution that is used for calculating (a) conduct incident relevant with customer position information and the similarity data between the target data is at first described.

To be expressed as N (m with the corresponding Gaussian distribution of customer position information in the incoming event information shown in (1) of Fig. 8 _e, σ _e).The corresponding Gaussian distribution of customer position information of certain target that comprises in certain particle of the internal model that will keep with the synthetic processing unit 131 of sound/image is expressed as N (m _t, σ _t).In example shown in Figure 8, the Gaussian distribution that comprises in the target data with the target n (tID=n) of particle 1 (pID=1) is expressed as N (m _t, σ _t).

Calculate as likelihood score [DL] between the Gaussian distribution of the index of similarity between the Gaussian distribution that is used to judge these two data through following equation:

DL＝N(m _t，σ _t+σ _e)×|m _e

This equation is to be used for calculating at the center m _tIn have variances sigma _t+ σ _eGaussian distribution in x=m _eThe equation of positional value.

Will be described below processing: likelihood score [UL] between the incident that calculating (b) conduct is relevant with user totem information (face's identification information or speaker identification information) and user's confidence information (uID) of the similarity data between the target data.

The confidence value (mark) of the relative users 1 to k of the user's confidence information (uID) in the incoming event information shown in Fig. 8 (1) is expressed as P _e[i]." i " arrives the corresponding variable of k with user identifier 1.

The confidence value (mark) of the relative users 1 to k that user's confidence information (uID) of certain target that comprises in certain particle of the internal model that will keep with the synthetic processing unit 131 of sound/image is corresponding is expressed as P _t[i].In example shown in Figure 8, the confidence value (mark) of the relative users 1 to k of the user's confidence information (uID) that comprises in the target data with the target n (tID=n) of particle 1 (pID=1) is expressed as P _t[i].

Calculate as likelihood score [UL] between user's confidence information (uID) of the index of similarity between the user's confidence information (uID) that is used to judge these two data through following equation:

UL＝∑P _e[i]×P _t[i]

This equation is the equation of product summation that is used for calculating the confidence value (mark) of the corresponding respective user that user's confidence information (uID) of two data comprises.The value of this summation is a likelihood score [UL] between user's confidence information (uID).

Instead, can also calculate corresponding maximum product that is value UL=arg max (P _e[i] * P _t[i]) as likelihood score [UL] between user's confidence information (uID), and this value is as likelihood score [UL] between user's confidence information (uID).

Through using this two likelihood scores; That is likelihood score [UL] between likelihood score [DL] and user's confidence information (uID) between the use Gaussian distribution, calculate incident-target likelihood score [L as similarity index between the target (tID) that comprises in incoming event information and certain particle (pID) _{PID, tID}].In other words, through using weight (α=0 is to 1) to come calculating incident-target likelihood score [L according to following equation _{PID, tID}]:

[L _pID，tID]＝UL ^α×DL ^1-α

Wherein α=0 is to 1.

For the respective objects of corresponding particle is calculated incident-target likelihood score [L _{PID, tID}].Based on incident-target likelihood score [L _{PID, tID}] calculate the target weight [W of respective objects _TID].

Be applied to calculating incident-target likelihood score [L _{PID, tID}] weight [α] can be the value of predetermined fixed, perhaps can be arranged to change according to incoming event.Can also be for example therein incoming event be under the situation of image; But for example when face detection success and can obtain the sign failure of positional information face the time; α is arranged to 0; Likelihood score (UL) between user's confidence information (uID) is arranged to 1, only comes calculating incident-target likelihood score [L according to likelihood score between Gaussian distribution [DL] _{PID, tID}], and only calculate target weight [W according to likelihood score between Gaussian distribution [DL] _TID].

Can also be for example therein incoming event be under the situation of sound; But for example when speaker identification success and can obtain spokesman's information positional information obtain failure the time; α is arranged to 0; [DL] is arranged to 1 with likelihood score between Gaussian distribution, only comes calculating incident-target likelihood score [L according to likelihood score [UL] between user's confidence information (uID) _{PID, tID}], and only calculate target weight [W according to likelihood score [UL] between user's confidence information (uID) _TID].

Be used for based on incident-target likelihood score [L _{PID, tID}] calculate target weight [W _TID] formula following:

[equation 1]

W_{tID} = Σ_{pID}^{m} W_{pID} L_{pID, tID}

In formula, [W _PID] be the particle weight that is provided with for corresponding particle.Describe after a while and be used to calculate particle weight [W _PID] processing.Under original state, for all particles (pID=1 is to m) are provided with unified value as particle weight [W _PID].

Based on incident-target likelihood score [L _{PID, tID}] target weight [W that calculates _TID] the basis on carry out the processing among the step S101 in the flow process shown in Figure 7, that is the generation incident generation source corresponding with corresponding particle supposed.Calculate with the target that is provided with for particle 1 and arrive n corresponding data of n (tID=1 is to n) as target weight [W _TID].

The source hypothetical target takes place and is set to according to target weight [W in the incident corresponding with a corresponding m particle (pID=1 is to m) _TID] ratio distribute.

For example, when n be 4 and target weight [the W that calculates according to target 1 to 4 (tID=1 to 4) _TID] when following:

Target 1: target weight=3;

Target 2: target weight=2;

Target 3: target weight=1; And

Target 4: target weight=5, the source hypothetical target is taken place for the incident of m particle be provided with as follows: 30% in m particle is incident generation source hypothetical target 1; In m particle 20% is that source hypothetical target 2 takes place incident; In m particle 10% is that source hypothetical target 3 takes place incident; And 50% in m particle is that source hypothetical target 4 takes place incident.In other words, the ratio according to target weight is distributed as the incident generation source hypothetical target that particle is provided with.

After hypothesis was set, the synthetic processing unit 131 of sound/image continued the step S103 of the flow process shown in Fig. 7.In step S103, the synthetic processing unit 131 of sound/image calculates weight that is the particle weight [W corresponding with corresponding particle _PID].As particle weight [W _PID], as stated,, but upgrade this unification value according to incident input for corresponding particle initially is provided with unified value.

Be used to calculate particle weight [W with reference to Fig. 9 and Figure 10 explanation _PID] the processing details.Particle weight [W _PID] being equivalent to the index of the hypothesis correctness that is used to judge corresponding particle, the hypothetical target in source takes place in the incident that for this reason generated.Particle weight [W _PID] being calculated as incident-target likelihood score, this incident-target likelihood score is incoming event and is the similarity between the hypothetical target in the incident generation source of a corresponding m particle (pID=1 is to m) setting.

In Fig. 9, show from sound event detecting unit 122 or image event detecting unit 112 and synthesize the event information 401 of processing unit 131 inputs, and show the particle 411 to 413 that the synthetic processing unit 131 of sound/image keeps to sound/image.In corresponding particle 411 to 413, be provided with the hypothetical target that in above-mentioned processing, is provided with, that is the hypothesis setting in source takes place in the incident among the step S102 of the process flow diagram shown in Fig. 7.In example shown in Figure 9, be that hypothetical target is following with Target Setting:

Target 2 (tID=2) 421 is used for particle 1 (pID=1) 411;

Target n (tID=n) 422 is used for particle 2 (pID=2) 412; And

Target n (tID=n) 423 is used for particle m (pID=m) 413.

In example shown in Figure 9, the particle weight [W of corresponding particle _PID] following corresponding to incident-target likelihood score:

Particle 1: event information 401 and target 2 (tID=2) incident-target likelihood score between 421;

Particle 2: event information 401 and target n (tID=n) incident-target likelihood score between 422; And

Particle m: event information 401 and target n (tID=n) incident-target likelihood score between 423.

Figure 10 shows and is used to particle 1 (pID-1) calculating particle weight [W _PID] the processing example.Being used to shown in Figure 10 (2) calculated particle weight [W _PID] the processing likelihood score computing identical with the likelihood score computing of explaining with reference to (2) among Fig. 8.In this example, handle to be performed the calculating as incident-target likelihood score, this incident-target likelihood score is used as (1) incoming event information and is selected from only has a similarity index between the hypothetical target in the particle.

As that kind of explaining with reference to (2) among Fig. 8, be such processing: individually calculate likelihood score [DL] between the Gaussian distribution of (a) conduct incident relevant and the similarity data between the target data and (b) likelihood score [UL] between user's confidence information (uID) of the conduct incident relevant and the similarity data between the target data with user totem information (face's identification information or speaker identification information) with customer position information in (2) the likelihood score computing shown in the bottom of Figure 10.

Hereinafter is described the processing of likelihood score [DL] between the Gaussian distribution that is used for calculating (a) conduct incident relevant with customer position information and the similarity data between the hypothetical target.

The Gaussian distribution corresponding with the customer position information in the incoming event information is expressed as N (m _e, σ _e), and will be expressed as N (m with the corresponding Gaussian distribution of customer position information of hypothetical target in being selected from particle _t, σ _t).Calculate likelihood score between Gaussian distribution [DL] through following equation:

DL＝N(m _t，σ _t+σ _e)×|m _e

This equation is to be used for calculating at the center m _tIn have distribution σ _t+ σ _eGaussian distribution in x=m _eThe equation of positional value.

The processing that hereinafter is described is such processing: likelihood score [UL] between the incident that calculating (b) conduct is relevant with user totem information (face's identification information or speaker identification information) and user's confidence information (uID) of the similarity data between the hypothetical target.

The confidence value (mark) of the relative users 1 to k of the user's confidence information (uID) in the incoming event information is expressed as P _e[i]." i " arrives the corresponding variable of k with user identifier 1.

The confidence value (mark) of the relative users 1 to k of the user's confidence information (uID) that is selected from the hypothetical target in the particle is expressed as P _t[i].Likelihood score between user's confidence information (uID):

UL＝∑P _e[i]×P _t[i]

Through using two likelihood scores, that is likelihood score [UL] between likelihood score [DL] and user's confidence information (uID) between Gaussian distribution, particle weight [W calculated _PID].In other words, through using weight (α=0 is to 1) to calculate particle weight [W according to following equation _PID]:

[W _pID]＝UL ^α×DL ^1-α。Wherein α is 0 to 1.

For the respective objects of corresponding particle is calculated particle weight [W _PID].

With the above-mentioned calculating incident-target likelihood score [L that is used for _{PID, tID}] processing in the same, be applied to calculate particle weight [W _PID] weight [α] can be the value of predetermined fixed, perhaps can be arranged to change according to incoming event.Can also be for example therein incoming event be under the situation of image; But for example when face detection success and can obtain the sign failure of positional information face the time; α is arranged to 0; Likelihood score (UL) between user's confidence information (uID) is arranged to 1, and only calculates particle weight [W according to likelihood score between Gaussian distribution [DL] _PID].Can also be for example therein incoming event be under the situation of sound; But for example when speaker identification success and can obtain spokesman's information positional information obtain failure the time; α is arranged to 0; [DL] is arranged to 1 with likelihood score between Gaussian distribution, and only calculates particle weight [W according to likelihood score [UL] between user's confidence information (uID) _PID].

By this way, the particle weight [W corresponding among the step S103 in the process flow diagram of execution graph 7 with corresponding particle _PID] calculating, as processing with reference to Fig. 9 and Figure 10 explanation.In step S104, the synthetic processing unit 131 of sound/image is based on the particle weight [W of the corresponding particle that is provided with among the step S103 subsequently _PID] carry out the processing that is used for the particle resampling.

The particle resampling is handled and is performed as such processing: according to particle weight [W _PID] from m particle, select particle.Particularly, when number of particles m was 5, it is following that the particle weight is set:

Particle 1: particle weight [W _PID]=0.40;

Particle 2: particle weight [W _PID]=0.10;

Particle 3: particle weight [W _PID]=0.25;

Particle 4: particle weight [W _PID]=0.05; And

Particle 5: particle weight [W _PID]=0.20.

In this case, according to 40% pair of particle 1 resampling of probability, and according to 10% pair of particle 2 resampling of probability.

In fact, m is the same big with 100 to 1000.The result of resampling comprises the particle corresponding to the distribution ratio of particle weight.

According to this processing, reservation has macroparticle weight [W _PID] a large amount of particles.

Even after resampling, total number of particles [m] does not still become.

After resampling, the weight [W of corresponding particle _PID] be reset.

According to the input of new events from step S101 re-treatment.

In step S105, the synthetic processing unit 131 of sound/image is carried out and is handled so that upgrade the target data (customer location and user's degree of confidence) that comprises in the corresponding particle.As preceding text were explained with reference to Fig. 6 etc., respective objects comprised following data:

(a) customer location: the probability distribution of the location corresponding [Gaussian distribution: N (m with respective objects _t, σ _t)]; And

(b) user's degree of confidence: respective objects is relative users 1 probable value (mark) to k, as showing that whose user's confidence information (uID): Pt [i] (i=1 is to k) is respective objects be, that is:

uID _t1＝Pt[1]，

uID _t2＝Pt[2]，

..., and

uID _tk＝Pt[k]。

To the updating target data among each the data execution in step S105 in (a) customer location and (b) the user's degree of confidence.The processing that is used for upgrading (a) customer location is at first described.

Refreshing of customer location: the renewal of customer location is performed as the update processing in two stages, that is:

(a1) to the update processing of all intended application of all particles; And

(a2) to the incident that is corresponding particle setting the update processing that the source hypothetical target is used takes place.

To the incident that is selected as the target of source hypothetical target and all targets in other target taking place, carries out (a1) update processing to all intended application of all particles.

Carry out this processing based on following hypothesis: the deviation of customer location is along with the time expands in the past.According to positional information, upgrade customer location through using Kalman filter since update processing last time Time And Event in the past.

The example of the update processing under the one dimension positional information situation is described.To be [dt] at first, and calculate the prediction distribution of the customer location of all targets after dt since the time representation in the past of update processing last time.

In other words, be described below and upgrade Gaussian distribution N (m as the customer location deviation information _t, σ _t) expectation value (mean value) [m _t] and variance [σ _t].

m _t＝m _t+xc×dt

σ _t ²＝σ _t ²+σc ²×dt

Wherein:

m _tBe prediction expectation value (predicted state),

σ _t ²Be prediction covariance (prediction estimate covariance),

Xc is mobile message (controlling models), and

σ c ²Be noise (process noise).

When do not have the condition that moves the user under, carrying out, can xc be arranged to 0 and carry out update processing.Upgrade Gaussian distribution N (m according to this computing as the customer position information that comprises in all targets _t, σ _t).

About the target that the incident generation source that is provided with as the corresponding particle of respectively doing for oneself is supposed, show the Gaussian distribution N (m of the customer location that from the event information of sound

event detecting unit

122 or 112 inputs of image event detecting unit, comprises through use _e, σ _e) carry out update processing.

Kalman gain is expressed as K, with incoming event information N (m _e, σ _e) in the observed value (observation state) that comprises be expressed as m _e, and with incoming event information N (m _e, σ _e) in the observed value (observation covariance) that comprises be expressed as σ _e ²It is following to carry out update processing.

K＝σ _t ²/(σ _t ²+σ _e ²)

m _t＝m _t+K(xc-m _t)

σ _t ^2＝(1-K)σ _t ²

(b) processing that is used to upgrade user's degree of confidence that is performed as the processing that is used to upgrade target data is described.

Target data comprises also that except customer position information respective objects is the probable value (mark) [Pt [i] (i=1 is to k)] of relative users 1 to k, as showing that whose user's confidence information (uID) is respective objects be.In step S105, the synthetic processing unit 131 of sound/image is also handled so that upgrade user's confidence information (uID).

Through the applicable value scope is the renewal of user's confidence information (uID) [Pt [i] (i=1 is to k)] of 0 to 1 renewal ratio [β] target of carrying out comprising in the corresponding particle, and this renewal ratio [β] is to be provided with in advance according to all registered users' posterior probability and user's confidence information (uID) [Pe [i] (i=1 is to k)] of from the event information of sound

event detecting unit

122 or 112 inputs of image event detecting unit, comprising.

Carry out the renewal of user's confidence information (uID) [Pt [i] (i=1 is to k)] of target according to following equation:

Pt [i]=(1-β) * Pt [i]+β * Pe [i], wherein i is 1 to k and β is 0 to 1.

Upgrading ratio [β] is that scope is 0 to 1 value and is provided with in advance.

In step S105, the synthetic processing unit 131 of sound/image is based on the following data and the corresponding particle weight [W that comprise in the target data of upgrading _PID] generate target information and target information outputed to processing and confirm unit 132:

(a) customer location: the probability distribution of the location corresponding [Gaussian distribution N (m with respective objects _t, σ _t)], and

(b) user's degree of confidence: respective objects is the probable value (mark) of relative users 1 to k, and these probable values (mark) are used as and show that whose user's confidence information (uID): Pt [i] (i=1 is to k) is respective objects be, that is:

uID _t1＝Pt[1]，

uID _t2＝Pt[2]，

..., and

uID _tk＝Pt[k]。

Data and corresponding particle weight [W based on these kinds _PID] generate with export target information and confirm unit 132 to handling.

As with reference to Fig. 5 explanation, with target information be generated as with corresponding particle (pID=1 is to m) in the weighted sum data of the corresponding data of the respective objects (tID=1 is to n) that comprises.Target information is the data shown in the target information 305 of right-hand member among Fig. 5.

Target information is generated as the information that comprises following information:

(a) customer position information, and

(b) user's confidence information of respective objects (tID=1 is to n).

For example, the customer position information in the target information corresponding with target (tID=1) is through following equation expression:

[equation 2]

Σ_{i = 1}^{m} W_{i} \cdot N (m_{i 1}, σ_{i 1})

In formula, W _iExpression particle weight [W _PID].

User's confidence information in the target information corresponding with target (tID=1) is passed through following equation expression:

[equation 3]

Σ_{i = 1}^{m} W_{i} \cdot {uID}_{i 11}

Σ_{i = 1}^{m} W_{i} \cdot {uID}_{i 12}

Σ_{i = 1}^{m} W_{i} \cdot u {ID}_{i 1 k}

In formula, W _jExpression particle weight [W _PID].

The synthetic processing unit 131 of sound/image is the target information that a corresponding n target (tID=1 is to n) is calculated these kinds, and the target information of calculating is outputed to the definite unit 132 of processing.

Processing among the step S106 shown in Fig. 7 will be described.

In step S106, it is the probability that the source takes place incident that the synthetic processing unit 131 of sound/image calculates a corresponding n target (tID=1 is to n), and probability is outputed to the definite unit 132 of processing as signal message.

As the preceding text explanation, show that the signal message in incident generation source is the data that show whose speech that is spokesman about sound event, then be to show that the face that comprises in the image is the data of whose face about image event.

The synthetic processing unit 131 of sound/image calculates the probability that respective objects is incident generation source based on the hypothetical target number that the source takes place the incident that is provided with in the corresponding particle.

In other words, be that the probability tables that the source takes place incident is shown P (tID=i) with respective objects (tID=1 is to n), wherein " i " is 1 to n.Yet it is i=1-n.In this case, be that the probability calculation that the source takes place incident is with respective objects:

P (tID=1): be assigned with target numbers/m of tID=1,

P (tID=2): be assigned with target numbers/m of tID=2,

..., and

P (tID=n): the target numbers/m that is assigned with tID=2.

Information that the synthetic processing unit 131 of sound/image will generate through this computing that is respective objects are that the source takes place incident probability outputs to handle as signal message and confirms unit 132.

When the processing among the completing steps S106, the synthetic processing unit 131 of sound/image turns back to step S101, and is converted to the armed state of importing to from the event information of sound event detecting unit 122 or image event detecting unit 112.

Explained that step S101 in the flow process shown in Figure 7 is to S106.Even when the synthetic processing unit 131 of sound/image may not still be carried out the Data Update of the target that comprises in the corresponding particle when sound event detecting unit 122 or image event detecting unit 112 obtain the event information shown in Fig. 3 B in step S121 in step S101.This renewal is to consider that customer location is according to the processing that changes time lapse.

It is that (a1) in the explanation with step S105 is to the identical processing of the update processing of all intended application of all particles that this target update is handled.Hypothesis based on the deviation of customer location was expanded along with past time was carried out this processing.According to positional information, upgrade customer location through using Kalman filter since update processing last time Time And Event in the past.

The example of the update processing under the one dimension positional information situation is described.To be [dt] at first, and calculate the prediction distribution of the customer location of all targets after dt since the time representation in the past of update processing last time.In other words, be described below and upgrade Gaussian distribution N (m as the customer location deviation information _t, σ _t) expectation value (mean value) [m _t] and variance [σ _t].

m _t＝m _t+xc×dt

σ _t ²＝σ _t ²+σ _c ²×dt

m _tIt is prediction expectation value (predicted state)

σ _t ²It is prediction covariance (prediction estimate covariance)

Xc is mobile message (controlling models)

σ _c ²Be noise (process noise).

When do not have the condition that moves the user under, carrying out computing, can xc be arranged to 0 and carry out update processing.Upgrade Gaussian distribution N (m according to this computing as the customer position information that comprises in all targets _t, σ _t).

Only if can obtain all registered users' of incident posterior probability or mark [Pe] from event information, otherwise not upgrade the user's confidence information (uID) that comprises in the target of corresponding particle.

When the processing among the completing steps S121, the synthetic processing unit 131 of sound/image turns back to step S101, and is converted to the armed state of importing to from the event information of sound event detecting unit 122 or image event detecting unit 112.

The processing of being carried out by the synthetic processing unit 131 of sound/image has been described with reference to Fig. 7.Whenever from sound event detecting unit 122 or image event detecting unit 112 incoming event information the time, the synthetic processing unit 131 of sound/image is repeatedly carried out the processing according to the flow process shown in Fig. 7.Through re-treatment, be set to the more particle weight increase of the target of high reliability that has of hypothetical target.Through carrying out sampling processing, keep the bigger particle of weight based on the particle weight.The result has kept to have the data that are similar to from the high reliability of the event information of sound

event detecting unit

122 or 112 inputs of image event detecting unit.At last, information that generation and output reliability are high that is following information are confirmed unit 132 to handling:

(a) conduct shows whether a plurality of users exist respectively and the user is the target information of whose estimated information, and

(b) show that the signal message of source like the speech user takes place incident.

(2) improve the processing example of the estimated performance of ID through getting rid of independence between the target.

Preceding text to [(1) is refreshed the position of finding the user and the processing of identifying user based on event information input through hypothesis] are described the description that corresponds essentially to Japanese patent application 2007-1930, this application be by the applicant identical with the application's applicant submit in first to file.

Above-mentioned processing comprises; Through analyzing information through multiple support channels (being also referred to as mode and model) input; Image information that promptly obtains via video camera particularly and the acoustic information that obtains via microphone, and the identifying user that carries out with confirm that the user is whose is handled, the processing in source or the like takes place in the processing of estimating user position and identified event.

Yet in above-mentioned processing, the target that is set to corresponding particle is updated, and has kept the independence between the target simultaneously.In other words, each target is updated independently of each other, and is irrelevant with the renewal of other target data simultaneously.In such processing, under the situation that does not have the in fact impossible incident of eliminating, carry out and upgrade.

Particularly, in some cases can be based on estimating that different target belongs to same subscriber and carries out target update.Estimating not to be used to get rid of the processing that wherein exists more than an identical own business spare during the processing.

Hereinafter will be described with pinpoint accuracy and analyze the processing example of getting rid of independence between target simultaneously.In other words; When estimating target wherein and target when whom being; Asynchronous positional information and identification information through comprising multiple support channels (mode, model) are unified randomly; And, can improve the uncertain estimated performance of ID through getting rid of the independence probability of happening (joint probability) when allowing a plurality of all users' of target processing ID between the target.

When the processing of position that is used to find the user and identifying user is formulated; Said processing can be performed as and be used to generate target information { position; ID } processing; As explaining, can it be described as system that the probability [P] in the following mathematical formulae (formula 1) is estimated in above-mentioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis].

P=(X _t, θ _t| z _t, X _T-1) ... .. (formula 1)

Wherein P (a|b) representative generates the probability of state a when obtaining input b.The parameter that comprises in the above-mentioned formula is following:

T: the time,

X _t: { x _t ¹, x _t ²... x _tθ ... x _t ⁿ}: n people's target information, wherein x={x _p, x _u}: target information { position, ID },

z _t: { Zp _t, Zu _t}: in the observed value { position, ID } of time t, and

θ _t: state (θ=1 to n), wherein at the observed value z of time t ₁It is the source that generates the target information x θ of target [θ].

In addition, z _t={ zp _t, zu _tBe observed value { position, ID } at time t, and corresponding to the event information of above-mentioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis].

In other words, zp _tBe the customer position information (position) that comprises in the event information, the customer position information of for example representing by the Gaussian distribution shown in (a) of (1) among Fig. 8.

Zu _tBe the user totem information (ID) that comprises in the event information, for example it is corresponding to the user totem information that is represented as to each user's among the user 1 to k shown in (b) of (1) among Fig. 8 confidence value.

By above-mentioned formula 1 that is P=(X _t, θ _t| z _t, X _T-1) probability P of expression, when two inputs that obtained to represent on the right side of above-mentioned formula, that is at the observed value [z of time t _t] (input 1) and at the target information [X of t-1 observing time last time _T-1] when (input 2), the probability of occurrence value of two states having represented to represent in the left side of formula, these two states are the observed value [z at time t _t] be to generate the source (state 1) of target information [x θ] (θ=1 is to n) and generate target information [X at time t _t].

Be used to find user's the position and the processing of identifying user; This processing can be performed as and be used to generate target information { position; ID } processing; As explaining, can be described to the system that the probability [P] in above-mentioned formula (formula 1) is estimated in above-mentioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis].

If through θ new probability formula mentioned above (formula 1) is carried out factorization now, then it can be changed as follows:

P(X _t，θ _t|z _t，X _t-1)＝P(X _t|θ _t，z _t，X _t-1)×P(θ _t|z _t，X _t-1)

Here, the first half formula among the factorization result is represented by (formula 2) and (formula 3) respectively with half formula of back.In other words, with P (X _t| θ _t, z _t, X _T-1) be expressed as (formula 2), and with P (θ _t| z _t, X _T-1) be expressed as (formula 3).Therefore, (formula 1)=(formula 2) * (formula 3).

Above-mentioned formula (formula 3) that is P (θ _t| z _t, X _T-1) have a following input:

Observed value [z at time t _t] (input 1), and

Target information [X at t-1 observing time last time _T-1] (input 2).

When obtaining these inputs, state [θ _t] be observed value [z _t] the generation source be [x θ] (state 1).This formula is the formula that is used to calculate the probability that state mentioned above can take place.

In above-mentioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis], come estimated probability through the processing that utilizes particle filter.Particularly, for example estimate through the processing of using [using the Rao-Blackwellised particle filter].

On the other hand, above-mentioned formula (formula 2) that is P (X _t| θ _t, z _t, X _T-1) have a following input:

Observed value [z at time t _t] (input 1),

Target information [X at t-1 observing time last time _T-1] (input 2), and

Observed value [z _t] the generation source be the probability [θ of [x θ] _t].

When obtaining these inputs, obtain dbjective state [X at time t (state) _t].This formula is the formula that is used to represent the probability that state mentioned above can take place.

In order to estimate by above-mentioned formula (formula 2) that is P (X _t| θ _t, z _t, X _T-1) representative the state probability of happening, with the target information [X that is expressed as the estimated state value _t] be extended to the target information [Xp corresponding with positional information _t] and the target information [Xu corresponding with user totem information _t].

This extension process allows with above-mentioned formula (formula 2) expression as follows:

P (X _t| θ _t, z _t, X _T-1)=P (Xp _t, Xu _t| θ _t, zp _t, zu _t, Xp _T-1, Xu _T-1) wherein:

Zp _t: at the observed value [Z of time t _t] in the target information that comprises, and

Zu _t: at the observed value [Z of time t _t] in the user totem information that comprises.

If the target information [Xp corresponding with positional information _t] and the target information [Xu corresponding with user totem information _t] separate, the product that then can the extends equation of above-mentioned formula 2 be expressed as two formula is following:

P(X _t|θ _t，z _t，X _t-1)

＝P(Xp _t，Xu _t|θ _t，zp _t，zu _t，Xp _t-1，Xu _t-1)

＝P(Xp _t|θ _t，zp _t，Xp _t-1)×P(Xu _t|θ _t，zu _t，Xu _t-1)

Here, the first half formula in the above-mentioned multiplication formula is represented by (formula 4) and (formula 5) respectively with half formula of back.In other words, with P (Xp _t| θ _t, zp _t, Xp _T-1) be expressed as (formula 4), and with P (Xu _t| θ _t, zu _t, Xu _T-1) be expressed as (formula 5).Then, can multiplication formula be expressed as (formula 2)=(formula 4) * (formula 5).

Target information, its through with above-mentioned formula (formula 4) that is P (Xp _t| θ _t, zp _t, Xp _T-1) in the observed value [zp corresponding with the position _t] upgrade, be target information [xp only to be arranged about specific objective (θ) position _tθ].

Here, about with the target information [xp of respective objects θ=1 to the corresponding position of n _tθ]: xp _t ¹, xp _t ²..., xp _t ⁿDifferent, so can be with above-mentioned formula (formula 4) that is P (Xp _t| θ _t, zp _t, Xp _T-1) expand as follows:

P(Xp _t|θ _t，zp _t，Xp _t-1)

＝P(xp _t ¹，xp _t ²，...，xp _t ⁿ|θ _t，zp _t，xp _t-1 ¹，xp _t-1 ²，...，xp _t-1 ⁿ)

＝P(xp _t ¹|xp _t-1 ¹)P(xp _t ²|xp _t-1 ²)，...，P(xp _tθ|zp _t，xp _t-1θ)，...，P(xp _t ⁿ|xp _t-1 ⁿ)

Therefore, can formula (formula 4) be expanded to the multiplication formula of each probable value of corresponding target (θ=1 is to N), thus can be only through utilizing observed value [zp _t] the influence target information [xp relevant that more newly arrive with the position of specific objective (θ) _tθ].

In addition, in the processing of above-mentioned [(1) be used for refreshing through hypothesis the position of finding the user and the processing of identifying user], use Kalman filter to estimate the value corresponding with formula 4 based on event information input.

Yet; In the processing of above-mentioned [(1) be used for refreshing through hypothesis the position of finding the user and the processing of identifying user] based on event information input; Can be performed as the update processing in two stages to the renewal of the customer location that comprises the target data that is set to corresponding particle, that is:

(a1) to the update processing of all intended application of all particles, and

The target that the incident that is selected as is taken place the source hypothetical target is carried out processing (a1) with all targets in other target, that is carries out the update processing to all intended application of all particles.Hypothesis based on the skew of customer location was expanded along with past time was carried out this processing.According to positional information, upgrade customer location through using Kalman filter since update processing last time Time And Event in the past.

In other words, it can be by formula P (xp _t| xp _T-1) representative.

Only, the estimation of using Kalman filter utilize this probability calculation to handle in handling to mobility model (time decay).

In addition, through using the Gaussian distribution N (m that shows customer location that from the event information of sound

event detecting unit

122 or 112 inputs of image event detecting unit, comprises _e, σ _e), carry out to the incident that is provided with for corresponding particle the update processing (a2) that the source hypothetical target is used takes place.

In other words, it can be by formula P (xp _t| zp _t, xp _T-1) representative.

, the estimation of using Kalman filter utilize this probability calculation to handle in handling to mobility model and observation model.

Next step, analyze obtain through above-mentioned 2 formula of expansion with user totem information (ID) corresponding formula (formula 5).This formula is following:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

In this formula (formula 5), target information, it is through the observed value [zu corresponding with user totem information (ID) _t] upgrade, be target information [xu only to be arranged about the user totem information of specific objective (θ) _tθ].

Here, if arrive the corresponding relevant target information [xu of user totem information of n with respective objects θ=1 _tθ]: xu _t ¹, xu _t ²..., xu _t ⁿSeparate, then can be with above-mentioned formula (formula 5) P (Xu _t| θ _t, zu _t, Xu _T-1) expand as follows:

P(Xu _t|θ _t，zu _t，Xu _t-1)

＝P(xu _t ¹，xu _t ²，...，xu _t ⁿ|θ _t，zu _t，xu _t-1 ¹，xu _t-1 ²，...，xu _t-1 ⁿ)

＝P(xu _t ¹|xu _t-1 ¹)P(xu _t ²|xu _t-1 ²)...P(xu _tθ|zu _t，xu _t-1θ)...P(xu _t ⁿ|xu _t-1 ⁿ)

Therefore, can formula (formula 5) be expanded to the multiplication formula of each probable value of corresponding target (θ=1 is to n), thus can be only through utilizing observed value [zu _t] more newly arrive the influence with specific objective (u _t) the relevant target information [xu in position _tθ].

In addition, handle to be performed as follows based on the target update of user totem information, it carries out through the processing of explaining in above-mentioned [(1) be used for import through hypothesis refresh the position of finding the user and the processing of identifying user based on event information].

The target that is set to corresponding particle comprises that respective objects is probable value (mark) Pt [i] (i=1 is to k) of relative users 1 to k, and these probable values (mark) are used as and show that whose user's confidence information (uID) is respective objects be.

The target update that carries out with the mode of the user totem information that comprises in the event information is set up, thus short of observed value, and probable value is just constant.In other words, probability is by being arranged to short of observed value with regard to constant formula P (xu _t| xu _T-1) representative.

Through using the value scope that is provided with in advance is 0 to 1 renewal ratio [β], the renewal of user's confidence information (uID) Pt [i] (i=1 is to k) of the target of carrying out comprising in the corresponding particle.Here; Based on each registered user's among all registered users posterior probability and the user's confidence information (uID) that from the event information of sound

event detecting unit

122 or 112 inputs of image event detecting unit, comprises: Pe [i] (i=1 is to k), come to confirm in advance to upgrade ratio [β].

Carry out user's confidence information (uID) of target through following formula: the renewal of Pt [i] (i=1 is to k):

Pt [i]=(1-β) * Pt [i]+β * Pe [i] wherein i=1 to k and β=0 to 1.Here, upgrading ratio [β] is that scope is 0 to 1 value and is provided with in advance.

This processing can be represented by following probability calculation formula:

P(xu _t|zu _t，xu _t-1)

The target update based on user totem information that above-mentioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis] explain is handled, and is equivalent to carry out the estimation processing of probability P corresponding with user totem information (ID) in the following formula (formula 5) through the above-mentioned formula of expansion (formula 2) acquisition:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

Therefore, the estimation that is equivalent to carry out the probability P of formula (formula 5) is handled.Yet,, handle in user totem information (ID) independence between the target keeping in [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis].

Therefore, in some cases,, still confirm same subscriber identifier (uID: ID) be most probable user identifier, then such confirming upgraded even under the situation of a plurality of different targets.In other words, in some cases, even any target in wherein for example a plurality of target is still upgraded through estimating to handle in such state corresponding to the appearance practically of actual rare state of same subscriber.

In addition, at (the uID: handle under independence assumption ID) of user identifier between the target.Therefore, through the observed value [zu corresponding with user totem information _t] target information upgraded be specific objective (θ) target information [xu only arranged _tθ].Therefore, exist to all targets with observed value [zu _t] upgrade the user totem information (uID: request ID) of all targets.

By this way, in above-mentioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis], keeping between the target in the independence execution analysis to handle.Therefore, carry out to estimate handle and get rid of and in fact not have event.Therefore, fresh target more unnecessarily.In addition, ID possibly take place estimate that efficient and the accuracy handled descend.

Hereinafter overcomes description the embodiments of the invention of above-mentioned drawback.In this embodiment, the independence between the eliminating target is carried out the processing of upgrading a plurality of target datas with when they are relative to each other based on an observed data.The execution of such processing allows when in fact eliminating possibly have event, not upgrade, and realizes effective analysis that accuracy is high thus.

In signal conditioning package according to the embodiment of the invention; The synthetic processing unit 131 of sound/image is carried out and is handled so that upgrade target data based on the user totem information that comprises in the event information, and this target data comprises the user's confidence information that shows that which user is corresponding with the target that provides as incident generation source.In order to carry out such processing, upgrade probability of happening (joint probability) when allowing the target candidate data corresponding with relative users based on the user totem information that comprises in the event information.Then, carry out to handle so that the probability of happening value is calculated the user degree of confidence corresponding with target through application update the time.

Because probability of happening (joint probability) when being directed against all target processing user totem informations (ID) through the independence between the eliminating target is so can improve the estimated performance of ID.Hereinafter will be described can be by the processing of synthetic processing unit 131 execution of sound/image.

(A) estimate to handle the independence of getting rid of between the target from the user.

The synthetic processing unit 131 of sound/image is carried out and is handled, and gets rid of the target information [Xu corresponding with user totem information through using above-mentioned formula (formula 5) from this processing _t] independence.That is the following formula of application:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

Sum up simply once more in order to derive the series of processes of above-mentioned formula (5).As stated, when being the probability (=signal message) when being expressed as P that source (=signal message) takes place incident with respective objects, can being formulated the processing that is used for calculating probability P, and should handling and represent as follows:

P (X _t, θ _t| z _t, X _T-1) ... (formula 1)

In addition, when through θ to formula (formula 1) when carrying out factorization, formula can be changed as follows:

Formula (formula 3) that is P (θ _t| z _t, X _T-1) have a following input:

Observed value [z at time t _t] (input 1), and

Target information [X in observing time last time [t-1] _T-1] (input 2).

Observed value [z at time t _t] (input 1),

Target information [X in observing time last time [t-1] _T-1] (input 2), and

If the target information [Xp corresponding with positional information _t] and the target information [Xu corresponding with user totem information _t] supposition separate, it is following then can (formula 2) mentioned above to be described as multiplication formula:

P(X _t|θ _t，z _t，X _t-1)

＝P(Xp _t，Xu _t|θ _t，zp _t，zu _t，Xp _t-1，Xu _t-1)

＝P(Xp _t|θ _t，zp _t，Xp _t-1)×P(Xu _t|θ _t，zu _t，Xu _t-1)

By this way, analyze through the above-mentioned formula of expansion 2 that obtain with user totem information (ID) corresponding formula (formula 5).This formula is following:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

This formula (formula 5) P (Xu _t| θ _t, zu _t, Xu _T-1) can expand as follows:

P(Xu _t|θ _t，zu _t，Xu _t-1)

Here,, target update do not have the hypothesis target information [Xu corresponding in handling with user totem information _t] target between independence.In other words, this is handled and considers probability of happening (joint probability) simultaneously, and this probability is the probability that any a plurality of incident can take place.Bayesian theory is used for this processing.According to bayesian theory, when probability P (x) and the incident x that (prior probability) can take place as definition incident x will the probability P (x|z) of (posterior probability) take place after the generation of incident z, be formulated following formula:

P(x|z)＝(P(z|x)P(x))/P(z)

Bayesian theory that is (P (z|x) P (x))/P (z) are used for expansion above-mentioned formula (formula 5) P (Xu corresponding with above-mentioned user totem information (ID) _t| θ _t, zu _t, Xu _T-1).

Spreading result is following:

P (Xu _t| θ _t, zu _t, Xu _T-1)=P (θ _t, zu _t, Xu _T-1| Xu _t) P (Xu _t)/P (θ _t, zu _t, Xu _T-1) ... (formula 6)

In above-mentioned formula (formula 6), meaning of parameters is following:

θ _t: wherein at the observed value z of time t ₁Be the state (θ=1 is to n) in source that generates the target information x θ of target [θ], and

Zu _t: at the observed value [z of time t _t] in the user totem information that comprises at time t.

As these parameters θ _tAnd zu _tOnly depend on the target information [Xu at time t corresponding with user totem information _t] (but do not depend on target information [Xu _T-1]) time, above-mentioned formula (formula 6) can further expand as follows:

P(Xu _t|θ _t，zu _t，Xu _t-1)＝P(θ _t，zu _t，Xu _t-1|Xu _t)P(Xu _t)/P(θ _t，zu _t，Xu _t-1)

=P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) P (Xu _t)/P (θ _t, zu _t) P (Xu _T-1) ... (formula 7)

Above-mentioned formula (formula 7) is carried out the ID estimation or ID is handled through calculating.In addition, obtaining to be used for the user's degree of confidence (uID) of a target i or the probability of xu (ID) if desired, is that the probability of the user identifier (ID) in the probability of happening (joint probability) simultaneously obtains it through the marginalisation target then.For example, use following formula to calculate it:

P(xu ⁱ)＝∑ _Xu＝xuiP(Xu)

The particular procedure example that uses such formula will be described in back literary composition.

Hereinafter, as the example of the processing of it being used above-mentioned formula (formula 7), following example will be described:

(a) wherein keep the analyzing and processing example of independence between the target;

(b) wherein get rid of the analyzing and processing example according to the embodiment of the invention of independence between the target; And

(c) wherein get rid of the analyzing and processing example that there is non-registered users in the consideration according to the embodiment of the invention of independence between the target.

To describe these now and handle example.Here, processing example (a) will be described so that do comparison with processing example (b) according to the embodiment of the invention.

(a) wherein keep the analyzing and processing example of independence between the target

The analyzing and processing example that wherein keeps independence between the target at first will be described.As stated, bayesian theory is used for expanding and user totem information (ID) corresponding formula (formula 5):

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

Therefore, obtain following formula (formula 7):

P(Xu _t|θ _t，zu _t，Xu _t-1)

＝P(θ _t，zu _t，Xu _t-1|Xu _t)P(Xu _t)/P(θ _t，zu _t，Xu _t-1)

Here, suppose P (Xu in the formula (7) _t), P (θ _t, zu _t) and P (Xu _T-1) be even prior probability, can formula (formula 5) and (formula 7) be represented as follows then:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

～P (θ _t, zu _t| Xu _t) * P (Xu _T-1| Xu _t) ... (formula 8) * (formula 9), wherein "～" refer to " with ... proportional ".

Therefore, can formula (formula 5) and (formula 7) be expressed as following formula (formula 10):

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

=R * P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) ... (formula 10)

Wherein R represents the normalization item.

Therefore, formula 10 and formula 5 are represented as follows:

Formula 10=R * (formula 8) * (formula 9); And

Formula 5=R * (formula 8) * (formula 9).

Here, with formula (8) expression as follows:

(θ _t, zu _t| Xu _t) ... (formula 8)

When obtaining the target information [Xu corresponding with user totem information at time t _t] time, formula (8) is about following probability: with respect to the user totem information that comprises in the target information, and observed value [zu _t] be observed information from specific objective (θ).With such definition of probability is [the prior probability P] of observed value.

In addition, as follows with formula (formula 9) expression:

P (Xu _T-1| Xu _t) ... (formula 9)

When obtaining the target information [Xu corresponding with user totem information at time t _t] time, formula (9) is to obtain the target information [Xu corresponding with user totem information in observing time last time [t-1] _T-1] probability.With such definition of probability is [state transition probability P].

In other words, obtain following equation:

(equation 5)=R * ([prior probability P]) * ([state transition probability P])

For example, as the target information [Xu of the computing formula (formula 8) of [the prior probability P] that will be used for observed value _t] individually be defined as target information [xu _t ¹, xu _t ²..., xu _tθ ..., xu _t ⁿ] time, can formula (formula 8) be represented as follows:

P(θ _t，zu _t|Xu _t)＝P(θ _t，zu _t|xu _t ¹，xu _t ²，...，xu _tθ，...，xu _t ⁿ)

In above-mentioned formula, the prior probability P of observed value is at xu _tθ=zu _tSituation under be arranged to P=A or be arranged to P=B in other cases.

In addition, probability A and probability B are arranged to A＞B.

Figure 11 illustrate when target numbers be the processing example that two (n=2) (Target id (tID=0 to 1)) and registered user's number are used to calculate prior probability P when being three (k=3) (ID (uID=0 to 2)).

For example, be positioned at the clauses and subclauses 501 that is the P (θ of the similar centre of Figure 11 _t, zu _t| xu _t ⁰, xu _t ¹)=P (0,2|2,1) shows with lower probability:

If xu _t ⁰=2: Target id (tID)=0 is corresponding to ID (uID=2), and xu _t ¹=1: Target id (tID)=1 is corresponding to ID (uID=1), then

According to θ _t=0, zu _t=2: Target id=0 obtains the observed information zu of ID=2 _t

In this case, parameter is represented xu _tθ=xu _t ⁰=2 and zu _t=2, realize xu then _tθ=zu _t

Therefore, prior probability P is represented as follows:

P(θ _t，zu _t|xu _t ⁰，xu _t ¹)＝P(0，2|2，1)＝A

In addition, clauses and subclauses 502 that is P (θ _t, zu _t| xu _t ⁰, xu _t ¹)=P (1,0|0,2) represents with lower probability:

If xu _t ⁰=0: Target id (tID)=0 is corresponding to ID (uID=0), and xu _t ¹=2: Target id (tID)=1 is corresponding to ID (uID=2), then according to θ _t=1, zu _t=0: Target id=1 obtains the observed information zu of ID=0 _t

In this case, parameter is represented xu _tθ=xu _t ¹=2 and zu _t=0, xu then _tθ=zu _tBe unrealized.

Therefore, prior probability P is represented as follows:

P(θ _t，zu _t|xu _t ⁰，xu _t ¹)＝P(1，0|0，2)＝B

In addition, state transition probability P is represented by following formula (formula 9):

P (Xu _T-1| Xu _t) ... (formula 9)

When in all users, not changing user identifier (ID), P is arranged to P=C with state transition probability.In other cases, it is arranged to P=D.

Here, probability C and probability D are arranged to C＞D.

State transition probability under such setting

It is the example of two (n=2 (0 to 1)) and registered user's number computing mode transition probability when being three (k=3 (0 to 2)) that Figure 12 illustrates when target numbers.

Clauses and subclauses 511 shown in Figure 12 that is P (x _T-1 ⁰, xu _T-1 ¹| xu _t ⁰, xu _t ¹)=P (0,1|0,1) illustrates with lower probability:

If xu _t ⁰=0: Target id (tID)=0 at time t corresponding to ID (uID=0), and xu _t ¹=0: Target id (tID)=1 at time t corresponding to ID (uID=1), then xu _T-1 ⁰=0: Target id (tID)=0 becomes ID (uID=0) at time t-1, and xu _T-1 ¹=1: Target id (tID)=1 becomes ID (uID=1) at time t-1.

In this case, with respect to all targets, between the user identifier (ID) of time t and user identifier (ID), do not change at time t-1.Therefore, state transition probability P becomes P=C.

In addition, clauses and subclauses shown in Figure 12 512 that is P (x _T-1 ⁰, xu _T-1 ¹| xu _t ⁰, xu _t ¹)=P (0,1|2,2) illustrates with lower probability:

If xu _t ⁰=0: Target id (tID)=0 at time t corresponding to ID (uID=2), and xu _t ¹=1: Target id (tID)=1 at time t corresponding to ID (uID=0), then xu _T-1 ⁰=0: Target id (tID)=0 becomes ID (uID=0) at time t-1, and xu _T-1 ¹=1: Target id (tID)=1 becomes ID (uID=1) at time t-1.

In these clauses and subclauses 512, state transition probability is not between the user identifier (ID) of time t and the user identifier (ID) at time t-1, not have the state transition probability that changes with respect to all targets.It causes the user identifier with respect at least one target to change.Therefore, state transition probability is arranged to P=D.

In Figure 13, as follows with formula (formula 10) expression:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

＝R×P(θ _t，zu _t|Xu _t)P(Xu _t-1|Xu _t)

... (formula 10)=(R * (formula 8) * (formula 9))

In this formula, probable value, that is be used for user's degree of confidence of the ID (0 to 2) of respective objects ID (2,1,0), it is configured to evenly (Figure 13 (a)) for obtaining as the initial value before the observed value of event information.

Then, probability is provided with as follows:

With the corresponding probability A=0.8 and the B=0.2 of prior probability P of above-mentioned formula (formula 8) representative, and

Corresponding probability C=1.0 and the D=0.0 of prior probability P with above-mentioned formula (formula 9) representative.

In other words, probability is provided with as follows:

[the prior probability P] of above-mentioned formula (formula 8) representative represented as follows:

In this formula, the prior probability P of observed value is arranged to xu _tθ=zu _tIn this case, prior probability P is arranged to P=A=0.8.In other cases, prior probability P is arranged to P=B=0.2.

In addition, probability is provided with as follows:

[the state transition probability P] of above-mentioned formula (formula 9) representative represented as follows:

P(Xu _t-1|Xu _t)

In this formula, when with respect to all targets when not changing between the user identifier (ID) of time t and the user identifier (ID) at time t-1, P is arranged to P=C=1.0 with state transition probability.Contrast, in other cases, P is arranged to P=D=0.0 with state transition probability.

Under above-mentioned probability setting, observe serial observed information successively, " θ=0, zu=0 ", and " θ=1, zu=1 " twice observing time.

Figure 13 illustrates the probability of the ID (0 to 2) that is used for Target id (2,1,0) or the variation example of user's degree of confidence (uID).

With respect to the data corresponding, be while probability of happening (joint probability) with probability calculation with all IDs that are used for all Target ids (2,1,0) (0 to 2).

In addition, " θ=0, zu=0 " shows that observed information [zu] is corresponding to the user identifier (UID=0) from target (θ=0).

" θ=1, zu=1 " shows that observed information [zu] is corresponding to the user identifier (UID=1) from target (θ=1).

Shown in the row of (a) original state as shown in Figure 13, with the candidate of the corresponding ID (uID=0 to 2) of three Target ids (tID=0,1,2) be tID0,1 arrives (2,2,2) with 2=(0,0,0).27 different candidate datas are arranged.

Calculate probability of happening (joint probability) simultaneously to each candidate data in these 27 different candidate datas, as the user degree of confidence corresponding with all IDs that are used for all Target ids (2,1,0) (0 to 2).

In the starting stage, probability of happening is arranged to evenly in the time of with 27 different candidate datas.27 candidate datas are arranged altogether, thereby the probability P of a candidate data is arranged to P=1.0/27=0.037037.

In Figure 13, (b) representative is when the variation of user's degree of confidence that observed information [θ=0, zu=0] is calculated as while probability of happening (joint probability) when observing (with the degree of confidence of corresponding all IDs (0 to 2) of all Target ids (2,1,0)).

Observed information [θ=0, zu=0] is wherein from the observed information of Target id=0 observed information corresponding to ID=0.

Based on observed information, among 27 candidates, increase the probability P (probability of happening (joint probability) simultaneously) that wherein ID=0 is set to the candidate data of Target id=0, reduce the probability P of other candidate data simultaneously.

Carry out probability calculation according to following formula.

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

＝R×P(θ _t，zu _t|Xu _t)P(Xu _t-1|Xu _t)

... ((formula 10)=(formula 8) * (formula 9))

In this formula, carry out probability calculation based on following the setting: the prior probability P for above-mentioned formula (formula 8) representative is probability A=0.8, B=0.2; And the prior probability P that represents for above-mentioned formula (formula 8) is probability C=1.0, D=0.0.

Shown in Figure 13 (b), this calculates and obtains with lower probability:

Be set to the candidate of tID=0, probability P=0.074074 for ID=0; And

For other candidate, probability P=0.018519.

In addition in Figure 13, (c) representative is when the variation of user's degree of confidence that observed information [θ=0, zu=0] is calculated as while probability of happening (joint probability) when observing (with the degree of confidence of corresponding all IDs (0 to 2) of all Target ids (2,1,0)).

Observed information [θ=1, zu=1] is wherein from the observed information of Target id=1 observed information corresponding to ID=1.

Based on observed information, among 27 candidates, increase the probability P (probability of happening (joint probability) simultaneously) that wherein ID=1 is set to the candidate data of Target id=1, reduce the probability P of other candidate data simultaneously.

Shown in Figure 13 (c), can the result be categorized into three different probability values (probability of happening (joint probability) simultaneously).Thereby the candidate that probability is the highest satisfies ID=0 to be set to the condition that tID=0 and ID=1 is set to tID=1 and to obtain probability P=0.148148.ID=0 is set to tID=0 or ID=1 is set to the arbitrary condition acquisition probability P=0.037-37 among the tID=1 thereby the probability second high candidate satisfies.ID=0 is set to tID=0 and ID=1 is not set to tID=1 acquisition probability P=0.009259 thereby the candidate that probability is minimum satisfies.

Figure 14 illustrates the marginalisation result who obtains through the processing shown in Figure 13.

(a) of Figure 14 corresponds respectively to (a) to (c) of Figure 13 to (c).In other words, they are corresponding to the result (b) who obtains through more newly arriving in order from original state (Figure 14 (a)) based on two observed informations and (c).Data shown in Figure 14 comprise according to the result shown in Figure 13 calculate with lower probability:

Wherein tID=0 is corresponding to the probability P of uID=0;

Wherein tID=0 is corresponding to the probability P of uID=1;

Wherein tID=2 is corresponding to the probability P of uID=1; And

Wherein tID=2 is corresponding to the probability P of uID=3.

Arithmetic addition (being marginalisation) through from the probable value of the corresponding data of 27 different pieces of informations obtains the probability shown in Figure 14.For example, can following formula be applied to this calculating.

P(xu ⁱ)＝∑ _Xu＝xuiP(Xu)

Shown in Figure 14 (a), in original state, following probability P is evenly and is arranged to P=0.333333:

Wherein tID=0 is corresponding to the probability P of uID=0;

Wherein tID=0 is corresponding to the probability P of uID=1;

Wherein tID=2 is corresponding to the probability P of uID=1; And

Wherein tID=2 is corresponding to the probability P of uID=3.

The graph data of probability is represented in the bottom of (a) among Figure 14.

Figure 14 (b) representative is worked as the renewal result of observed information [θ=0, zu=0] when observing.In other words, data represented from " wherein tID=0 is corresponding to the probability P of uID=0 " to " wherein tID=2 is corresponding to the probability P of uID=3 ".

In this case, only the value of " wherein tID=0 is corresponding to the probability P of uID=0 " is set to height.The influence of this set reduces following two kinds of probability:

Wherein tID=0 is corresponding to the probability P of uID=1; And

Wherein tID=0 is corresponding to the probability P of uID=2.

Contrast, do not influence the probability of other target tID=1 and tID=2 fully.That is, the setting of change with lower probability be not set fully from original state.

Wherein tID=1 is corresponding to the probability P of uID=0;

Wherein tID=1 is corresponding to the probability P of uID=1;

Wherein tID=1 is corresponding to the probability P of uID=2;

Wherein tID=2 is corresponding to the probability P of uID=0;

Wherein tID=2 is corresponding to the probability P of uID=1; And

Wherein tID=2 is corresponding to the probability P of uID=2.

This setting is owing to the analyzing and processing that wherein keeps independence between the target.

Figure 14 (c) representative is worked as the renewal result of observed information [θ=1, zu=1] when observing.In other words, data represented from " wherein tID=0 is corresponding to the probability P of uID=0 " to " wherein tID=2 is corresponding to the probability P of uID=3 ".

In this case, only the value of " wherein tID=1 is corresponding to the probability P of uID=1 " is set to height.The influence of this renewal reduces following two kinds of probability:

Wherein tID=1 is corresponding to the probability P of uID=0; And

Wherein tID=1 is corresponding to the probability P of uID=2.

The probability of other target: tID=0 and tID=2 is unaffected fully, and does not change from the probability shown in the figure (b).This comes from the analyzing and processing that keeps independence between the target.

Come relatively to handle through further obtaining observed information, carry out the classification of target then according to the aforementioned weight of target, thereby allow to keep the high candidate of probability.Yet this processing is that poor efficiency is handled owing to keeping the independence between the target.

(b) wherein get rid of the analyzing and processing example according to the embodiment of the invention of independence between the target

The analyzing and processing example according to the embodiment of the invention of wherein getting rid of independence between the target then will be described.

Hereinafter with in the example of describing, be that user totem information is not distributed under the constraint of different target and handled at same subscriber identifier (ID).

Probability of happening (joint probability) when the synthetic processing unit 131 of sound/image upgrades the candidate data of setting up correspondence between target and each user based on user totem information.Here, user totem information is the observed value that comprises in the temporal information.Then, the probability of happening value is used for carrying out the calculating of the user degree of confidence corresponding with target renewal the time.

As from tangible Figure 13 and Figure 14; It has been described to keep the processing of independence between the target; Depend on marginalisation result shown in figure 14; Possibly not be excluded with respect to the independence between the target of user identifier (ID), have the processing of while probability of happening (joint probability) to be performed even use.

In other words, although for example as the result among Figure 14 (b), caused corresponding to Target id: the user of tID=0 is the high probability of " user 0 ", and not to Target id: tID=1,2 reflects any processing of such result.This is because this processing has kept the independence between the target.

Based on definite Target id: tID=0 is high probability of the user 0, can estimating target ID:tID=1,2 not be user 0.Therefore, this estimation can be used to upgrade user's degree of confidence of each target, handles effectively allowing.

Now hereinafter uses description to getting rid of between the target processing example of effective analysis of pin-point accuracy in the independence.

Use bayesian theory to develop the aforementioned formula corresponding (formula 5) with user totem information (ID):

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

To obtain following formula (formula 7):

P(Xu _t|θ _t，zu _t，Xu _t-1)

＝P(θ _t，zu _t，Xu _t-1|Xu _t)P(Xu _t)/P(θ _t，zu _t，Xu _t-1)

If suppose only P (θ _t, zu _t) be evenly in formula (formula 7), then can formula (formula 5) be represented as follows:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

～P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) P (Xu _t)/P (Xu _T-1) wherein "～" refer to " with ... proportional ".

Therefore, can formula (formula 5) and formula (formula 7) be expressed as following formula (formula 11):

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

=R * P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) P (Xu _t)/P (Xu _T-1) ... (formula 11) wherein R represented the normalization item.

In addition, in formula (formula 11), use prior probability P (Xu _t) and P (Xu _T-1) represent that " same subscriber identifier (ID) is not distributed to a plurality of targets " this restriction is following: restriction 1: if at P (Xu)=P (xu ¹, xu ²..., xu ⁿ) at least one xu that overlaps with another xu (user identifier (ID)) is arranged, then probability is arranged to P (Xu _t)=P (Xu _t-1)=NG (P=0.0) and P (Xu _t)=P (Xu _t-1)=NG (P=0.0), and be P (Xu in other cases _t)=P (Xu _t-1)=OK (0.0＜P≤1.0).Carry out the setting of the probability of these kinds.

Figure 15 illustrate when target numbers be n=3 (0 to 2) and registered user's number when being k=3 (0 to 2) original state according to above-mentioned restriction example is set.

This original state is corresponding to the original state of (a) among Figure 13.In other words, its representative probability of happening (joint probability) with respect to the time with the corresponding data of all IDs that are used for all Target ids (2,1,0) (0 to 2).

In the example shown in Figure 15, if at P (Xu)=P (xu ¹, xu ²..., xu ⁿ) at least one xu that overlaps with another xu (user identifier (ID)) is arranged, then simultaneously probability of happening is arranged to P=0 (NG).For any candidate except candidate, the probable value (0.0＜P≤1.0) greater than zero is provided with P=OK with P=0 (NG)) as while probability of happening: P.

The initial setting up of probability of happening (joint probability) when by this way, the synthetic processing unit 131 of sound/image allows the target candidate data corresponding with relative users.Here, such initial setting up is based on a plurality of targets, not distributing this restriction of same subscriber identifier (ID).

The probable value P (Xu) of probability of happening is P (Xu)=0.0 when wherein same subscriber identifier (ID) being arranged to the candidate data of different target, and the probable value of other target data is P (Xu)=0.0＜P≤1.0.

Figure 16 and Figure 17 illustrate the analyzing and processing example according to the embodiment of the invention.The independence between the target has been got rid of in the processing of this embodiment, and has utilized " same subscriber identifier (ID) is not distributed to a plurality of targets " this restriction.Here, these figure are corresponding to illustrating Figure 13 and the Figure 14 that keeps the aforementioned processing example of independence between the target.

Here, the example process shown in Figure 16 and Figure 17 is to have got rid of the processing of independence between the target.This processing and utilizing the formula (formula 11) that generates based on the aforementioned formula corresponding (formula 5) with user totem information (ID).With formula 11 expressions as follows:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

=R * P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) P (Xu _t)/P (Xu _T-1) ... (formula 11)

Use above-mentioned formula and be that user totem information is not distributed under this restriction of different target to carry out and handled at same subscriber identifier (ID).

That is, in above-mentioned formula (formula 11), if at P (Xu)=P (xu ¹, xu ²..., xu ⁿ) at least one xu that overlaps with another xu (user identifier (ID)) is arranged, then probability is arranged to P (Xu _t)=P (Xu _t-1)=NG (P=0.0).In other cases, probability is arranged to P (Xu _t)=P (Xu _t-1)=OK (0.0＜P≤1.0).

Therefore, carry out the processing of the probability that utilizes these kinds.

Aforementioned formula (formula 11) is different from and illustrates used formula (formula 10) among Figure 13 of keeping the example process of independence between the target and Figure 14.With formula (formula 10) expression as follows:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

=R * P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) ... (formula 10)

Can formula (formula 11) be represented as follows:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

R * (formula 8) * (formula 9) * (P (Xu _t)/P (Xu _T-1))

Figure 16 and the example process shown in Figure 17 have with before with reference to the condition setting identical condition setting of Figure 13 with the example process of Figure 14 description, and difference is if at P (Xu)=P (xu ¹, xu ²..., xu ⁿ) in have at least one xu that overlaps with another xu (user identifier (ID)) that P=0 (NG) then is set.

In other words, [the prior probability P] that is the P (θ that represent at above-mentioned formula (formula 8) _t, zu _t| Xu _t)=P (θ _t, zu _t| xu _t ¹, xu _t ²..., xu _tθ ..., xu _t ⁿ) in, probability is provided with as follows:

When the prior probability P of observed value is xu _tθ=zu _tThe time prior probability: P=A=0.8, and

Prior probability: P=B=0.2 in other cases.

In addition, [the state transition probability P] that is the P (Xu that represent at above-mentioned formula (formula 8) _T-1| Xu _t) in, probability is provided with as follows:

When time t and time t-1 do not have the change of user identifier (ID) with respect to all targets, state transition probability P=C=1.0, and

State transition probability P=D=0.0 in other cases.

Figure 16 and Figure 17 illustrate when under the above-mentioned probability setting when observing serial observed information " θ=0; zu=0 " and " θ=1, zu=1 " successively twice observing time, be used for Target id (2; The probable value of ID 1,0) (0 to 2) is the variation example of user's degree of confidence (uID).

With respect to the data corresponding, be while probability of happening (joint probability) with user's confidence calculations with all IDs that are used for all Target ids (2,1,0) (0 to 2).

As stated, " θ=0, zu=0 " shows the observed information [zu] corresponding with user identifier (UID=0) from target (θ=0).

" θ=1, zu=1 " shows the observed information [zu] corresponding with user identifier (UID=1) from target (θ=1).

Shown in the row of (a) original state as shown in Figure 16, with the candidate of the corresponding ID (uID=0 to 2) of three Target ids (tID=0,1,2) be tID0,1 arrives (2,2,2) with 2=(0,0,0).

27 different candidate datas are arranged.

Probability (user's degree of confidence) is different from the probability of (a) original state among Figure 13 as stated.That is, if an xu who overlaps with another xu (user identifier (ID)) is arranged, then probability is arranged to P=0.In the example shown in the figure, for example probable value P=0.166667 is provided with to other candidate.

In Figure 16, (b) representative is when to observed information [θ=0, zu=0] when observing, the variation of user's degree of confidence of calculating as while probability of happening (joint probability) (with the degree of confidence of corresponding all IDs (0 to 2) of all Target ids (2,1,0)).

Based on observed information; Among 27 candidates except other candidate that P=0 (NG) is set in original state; Increase the probability P (probability of happening (joint probability) simultaneously) that wherein ID=0 is provided with the candidate data of giving tID=0, reduce the probability P of other candidate data simultaneously.

In original state, be provided with among the candidate of probability P=0.166667, improve probability P and its that ID=0 is provided with the candidate who gives tID=0 and be set to P=0.333333, the probability with any other candidate is lowered into P=0.0083333 simultaneously.

In addition in Figure 16, (c) representative is when to observed information [θ=1, zu=1] when observing, the variation of user's degree of confidence of calculating as while probability of happening (joint probability) (with the degree of confidence of corresponding all IDs (0 to 2) of all Target ids (2,1,0)).

Based on observed information; Among 27 candidates except other candidate that P=0 (NG) is set in original state; Increase the probability P (probability of happening (joint probability) simultaneously) that wherein ID=1 is provided with the candidate data of giving Target id=1, reduce the probability P of other candidate data simultaneously.

Shown in Figure 16 (c), can the result be categorized into four different probability values.

The candidate that probability is the highest is not arranged to P=0 (NG) in original state, be tID=1 for tID=0 for ID=1 but be arranged to for ID=0.Probability of happening is P=0.592593 in the time of these candidates.

The probability second high candidate is not arranged to P=0 (NG), and satisfies ID=0 setting and arbitrary condition of giving among the tID=1 is set for tID=0 or ID=1.Probability of happening is P=0.148148 in the time of these candidates.

Probability the 3rd high candidate is not arranged to P=0 (NG), and ID=0 is not provided with and gives tID=0, and ID=1 does not have setting to give tID=1.Probability of happening is P=0.037037 in the time of these candidates.

The candidate that probability is minimum is arranged to P=0 (NG) in original state.Probability of happening is P=0.0 in the time of these candidates.

Figure 17 illustrates the marginalisation result who obtains through handling shown in Figure 16.

(a) of Figure 17 corresponds respectively to (a) to (c) of Figure 16 to (c).In other words, they corresponding to based on two kinds of observed informations through upgrades the result (b) and (c) of acquisition in succession from original state (Figure 17 (a)).Data shown in Figure 17 comprise according to the result shown in Figure 16 calculate with lower probability:

Wherein tID=0 is corresponding to the probability P of uID=0;

Wherein tID=0 is corresponding to the probability P of uID=1;

Wherein tID=2 is corresponding to the probability P of uID=1; And

Wherein tID=2 is corresponding to the probability P of uID=3.

Arithmetic addition (being marginalisation) through from the probable value of the corresponding data of 27 different pieces of informations enumerating among Figure 16 obtains the probability shown in Figure 17.For example, can following formula be applied to this calculating.

P(xu ⁱ)＝∑ _Xu＝xuiP(Xu)

Shown in Figure 17 (a), in original state, following probability P is evenly and is arranged to P=0.333333:

Wherein tID=0 is corresponding to the probability P of uID=0;

Wherein tID=0 is corresponding to the probability P of uID=1;

Wherein tID=2 is corresponding to the probability P of uID=1; And

Wherein tID=2 is corresponding to the probability P of uID=3.

The graph data of probability is represented in the bottom of (a) among Figure 17.

The result who in original state, obtains is similar to the example process previous result who describes in Figure 14 (a) with reference to the independence that keeps respective objects.

Figure 17 (b) representative is worked as the renewal result of observed information [θ=0, zu=0] when observing.In other words, data represented from " wherein tID=0 is corresponding to the probability P of uID=0 " to " wherein tID=2 is corresponding to the probability P of uID=3 ".

In this case, only the value of " wherein tID=0 is corresponding to the probability P of uID=0 " is set to height.The influence of this set has reduced following two kinds of probability:

Wherein tID=0 is corresponding to the probability P of uID=1; And

Wherein tID=0 is corresponding to the probability P of uID=2.

In addition, in this processing example, with respect to tID=1, the probability of uID=0 reduces;

The probability P of uID=1 increases; And

The probability P of uID=2 increases.

With respect to tID=2,

The probability P of uID=0 reduces;

The probability P of uID=1 increases; And

The probability P of uID=2 increases.

Therefore, the probability (user's degree of confidence) of the target (tID=1,2) different with the target (tID=0) of having been obtained observed information " θ=0, zu=0 " by hypothesis changes.

This fact cause with Figure 14 (b) in the fact of observing different.That is, in Figure 14 (b), upgrading letting tID=1 with the data probability that changes tID=0,2 data probability is as before.Contrast in Figure 17 (b), is upgraded tID=0, all data of 1,2.

The processing of having described above with reference to Figure 13 and Figure 14 is the processing example that keeps the independence of respective objects.Contrast, the processing shown in Figure 16 and Figure 17 are the processing examples of getting rid of the independence of respective objects.In other words, an observed data not only influences a target corresponding data but also influences another target corresponding data.

In the processing example of Figure 16 and Figure 17, in formula (formula 11):

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

=R * P (θ _t, zu _t| Xu _t) P (Xu _T-1| Xu _t) P (Xu _t)/P (Xu _T-1) ... (formula 11), if having at P (Xu)=P (xu ¹, xu ²..., xu ⁿ) in this restriction 1 of overlapping with another xu of at least one xu (user identifier (ID)), then probability is arranged to P (Xu _t)=P (Xu _t-1)=NG (P=0.0), and in other cases probability is arranged to P (Xu _t)=P (Xu _t-1)=OK (0.0＜P≤1.0).

As result, shown in Figure 17 (b), probability (user's degree of confidence) and the probability (user's degree of confidence) of any other target (tID=2,3) of being supposed to obtain the target (tID=0) of observed information " θ=0, zu=0 " are changed.Therefore, can effective renewal the in pinpoint accuracy ground represent the probability (user degree of confidence) of each target corresponding to which user.

Figure 17 (c) representative is worked as the renewal result of observed information [θ=1, zu=1] when observing.In other words, data represented from " wherein tID=0 is corresponding to the probability P of uID=0 " to " wherein tID=2 is corresponding to the probability P of uID=3 ".

In this case, upgrade, thereby the value of " wherein tID=1 is corresponding to the probability P of uID=1 " is set to height.The influence of upgrading reduce following two kinds of probability:

Wherein tID=1 is corresponding to the probability P of uID=0; And

Wherein tID=1 is corresponding to the probability P of uID=2.

In addition, in this processing example, with respect to tID=0, the probability of uID=0 increases;

The probability P of uID=1 reduces; And

The probability P of uID=2 increases.

With respect to tID=2,

The probability P of uID=0 increases;

The probability P of uID=1 reduces; And

The probability P of uID=2 increases.

Therefore, the probability (user's degree of confidence) of the target (tID=0,2) different with the target (tID=1) of having been obtained observed information " θ=1, zu=1 " by hypothesis changes.

In the processing example of having described to Figure 17 with reference to Figure 15, use and all target datas are carried out update processing: if at P (Xu)=P (xu such as the restriction as limit 1 ¹, xu ²..., xu ⁿ) at least one xu that overlaps with another xu (user identifier (ID)) is arranged, then probability is arranged to P (Xu _t)=P (Xu _t-1)=NG (P=0.0), and be arranged to P (Xu in other cases _t)=P (Xu _t-1)=OK (0.0＜P≤1.0).In this embodiment, use such restriction that all target datas are carried out update processing.Yet, the invention is not restricted to such restriction.Instead, can Treatment Design is following.

At P (Xu)=P (xu ¹, xu ²..., xu ⁿ) in, from the target data deletion state that overlaps with another xu of at least one xu (user identifier (ID)) wherein.Then, only all the other target datas are carried out this process.Such processing allows state number with [Xu] from k _nBe reduced to _nP _kTherefore, become and to improve treatment effeciency.

Will be with reference to the example of Figure 18 data of description deletion processing.For example, the different candidate tID with 27 of the corresponding ID (uID=0 to 2) of three Target ids (tID=0,1,2) 0,1 that representes in the left side of Figure 18,2=(0,0,0) are arranged to (2,2,2).Data [P (Xu)=P (xu these 27 candidates ¹, xu ², xu ³)] in, remove the state that has at least one xu that overlaps with another xu (user identifier (ID)) from target data, obtain six data 0 to 5 of representing on the right side of Figure 18 thus.

Instead; The synthetic processing unit 131 of sound/image can be designed to carry out following processing; In this is handled; Delete the candidate data that same subscriber identifier (ID) is set to different target as stated, let remainder data as before, and only provide the residue candidate data as upgating object based on event information.

Even only handle to carrying out as these six data of upgating object, still can obtain with reference to the result that come to the same thing of Figure 16 with Figure 17 description.

(c) wherein get rid of the analyzing and processing example of in the analyzing and processing example, considering to exist non-registered users according to the embodiment of the invention of independence between the target

Then will in above-mentioned the analyzing and processing example according to the embodiment of the invention of independence between the target [(b) wherein get rid of], consider to exist the situation of non-registered users to get off to describe to handle example.

In above-mentioned the analyzing and processing example according to the embodiment of the invention of independence between the target [(b) wherein get rid of]; When registered user's number is " k " and when their relative users identifier (uID) is arranged to uID=1 to k, carry out processing with respect to each registered user among k the registered user 1 to k.

Yet, in actual treatment, possibly obtain sometimes except the image of registered user's image and the nonregistered user the sound and sound as observed information.The number of nonregistered user can be one or two or more.In other words, different with the registered user, can not specify the number of nonregistered user in advance.

In addition, generally speaking, marking equipment (such as face's marking equipment and speaker identification equipment) possibly not distinguished different non-registered persons.In this case, possibly not analyze user identifier.In other words, marking equipment is only exported identical observed value " ID=the unknown ".

In this case, the restriction 1 of definition (promptly limits 1: at P (Xu)=P (xu in aforementioned the analyzing and processing example according to the embodiment of the invention of independence between the target [(b) wherein get rid of] ¹, xu ²..., xu ⁿ) in, if at least one xu that overlaps with another xu (ID) is arranged, then probability is arranged to P (Xu _t)=P (Xu _t-1)=NG (0.0), and be P (Xu in other cases _t)=P (Xu _t-1)=OK (0.0＜P≤1.0)) in, the direct application of restriction 1 causes unwanted results.

In other words, will generate a plurality of non-registered users.If non-registered users is regarded as same subscriber (the unknown), situation about then a plurality of same subscriber identifiers (uID=position) in the above-mentioned restriction being overlapped is arranged to P (Xu _t)=P (Xu _t-1)=NG (0.0).Therefore, will ignore such state that possibly in fact not take place.

Therefore, add exception rule to above-mentioned restriction 1.In other words, will limit 1 definition as follows:

At P (Xu)=P (xu ¹, xu ²..., xu ⁿ) in, if at least one xu that overlaps with another xu (ID) is arranged, then probability is P (Xu _t)=P (Xu _t-1)=NG (0.0), and P (Xu in other cases _t)=P (Xu _t-1)=OK (0.0＜P≤1.0), but there is following exception: if xu=is unknown, application limitations 1 not then.

To the use of such restriction, even allow to occur therein in the environment of non-registered users the also application of aforementioned analyzing and processing example according to the embodiment of the invention of independence between the target [(b) wherein get rid of] with exception.

[deletion of target and generation are handled]

For example, when the event number from 112 inputs of image event detecting unit is higher than target numbers, carry out the setting of fresh target.Particularly, for example, such situation is corresponding to following situation: non-existent face manifests on the picture frame by shootings such as video cameras.In this case, fresh target is provided with to each particle.With the target of this target design for upgrading corresponding to new events.In addition, for example under the peak condition that does not detect the customer position information that comprises in the target, can not carry out and be used to delete processing such as the such data of the data that do not provide the specific user position as yet.

Therefore, in native system, when the deletion of carrying out target perhaps generates, can increase or reduce target numbers.Depend on that such target numbers increases or reduces, state [Xu] also changes.Therefore the requirement of the probable value of recomputating is arranged.Hereinafter uses description to the particular procedure example of target deletion and target generation.

(target deletion)

In the signal conditioning package according to the embodiment of the invention, the synthetic processing unit 131 of sound/image is handled, so that based on passing through to carry out target data and corresponding particle weight [W _PID] the target data of upgrading of more newly arriving generate target information, and target information outputed to handle confirm unit 132.The synthetic processing unit 131 of sound/image for example generates the target information 520 shown in Figure 21.Target information is generated as (a) customer position information that comprises respective objects (tID=1 is to n) and (b) information of user's confidence information.

The synthetic processing unit 131 of sound/image is paid close attention to by this way based on the customer position information in the target information of the target generation of upgrading.With customer position information be arranged to Gaussian distribution N (m, σ).When in Gaussian distribution, not detecting constant peak, customer position information is not the effective information that shows the specific user position.The target that the such distributed data of synthetic processing unit 131 selections of sound/image does not have peak value is as the deletion object.

Three kinds of target informations 521,522 and 523 of

target

1,2 and n for example, have been shown in the target information shown in Figure 19 520.The synthetic processing unit 131 of sound/image is carried out the peak value of the gaussian distribution data that shows customer location in the target information and the comparison of the threshold value 531 of setting in advance.The synthetic processing unit 131 of sound/image is provided with data (promptly being target information 523 in the example in Figure 19) the conduct deletion target that peak value does not equal or be higher than threshold value 531.

In this example, select target (tID=n) is as the deletion target and from this target of particle deletion.When the maximal value of the Gaussian distribution that shows customer location (probability density distribution) during, has the target of this Gaussian distribution from all particles deletions less than the deletion threshold value.The threshold value of using can be fixed value or can and change to each target, for example be arranged to the interactive object target lower, to prevent the deleting interactive object target easily.

By this way, when certain target of deletion, with the probable value marginalisation of target.In Figure 20, illustrate from the example of three targets (tID=0,1,2) deletion targets (tID=0).

In the row in Figure 20 left side, enumerated the example of the candidate data that the corresponding uID of 0 to 26 conduct of 27 kinds of target datas and three targets (tID=0,1,2) is set.When from these target data deletion targets 0, shown in the row on Figure 20 right side, data edges is changed into nine kinds of data, that is tID=1, nine combinations (0,0) of 2 are to (2,2).In this case, 27 data before marginalisation are selected tID=1, and 2 corresponding data combination (0,0) to (2,2) is with nine kinds of data after the generation marginalisation.For example, generate a combination tID=1,2=(0,0) by tID=(0,0,0), (1,0,0) and (2,0,0) these three data.

Here the probable value that uses description to delete in the processing of target data distributes.For example, generate a combination tID=1,2=(0,0) by tID=(0,0,0), (1,0,0) and (2,0,0) these three data.The probable value P that is provided with tID=(0,0,0), (1,0,0) and (2,0,0) these three data is by marginalisation, and is set to and is used for tID=1, the probable value of 2=(0,0).

Therefore, when the deletion target, the synthetic processing unit 131 of sound/image is carried out and is handled, so that be set to the value of probability of happening when comprising the candidate data of deleting target, is arrived the candidate data that after the deletion target, keeps by marginalisation.Subsequently, the synthetic processing unit 131 of sound/image is carried out and is handled, so that the value of probability of happening is normalized into 1 (one) when will be provided with to all candidate datas.

(target generation)

To the processing that be used to generate fresh target in the synthetic processing unit 131 of sound/image be described with reference to Figure 21.For example when incident generation source hypothesis is set for corresponding particle, carry out the generation of fresh target.

As shown in Figure 21; During the incident between calculating incident and corresponding existing n target-target likelihood score; The new transient target 551 of the synthetic processing unit 131 interim generations of sound/image wherein is provided with " positional information " and " identification information " with equally distributed mode (" variance is big Gaussian distribution fully " and " ID that wherein all Pt [i] are equal distributes ") as n+1 target.

Interim fresh target (tID=n+1) is being set afterwards, the synthetic processing unit 131 of sound/image carries out the setting of incident generation source hypothesis based on the input of new events.In this was handled, the synthetic processing unit 131 of sound/image was carried out the calculating of likelihood score between incoming event information and the respective objects, and calculates the target weight [W of respective objects _TID].In this case, the synthetic processing unit 131 of sound/image is also carried out the calculating of likelihood score between the transient target shown in incoming event information and Figure 21 (tID=n+1), and calculates the target weight (W of interim n+1 target _N+1).

As the target weight (W that judges interim n+1 target _N+1) greater than the target weight (W that has n target now ₁To W _n) time, the synthetic processing unit 131 of sound/image is provided with fresh target for all particles.

When generating fresh target, add data to certain state, and state assignment that will be corresponding with number of users is to additional data, and its probable value is distributed to existing target data with respect to fresh target.

Figure 22 illustrates and regenerates and add the processing example of target (tID=3) to two targets (tID=1,2).

In the row in Figure 22 left side, enumerate nine kinds of data as the corresponding uID candidate's of representative and two targets (tID=1,2) target data (0,0) to (2,2).Add the additional object data that are provided with this new user of user identifier k=3 to target data.The setting of 27 kinds of target datas (0 to 26) that this processing permission is enumerated on the right side of Figure 22.

The probable value that uses description to now to increase in the processing of target data distributes.

For example, from target tID=1,2=(0,0) generates tID=(0,0,0), (0,0,1) and (0,0,2) these three data.To be provided with to tID=1, the probable value P of 2=(0,0) distributes equably and gives these three data [tID=(0,0,0), ((0,0,1), (0,0,2)].

In addition, when carrying out reducing corresponding prior probability and state number according to such as the processing of the such restriction of " same subscriber ID does not distribute to a plurality of targets " this restriction the time.In addition, when the general probability of corresponding target data is not [1] or overall simultaneously when probability of happening (joint probability) is not [1], carry out normalization and handle general probability is adjusted to [1].

Therefore; When generating and adding fresh target when having target now; The synthetic processing unit 131 of sound/image is carried out and is handled; So that the state assignment corresponding with number of users given through adding the additional candidate data that the target that generates provides, and will be provided with that the probability of happening value distributes to the additional candidate data when having candidate data now.Subsequently, the synthetic processing unit 131 of sound/image is carried out and is handled so that the population value that will be provided with to the overall while probability of happening of all candidate datas is normalized into 1 (one).

Referring now to the process flow diagram shown in Figure 23 the processing sequence when getting rid of the analyzing and processing of independence between the target is described.

Processing shown in Figure 23 is to handle sequence.That is the synthetic processing unit 131 of the sound/image in the signal conditioning package 100 shown in Fig. 2 receives the input of event information from sound event detecting unit 122 and image event detecting unit 112.Here, event information comprises the customer position information shown in Fig. 3 B and these two kinds of information of user totem information (face's identification information or speaker identification information).Then, the synthetic processing unit 131 of sound/image is responded input and is handled definite unit 132 so that following information is outputed to:

(b) [signal message] shows that incident the source takes place like the speech user.

At first, in step S201, the synthetic processing unit 131 of sound/image receives the input of following information from sound event detecting unit 122 and image event detecting unit 112:

(a) customer position information;

(b) user totem information (face's identification information or speaker identification information); And

(c) face's attribute information (face's attribute scores).

This process continues step S202 when it obtains the event information success, perhaps when it obtains the event information failure, continue step S221.Processing among the step S221 will be described after a while.

If obtain event information success, then the synthetic processing unit 131 of sound/image is carried out based on input information in step S202 and subsequent step thereof and is used for the more processing of new particle.In step S202, before the processing that is used for new particle more, determine whether that promising each particle is provided with the requirement of fresh target.

For example, if be higher than target numbers, the requirement that fresh target is set is arranged then from the event number of image event detecting unit 112 inputs.Particularly, when the new face in not being present in the picture frame of being taken by video camera occurs, hope to be provided with fresh target.In this case, this process continues step S203, so that fresh target is provided with to each particle.With the target of object definition for upgrading corresponding to new events.In addition, when generating the fresh target data,, be used for the data of fresh target to certain state increase, and state assignment that will be corresponding with number of users is given the data of increase as said with reference to Figure 20.Subsequently, be used to be provided with the process of probable value, probable value is distributed to existing target data.

Then, in step S204, the hypothesis that incident is taken place in the source is provided with corresponding m the particle 1 to m (pID=1 is to m) that is provided with in the synthetic processing unit 131 of sound/image.The source takes place incident for example is the speech user under the situation of sound event, and is the user with face of extraction under the situation of image event.

After the hypothesis setting in step S204, this process continues step S205.In step S205, the synthetic processing unit 131 of sound/image calculates weight that is the particle weight [W corresponding with corresponding particle _PID].As stated, unified value initially is set as particle weight [W for corresponding particle _PID], but upgrade unified value according to incident input.

Described above with reference to Fig. 9 and Figure 10 and to be used to calculate particle weight [W _PID] the processing details.Particle weight [W _PID] be equivalent to and be used to be judged as the index of hypothesis correctness that the corresponding particle of source hypothetical target takes place its generation incident.With particle weight [W _PID] being calculated as incident-target likelihood score, this incident-target likelihood score is to the incident that a corresponding m particle (pID=1 is to m) is provided with the hypothetical target in source and the similarity between the incoming event to take place.

Subsequently, in step S206, the synthetic processing unit 131 of sound/image is based on the particle weight [W of the corresponding particle that is provided with among the step S205 _PID] carry out the processing that is used for the particle resampling.

With particle resampling processing execution for being used for according to particle weight [W _PID] from the processing of m particle selection particle.

Particularly, when number of particles m is 5, the particle weight is provided with as follows:

Particle 1: particle weight [W _PID]=0.40;

Particle 2: particle weight [W _PID]=0.10;

Particle 3: particle weight [W _PID]=0.25;

Particle 4: particle weight [W _PID]=0.05; And

Particle 5: particle weight [W _PID]=0.20.

In this case, according to 40% pair of particle 1 resampling of probability, and according to 10% pair of particle 2 resampling of probability.In fact, m greatly to 100 to 1000.The result of resampling comprises the particle that distribution ratio is corresponding with the weight of particle.

According to this processing, keep particle weight [W _PID] big a large amount of particles.Even after resampling, total number of particles [m] is still constant.After resampling, the weight [W of the corresponding particle that resets _PID].According to the input of new events from step S201 re-treatment.

In step S207, the synthetic processing unit 131 of sound/image is carried out the processing that is used for upgrading the target data (customer location and user's degree of confidence) that corresponding particle comprises.As preceding text were explained with reference to Fig. 6 etc., respective objects comprised following data:

(a) customer location: the probability distribution of the position corresponding [Gaussian distribution: N (m with respective objects _t, σ _t)]; And

uID _t1＝Pt[1]

uID _t2＝Pt[2]

uID _tk＝Pt[k]。

To the updating target data among each the data execution in step S207 in (a) customer location and (b) the user's degree of confidence.

At first use description to upgrade the processing of (a) customer location.

With with the process flow diagram shown in the Fig. 7 of aforementioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis] in the similar mode of mode of step S105 upgrade the processing of (a) customer location.In other words, the renewal of customer location is implemented as the update processing in two stages:

(a1) to the update processing of all intended application of all particles; And

Carry out the processing that (b) is used to upgrade user's degree of confidence, that is use the aforementioned processing of formula (formula 11).That is it is the processing of getting rid of the independence between the target as stated and utilizing the formula (11) that generates based on corresponding formula (formula 5).Formulate is following:

P (Xu _t| θ _t, zu _t, Xu _T-1) ... (formula 5)

In addition, use above-mentioned formula and have the processing that same subscriber identifier (ID) that is user totem information are not distributed to this restriction of different target with execution.

In addition, calculate probability of happening when probability of happening (joint probability) when Figure 17 has described perhaps allows the data of all IDs and all goal coordinations with reference to Figure 15.Then, be used to upgrade the processing of probability of happening simultaneously based on carrying out as the observed value of event information input, whose user's confidence information (uID) represents respective objects with calculating is.

In addition, said with reference to Figure 17 like preceding text, the added together or marginalisation with the probable value of a plurality of candidate datas is to obtain and the corresponding user identifier of respective objects (tID).Use following formula to calculate:

P(xu ⁱ)＝∑ _Xu＝xuiP(Xu)

To handling the target information that comprises user's confidence information and customer position information of confirming that unit output obtains as stated.

In step S208, it is the probability that the source takes place incident that the synthetic processing unit 131 of sound/image calculates a corresponding n target (tID=1 is to n), and probability is outputed to the definite unit 132 of processing as signal message.

As preceding text explanations, show that the source takes place incident [signal message] is the data that show whose speech that is [spokesman] about sound event, and be to show that the face that comprises in the image is the data of whose face about image event.

In other words, be that the probability tables that the source takes place incident is shown P (tID=i) with respective objects (tID=1 is to n), wherein " i " is that i=1 is to n.

In this case, it is following respective objects to be that the probability calculation in source takes place incident:

P (tID=1): be assigned with target numbers/m of tID=1,

P (tID=2): be assigned with target numbers/m of tID=2,

..., and

P (tID=n): the target numbers/m that is assigned with tID=2.

The probability that the synthetic processing unit 131 of sound/image will be incident generation source through the information that is the respective objects of this computing generation outputs to handle as [signal message] confirms unit 132.

When the processing among the completing steps S208, the synthetic processing unit 131 of sound/image turns back to step S201.Then, it is converted to the armed state of importing to from the event information of sound event detecting unit 122 and image event detecting unit 112.

In above description, the step S201 that has described the flow process shown in Figure 23 is to S208.Even when the synthetic processing unit 131 of sound/image may not still be carried out the Data Update of the target that comprises in the corresponding particle when sound event detecting unit 122 or image event detecting unit 112 obtain the event information shown in Fig. 3 B in step S221 in step S221.This renewal is the processing of considering that customer location changes according to the time in the past.

It is that (a1) in the explanation with step S207 is to the identical processing of the update processing of all intended application of all particles that this target update is handled.Hypothesis based on the deviation of customer location was expanded along with past time was carried out this processing.According to positional information, upgrade customer location through using Kalman filter since update processing last time Time And Event in the past.

With with the process flow diagram shown in the Fig. 7 of aforementioned [(1) is used for refreshing the position of finding the user and the processing of identifying user based on event information input through hypothesis] in step S121 in the similar mode of processing carry out this processing.

If the process among the completing steps S221 then need to determine whether the deletion target at step S222.Yet,, will in step S223, delete target if requirement is arranged.The deletion of target is implemented as is used for the processing that deletion does not have the data of particular user position (for example when detection is less than peak value in the customer position information that comprises in target).If there is not such target, then need not deletion and handle.

After the processing of step S222 in S223, the synthetic processing unit 131 of sound/image turns back to step S201, and is converted to the armed state of importing to from the event information of sound event detecting unit 122 and image event detecting unit 112.

In above description, the processing of being carried out by the synthetic processing unit 131 of sound/image has been described with reference to Figure 23.Whenever from sound event detecting unit 122 and image event detecting unit 112 incoming event information the time, the synthetic processing unit 131 of sound/image is just repeatedly carried out the processing according to the process flow diagram shown in Figure 23.Through re-treatment, the Target Setting that reliability is higher is that the weight of the particle of hypothetical target increases.Through carrying out sampling processing, keep the bigger particle of weight based on the particle weight.

As a result, the data of reservation are and high data of the similar reliability of event information from sound

event detecting unit

(a) [target information] is as showing whether a plurality of users exist respectively and whose estimated information the user is; And

Through carrying out process, can use an observed value to represent the Data Update of user's degree of confidence of all targets according to independence between the eliminating target of the present invention.Therefore, can use an observed value pinpoint accuracy ground effectively to realize being used for the processing of identifying user.

With reference to specific embodiment the present invention has been described particularly.Yet, be apparent that those skilled in the art can carry out correction and the replacement of embodiment and not break away from spirit of the present invention.In other words, the present invention discloses and should not have restrictedly with the form of example and explained.In order to judge purport of the present invention, should consider the Patent right requirement book.

Can carry out the series of processes of explaining in this instructions through the combination of hardware, software or hardware and software.When carrying out processing through software; The program of handling sequence that wherein records is installed in the storer in the computing machine that can in specialized hardware, incorporate into; And make executive program; Perhaps installation procedure in the multi-purpose computer that can carry out various processing, and make the multi-purpose computer executive routine.For example, logging program in recording medium in advance.Except program is installed to the computing machine from recording medium, can also receive program through network such as LAN (LAN) or the Internet, and in such as the recording medium of built-in hard disk etc. installation procedure.

Various processing described in this instructions are not only carried out according to time sequencing according to describing, and can or carry out concurrently or individually where necessary according to the processing power of carrying out the device of handling.In this manual, system is the configuration of the logical collection of multiple arrangement, and is not limited to wherein in identical casings, provide the system of the device with independent configuration.

As stated; According to embodiments of the invention; Incoming event information (comprising the subscriber identity data of importing based on the image information of being obtained by video camera or microphone or acoustic information) is provided with the renewal of the target data of a plurality of user's degree of confidence with execution, to generate user totem information.Upgrade probability of happening (joint probability) when allowing the target candidate data corresponding based on the user totem information that comprises in the event information with relative users.The probability of happening value is used to calculate the user degree of confidence corresponding with target when upgrading.Therefore, can effectively be used for the processing of identifying user to pinpoint accuracy, and can not think different targets by mistake identical user.

The application comprises and on the July 8th, 2008 of relevant theme of disclosed theme in the Japanese preference patented claim JP2008-177609 that Jap.P. office submits to, and the entirety of this application is incorporated into this by reference.

One skilled in the art will appreciate that and depend on designing requirement and other factors, various modifications, combination can occur, make up again and change, as long as they are in the scope of accompanying claims and equivalent thereof.

Claims

1. signal conditioning package comprises:

A plurality of information input units, its input comprise image information or the information of acoustic information in the real space;

The event detection unit, it generates the event information that comprises the user totem information in the said real space of being present in of estimation through analyzing image information or the said information of acoustic information the real space that comprises from said information input unit input; And

Information is synthesized processing unit, and it is provided with the probability distribution data of the hypothesis relevant with user totem information, and through upgrading based on said event information and selecting said hypothesis to carry out the processing that the user who is present in the said real space is identified, wherein

The synthetic processing unit of said information is carried out the processing that is used to upgrade the target data that comprises user's confidence information based on the user totem information that comprises in the said event information; Said user's confidence information shows that which user among the said user corresponding to the incident that is provided as the target in source takes place, and

The synthetic processing unit of said information is carried out the processing that is used to calculate said user's degree of confidence through using the not simultaneous restriction of same user to the said processing that is used to upgrade said target data.

2. signal conditioning package according to claim 1, wherein:

The synthetic processing unit of said information is based on the user totem information that comprises in the said event information, upgrades probability of happening (joint probability) when allowing the said target candidate data corresponding with relative users, and

The value of the said while probability of happening of the synthetic processing unit of said information after the processing application update of the user's degree of confidence that is used to calculate the user identifier corresponding with each target, and carry out such processing.

3. signal conditioning package according to claim 2, wherein:

The synthetic processing unit of said information is based on the user totem information that comprises in the said event information, with the value marginalisation of the said while probability of happening after upgrading, to calculate user's degree of confidence of the user identifier corresponding with each target.

4. signal conditioning package according to claim 2, wherein:

The synthetic processing unit of said information is not distributed to the restriction of a plurality of targets based on same subscriber identifier (ID), comes to carry out initial setting up for the said while probability of happening (joint probability) that allows the said target candidate data corresponding with relative users, wherein

The probable value P (Xu) of said while probability of happening that same subscriber identifier (ID) is set to the candidate data of different target is

P (Xu)=0.0; And

The probable value of other target data is

P(Xu)＝0.0＜P≤1.0。

5. signal conditioning package according to claim 4, wherein:

Exception below the synthetic processing unit of said information is carried out is provided with processing: with respect to the non-registered users that is provided with user identifier ID-the unknown; Even the ID-the unknown of same subscriber identifier is set to different target, the probable value P of said while probability of happening (Xu) remains P (Xu)=0.0＜P≤1.0.

6. signal conditioning package according to claim 1, wherein:

The synthetic processing unit of said information deletes wherein that same subscriber identifier (ID) is set to the candidate data of different target and only keeps other candidate data, and

The candidate data that reservation only is provided is as the upgating object based on said event information.

7. signal conditioning package according to claim 2, wherein:

The probable value that the synthetic processing unit utilization of said information uses following formula to calculate:

P(Xu _t|θ _t，zu _t，Xu _t-1)

＝R×P(θ _t，zu _t|Xu _t)P(Xu _t-1|Xu _t)P(Xu _t)/P(Xu _t-1)，

Wherein R representes the normalization item, wherein

Said formula is set up through following hypothesis:

Even probability P (θ _t, zu _t), observed value zu wherein _tBe and, be provided as the generation source of target θ, and when the processing that execution is used to calculate said while probability of happening (joint probability), be assumed to be non-homogeneous at the corresponding event information of identification information that time t obtains; And

Target information Xu, it has shown the user totem information state { xu that in the target data of time t, comprises _t ¹, xu _t ²..., xu _t ⁿ, and be assumed to be evenly.

8. signal conditioning package according to claim 7, wherein:

The synthetic processing unit of said information is handled through using following formula when probability is represented user's degree of confidence of the user identifier corresponding with each target, to carry out the marginalisation that is used for said probable value P (Xu):

P(xu ⁱ)＝∑ _Xu＝xuiP(Xu)

Wherein i representes to be used to calculate the object identifier (tID) of said probability of said user's degree of confidence of said user identifier, and

The synthetic processing unit of said information utilizes said formula to come the probability of user's degree of confidence of the represents said user identifier corresponding with each target.

9. signal conditioning package according to claim 2, wherein:

The synthetic processing unit of said information is carried out and is handled so that will be set to the candidate data that the value edge of the said while probability of happening of the candidate data that comprises target to be deleted turns to the reservation afterwards of the said target of deletion, and

Carry out to handle and be normalized into 1 (one) so that will be set to the total value of the said while probability of happening of all said candidate datas.

10. signal conditioning package according to claim 2, wherein:

When generating and add additional object to candidate data, said information is synthesized processing unit:

Carry out to handle so that the state assignment corresponding with number of users given through adding the additional candidate data that the target that generates increases, and the value that will be set to the said while probability of happening that has candidate data now distributes to said additional candidate data; And

Carry out to handle and be normalized into 1 (one) so that will be set to the total value of the said while probability of happening of all candidate datas.

11. one kind is used for the information processing method carried out at the signal conditioning package with information input unit and event detection unit, may further comprise the steps:

By information input unit information is input in the said event detection unit, wherein said information comprises image information or the acoustic information in the real space; And

Allow said event detection unit to generate event information through analyzing from comprise image information or the said information of acoustic information the real space of said information input unit input, wherein said event information comprises the user totem information in the said real space of being present in of estimation; And

Execution information is synthetic to be handled; Wherein the synthetic processing unit of information is the probability distribution data that user totem information is provided with hypothesis; And through upgrading based on said event information and selecting said hypothesis to carry out the processing that the said user who is present in the said real space is identified, wherein

The synthetic treatment step of said information may further comprise the steps: carry out the processing that is used to upgrade target data based on the user totem information that comprises in the said event information; Wherein said target data is the target data that comprises user's confidence information; Said user's confidence information representes that which user among the said user corresponding to the incident that is provided as the target in source takes place, and

The synthetic processing unit of said information is carried out the processing that is used to calculate said user's degree of confidence through to the not simultaneous restriction of said processing application same subscriber that is used to upgrade said target data.

12. information processing method according to claim 11, wherein:

Said information synthesis step is based on the user totem information that comprises in the said event information, upgrades probability of happening (joint probability) when allowing the said target candidate data corresponding with relative users; And

Said information synthesis step is used the updating value of probability of happening simultaneously to the processing that is used to calculate user's degree of confidence corresponding with target, and carries out such processing.