CN103310180B

CN103310180B - The system and method for detection arbitrary object in the target image

Info

Publication number: CN103310180B
Application number: CN201210057842.1A
Authority: CN
Inventors: 刘媛; 师忠超; H.关; 刘殿超; 刘童
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2012-03-07
Filing date: 2012-03-07
Publication date: 2016-06-29
Anticipated expiration: 2032-03-07
Also published as: CN103310180A

Abstract

The invention discloses the method and system of a kind of detection one or more arbitrary object in the target image, the method includes: Visual Feature Retrieval Process step, for extracting the visual signature of described target image；Positional information obtains step, and for obtaining the positional information of described target image, the geographical position that described positional information is positioned at when being taken to described target image is relevant；And arbitrary object determines step, for the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determine in described target image whether there is described arbitrary object.The invention also discloses a kind of system and method creating image data base.

Description

The system and method for detection arbitrary object in the target image

Technical field

The present invention relates to image procossing and area of pattern recognition, and more particularly, to the system and method for a kind of detection arbitrary object in the target image and create the system and method for image data base.

Background technology

The arbitrary object detecting such as pedestrian, vehicle, animal etc. in image/video is widely used in fields such as video monitoring, robotics, intelligent transportation, medical image, virtual reality technologies, is also the important research direction in computer vision and area of pattern recognition.

Although the detection of the arbitrary object of such as pedestrian and tracking technique have studied more than ten years, but at present still without a standard, healthy and strong, accurate, high performance or real-time arbitrary object detection and track algorithm.Due to intrinsic some characteristics of pedestrian, the complexity of application scenarios, influencing each other between person to person or human and environment so that the detection of pedestrian and tracking are one of challenges of being most difficult in computer vision research field.

Prior art there is much work solve the detection of arbitrary object of such as pedestrian.In the prior art, it is common to use visual signature detects the arbitrary object in image as chief source of information.Visual signature includes such as color, brightness, skirt response, texture and shape, and these visual signatures can catch the profile variation of arbitrary object to a certain extent.But, in view of the change of the posture of clutter, noise and arbitrary object and light conditions, it is difficult to obtain the testing result of arbitrary object exactly.

Such as, in the prior art, U.S. Patent application/patent (disclosed in the 4 days October in 2007 of Shashua et al.) US2007/0230792A1, on the August in 2008 5 of the Sung et al. (Granted publication) US7409091B2, on the August in 2008 26 of the Ogasawara (Granted publication) US7418112B2 and on the November in 2009 3 of the Iwasaki et al. (Granted publication) US7613325B2 disclose how and are efficiently used visual signature, how to develop the classification of more visual signature, how to combine visual appearance and movable information etc..Specifically, as it is shown in figure 1, Fig. 1 illustrates the block diagram of existing pedestrian detecting system.Wherein, this existing pedestrian detecting system 100 includes target image receptor 101, for receiving the target image detecting pedestrian；Feature extractor 102, for extracting the visual signature of this target image；Pedestrian detector 103, for the visual signature according to this target image extracted detects in this target image whether there is pedestrian；Display 104, is used for showing testing result.But, these applications/patents all pertain only to the application of the visual signature in pedestrian detection.

Additionally, in the prior art, United States Patent (USP) on the April in 2005 12 of the OttoDufek (Granted publication) US6879284B2, on the October in 2011 18 of the Chan-YoungChoi (Granted publication) US8041503B2 disclose how that use location context is to the fixed buildings detecting in image, street and river, and they detect such as fixing in this position building, street and river by the location context of concrete longitude and latitude.These patents are only for such as building, street and river immovable in a certain position.It is known that once be aware of the positional information including longitude and latitude, according to satellite map, it is not necessary to can learn whether this position exists this fixed buildings etc. by additional means.But, these patents are not related to detect the detection being likely to the object of random appearance, such as pedestrian, vehicle or animal in a certain position.

The arbitrary object of such as pedestrian, vehicle or animal all cannot be detected by prior art accurately.Accordingly, it would be desirable to a kind of technology that can detect arbitrary object in image/video exactly.

Summary of the invention

In order to solve above-mentioned the problems of the prior art, according to an aspect of the present invention, it is provided that the method for a kind of detection one or more arbitrary object in the target image, including: Visual Feature Retrieval Process step, for extracting the visual signature of described target image；Positional information obtains step, and for obtaining the positional information of described target image, the geographical position that described positional information is positioned at when being taken to described target image is relevant；And arbitrary object determines step, for the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determine in described target image whether there is described arbitrary object.

So, use positional information and consider the positional information impact on the probability that arbitrary object occurs, it is possible to determining whether target image exists described arbitrary object more accurately.

In a preferred embodiment, the method can also include environmental information and obtain step, for obtaining the environmental information of described target image, and environmental correclation when described environmental information and described target image are taken.Described arbitrary object determines that step is also based on the impact on the probability that described arbitrary object occurs of the described environmental information, determines in described target image whether there is arbitrary object.

So, not only use positional information and consider the positional information impact on the probability that arbitrary object occurs, also use environmental information and consider the environmental information impact on the probability that arbitrary object occurs, it is possible to determining whether target image exists described arbitrary object more accurately.

In a preferred embodiment, the method also includes: image data base receiving step, for receiving image data base, wherein include multiple image and the arbitrary object relevant to the plurality of image exists information, visual signature, positional information and/or environmental information in described image data base.Can there is information, visual signature, positional information and/or environmental information based on the arbitrary object relevant to the plurality of image and determine described visual signature and described positional information and/or the environmental information impact on the probability that described arbitrary object occurs.

Therefore, described visual signature and described positional information and/or the environmental information impact on the probability that described arbitrary object occurs can be assisted in by image data base, help the concrete impact judging certain visual signature, certain positional information and/or certain environmental information to the probability that arbitrary object occurs, and the judgement speed of arbitrary object can be improved.

In a preferred embodiment, described arbitrary object determines that step may include that locality condition probability obtains step, the positional information of the multiple images for including based on the described positional information of described target image and described data base obtains the locality condition probability of this target image, to represent in each image with the positional information identical with target image in image data base, to there is the probability of arbitrary object；Vision posterior probability obtains step, the visual signature of multiple images, positional information and/or environmental information for including based on described target image and described data base obtains the vision posterior probability of this target image, with in representing existence arbitrary object in image data base and there is each image of the positional information identical with target image and/or environmental information, there is the probability of the visual signature identical with target image；Arbitrary object occurs that probability obtains step, probability occurs for the arbitrary object obtaining this target image by this locality condition probability is multiplied by this vision posterior probability.If described arbitrary object probability occurs more than first threshold, then described arbitrary object determines that step may determine that there is one or more arbitrary object in described target image.

So, it is multiplied by vision posterior probability by locality condition probability to obtain the arbitrary object of target image probability occurs so that probability occurs in the arbitrary object more simply, more intuitively calculating target image, founding mathematical models of being also more convenient for.

In a preferred embodiment, described locality condition probability obtains step and may include that the total quantity obtaining the image in the plurality of image of described image data base with the positional information identical with described target image, as the first quantity；Obtain and there is the positional information identical with described target image in the plurality of image of described image data base and there is the quantity of image of arbitrary object, as the second quantity；Based on described first quantity and described second quantity, it is thus achieved that described locality condition probability, wherein, having the image of the positional information identical with target image described in is its positional information and the distance of the positional information of the target image image less than Second Threshold.

In a preferred embodiment, described vision posterior probability obtains step and may include that the visual signature based on the multiple images in described image data base, by cluster process, the plurality of image is divided into multiple class so that less with the distance of the visual signature of each image of other apoplexy due to endogenous wind than them in the visual signature distance each other of each image of each apoplexy due to endogenous wind；Visual signature according to described target image, is attributed to an apoplexy due to endogenous wind of the plurality of apoplexy due to endogenous wind by described target image；In obtaining in the image that the apoplexy due to endogenous wind that is attributed to target image in described image data base includes, there is the image of the positional information identical with target image and/or identical environmental information, there is the quantity of the image of arbitrary object, as the 3rd quantity；Acquisition has in the image of the positional information identical with target image and/or identical environmental information in described image data base, there is the quantity of the image of arbitrary object, as the 4th quantity；Sum based on described 3rd quantity and described 4th quantity and the plurality of class, it is thus achieved that described vision posterior probability.The described image with the positional information identical with target image and/or identical environmental information can be the positional information of its positional information and/or environmental information and target image and/or the distance of the environmental information image less than the 3rd threshold value.

In a preferred embodiment, described image data base can be through what following steps created: collects multiple sample image；Extract the text message of the plurality of sample image, described text message can include at least one in the header file of the word around described sample image and described sample image, and described text message may indicate that the arbitrary object of described sample image exists at least one in information, positional information, environmental information；Extract the visual signature of the plurality of sample image；Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, at least one existence in information, visual signature, positional information, environmental information with described arbitrary object by described each sample image is associated.

So, it is possible to use the Internet or the large nuber of images of existing existence in other networks (such as picture sharing website) so that without image pickup extraly, it is possible to easily create this image data base, save substantial amounts of human and material resources and financial resources.

According to a further aspect in the invention, it is provided that the system of a kind of detection one or more arbitrary object in the target image, including: Visual Feature Retrieval Process device, extracts the visual signature of described target image；Positional information obtains device, it is thus achieved that the positional information of described target image, the geographical position that described positional information is positioned at when being taken to described target image is relevant；And arbitrary object determines device, the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determine in described target image whether there is described arbitrary object.

According to a further aspect in the invention, it is provided that a kind of method creating image data base, including: collect multiple sample image；Extract the text message of the plurality of sample image, wherein, described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information；Extract the visual signature of the plurality of sample image；Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, at least one existence in information, visual signature, positional information, environmental information with described arbitrary object by described each sample image is associated, wherein, the geographical position that described positional information is positioned at when being taken to described sample image is relevant, and environmental correclation when described environmental information is taken with described sample image.

According to a further aspect in the invention, it is provided that a kind of system creating image data base, including: collect the device of multiple sample image；Extract the device of the text message of the plurality of sample image, wherein, described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information；Extract the device of the visual signature of the plurality of sample image；Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, described each sample image is existed with described arbitrary object at least one in information, visual signature, positional information, environmental information device of being associated, wherein, the geographical position that described positional information is positioned at when being taken to described sample image is relevant, and environmental correclation when described environmental information is taken with described sample image.

To sum up, the technology of the present invention can detect the arbitrary object in image or video more accurately than prior art.

Accompanying drawing explanation

Fig. 1 illustrates the block diagram of existing pedestrian detecting system；

Fig. 2 illustrates the block diagram of the system of detection according to an embodiment of the invention one or more arbitrary object in the target image；

Fig. 3 illustrates the block diagram of the system of the one or more arbitrary object in the target image of detection according to another embodiment of the present invention；

Fig. 4 (a)-4 (g) illustrates visual signature according to another embodiment of the present invention, positional information and the environmental information schematic diagram on the impact of the probability that arbitrary object occurs；

Fig. 5 illustrates that the arbitrary object in system according to another embodiment of the present invention determines the block diagram of device；

Fig. 6 illustrates the flow chart of the method for the one or more arbitrary object in the target image of detection according to another embodiment of the present invention；

Fig. 7 illustrates the block diagram of the system of establishment image data base according to another embodiment of the present invention；And

Fig. 8 illustrates the flow chart of the method for establishment image data base according to another embodiment of the present invention；

Fig. 9 illustrates a kind of exemplary hardware schematic diagram when applying the technology of the present invention；

Figure 10 illustrates the structural representation of the personal computer in Fig. 9；

Figure 11 illustrates the another kind of exemplary hardware schematic diagram when applying the technology of the present invention；And

Figure 12 illustrates the structural representation of the vehicle in Figure 11.

Detailed description of the invention

Referring now specifically to specific embodiments of the invention, illustrate the example of specific embodiments of the invention in the accompanying drawings.Although the present invention will be described in conjunction with following specific embodiment, but it is not used to invention is limited to embodiment described.On the contrary, embodiment described is for covering replacement, amendment and the equivalent that can include in the spirit and scope of the present invention being defined by the following claims.

Fig. 2 illustrates the block diagram of the system 200 of detection according to an embodiment of the invention one or more arbitrary object in the target image.This system 200 includes: Visual Feature Retrieval Process device 201, extracts the visual signature of described target image；Positional information obtains device 202, is configured to obtain the positional information of described target image, and the geographical position that described positional information is positioned at when being taken to described target image is relevant；And arbitrary object determines device 203, it is configured to the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determines in described target image whether there is described arbitrary object.

Should be noted that, " arbitrary object " occurred in this specification refers to and is likely to the object of random appearance, such as pedestrian, vehicle or animal etc. in a certain position, rather than the such as fixed buildings, street, river etc. fixing in a certain position and certainly exist.

Specifically, Visual Feature Retrieval Process device 201 extracts the visual signature V of described target image.Normally, the visual signature V extracted is a vector, for instance include mentioned in the background section color, brightness, skirt response, texture, shape ... } vector.Certainly, the method extracting visual signature is not limited to this, but can also include in other parameters or these parameters one or more.Because extracting visual signature is technology well known in the art, therefore it is not repeated herein.

Positional information obtains device 202 and obtains the positional information L of described target image, and the geographical position that described positional information L is positioned at when being taken to described target image is relevant.This positional information L can be accurate global positioning system (GPS) information including longitude, latitude coordinate, can also be only to this target image captured by time the geographical position that is positioned at substantially relevant information, such as " so-and-so doorway, skifield ", " so-and-so road and so-and-so crossing, road " etc., or other information that the geographical position that is positioned at when can be to this shooting is relevant, such as with a certain GPS location at a distance of can be considered as being positioned at this GPS location less than the positional information of certain distance, and it is not limited to accurate GPS information.

Arbitrary object determines device 203 impact on the probability that described arbitrary object occurs based on described visual signature V and described positional information L, determines whether there is described arbitrary object in described target image.

At this, disclose many for estimating the visual signature V method on the impact of the probability that arbitrary object occurs in the prior art, for instance, in which type of color, brightness, skirt response, texture, when shape etc., there is more likely to be arbitrary object, be not repeated herein.And for the positional information L impact on the probability that arbitrary object occurs, below by for example bright.Such as, by positional information L, can estimate which country (Asian countries, American States, African country etc.) this image is in when being taken, the facial appearance of the people of country variant, profile, dress ornament, personage's background etc. are all different, therefore positional information is likely in the image of impact shooting the visual signature of pedestrian or non-pedestrian, and contributes to the visual signature V combining position information L by image and judge whether pedestrian occurs more accurately.Again such as, by positional information L, it is possible to estimate whether to be in when this image is taken highway or urban road, on a highway, the probability that pedestrian occurs is relatively low, and on urban road, the probability that pedestrian occurs is higher, and therefore positional information L is likely to affect the probability that pedestrian occurs.Again such as, by positional information L, can estimate that this image when being taken is in city, rural area or desert etc., the probability that pedestrian occurs in city could possibly be higher than the probability that pedestrian occurs in rural area, and the probability that pedestrian occurs in city or rural area could possibly be higher than the probability that pedestrian occurs in desert.In addition, those skilled in the art are it is contemplated that relevant other positional informationes of physical location residing when being taken to described target image and the impact on the probability that pedestrian occurs of these positional informationes, and differ a citing at this.

Visible, the probability that pedestrian occurs is had certain impact by positional information time captured by image.Except the pedestrian of example, other arbitrary object of such as vehicle, animal etc. for occurring at random, the technology of the present invention can be suitable for too.Therefore, positional information L during by considering captured by target image impact on the probability that arbitrary object occurs on the impact of the probability that arbitrary object occurs the visual signature V of based target image, it is possible to determine whether there is described arbitrary object in described target image more accurately.

Fig. 3 illustrates the block diagram of the system 300 of the one or more arbitrary object in the target image of detection according to another embodiment of the present invention.

This system 300 includes: Visual Feature Retrieval Process device 301, is configured to extract the visual signature V of described target image；Positional information obtains device 302, is configured to obtain the positional information L of described target image, and the geographical position that described positional information L is positioned at when being taken to described target image is relevant.The details that this Visual Feature Retrieval Process device 301 obtains device 302 with positional information is similar with this Visual Feature Retrieval Process device 201 in Fig. 2 and positional information acquisition device 202, is not repeated herein.

Alternatively, this system 300 can also include environmental information and obtain device 304, for obtaining the environmental information E of described target image.Environmental correclation when described environmental information E and described target image are taken.The environmental factorss such as time when this environmental information E can include that such as target image is taken, season, weather.

Described arbitrary object determines that device 303 is except the impact on the probability that described arbitrary object occurs based on described visual signature V and described positional information L, it is additionally based upon the described environmental information E impact on the probability that described arbitrary object occurs, determines in described target image whether there is arbitrary object.

Such as, the impact of the probability that described arbitrary object occurs is included by described environmental information E: such as, if environmental information E instruction is the temporal information in daytime or evening, it is likely to affect the visual signature V (change due to such as brightness, colourity) of pedestrian or non-pedestrian in the image shot, and contributes to judging whether pedestrian occurs by the visual signature V of image more accurately in conjunction with this environmental information E；Such as, if environmental information E instruction is the season information in winter or summer, it is possible to the visual signature V (due to the change of such as dressing, change of background color etc.) of pedestrian or non-pedestrian in the image of impact shooting；Such as, if environmental information E instruction is the Weather information of fine day or rainy day, then be likely to pedestrian or non-pedestrian in the image of impact shooting visual signature V (due to for example whether hold up an umbrella, brightness, the change of colourity, the change of dressing, background color change etc.).In addition, those skilled in the art are it is contemplated that the other environmental information of environmental correclation when being taken with described target image, and differ a citing at this.

Therefore, beyond positional information L time captured by target image, environmental information E when further contemplating captured by the target image impact on the probability that arbitrary object occurs, and the impact that the visual signature V of based target image is on the probability that arbitrary object occurs, it is possible to determine whether described target image exists described arbitrary object more accurately.Certainly, it is considered to environmental information is not necessary, but can be by considering the purpose that environmental information reaches to determine more accurately the appearance of arbitrary object.

With reference to Fig. 4 (a)-4 (g), Fig. 4 (a)-4 (g) illustrates visual signature V according to another embodiment of the present invention, the positional information L and the environmental information E schematic diagram on the impact of the probability that arbitrary object occurs.

In Fig. 4 (a)-4 (d), V represents visual signature, and L represents positional information, and E represents environmental information, and P represents whether arbitrary object occurs.

See Fig. 4 (a), can be seen that, in the conventional technology, generally merely with the visual signature V of image, the impact that arbitrary object occurs is judged the appearance of arbitrary object, this judgement is not accurate enough, because along with differences such as the position of image taking and/or environment, the probability that its arbitrary object occurs also is different.And in Fig. 4 (b), illustrate the system 200 such as Fig. 2, merely with visual signature V and positional information L, arbitrary object is occurred that the P impact produced determines whether arbitrary object more accurately, wherein, the visual signature V of image can be produced impact (such as by the positional information L of image, Asian countries is different from the visual signature of Hesperian pedestrian), arbitrary object can also be occurred that P directly produces impact (probability that the probability that such as, the pedestrian on expressway occurs occurs) less than pedestrian on avenue.In Fig. 4 (c), it is shown that only environmental information E is on the impact of visual signature V (such as, winter or the impact on the dressing of pedestrian, background color in the summer), and arbitrary object is occurred that P creates impact again by visual signature V.In Fig. 4 (d), illustrate system 300 as shown in Figure 3, utilize visual signature V, arbitrary object is occurred that the impact producing P determines whether arbitrary object by positional information L and environmental information E three, wherein, the visual signature V of image can be produced impact (such as by the positional information L of image, Asian countries is different from the visual signature of Hesperian pedestrian), arbitrary object can also be occurred that P directly produces impact (such as, the probability that the probability that pedestrian on expressway occurs occurs less than pedestrian on avenue), visual signature V is also had impact (such as by environmental information E, environment is winter or the dressing to pedestrian in the summer, the impact of background color, with reference to Fig. 4 (e) and Fig. 4 (g)；Environment is daytime or the impact on brightness, colourity in the evening, with reference to Fig. 4 (e) and Fig. 4 (f)).Namely, environmental information E can be passed through arbitrary object is occurred by the impact (thus affecting indirectly arbitrary object further P occur) of visual signature V, positional information L by the impact (thus affecting indirectly arbitrary object further P occur) of visual signature V, positional information L, and the impact of P is occurred by the impact of P and visual signature V in arbitrary object, determine whether arbitrary object more accurately.

Additionally, alternatively, in order to help the judgement speed specifically affecting and improving arbitrary object judging certain visual signature, certain positional information and/or certain environmental information to the probability that arbitrary object occurs, this system 300 can also include image data base and receive device 305, is used for receiving image data base.Described image data base can include multiple image and the arbitrary object relevant to the plurality of image exists information P1-Pa, visual signature V1-Va, positional information L1-La and/or environmental information E1-Ea, wherein, a is the sum of multiple images in image data base.Can based on there is information P1-Pa to the plurality of (i.e. a) arbitrary object that image is relevant, visual signature V1-Va, positional information L1-La and/or environmental information E1-Ea determine described visual signature and described positional information and/or the environmental information impact on the probability that described arbitrary object occurs.

It is to say, the technology of the present invention can assist in described visual signature and described positional information and/or environmental information to the impact of the probability that described arbitrary object occurs the judgement speed improving arbitrary object by image data base.Certainly, this image data base not necessarily, does not use this image data base can also to realize the purpose of the present invention.

First, this image data base have collected multiple sample image, even can also be continuously increased new sample image.

Wherein, at least can there is two ways to create this image data base.

The first optional mode is to utilize the large nuber of images existed in the Internet or other networks.Along with being continuously increased of user in the Internet or other networks (especially picture sharing website), increasing image enters in the Internet.Utilize the large nuber of images of existing existence in the Internet or other networks (such as picture sharing website) so that without image pickup extraly, it is possible to easily create this image data base, save substantial amounts of human and material resources and financial resources.

Specifically, the text message of the plurality of sample image is extracted.Described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information.

Such as, image peripheral in webpage generally enclose word (such as, the explanation of picture, the title of picture, label, keyword etc.), sometimes, the header file of the file of image also has many about this information, arbitrary object can be extracted from these words and/or header file and there is at least one in information P, positional information L, environmental information E.Such as, if the word of image peripheral and/or header file occur " man ", " woman ", " people ", " people ", " pedestrian ", " walking " etc., there is information P in the arbitrary object that can obtain this image, such as P=1 refers to arbitrary object in image, P=0 then represents in image does not have arbitrary object.Such as, if the word of image peripheral and/or header file occur such as " so-and-so doorway, skifield " of GPS information, longitude and latitude information, street address information or even hint position, " so-and-so road and so-and-so crossing, road " etc., the positional information L of this image can be obtained.Such as, if " daytime ", " afternoon " (can represent daytime), " snow ", " skiing " (can represent winter or snowy day) occur in the word of image peripheral and/or header file, then the environmental information E of this image can be obtained.Certainly, by text message obtain the arbitrary object of each image exist information P, positional information L and environmental information E method also a variety of, differ a citing at this.

Then, each visual signature V of these sample images is extracted.

Then, based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, at least one existence in information, visual signature, positional information, environmental information with the arbitrary object obtained by each sample image is associated, and these are associatedly stored in image data base, to form image data base.

Another optional mode is, utilizes the vehicle in moving to shoot substantial amounts of image.Generally, be mounted with GPS device (being used for obtaining comparatively accurate GPS geography information) in vehicle, timer device (is used for obtaining temporal information during shooting, thus the environmental information in season inferring daytime or evening, winter or summer etc.), environmental sensor (such as, weather sensor, temperature sensor etc. are used for inferring the environmental information of such as weather condition, temperature conditions etc.), thus obtaining the positional information L and environmental information E of the image of each shooting.Then, each visual signature V of these sample images is extracted.Furthermore it is possible to sense whether there is pedestrian's (that is, arbitrary object exists information P) by such as infrared ray sensor etc..Certainly, except the vehicle in movement, it is also possible to utilize the pedestrian of walking itself, or other other objects of movement on street carry out this shooting and sensing.

So, due to by this and some sensor of GPS, it is possible to create image data base more accurately, so that by utilizing this chart database to determine in target image whether there is arbitrary object more accurately.

Certainly, the method creating image data base is not limited to above two, along with the development of technology, it is also possible to the method having more establishment image data base.

Below, introduce whether the image data base created exists the application in arbitrary object in the present invention sets the goal image really.

Specifically, the respective arbitrary object of multiple images passed through image data base includes can there is information (i.e. P, P=1 refers to arbitrary object in image), visual signature (parameter V), positional information (parameter L) and/or environmental information (parameter E), estimate probability that under the visual signature identical with target image arbitrary object exists (such as Pr (P=1 | V), it refers to when there is this visual signature V, there is the probability that arbitrary object exists, P=1 refers to arbitrary object in image), probability (the such as Pr (P=1 | L) that arbitrary object exists under the positional information identical with target image, it refers to when this positional information L, there is the probability that arbitrary object exists), and/or the probability that under the environmental information identical with target image, arbitrary object exists (such as Pr (P=1 | E), it refers to when this environmental information E, there is the probability that arbitrary object exists).

So, arbitrary object determine device 303 can by except based on described visual signature V and described positional information E on the impact of the probability that described arbitrary object occurs (such as Pr (P=1 | V) and Pr (P=1 | L)), it is additionally based upon described environmental information E to the impact of the probability that described arbitrary object occurs (such as Pr (P=1 | E)), determine described target image exists arbitrary object probability P r (P=1 | V, L, E).

Specifically, for instance, the visual signature at target image is V, and positional information is L, when environmental information is E, there is the probability of arbitrary object or probability is Pr (P=1 | V, L, E) in described target image.Specifically, assume the multiple images in image data base have in n the image of the visual signature V identical with target image, positional information L and environmental information E, it is m that its arbitrary object exists the quantity of the image of information (P=1), so, this target image exists the probability of arbitrary object or probability be Pr (P=1 | V, L, E) can be m/n, wherein, m is zero or positive integer, and n is positive integer.

Certainly, estimation Pr (P=1 | V, L, E) can also have additive method.

Other two kinds of methods of this specification also estimation Pr (P=1 | V, L, E) described in detail below.

First, by the implication of conditional probability, Pr (P=1 | V, L, E) is expanded into:

\Pr (P = 1 | V, L, E) = \frac{\Pr (P = 1, V, L, E)}{Σ_{P^{'} &Element; {0,1}} \Pr (P^{'}, V, L, E)} &Proportional; \Pr (P = 1, V, L, E)

Formula (1)

Wherein, ∑_{P ' ∈ { 0,1}}Pr (P ', V, L, E) i.e. multiple images in image data base described above have the probability of the visual signature V identical with target image, positional information L and environmental information E.And Pr (P=1, V, L, E) i.e. multiple images in image data base described above have visual signature V, the positional information L and environmental information E identical with target image and its arbitrary object exists the probability of information (P=1).

Therefore,It is proportional to Pr's (P=1, V, L, E).

Then, by Bayesian network model (those skilled in the art should understand that this Bayesian network model, be not repeated herein), above-mentioned formula (1) is expanded into:

Pr (P=1, V, L, E)=Pr (P=1 | L) Pr (V | P=1, L, E) Pr (L) formula (2)

Wherein, no matter either with or without arbitrary object in image, Pr (L) is constant.

It was therefore concluded that:

Pr (P=1 | V, L, E) ∝ Pr (P=1 | L) Pr (V | P=1, L, E) formula (3)

Wherein, Pr (P=1 | L) it is referred to as locality condition probability, represent in each image with the positional information L identical with target image in image data base, there is the probability of arbitrary object, and Pr (V | P=1, L, E) it is referred to as vision prior probability, in representing the existence arbitrary object (P=1) in image data base and there is each image of the positional information L identical with target image and/or environmental information E, there is the probability of the visual signature V identical with target image.

Describe below with reference to Fig. 5 and how specifically to calculate locality condition probability P r (P=1 | L) and vision prior probability Pr (V | P=1, L, E).

Fig. 5 illustrates that in system 300 according to another embodiment of the present invention, arbitrary object determines the block diagram of device 303.

This arbitrary object determines that device 303 includes locality condition probability and obtains device 3031, the positional information of the multiple images for including based on described positional information L and the described data base of described target image obtains the locality condition probability P r (P=1 | L) of this target image, to represent in each image with the positional information identical with target image in image data base, to there is the probability of arbitrary object；Vision posterior probability obtains device 3032, the visual signature V of the multiple images for including based on described target image and described data base, positional information L and/or environmental information E obtain this target image vision posterior probability Pr (V | P=1, L, E), with in representing existence arbitrary object in image data base and there is each image of the positional information identical with target image and/or environmental information, there is the probability of the visual signature identical with target image；Arbitrary object occurs that probability obtains device 3033, for by this locality condition probability is multiplied by this vision posterior probability, namely Pr (P=1 | L) Pr (V | P=1, L, E), obtain the arbitrary object of this target image and probability Pr (P=1 | V, L, E) occurs.If described arbitrary object occur probability Pr (P=1 | V, L, E) more than first threshold (such as, this first threshold can be determined by the mode of the method for empirical statistics or machine learning), then described arbitrary object determines that step determines there is one or more arbitrary object in described target image.

Specifically, described locality condition probability obtains device and can also include: obtain the total quantity of the image in the plurality of image of described image data base with the positional information L identical with described target image, device (not shown) as the first quantity (such as, being denoted as N (L))；Obtain and there is the positional information L identical with described target image in the plurality of image of described image data base and there is the quantity of image of arbitrary object (P=1), device (not shown) as the second quantity (such as, being denoted as PN (L))；Based on described first quantity N (L) and described second quantity PN (L), it is thus achieved that the device (not shown) of described locality condition probability.

Such as, equation below is used to calculate initial locality condition probability initPr (P=1 | L).

initPr (P = 1 | L) = \frac{PN (L)}{N (L)}

Formula (4)

The scope of this value initPr (P=1 | L) is between 0 to 1.

But, due to locality condition probability P r (P=1 | L) will with vision posterior probability Pr (V | P=1, L, E) it is multiplied, therefore, in order to avoid this locality condition probability take approximate 0 this cross fractional value thus causing that the result of product is for approximate 0, hence with logarithmic function, final Pr (P=1 | L) is revised as:

Finally

\Pr (P = 1 | L) = \frac{1}{1 + e^{- (initPr (P = 1 | L) - 0.5)}}

Formula (5)

Final position conditional probability Pr (P=1 | L) is made to be restricted near 0.5, to avoid excessive (such as 1) or the value of too small (such as 0).

It will be apparent that above-mentioned logarithmic function is only example, and and unrestricted.Other functions can be used, even do not use any function, it is also possible to realize the purpose of the present invention.

It is described below how computation vision posterior probability Pr (V | P=1, L, E).

This vision posterior probability obtains device 3032 and includes: based on the visual signature of the multiple images in described image data base, by cluster process, the plurality of image is divided into multiple class so that the device (not shown) that visual signature distance each other at each image of each apoplexy due to endogenous wind is less with the distance of the visual signature of each image of other apoplexy due to endogenous wind than them；Visual signature according to described target image, is attributed to the device (not shown) of an apoplexy due to endogenous wind of the plurality of apoplexy due to endogenous wind by described target image；In obtaining in the image that the apoplexy due to endogenous wind that is attributed to target image in described image data base includes, there is the image of the positional information identical with target image and/or identical environmental information, there is the quantity of the image of arbitrary object, as the device (not shown) of the 3rd quantity；Acquisition has in the image of the positional information identical with target image and/or identical environmental information in described image data base, there is the quantity of the image of arbitrary object, as the device (not shown) of the 4th quantity；Sum based on described 3rd quantity and described 4th quantity and the plurality of class, it is thus achieved that the device (not shown) of described vision posterior probability.

Specifically, as previously mentioned, vision posterior probability Pr (V | P=1, L, E) represent existence arbitrary object (i.e. P=1) in image data base and there is each image of the positional information L identical with target image and/or environmental information E in, there is the probability of the visual signature V identical with target image.

Carry out as vision posterior probability obtains device 3032, computation vision posterior probability Pr (V | P=1, L, E) time, first based on visual signature (the such as V1, V2 ... Va of the multiple images in described image data base, a is the sum of all images in image data base), by clustering (Cluster) process, the plurality of image is divided into multiple class (such as, k class) so that less with the distance of the visual signature of each image of other apoplexy due to endogenous wind than them in the visual signature distance each other of each image of each apoplexy due to endogenous wind.In one example, use KMeans of the prior art (K average) method to carry out cluster process so that each cluster itself is compact as much as possible, and separates as much as possible between respectively clustering.The known clustering algorithm of those skilled in the art can also use additive method, such as, K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS algorithm, BIRCH algorithm, CURE algorithm, CHAMELEON algorithm, STING algorithm, CLIQUE algorithm, WAVE-CLUSTER algorithm etc., do not describe in detail one by one at this.

Secondly, the visual signature V according to described target image, described target image is attributed to an apoplexy due to endogenous wind in the plurality of class (k class), for instance class k1.Therefore, in the present embodiment, it is considered to that there is the visual signature V identical with target image at all images of k1 apoplexy due to endogenous wind.The method of this classification also has a lot.Such as, first, selecting a center image at each apoplexy due to endogenous wind, this center image is bordering on the average of each corresponding apoplexy due to endogenous wind image coordinate most.Then, calculating the distance between target image and each center image, if the distance between target image and certain center image is minimum, then the class k1 that this center image is attributed to is considered as then the class that target image is attributed to.

Then, in obtaining in the class k1 that is attributed to target image in the described image data base image included, there is the image of the positional information L identical with target image and/or identical environmental information E, there is the quantity of the image of arbitrary object (P=1), as the 3rd quantity PN (VLE).It is to say, the 3rd quantity PN (VLE) represents existence arbitrary object (i.e. P=1) in image data base, has the visual signature V identical with target image and has the quantity of image of the positional information L identical with target image and/or environmental information E.

Then, it is thus achieved that have in the image of the positional information L identical with target image and/or identical environmental information E in described image data base, there is the quantity of the image of arbitrary object (P=1), as the 4th quantity PN (LE).It is to say, the 4th quantity PN (LE) indicates that the existence arbitrary object (i.e. P=1) in image data base and has the quantity of image of the positional information L identical with target image and/or environmental information E.

Sum (i.e. k) based on described 3rd quantity PN (VLE) and described 4th quantity PN (LE) and the plurality of class, it is thus achieved that described vision posterior probability Pr (V | P=1, L, E).Specifically, it is possible to use smooth (Laplaciansmoothing) algorithm of Laplce calculates this vision posterior probability Pr (V | P=1, L, E), it may be assumed that

\Pr (V | P = 1, L, E) = \frac{1 + PN (VLE)}{k + PN (LE)}

Formula (6).

Certainly, Laplce's smoothing algorithm is only example as used herein, it is also possible to calculate this vision posterior probability Pr (V | P=1, L, E) with other algorithms.

Therefore, a kind of example embodiment of acquisition locality condition probability described in detail above and vision posterior probability.As it has been described above, by the locality condition probability of acquisition is multiplied by this vision posterior probability, namely Pr (P=1 | L) Pr (V | P=1, L, E), obtain the arbitrary object of this target image occur probability Pr (P=1 | V, L, E).As mentioned above, if described arbitrary object occur probability Pr (P=1 | V, L, E) more than first threshold (such as, this first threshold can be determined by the mode of the method for empirical statistics or machine learning), then described arbitrary object determines that step may determine that there is one or more arbitrary object in described target image.

Occur beyond probability Pr (P=1 | V, L, E) except above example mode obtains arbitrary object, it is also possible to obtain by other means.Such as, according to naive Bayesian theorem, it is possible to derive following equation:

\Pr (P = 1 | V, L, E) = \frac{\Pr (P = 1, V, L, E)}{Σ_{P^{'} &Element; {0,1}} \Pr (P^{'}, V, L, E)} &Proportional; \Pr (P = 1, V, L, E)

Formula (7)

Thus learning, this arbitrary object occurs that probability Pr (P=1 | V, L, E) is proportional to distribution probability Pr (P=1, V, L, E).Bayesian network model according to Fig. 4 (d), obtains:

Pr (P=1, V, L, E)=Pr (P=1 | L) Pr (V | P=1, L, E) Pr (L) formula (8)

Consider that positional information L and environmental information E is generally independent from one another.Therefore above-mentioned formula (8) can be derived as:

Pr (P=1, V, L, E)=Pr (P=1 | L) Pr (V | P=1, L) Pr (V | P=1, E) Pr (L)

Formula (9)

Due to no matter, whether image occurs that arbitrary object, Pr (L) are all equal, therefore, it can derive:

Pr (P=1 | V, L, E) ∝ Pr (P=1 | L) Pr (V | P=1, L) Pr (V | P=1, E)

Formula (10)

Wherein, Pr (P=1 | L), Pr (V | P=1, L) and Pr (V | P=1, E) can be referred to as location-prior probability, locality condition probability, environmental condition probability.Wherein location-prior probability P r (P=1 | L) represents in the image in image data base with positional information L, there is the probability of arbitrary object (P=1), locality condition probability P r (V | P=1, L) in representing and there is positional information L in image data base and there is the image of arbitrary object (P=1), there is the probability of visual signature V, and environmental condition probability P r (V | P=1, E), in representing and there is environmental information E in image data base and there is the image of arbitrary object (P=1), there is the probability of visual signature V.Therefore, arbitrary object occurs that probability Pr (P=1 | V, L, E) can obtain by calculating above three probability.

For the circular of these three probability, those skilled in the art can be drawn by above-mentioned instruction, is therefore not repeated herein.

Then, equally, if arbitrary object computed as above occur probability Pr (P=1 | V, L, E) more than first threshold (such as, this first threshold can be determined by the mode of the method for empirical statistics or machine learning), then described arbitrary object determines that step may determine that there is one or more arbitrary object in described target image.

Obtain arbitrary object and occur that the mode of probability Pr (P=1 | V, L, E) is not limited to above-mentioned several method.Those skilled in the art according to art technology knowledge it is contemplated that additive method calculates or estimates that probability Pr (P=1 | V, L, E) occurs in this arbitrary object.

Certainly, it is described above considering the example of environmental information E, but system as shown in Figure 2 200 does not use environmental information E, when there is the probability of arbitrary object in determining described target image, use formula Pr (P=1 | V, L).The various computing formula that those skilled in the art can be able to derive completely when not using environmental information E in teachings above, are not repeated herein.

Should be noted that, the visual signature identical with target image mentioned in this specification, the positional information identical with target image, and the environmental information identical with described target image is all not necessarily accurately equal to and the visual signature of described target image, positional information or environmental information, and can be the visual signature with target image, positional information, environmental information differs the visual signature in certain threshold value, positional information, environmental information, or, this visual signature with target image, visual signature in the same class (words if, with clustering method) at positional information or environmental information place, positional information or environmental information, above-mentioned these can be considered as the visual signature identical with target image, positional information, or environmental information, etc..

Fig. 6 illustrates the flow chart of the method 600 of the one or more arbitrary object in the target image of detection according to another embodiment of the present invention.

The method 600 includes Visual Feature Retrieval Process step 601, for extracting the visual signature of described target image；Positional information obtains step 602, and for obtaining the positional information of described target image, the geographical position that described positional information is positioned at when being taken to described target image is relevant；And arbitrary object determines step 603, for the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determine in described target image whether there is described arbitrary object.

Fig. 7 illustrates the block diagram of the system 700 of establishment image data base according to another embodiment of the present invention.

This system 700 includes the device 701 collecting multiple sample image；Extract the device 702 of the text message of the plurality of sample image, wherein, described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information；Extract the device 703 of the visual signature of the plurality of sample image；Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, described each sample image is existed with described arbitrary object at least one in information, visual signature, positional information, environmental information device 704 of being associated.The geographical position that described positional information is positioned at when being taken to described sample image is relevant, and environmental correclation when described environmental information is taken with described sample image.

Fig. 8 illustrates the flow chart of the method 800 of establishment image data base according to another embodiment of the present invention.

The method 800 includes collecting multiple sample image (step 801)；Extract the text message (step 802) of the plurality of sample image, wherein, described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information；Extract the visual signature (step 803) of the plurality of sample image；Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, be there is at least one in information, visual signature, positional information, environmental information with described arbitrary object in described each sample image and is associated (step 804).The geographical position that described positional information is positioned at when being taken to described sample image is relevant, and environmental correclation when described environmental information is taken with described sample image.

Fig. 9 illustrates a kind of exemplary hardware schematic diagram when applying the technology of the present invention.First, the technology of the present invention can apply on non-moving device, such as personal computer 901.This personal computer 901 is communicated with multiple servers 905 (1)-905 (N) by network 902.

Figure 10 illustrates the structural representation of the personal computer 901 in Fig. 9.This personal computer 901 generally can include CPU (CPU) 9011, for realizing the detection of the present invention one or more arbitrary object in the target image or creating the technology of image data base；Memorizer 9012；Hard disk 9013；Display unit 9014, for browsing arbitrary object detection or creating the result of image data base；Web service 9015, for receiving data via network 902 from data server 905 (1)-905 (N).The result of arbitrary object detection or establishment image data base can be sent to multiple server 905 (1)-905 (N) by network 902 by this personal computer 901.

Figure 11 illustrates the another kind of exemplary hardware schematic diagram when applying the technology of the present invention.The technology of the present invention can also be applied on the device of movement, such as vehicle 1101.This vehicle 1101 can the signal of receiving world locational system (GPS) satellite 1102, and such as wireless network 1104 can be passed through communicate with each data server 905 (1)-905 (N).

Figure 12 illustrates the structural representation of the vehicle 1101 in Figure 11.This vehicle 1101 can include microprocessor 11011, for realizing the detection of the present invention one or more arbitrary object in the target image or creating the technology of image data base；Memorizer 11012；Hard disk 11013；Display unit 11014, for browsing arbitrary object detection or creating the result of image data base；Web service 11015, for receiving data via wireless network 1104 from data server 905 (1)-905 (N)；Video camera 11016, is used for shooting digital photos and digital video alternatively；GPS unit 11017, for based on the signal from gps satellite 1102, it is determined that the current geographic position of this vehicle 1101；One or more environmental sensors 11018 (alternatively), are used for detecting environmental information, such as time, season and weather etc..The result of arbitrary object detection or establishment image data base can be sent to each data server 905 (1)-905 (N) via wireless network 1104 by this vehicle 1101 alternatively.

Although it has been illustrated and described that several embodiments of this total inventive concept, it will be appreciated by those skilled in the art that the principle and spirit that are changed the inventive concept total without departing from this in these embodiments, the scope of this total inventive concept is limited by claims and its equivalent.It should be appreciated by those skilled in the art that in the scope of claims or its equivalent, it is possible to need based on design and other factors carry out various amendment, combination, sub-portfolio and change.

Claims

1. a method for detection one or more arbitrary object in the target image, including:

Visual Feature Retrieval Process step, for extracting the visual signature of described target image；

Positional information obtains step, and for obtaining the positional information of described target image, the geographical position that described positional information is positioned at when being taken to described target image is relevant；And

Arbitrary object determines step, for the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determines in described target image whether there is described arbitrary object.

2. method according to claim 1, also includes:

Environmental information obtains step, for obtaining the environmental information of described target image, and environmental correclation when described environmental information and described target image are taken；

Wherein, described arbitrary object determines that step is additionally based upon the impact on the probability that described arbitrary object occurs of the described environmental information, determines in described target image whether there is arbitrary object.

3. the method according to claim 1 or 2, also includes:

Image data base receiving step, is used for receiving image data base, wherein includes multiple image and the arbitrary object relevant to the plurality of image exists information, visual signature, positional information and/or environmental information in described image data base,

Wherein, there is information, visual signature, positional information and/or environmental information based on the arbitrary object relevant to the plurality of image and determine described visual signature and described positional information and/or the environmental information impact on the probability that described arbitrary object occurs.

4. method according to claim 3, wherein, described arbitrary object determines that step includes:

Locality condition probability obtains step, the positional information of the multiple images for including based on the described positional information of described target image and described data base obtains the locality condition probability of this target image, to represent in each image with the positional information identical with target image in image data base, to there is the probability of arbitrary object；

Vision posterior probability obtains step, the visual signature of multiple images, positional information and/or environmental information for including based on described target image and described data base obtains the vision posterior probability of this target image, with in representing existence arbitrary object in image data base and there is each image of the positional information identical with target image and/or environmental information, there is the probability of the visual signature identical with target image；

Arbitrary object occurs that probability obtains step, probability occurs for the arbitrary object obtaining this target image by this locality condition probability is multiplied by this vision posterior probability,

Wherein, if described arbitrary object probability occurs more than first threshold, then described arbitrary object determines that step determines there is one or more arbitrary object in described target image.

5. method according to claim 4, wherein, described locality condition probability obtains step and includes:

Obtain the total quantity of the image in the plurality of image of described image data base with the positional information identical with described target image, as the first quantity；

Obtain and there is the positional information identical with described target image in the plurality of image of described image data base and there is the quantity of image of arbitrary object, as the second quantity；

Based on described first quantity and described second quantity, it is thus achieved that described locality condition probability,

Wherein, having the image of the positional information identical with target image described in is its positional information and the distance of the positional information of the target image image less than Second Threshold.

6. method according to claim 4, wherein, described vision posterior probability obtains step and includes:

Visual signature based on the multiple images in described image data base, by cluster process, the plurality of image is divided into multiple class so that less with the distance of the visual signature of each image of other apoplexy due to endogenous wind than them in the visual signature distance each other of each image of each apoplexy due to endogenous wind；

Visual signature according to described target image, is attributed to an apoplexy due to endogenous wind of the plurality of apoplexy due to endogenous wind by described target image；

In obtaining in the image that the apoplexy due to endogenous wind that is attributed to target image in described image data base includes, there is the image of the positional information identical with target image and/or identical environmental information, there is the quantity of the image of arbitrary object, as the 3rd quantity；

Acquisition has in the image of the positional information identical with target image and/or identical environmental information in described image data base, there is the quantity of the image of arbitrary object, as the 4th quantity；

Sum based on described 3rd quantity and described 4th quantity and the plurality of class, it is thus achieved that described vision posterior probability,

Wherein, having the image of the positional information identical with target image and/or identical environmental information described in is the positional information of its positional information and/or environmental information and target image and/or the distance of the environmental information image less than the 3rd threshold value.

7. method according to claim 3, wherein, described image data base creates as follows:

Collect multiple sample image；

Extract the text message of the plurality of sample image, wherein, described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information；

Extract the visual signature of the plurality of sample image；

Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, at least one existence in information, visual signature, positional information, environmental information with described arbitrary object by described each sample image is associated.

8. a system for detection one or more arbitrary object in the target image, including:

Visual Feature Retrieval Process device, extracts the visual signature of described target image；

Positional information obtains device, it is thus achieved that the positional information of described target image, the geographical position that described positional information is positioned at when being taken to described target image is relevant；And

Arbitrary object determines device, the impact on the probability that described arbitrary object occurs based on described visual signature and described positional information, determines in described target image whether there is described arbitrary object.

9. the method creating image data base, including:

Collect multiple sample image；

Extract the visual signature of the plurality of sample image；

Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, at least one existence in information, visual signature, positional information, environmental information with described arbitrary object by described each sample image is associated,

Wherein, the geographical position that described positional information is positioned at when being taken to described sample image is relevant, and environmental correclation when described environmental information is taken with described sample image.

10. create a system for image data base, including:

Collect the device of multiple sample image；

Extract the device of the text message of the plurality of sample image, wherein, described text message includes at least one in the header file of the word around described sample image and described sample image, and described text message indicates the arbitrary object of described sample image to there is at least one in information, positional information, environmental information；

Extract the device of the visual signature of the plurality of sample image；

Based on described arbitrary object exist in information, visual signature, positional information, environmental information one or more, described each sample image is existed with described arbitrary object at least one in information, visual signature, positional information, environmental information device of being associated