WO2016139964A1 - 注目領域抽出装置および注目領域抽出方法 - Google Patents
注目領域抽出装置および注目領域抽出方法 Download PDFInfo
- Publication number
- WO2016139964A1 WO2016139964A1 PCT/JP2016/050344 JP2016050344W WO2016139964A1 WO 2016139964 A1 WO2016139964 A1 WO 2016139964A1 JP 2016050344 W JP2016050344 W JP 2016050344W WO 2016139964 A1 WO2016139964 A1 WO 2016139964A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- interest
- image
- region
- partial
- degree
- Prior art date
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 87
- 238000004364 calculation method Methods 0.000 claims description 65
- 238000000034 method Methods 0.000 claims description 34
- 230000010354 integration Effects 0.000 claims description 24
- 239000000284 extract Substances 0.000 abstract description 9
- 238000004422 calculation algorithm Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 16
- 238000001514 detection method Methods 0.000 description 11
- 241000282414 Homo sapiens Species 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 3
- 239000003086 colorant Substances 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 210000003994 retinal ganglion cell Anatomy 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
Definitions
- the present invention relates to a technique for extracting a region of interest from an image.
- an attention area an image area expected to be noticed by a human or an image area to be noticed
- the attention area detection is also called salient area detection (SaliencyaliDetection), objectness detection (Objectness Detection), foreground detection (Foreground Detection), attention detection (Attention Detection), and the like.
- a pattern of a region to be detected is learned based on a large number of image data about a learning target, and a region of interest is detected based on the learning result.
- a feature type is learned and determined in advance based on a plurality of image data to be learned, and the determined feature type and target image data for which a saliency is calculated. Based on the above, it is described that features of each part in the target image data are extracted.
- Non-Patent Document 1 models information transmitted to the brain when a region called a receptive field in a retinal ganglion cell in the retina of an eye is stimulated by light.
- the receptive field is composed of a central area and a peripheral area.
- a model that quantifies a place where a signal becomes strong (a place to draw attention) due to stimulation to the central area and the peripheral area is constructed. is doing.
- the model-based algorithm can detect a region of interest without prior knowledge, but has a drawback that it is difficult to construct a model and the detection accuracy of the region of interest is not sufficient. Therefore, in any method, it is not possible to accurately extract the attention area without limiting the detection target.
- any of the learning-based and model-based algorithms when a plurality of regions are detected from one image, it is possible to determine which region is more important and more interested by people. Can not. When a plurality of areas are detected, it is desirable to rank the degree of interest.
- the present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of accurately extracting a region of interest from an image and calculating the degree of interest.
- an image similar to a partial region extracted from an input image is searched from an image database, and the interest level of the partial region is obtained using a search result.
- the interest level of the partial region is obtained using a search result.
- the attention area extracting apparatus includes an extracting unit that extracts one or a plurality of partial areas from an input image, and for each partial area extracted by the extracting unit, Search means for searching for a similar image from an image database storing a plurality of images, and interest level determination means for determining the interest level of each partial region based on a search result by the search means.
- the partial area is an image area expected to be noticed by humans in the input image or a candidate of an image area to be noticed, that is, a candidate for an attention area.
- the extraction of the partial area by the extracting means can be performed using any existing method.
- the extraction means extracts the partial region by, for example, a region of interest extraction method using a learning-based or model-based algorithm.
- the image database is a device that stores a plurality of image data in a searchable manner.
- the image database may be constructed integrally with the attention area extraction device, or may be constructed separately from the attention area extraction device.
- the image database can be constructed in a storage device included in the attention area extraction device.
- the image database can be constructed in another device that can be accessed by the attention area extracting device via the communication network.
- the creator / manager of the image database does not have to be the same as the creator / manager of the attention area extracting apparatus.
- an image database managed by a third party and published on the Internet can be used.
- the search unit searches the image database for an image similar to the partial area extracted by the extraction unit, and acquires a search result. Specifically, the search unit creates an inquiry (query) for obtaining an image similar to the partial area, transmits the query to the image database, and acquires a response to the query from the image database. Search for similar images in the image database can be performed using any existing technique. For example, a similar image can be searched using an algorithm that calculates a similarity based on a comparison between all areas of an image, a comparison between an entire image and a part, or a comparison between an image part and a part.
- the interest level determination means determines the interest level for each partial region based on the search result by the search means.
- the degree of interest is an index representing the level of interest that a human is expected to have in the partial area, or the level of interest that the human should have in the partial area.
- a high degree of interest in a partial area means that a human should have a higher interest in the partial area or a higher interest in the partial area.
- the degree of interest may be determined for all persons, may be determined for a group of persons (persons having specific attributes), or may be determined for specific individuals. Good.
- the interest level determination means preferably determines the interest level of the partial area using statistical information of an image similar to the partial area searched by the search means (hereinafter also simply referred to as a similar image).
- the statistical information is information obtained by performing statistical processing on the information obtained as a result of the search.
- the number of images similar to the partial area is adopted as the statistical information, and the degree of interest can be determined higher as the number of similar images increases. This is because an object (target) having a larger number stored in the image database is considered to be more likely to be noticed.
- the number of similar images may be considered to represent the probability (accuracy) that the region extracted by the extraction unit is the attention region. Therefore, since it can be said that the partial region with a small number of similar images is not detected as a region of interest in nature, the interest level determination means does not determine the interest level for a partial region with a number of similar images less than the threshold value. It is also preferable.
- tag information associated with similar images can be employed as statistical information.
- the tag information is information representing the content and characteristics of image data specified by a natural language and stored in association with image data in the image database.
- the tag information may be embedded and stored in the image data, or may be stored as a file different from the image data.
- Tag information may be added in any way. For example, tag information may be manually input by a human, or tag information may be automatically added by image processing by a computer.
- the degree of interest determination means determine the degree of interest of the partial region higher as the convergence of the meaning of the tag information associated with the similar image is higher.
- the convergence of the meaning of the tag information is preferably performed by natural language processing. For example, even if the wording of the tag information is different, it is preferable to determine that the meanings are close if they are the same concept or a similar concept.
- an average value, mode value, median value, median value, variance, standard deviation, etc. of the similarity between the partial area and the image similar to the partial area can be adopted.
- statistical information not only the similarity of similar images, but also the size (area or number of pixels) of the similar images, the position in the image, the color, and the like can be employed.
- the size of the similar image the size of the entire similar image, the size of the region similar to the partial region (absolute size or relative size with respect to the entire image), or the like can be adopted.
- the interest level determination means can determine the interest level based on the average value, mode value, median value, intermediate value, variance, standard deviation, and the like of these pieces of information.
- Meta information includes attribute information about the image itself (size, color space, etc.), image shooting conditions (shooting date, shutter speed, aperture, ISO sensitivity, metering mode, presence / absence of flash, focal length, shooting position, etc.) Is included.
- the interest level determination means may determine the interest level based on the meta information.
- the interest level determination means can determine the interest level of the partial area based on the size or position of the partial area.
- the size of the partial area may be an absolute size or a relative size with respect to the input image.
- the degree of interest determination means may determine the degree of interest higher as the size of the partial area is larger, or may determine the degree of interest as higher as the size of the partial area is smaller.
- the degree-of-interest determination means may determine the degree of interest higher as the partial area is closer to the center of the input image, or may determine the degree of interest higher as the partial area is closer to the periphery of the input image.
- the interest level determination means preferably determines the interest level in consideration of the size or position of the partial area and also the type of the object included in the partial area.
- the interest level determination means obtains a plurality of interest levels based on the plurality of information described above, and determines the final interest level by integrating the plurality of interest levels.
- the method of integrating a plurality of interest levels is not particularly limited. For example, the product or weighted average of all the interest levels can be used as the final interest level.
- the attention area extracting apparatus further includes a calculation criterion acquisition unit that receives an input of a calculation criterion for the interest level, and the interest level determination unit calculates the first according to a predetermined calculation criterion. It is also preferable to calculate the final interest level based on the interest level and the second interest level calculated according to the calculation criterion acquired by the calculation criterion acquisition unit.
- the predetermined calculation standard is a calculation standard for interest level for general humans, that is, a general-purpose calculation standard.
- the calculation criterion acquired by the calculation criterion acquisition unit is a calculation criterion according to the situation, for example, a calculation criterion according to a user who views the image, or a calculation criterion according to an application using the extracted attention area. It is preferable.
- the attention area extracting apparatus further includes integration means for integrating a plurality of adjacent partial areas as one partial area among the partial areas included in the input image.
- integration means for integrating a plurality of adjacent partial areas as one partial area among the partial areas included in the input image.
- the term “partial regions are close to each other” includes the case where the partial regions are adjacent to each other and the case where the distance between them is within a predetermined distance (number of pixels). The predetermined distance may be determined according to the size of the partial area, the type of the object included in the partial area, and the like.
- the attention area extracting apparatus preferably further includes output means for outputting the position of the partial area included in the input image and the degree of interest for each partial area.
- the output of the position of the partial area is displayed, for example, by superimposing a frame line indicating the location of the partial area on the input image, or by displaying the partial area with a color or brightness different from those of other areas. You can do it.
- the degree of interest may be output by displaying a numerical value of the degree of interest or by displaying a marker having a color or size corresponding to the degree of interest.
- the output means does not display the interest level or display the partial area for the partial area whose interest level is less than the threshold value, but only the partial area whose interest level is equal to or higher than the threshold level. Output can also be performed.
- the present invention can be understood as an attention area extracting device including at least a part of the above means.
- the present invention can also be understood as an attention area extraction method or interest level calculation method.
- it can also be grasped as a computer program for causing a computer to execute each step of these methods, or a computer-readable storage medium storing the program in a non-temporary manner.
- FIG. 1A and FIG. 1B are respectively a diagram illustrating a hardware configuration and a functional block of the attention area extraction device according to the first embodiment.
- FIG. 2 is a flowchart showing a flow of attention area extraction processing in the first embodiment.
- 3A and 3B are diagrams illustrating an example of an input image and an example of a region of interest extracted from the input image, respectively.
- FIG. 4 is a conceptual diagram illustrating interest level calculation of a region of interest.
- FIGS. 5A and 5B are diagrams illustrating an example of a similar image search result and an example of interest level calculation based on the search result, respectively.
- FIG. 6A and FIG. 6B are a flowchart showing a flow of interest level output processing and an example of interest level output, respectively.
- FIG. 7 is a flowchart showing a flow of attention area extraction processing in the second embodiment.
- FIG. 8 is a diagram illustrating functional blocks of the attention area extraction device according to the third embodiment.
- FIG. 9 is a flowchart showing a flow of attention area extraction processing in the third embodiment.
- FIG. 10 is a diagram illustrating functional blocks of the attention area extraction device according to the fourth embodiment.
- FIG. 11 is a flowchart showing a flow of attention area extraction processing in the fourth embodiment.
- FIG. 12A and FIG. 12B are diagrams showing the attention area integration processing before and after the attention area integration processing in the fourth embodiment, respectively.
- the attention area extraction device is capable of accurately extracting a attention area from an input image and calculating the interest level of each attention area by performing a similar image search on an image database. It is. By searching the image database, it is possible to use information that cannot be obtained from only the input image, and it is possible to extract a region of interest and calculate the interest level with high accuracy.
- FIG. 1A is a diagram illustrating a hardware configuration of the attention area extracting device 10 according to the present embodiment.
- the attention area extraction device 10 includes an image input unit 11, a calculation device 12, a storage device 13, a communication device 14, an input device 15, and an output device 16.
- the image input unit 11 is an interface that receives image data from the camera 20. In this embodiment, the image data is directly received from the camera 20, but the image data may be received via the communication device 14, or the image data may be received via a recording medium.
- the arithmetic device 12 is a general-purpose processor such as a CPU (Central Processing Unit), and executes a program stored in the storage device 13 to realize functions to be described later.
- CPU Central Processing Unit
- the storage device 13 includes a main storage device and an auxiliary storage device, stores a program executed by the arithmetic device 12, and stores image data and temporary data during execution of the program.
- the communication device 14 is a device for the attention area extraction device 10 to communicate with an external computer. The form of communication may be wired or wireless, and the communication standard may be arbitrary. In the present embodiment, the attention area extraction device 10 accesses the image database 30 via the communication device 14.
- the input device 15 includes a keyboard and a mouse, and is a device for a user to input an instruction to the attention area extraction device.
- the output device 16 includes a display device, a speaker, and the like, and is a device for the attention area extraction device to output to the user.
- the image database 30 is a computer including an arithmetic device and a storage device, and stores a plurality of image data so as to be searchable.
- the image database 30 may be composed of one computer or a plurality of computers.
- Various attribute information is associated with the image data stored in the image database 30 in addition to the image data itself (color information for each pixel).
- a data file of image data can include various attribute information according to the Exif format.
- the image database 30 can store attribute information stored in a file different from the data file of the image data in association with the image data.
- Attribute information includes, for example, image size, color space, image shooting conditions (shooting date, shutter speed, aperture, ISO sensitivity, metering mode, flash presence, focal length, shooting position, etc.), image content and features Information (tag information) described in natural language is included. These pieces of attribute information are meta information about image data.
- the image database 30 is open to the public via a public network such as the Internet, and accepts registration and search of image data.
- Who is registered in the image database 30 and the number of registered images are not particularly limited.
- an image of an object that should be noted by the user of the attention area extraction device 10 may be registered.
- the registered image is an image suitable for the attention area extraction process, and therefore the number of registered images may not be so large.
- a third party general user or a search service provider may register the image.
- the registered image may not be an image suitable for the attention area extraction process. Therefore, in such a case, it is preferable that many images are registered in the image database 30.
- the arithmetic device 12 implements a function as shown in FIG. 1B by executing a program. That is, the arithmetic device 12 provides the functions of the region extraction unit 110, the similar image search unit 120, the interest level calculation unit 130, and the output unit 140. The processing content of each part will be described below.
- FIG. 2 is a flowchart showing the flow of attention area extraction processing executed by the attention area extraction device 10.
- the attention area extraction device 10 acquires an image (input image).
- the input image may be acquired from the camera 20 via the image input unit 11, acquired from another computer via the communication device 14, or acquired from a storage medium via the storage device 13. Also good.
- FIG. 3A is a diagram illustrating an example of the input image 400.
- the region extraction unit 110 extracts a region of interest (partial region) from the input image.
- the attention area extraction algorithm used by the area extraction unit 110 is not particularly limited, and any existing algorithm including a learning base algorithm and a model base algorithm can be employed. Further, the algorithm to be employed need not be limited to one, and the attention area may be extracted according to a plurality of algorithms. Note that it is preferable to use a model-based extraction algorithm because a learning-based extraction algorithm can extract only learned objects.
- FIG. 3B is a diagram illustrating an example of a region of interest extracted from the input image 400.
- four attention areas 401 to 404 are extracted from the input image 400.
- Area 401 is a vehicle
- area 402 is a person
- area 403 is a road sign.
- the region 404 is not originally a region of interest, but is a region erroneously detected as a region of interest by the region extraction unit 110.
- the similar image search unit 120 performs similar image search processing for each of the attention areas extracted in step S ⁇ b> 20, and the interest level of the attention area based on the search result. Is calculated (loop L1). More specifically, in step S ⁇ b> 30, the similar image search unit 120 issues a query for searching for an image similar to each region of interest to the image database 30 and acquires a search result from the image database 30. When the image database 30 receives the search query, the image database 30 searches the database for an image similar to the search image (image of the attention area) included in the search query, and transmits the search result. As a similar image search algorithm in the image database 30, any known algorithm can be adopted.
- the image database 30 transmits the similar image obtained by the search and its attribute information to the attention area extracting apparatus 10 as a search result.
- the interest level calculation unit 130 of the attention area extraction device 10 calculates the interest degree of the attention area based on the search result obtained from the image database 30.
- the interest level calculation unit 130 calculates a plurality of individual interest levels (R1 to R4) based on the search result, and integrates the plurality of interest level scores to obtain a final interest level (total interest level). R is calculated.
- the individual interest level is an interest level evaluated from different viewpoints.
- the interest level (R1) based on the number of similar images matching the search
- the interest level (R2) based on the average similarity of similar images
- the degree of interest (R3) based on the relative size of the similar region in the image and the degree of interest (R4) based on the convergence of the meaning of the tag information are included.
- the total interest level R is determined based on the individual interest levels R1 to R4, for example, it may be obtained as an average (including a weighted average) or a maximum or minimum value of the individual interest levels R1 to R4. Good.
- the individual interest level shown here is an example, and a value determined according to a criterion other than the above based on the search condition may be used.
- the degree of interest does not necessarily need to be calculated only from the search result, and may be calculated in consideration of the extraction region itself or the input image, for example.
- FIG. 5A is a diagram illustrating an example of a search result in step S30.
- search image an image having a similarity greater than or equal to a predetermined threshold
- the image number 501 an image having a similarity greater than or equal to a predetermined threshold
- the similarity 502 the overall size of the similar image 503, and the attention among the similar images
- the size 504 of the area similar to the area and the tag information 505 stored in association with the similar image are shown, information other than these may be included in the search result.
- FIG. 5B is a diagram illustrating an example of interest level calculation performed by the interest level calculation unit 130.
- the degree of interest R1 based on the number of similar images that match the search is calculated higher as the number of search hits increases. As a result, the degree of interest is calculated higher for objects that are registered in the image database 30 more frequently.
- the number of search hits used for calculating the interest level R1 may be the total number of similar images sent from the image database 30, or the number of search results whose similarity 502 is greater than or equal to a predetermined threshold. It may be.
- the degree of interest R2 based on the average similarity of similar images is calculated higher as the average similarity of similarities 502 of similar images included in the search result is higher. Even if the number of search hits is large, if the similarity of the similar images is low, the object is not necessarily an object of high interest. Therefore, the accuracy of interest calculation can be improved by considering the average similarity.
- the average similarity is used for calculating the interest level R2 here, it may be based on other statistics such as the mode, median, intermediate value, variance, standard deviation, and the like.
- the degree of interest (R3) based on the relative size of the similar region in the similar image is calculated as the average value of the ratio of the size 504 of the similar region to the overall size 503 of the similar image included in the search result increases. As a result, the degree of interest is calculated higher for an object that is larger in the image. Note that the degree of interest R3 may be obtained by another criterion based on these values, in addition to the ratio of the overall size 503 of similar images and the size 504 of similar regions.
- the degree of interest R4 based on the convergence of the meaning of the tag information is calculated higher as the convergence of the meaning of the tag information included in the search result is higher. As a result, the degree of interest is calculated higher for objects to which many people have tag information having the same meaning.
- the convergence of meaning is preferably determined by natural language processing, and it is preferable that the convergence of meaning is determined to be high if the concept is the same or a similar concept even if the wording of the tag information is different.
- the interest level calculation unit 130 can, for example, divide the meaning of the tag information included in the search result into several categories, and obtain the ratio of the number of elements in the maximum category as the interest level R4. In the example of tag information shown in FIG.
- “automobile” and “car” have the same concept and can be classified into the same category. Since “sports car” is a subordinate concept of “car” and “car”, it can be classified into the same category as “car” and “car”. On the other hand, “park” is a different concept from “automobile” or the like, and is therefore classified into a different category. Since “motor show” is a concept related to “automobile” or the like, it may be classified into the same category as “automobile” or the like, or may be classified into a different category. Here, if “motor show” is also classified into the same category as “automobile” and the search results are the five items shown in FIG.
- the tag information is a word is shown, but the tag information may be expressed as a sentence, and in that case, the meaning can be estimated by natural language processing.
- the interest level calculation unit 130 calculates the total interest level R as described above based on the individual interest levels R1 to R4.
- the individual interest levels R1 to R4 are calculated as large values for a region that is assumed to be noted by a general human. That is, the individual interest levels R1 to R4 are general-purpose interest levels for all human beings, and the total interest level R calculated based on these is also a general-purpose interest level.
- step S50 the output unit 140 outputs the position of the attention area in the input image and the interest level for each attention area.
- the output unit 140 does not output all the attention areas extracted in step S20, but outputs attention areas whose interest level is greater than or equal to a predetermined threshold Th R among these attention areas.
- FIG. 6A is a flowchart for explaining the output process in step S50 in more detail.
- the output unit 140 repeats the following process (loop L2) for all the attention areas extracted in step S20. First, the output unit 140 determines whether or not the degree of interest calculated for the attention area is greater than or equal to the threshold Th R (S51).
- the position of the region of interest and the interest level are output (S52). If the interest level is less than the threshold Th R (S51-NO), The position of the region of interest and its interest level are not output.
- FIG. 6B is a diagram illustrating an example of the output of the position of the region of interest and the degree of interest in the present embodiment.
- the attention regions 401 to 403 have the interest level equal to or higher than the threshold Th R among the attention regions 401 to 404 shown in FIG. Therefore, the positions of the attention areas 401 to 403 are displayed by a frame display surrounding the areas. Also, beside the attention areas 401 to 403, the interest levels of these attention areas are displayed as numerical values on the interest level display sections 601 to 603. The attention area 404 is not displayed because the degree of interest is less than the threshold Th R. Note that this example is merely an example of display.
- the position of the attention area can be specified by changing the luminance or color in the display of the attention area and the area other than the attention area.
- the degree of interest does not need to be displayed numerically.
- the degree of interest can be indicated by changing the color or shape of the symbol, and the thickness or color of the frame indicating the region of interest can be changed. You can also indicate the level of interest.
- ⁇ Effect of this embodiment> by extracting a region of interest from an input image using information related to an image included in the image database, it is possible to perform extraction with higher accuracy than extracting a region of interest from only the input image.
- various target objects can be extracted as attention areas without being limited to the target areas that can be extracted.
- the extraction accuracy can be improved by using the search result of the image database.
- FIG. 7 is a flowchart showing a flow of attention area extraction processing in the present embodiment.
- the difference is that a process of comparing the number of similar images retrieved with a threshold Th N is added after the similar image retrieval step S30. If the number of similar images retrieved is greater than or equal to the threshold Th N (S35-YES), the interest level calculation unit 130 calculates the interest level for the attention area as in the first embodiment (S40). If the number of images is less than the threshold Th N (S35-NO), the interest level is not calculated for this attention area.
- the degree of interest is not calculated for the region where the number of similar images hit by the search is small. It can be said that the fact that the number of similar images is small is an area that does not need to be noticed so much, and the above determination process can also be regarded as a process for determining whether the extraction accuracy of the attention area extraction process in step S20 is equal to or higher than a threshold value. .
- the extraction accuracy does not necessarily have to be evaluated by the number of similar image search hits, but may be evaluated by other criteria.
- the extraction accuracy and interest level of the region extracted by the conventional attention region extraction process (S20) are calculated based on different criteria using the similar image search results.
- the degree of interest is calculated as a general-purpose measure for all human beings.
- the attention area extracting apparatus 310 receives the interest level calculation criterion determined based on the prior knowledge, and also obtains the interest level specialized for the user.
- FIG. 8 is a diagram showing functional blocks realized by the arithmetic device 12 of the attention area extracting device 310 according to the present embodiment executing a program.
- the functional blocks of the attention area extraction device 310 are basically the same as those in the first embodiment (FIG. 1B), but the interest level calculation unit 130 includes a general interest level calculation unit 131, an interest level calculation reference acquisition unit. 132, a specific interest level calculation unit 133, and an interest level integration unit 134.
- FIG. 9 is a flowchart showing the flow of attention area extraction processing executed by the attention area extraction device 310 according to this embodiment.
- the same processes as those in the first embodiment (FIG. 2) are denoted by the same reference numerals, and detailed description thereof is omitted.
- the interest level calculation reference acquisition unit 132 acquires a reference for calculating the interest level (specific interest level) for a specific user or application.
- the calculation criterion of the specific interest level varies depending on the user or application that uses the processing result of the attention area extraction device 310. For example, if there is prior knowledge that a certain user is particularly interested in a specific object, the degree of interest of the object should be greatly calculated for this user. Also, if the application prompts the user to pay attention to objects that are easily overlooked, calculate the interest level of objects that are difficult to see due to their size being small or similar to the surrounding colors in the input image. Should.
- the interest level calculation standard acquisition unit 132 may receive the calculation standard itself from the outside, or acquires information specifying the user or application, and acquires the interest level calculation standard corresponding to the user or application by itself. May be. In the latter case, the interest level calculation reference acquisition unit 132 stores the interest level calculation reference for each user or application, or inquires and acquires an external device. In FIG. 9, the interest level calculation reference is acquired after step S20. However, the interest level calculation reference may be acquired before the input image acquisition process S10 or the attention area extraction process S20.
- the point that the interest level calculation unit 130 calculates the interest level for each of the attention areas extracted from the input image in the loop L1 is the same as in the first embodiment. In the present embodiment, a specific calculation method is different from that of the first embodiment, and will be described below.
- step S30 the similar image search unit 120 searches the image database 30 for an image similar to the attention area, and acquires the search result.
- step S41 the general interest level calculation unit 131 calculates a general interest level using the search result and a predetermined calculation criterion. This process is the same as the interest level calculation process (S40) in the first embodiment.
- step S ⁇ b> 42 the specific interest level calculation unit 133 uses the search results obtained by the similar image search unit 120 and the calculation criteria acquired by the interest level calculation reference acquisition unit 132 to specify the interest level for the specific user or application (specific Interest level).
- This process is the same as the process by the general interest level calculation unit 131 except that the calculation criteria are different.
- the specific interest level calculation unit 133 may calculate a plurality of individual interest levels according to different criteria, and may calculate the specific interest level by integrating the plurality of individual interest levels.
- the interest level integration unit 134 integrates the general interest level calculated by the general interest level calculation unit 131 and the specific interest level calculated by the specific interest level calculation unit 133 to obtain a final interest level. Is calculated.
- the method of integration may be arbitrary. For example, an average (simple average or weighted average) of general interest and specific interest may be used as the final interest. The weight in the weighted average may be fixed or may be changed according to the user or application.
- the interest level integration unit 134 determines the weighted average of the individual interest levels obtained when calculating the general interest level and the specific interest level as the final interest level. The degree of interest may be determined.
- the output process (S50) after the interest level for each region of interest is calculated is the same as in the first embodiment.
- the degree of interest can be calculated higher for an object that the user is interested in using the tendency of interest of the user. Further, when it is difficult for the user to visually recognize specific colors, the degree of interest of an object having these colors can be calculated high. If the application is for detecting an object that is difficult to visually recognize, the degree of interest can be calculated higher for an object having a smaller size of the region of interest in the input image. In addition, when applying to moving images, the degree of interest of an object that suddenly appeared (an object that did not exist in the previous frame) is calculated to be high, or conversely, It is possible to calculate a high degree of interest.
- the general interest level and the interest level specialized for a specific application are calculated, and these are integrated to obtain the final interest level. Can be calculated.
- the general interest level calculation unit 130 the general interest level calculation unit 131 and the interest level integration unit 134 can be omitted.
- the region of interest output processing is different compared to the first to third embodiments.
- the attention areas adjacent to each other in the input image are integrated and output as one attention area.
- FIG. 10 is a diagram showing functional blocks realized by the arithmetic device 12 of the attention area extracting device 410 according to the present embodiment executing a program.
- the attention area extraction device 410 includes an area integration unit 150 in addition to the functions of the first embodiment.
- FIG. 11 is a flowchart showing the flow of attention area extraction processing executed by the attention area extraction device 410 according to this embodiment.
- the region integration unit 150 integrates a plurality of attention regions based on the positional relationship of the attention regions. For example, if the distance between the attention areas is equal to or smaller than a predetermined threshold Th D , the area integration unit 150 integrates these attention areas.
- the distance between the attention areas may be defined as the distance between the centers (number of pixels), or may be defined as the distance between the closest border portions.
- the threshold value Th D may be a fixed value, or may vary according to the size of the attention area and the object type in the attention area.
- FIG. 12A is a diagram showing attention areas 1201 to 1203 extracted from the input image 1200 in step S20. While the attention area 1201 is far from other attention areas, the attention area 1202 and the attention area 1203 are short. Therefore, the region integration unit 150 integrates the attention region 1202 and the attention region 1203.
- FIG. 12B shows an image 1200 after the integration process. As shown in the figure, the attention area 1202 and the attention area 1203 are integrated into one attention area 1204.
- the integrated attention area 1204 is a minimum rectangle including the attention area 1202 and the attention area 1203, but the integrated attention area 1204 may be generated by a method different from this.
- attention regions with a low interest level need not be integrated, and only when the interest level of the attention region satisfies a predetermined relationship (for example, the average interest level is equal to or greater than a threshold).
- the areas may be integrated. That is, the region integration unit 150 may determine whether to integrate based on the interest level of the attention region in addition to the distance between the attention regions. Further, the region integration unit 150 may integrate three or more attention regions into one region.
- the area integration unit 150 When integrating a plurality of attention areas, the area integration unit 150 also determines the degree of interest in the attention area after integration.
- the interest level of the attention area after the integration is preferably, for example, an average value or a maximum value of the interest degree regarding the integrated attention area, but may be determined by other methods.
- the interest level output process of the attention area in step S50 is the same as the process in the first embodiment except that the attention area output process is performed on the attention area after integration.
- the number of attention areas to be output can be suppressed by integrating a plurality of attention areas that are close to each other. Further, in determining whether or not to integrate the regions, the regions can be more appropriately integrated by adopting the interest level using the search result of the image database.
- the image database is configured as a device different from the attention area extraction device.
- the image database may be configured integrally with the attention region extraction device.
- the image data included in the image database may be registered by the manufacturer of the attention area extracting device, or may be registered by the user.
- the attention area extraction apparatus may use a plurality of image databases including an image database inside the apparatus and an image database outside the apparatus.
- the method for calculating the degree of interest described above is an example, and in the present invention, if the degree of interest is calculated using a search result obtained by searching for an image similar to the region of interest, the method for calculating the degree of interest is not particularly limited.
- the degree of interest is preferably calculated using statistical information of the search results.
- the search result statistical information includes the number of search hits, the similarity statistic, the size of the similar image size, the position of the region similar to the search image in the similar image, the convergence of the meaning indicated by the tag information, etc. included.
- the degree of interest can be calculated based on the statistic of meta information.
- the statistic is an amount obtained by performing statistical processing on a plurality of data, and typically includes an average value, a mode value, a median value, an intermediate value, a variance, a standard deviation, and the like. included.
- the interest level of the attention area can be calculated using information other than the result of the similar image search. For example, it can be calculated based on the size and color of the attention area itself, the position of the attention area in the input image, and the like.
- the input image is assumed to be a still image, but the input image may be a moving image (a plurality of still images).
- the region extraction unit 110 may extract the attention region using an existing algorithm that extracts the attention region from the moving image.
- the interest level calculation unit 130 can also calculate the interest level in consideration of the temporal change in the position of the attention area. For example, the moving speed and moving direction of the attention area can be considered. The degree of interest may be calculated higher or lower as the moving speed of the region of interest increases.
- the interest level may be calculated based on the moving direction itself, or the interest level may be calculated based on variations in the moving direction.
- the attention area extraction apparatus can be implemented as an arbitrary information processing apparatus (computer) such as a desktop computer, a notebook computer, a slate computer, a smartphone, a mobile phone, a digital camera, or a digital video camera.
- a desktop computer such as a desktop computer, a notebook computer, a slate computer, a smartphone, a mobile phone, a digital camera, or a digital video camera.
- Region of interest extraction device 20 Camera
- Image database 110 Region extraction unit
- 120 Similar image search unit
- Interest level calculation unit 140 Output unit
- Region integration unit 400 Input Images 401, 402, 403, 404: attention area 601, 602, 603: interest display section 1200: input image 1201, 1202, 1203: attention area (before integration processing)
- 1204 Region of interest (after integration processing)
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Geometry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112016001039.7T DE112016001039T5 (de) | 2015-03-05 | 2016-01-07 | Vorrichtung und Verfahren zur Extraktion eines interessierenden Bereichs |
US15/683,997 US20170352162A1 (en) | 2015-03-05 | 2017-08-23 | Region-of-interest extraction device and region-of-interest extraction method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510098283.2A CN105989174B (zh) | 2015-03-05 | 2015-03-05 | 关注区域提取装置以及关注区域提取方法 |
CN201510098283.2 | 2015-03-05 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/683,997 Continuation US20170352162A1 (en) | 2015-03-05 | 2017-08-23 | Region-of-interest extraction device and region-of-interest extraction method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016139964A1 true WO2016139964A1 (ja) | 2016-09-09 |
Family
ID=56849320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/050344 WO2016139964A1 (ja) | 2015-03-05 | 2016-01-07 | 注目領域抽出装置および注目領域抽出方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20170352162A1 (de) |
CN (1) | CN105989174B (de) |
DE (1) | DE112016001039T5 (de) |
WO (1) | WO2016139964A1 (de) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017203705A1 (ja) * | 2016-05-27 | 2017-11-30 | 楽天株式会社 | 画像処理装置、画像処理方法及び画像処理プログラム |
JP6948128B2 (ja) | 2017-01-13 | 2021-10-13 | キヤノン株式会社 | 映像監視装置及びその制御方法及びシステム |
US10810773B2 (en) * | 2017-06-14 | 2020-10-20 | Dell Products, L.P. | Headset display control based upon a user's pupil state |
JP6907774B2 (ja) * | 2017-07-14 | 2021-07-21 | オムロン株式会社 | 物体検出装置、物体検出方法、およびプログラム |
CN111666952B (zh) * | 2020-05-22 | 2023-10-24 | 北京腾信软创科技股份有限公司 | 一种基于标签上下文的显著区域提取方法及系统 |
CN113656395B (zh) * | 2021-10-15 | 2022-03-15 | 深圳市信润富联数字科技有限公司 | 数据质量治理方法、装置、设备及存储介质 |
US20230368108A1 (en) * | 2022-05-11 | 2023-11-16 | At&T Intellectual Property I, L.P. | Method and system for assessment of environmental and/or social risks |
CN114840700B (zh) * | 2022-05-30 | 2023-01-13 | 来也科技(北京)有限公司 | 结合rpa和ai实现ia的图像检索方法、装置及电子设备 |
US11941043B2 (en) * | 2022-07-25 | 2024-03-26 | Dell Products L.P. | System and method for managing use of images using landmarks or areas of interest |
US12032612B2 (en) | 2022-07-25 | 2024-07-09 | Dell Products L.P. | System and method for managing storage and use of biosystem on a chip data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010122931A (ja) * | 2008-11-20 | 2010-06-03 | Nippon Telegr & Teleph Corp <Ntt> | 類似領域検索方法、類似領域検索装置、類似領域検索プログラム |
WO2013031096A1 (ja) * | 2011-08-29 | 2013-03-07 | パナソニック株式会社 | 画像処理装置、画像処理方法、プログラム、集積回路 |
JP2014063377A (ja) * | 2012-09-21 | 2014-04-10 | Nikon Systems Inc | 画像処理装置およびプログラム |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5893095A (en) * | 1996-03-29 | 1999-04-06 | Virage, Inc. | Similarity engine for content-based retrieval of images |
US6175829B1 (en) * | 1998-04-22 | 2001-01-16 | Nec Usa, Inc. | Method and apparatus for facilitating query reformulation |
EP1293925A1 (de) * | 2001-09-18 | 2003-03-19 | Agfa-Gevaert | Verfahren zur Bestimmung von RÖntgenbildern |
US8467631B2 (en) * | 2009-06-30 | 2013-06-18 | Red Hat Israel, Ltd. | Method and apparatus for identification of image uniqueness |
WO2011140786A1 (zh) * | 2010-10-29 | 2011-11-17 | 华为技术有限公司 | 一种视频兴趣物体提取与关联的方法及系统 |
EP2810218A4 (de) * | 2012-02-03 | 2016-10-26 | See Out Pty Ltd | Benachrichtigungs- und privatsphärenverwaltung von onlinefotos und -videos |
CN104217225B (zh) * | 2014-09-02 | 2018-04-24 | 中国科学院自动化研究所 | 一种视觉目标检测与标注方法 |
-
2015
- 2015-03-05 CN CN201510098283.2A patent/CN105989174B/zh active Active
-
2016
- 2016-01-07 WO PCT/JP2016/050344 patent/WO2016139964A1/ja active Application Filing
- 2016-01-07 DE DE112016001039.7T patent/DE112016001039T5/de active Pending
-
2017
- 2017-08-23 US US15/683,997 patent/US20170352162A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010122931A (ja) * | 2008-11-20 | 2010-06-03 | Nippon Telegr & Teleph Corp <Ntt> | 類似領域検索方法、類似領域検索装置、類似領域検索プログラム |
WO2013031096A1 (ja) * | 2011-08-29 | 2013-03-07 | パナソニック株式会社 | 画像処理装置、画像処理方法、プログラム、集積回路 |
JP2014063377A (ja) * | 2012-09-21 | 2014-04-10 | Nikon Systems Inc | 画像処理装置およびプログラム |
Also Published As
Publication number | Publication date |
---|---|
CN105989174A (zh) | 2016-10-05 |
CN105989174B (zh) | 2019-11-01 |
US20170352162A1 (en) | 2017-12-07 |
DE112016001039T5 (de) | 2018-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016139964A1 (ja) | 注目領域抽出装置および注目領域抽出方法 | |
US11657084B2 (en) | Correlating image annotations with foreground features | |
CN107251045B (zh) | 物体识别装置、物体识别方法及计算机可读存储介质 | |
WO2019218824A1 (zh) | 一种移动轨迹获取方法及其设备、存储介质、终端 | |
CN107784282B (zh) | 对象属性的识别方法、装置及系统 | |
US10872424B2 (en) | Object tracking using object attributes | |
US9323785B2 (en) | Method and system for mobile visual search using metadata and segmentation | |
JP5763965B2 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
CN112348117B (zh) | 场景识别方法、装置、计算机设备和存储介质 | |
US20240330372A1 (en) | Visual Recognition Using User Tap Locations | |
WO2017045443A1 (zh) | 一种图像检索方法及系统 | |
JP5963609B2 (ja) | 画像処理装置、画像処理方法 | |
US20160026854A1 (en) | Method and apparatus of identifying user using face recognition | |
US20130243249A1 (en) | Electronic device and method for recognizing image and searching for concerning information | |
US20190114780A1 (en) | Systems and methods for detection of significant and attractive components in digital images | |
WO2018121287A1 (zh) | 目标再识别方法和装置 | |
US20160148070A1 (en) | Image processing apparatus, image processing method, and recording medium | |
WO2020052513A1 (zh) | 图像识别和行人再识别方法及装置,电子和存储设备 | |
CN107644105A (zh) | 一种搜题方法及装置 | |
US11921774B2 (en) | Method for selecting image of interest to construct retrieval database and image control system performing the same | |
CN107203638B (zh) | 监控视频处理方法、装置及系统 | |
JP7351344B2 (ja) | 学習装置、学習方法、推論装置、推論方法、及び、プログラム | |
Mu et al. | Finding autofocus region in low contrast surveillance images using CNN-based saliency algorithm | |
US20230131717A1 (en) | Search processing device, search processing method, and computer program product | |
JP2015165433A (ja) | 情報処理装置、情報処理方法、及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16758660 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112016001039 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16758660 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |