WO2016139964A1 - 注目領域抽出装置および注目領域抽出方法 - Google Patents

注目領域抽出装置および注目領域抽出方法 Download PDF

Info

Publication number
WO2016139964A1
WO2016139964A1 PCT/JP2016/050344 JP2016050344W WO2016139964A1 WO 2016139964 A1 WO2016139964 A1 WO 2016139964A1 JP 2016050344 W JP2016050344 W JP 2016050344W WO 2016139964 A1 WO2016139964 A1 WO 2016139964A1
Authority
WO
WIPO (PCT)
Prior art keywords
interest
image
region
partial
degree
Prior art date
Application number
PCT/JP2016/050344
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
翔 阮
安田 成留
艶萍 呂
湖川 盧
Original Assignee
オムロン株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オムロン株式会社 filed Critical オムロン株式会社
Priority to DE112016001039.7T priority Critical patent/DE112016001039T5/de
Publication of WO2016139964A1 publication Critical patent/WO2016139964A1/ja
Priority to US15/683,997 priority patent/US20170352162A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing

Definitions

  • the present invention relates to a technique for extracting a region of interest from an image.
  • an attention area an image area expected to be noticed by a human or an image area to be noticed
  • the attention area detection is also called salient area detection (SaliencyaliDetection), objectness detection (Objectness Detection), foreground detection (Foreground Detection), attention detection (Attention Detection), and the like.
  • a pattern of a region to be detected is learned based on a large number of image data about a learning target, and a region of interest is detected based on the learning result.
  • a feature type is learned and determined in advance based on a plurality of image data to be learned, and the determined feature type and target image data for which a saliency is calculated. Based on the above, it is described that features of each part in the target image data are extracted.
  • Non-Patent Document 1 models information transmitted to the brain when a region called a receptive field in a retinal ganglion cell in the retina of an eye is stimulated by light.
  • the receptive field is composed of a central area and a peripheral area.
  • a model that quantifies a place where a signal becomes strong (a place to draw attention) due to stimulation to the central area and the peripheral area is constructed. is doing.
  • the model-based algorithm can detect a region of interest without prior knowledge, but has a drawback that it is difficult to construct a model and the detection accuracy of the region of interest is not sufficient. Therefore, in any method, it is not possible to accurately extract the attention area without limiting the detection target.
  • any of the learning-based and model-based algorithms when a plurality of regions are detected from one image, it is possible to determine which region is more important and more interested by people. Can not. When a plurality of areas are detected, it is desirable to rank the degree of interest.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of accurately extracting a region of interest from an image and calculating the degree of interest.
  • an image similar to a partial region extracted from an input image is searched from an image database, and the interest level of the partial region is obtained using a search result.
  • the interest level of the partial region is obtained using a search result.
  • the attention area extracting apparatus includes an extracting unit that extracts one or a plurality of partial areas from an input image, and for each partial area extracted by the extracting unit, Search means for searching for a similar image from an image database storing a plurality of images, and interest level determination means for determining the interest level of each partial region based on a search result by the search means.
  • the partial area is an image area expected to be noticed by humans in the input image or a candidate of an image area to be noticed, that is, a candidate for an attention area.
  • the extraction of the partial area by the extracting means can be performed using any existing method.
  • the extraction means extracts the partial region by, for example, a region of interest extraction method using a learning-based or model-based algorithm.
  • the image database is a device that stores a plurality of image data in a searchable manner.
  • the image database may be constructed integrally with the attention area extraction device, or may be constructed separately from the attention area extraction device.
  • the image database can be constructed in a storage device included in the attention area extraction device.
  • the image database can be constructed in another device that can be accessed by the attention area extracting device via the communication network.
  • the creator / manager of the image database does not have to be the same as the creator / manager of the attention area extracting apparatus.
  • an image database managed by a third party and published on the Internet can be used.
  • the search unit searches the image database for an image similar to the partial area extracted by the extraction unit, and acquires a search result. Specifically, the search unit creates an inquiry (query) for obtaining an image similar to the partial area, transmits the query to the image database, and acquires a response to the query from the image database. Search for similar images in the image database can be performed using any existing technique. For example, a similar image can be searched using an algorithm that calculates a similarity based on a comparison between all areas of an image, a comparison between an entire image and a part, or a comparison between an image part and a part.
  • the interest level determination means determines the interest level for each partial region based on the search result by the search means.
  • the degree of interest is an index representing the level of interest that a human is expected to have in the partial area, or the level of interest that the human should have in the partial area.
  • a high degree of interest in a partial area means that a human should have a higher interest in the partial area or a higher interest in the partial area.
  • the degree of interest may be determined for all persons, may be determined for a group of persons (persons having specific attributes), or may be determined for specific individuals. Good.
  • the interest level determination means preferably determines the interest level of the partial area using statistical information of an image similar to the partial area searched by the search means (hereinafter also simply referred to as a similar image).
  • the statistical information is information obtained by performing statistical processing on the information obtained as a result of the search.
  • the number of images similar to the partial area is adopted as the statistical information, and the degree of interest can be determined higher as the number of similar images increases. This is because an object (target) having a larger number stored in the image database is considered to be more likely to be noticed.
  • the number of similar images may be considered to represent the probability (accuracy) that the region extracted by the extraction unit is the attention region. Therefore, since it can be said that the partial region with a small number of similar images is not detected as a region of interest in nature, the interest level determination means does not determine the interest level for a partial region with a number of similar images less than the threshold value. It is also preferable.
  • tag information associated with similar images can be employed as statistical information.
  • the tag information is information representing the content and characteristics of image data specified by a natural language and stored in association with image data in the image database.
  • the tag information may be embedded and stored in the image data, or may be stored as a file different from the image data.
  • Tag information may be added in any way. For example, tag information may be manually input by a human, or tag information may be automatically added by image processing by a computer.
  • the degree of interest determination means determine the degree of interest of the partial region higher as the convergence of the meaning of the tag information associated with the similar image is higher.
  • the convergence of the meaning of the tag information is preferably performed by natural language processing. For example, even if the wording of the tag information is different, it is preferable to determine that the meanings are close if they are the same concept or a similar concept.
  • an average value, mode value, median value, median value, variance, standard deviation, etc. of the similarity between the partial area and the image similar to the partial area can be adopted.
  • statistical information not only the similarity of similar images, but also the size (area or number of pixels) of the similar images, the position in the image, the color, and the like can be employed.
  • the size of the similar image the size of the entire similar image, the size of the region similar to the partial region (absolute size or relative size with respect to the entire image), or the like can be adopted.
  • the interest level determination means can determine the interest level based on the average value, mode value, median value, intermediate value, variance, standard deviation, and the like of these pieces of information.
  • Meta information includes attribute information about the image itself (size, color space, etc.), image shooting conditions (shooting date, shutter speed, aperture, ISO sensitivity, metering mode, presence / absence of flash, focal length, shooting position, etc.) Is included.
  • the interest level determination means may determine the interest level based on the meta information.
  • the interest level determination means can determine the interest level of the partial area based on the size or position of the partial area.
  • the size of the partial area may be an absolute size or a relative size with respect to the input image.
  • the degree of interest determination means may determine the degree of interest higher as the size of the partial area is larger, or may determine the degree of interest as higher as the size of the partial area is smaller.
  • the degree-of-interest determination means may determine the degree of interest higher as the partial area is closer to the center of the input image, or may determine the degree of interest higher as the partial area is closer to the periphery of the input image.
  • the interest level determination means preferably determines the interest level in consideration of the size or position of the partial area and also the type of the object included in the partial area.
  • the interest level determination means obtains a plurality of interest levels based on the plurality of information described above, and determines the final interest level by integrating the plurality of interest levels.
  • the method of integrating a plurality of interest levels is not particularly limited. For example, the product or weighted average of all the interest levels can be used as the final interest level.
  • the attention area extracting apparatus further includes a calculation criterion acquisition unit that receives an input of a calculation criterion for the interest level, and the interest level determination unit calculates the first according to a predetermined calculation criterion. It is also preferable to calculate the final interest level based on the interest level and the second interest level calculated according to the calculation criterion acquired by the calculation criterion acquisition unit.
  • the predetermined calculation standard is a calculation standard for interest level for general humans, that is, a general-purpose calculation standard.
  • the calculation criterion acquired by the calculation criterion acquisition unit is a calculation criterion according to the situation, for example, a calculation criterion according to a user who views the image, or a calculation criterion according to an application using the extracted attention area. It is preferable.
  • the attention area extracting apparatus further includes integration means for integrating a plurality of adjacent partial areas as one partial area among the partial areas included in the input image.
  • integration means for integrating a plurality of adjacent partial areas as one partial area among the partial areas included in the input image.
  • the term “partial regions are close to each other” includes the case where the partial regions are adjacent to each other and the case where the distance between them is within a predetermined distance (number of pixels). The predetermined distance may be determined according to the size of the partial area, the type of the object included in the partial area, and the like.
  • the attention area extracting apparatus preferably further includes output means for outputting the position of the partial area included in the input image and the degree of interest for each partial area.
  • the output of the position of the partial area is displayed, for example, by superimposing a frame line indicating the location of the partial area on the input image, or by displaying the partial area with a color or brightness different from those of other areas. You can do it.
  • the degree of interest may be output by displaying a numerical value of the degree of interest or by displaying a marker having a color or size corresponding to the degree of interest.
  • the output means does not display the interest level or display the partial area for the partial area whose interest level is less than the threshold value, but only the partial area whose interest level is equal to or higher than the threshold level. Output can also be performed.
  • the present invention can be understood as an attention area extracting device including at least a part of the above means.
  • the present invention can also be understood as an attention area extraction method or interest level calculation method.
  • it can also be grasped as a computer program for causing a computer to execute each step of these methods, or a computer-readable storage medium storing the program in a non-temporary manner.
  • FIG. 1A and FIG. 1B are respectively a diagram illustrating a hardware configuration and a functional block of the attention area extraction device according to the first embodiment.
  • FIG. 2 is a flowchart showing a flow of attention area extraction processing in the first embodiment.
  • 3A and 3B are diagrams illustrating an example of an input image and an example of a region of interest extracted from the input image, respectively.
  • FIG. 4 is a conceptual diagram illustrating interest level calculation of a region of interest.
  • FIGS. 5A and 5B are diagrams illustrating an example of a similar image search result and an example of interest level calculation based on the search result, respectively.
  • FIG. 6A and FIG. 6B are a flowchart showing a flow of interest level output processing and an example of interest level output, respectively.
  • FIG. 7 is a flowchart showing a flow of attention area extraction processing in the second embodiment.
  • FIG. 8 is a diagram illustrating functional blocks of the attention area extraction device according to the third embodiment.
  • FIG. 9 is a flowchart showing a flow of attention area extraction processing in the third embodiment.
  • FIG. 10 is a diagram illustrating functional blocks of the attention area extraction device according to the fourth embodiment.
  • FIG. 11 is a flowchart showing a flow of attention area extraction processing in the fourth embodiment.
  • FIG. 12A and FIG. 12B are diagrams showing the attention area integration processing before and after the attention area integration processing in the fourth embodiment, respectively.
  • the attention area extraction device is capable of accurately extracting a attention area from an input image and calculating the interest level of each attention area by performing a similar image search on an image database. It is. By searching the image database, it is possible to use information that cannot be obtained from only the input image, and it is possible to extract a region of interest and calculate the interest level with high accuracy.
  • FIG. 1A is a diagram illustrating a hardware configuration of the attention area extracting device 10 according to the present embodiment.
  • the attention area extraction device 10 includes an image input unit 11, a calculation device 12, a storage device 13, a communication device 14, an input device 15, and an output device 16.
  • the image input unit 11 is an interface that receives image data from the camera 20. In this embodiment, the image data is directly received from the camera 20, but the image data may be received via the communication device 14, or the image data may be received via a recording medium.
  • the arithmetic device 12 is a general-purpose processor such as a CPU (Central Processing Unit), and executes a program stored in the storage device 13 to realize functions to be described later.
  • CPU Central Processing Unit
  • the storage device 13 includes a main storage device and an auxiliary storage device, stores a program executed by the arithmetic device 12, and stores image data and temporary data during execution of the program.
  • the communication device 14 is a device for the attention area extraction device 10 to communicate with an external computer. The form of communication may be wired or wireless, and the communication standard may be arbitrary. In the present embodiment, the attention area extraction device 10 accesses the image database 30 via the communication device 14.
  • the input device 15 includes a keyboard and a mouse, and is a device for a user to input an instruction to the attention area extraction device.
  • the output device 16 includes a display device, a speaker, and the like, and is a device for the attention area extraction device to output to the user.
  • the image database 30 is a computer including an arithmetic device and a storage device, and stores a plurality of image data so as to be searchable.
  • the image database 30 may be composed of one computer or a plurality of computers.
  • Various attribute information is associated with the image data stored in the image database 30 in addition to the image data itself (color information for each pixel).
  • a data file of image data can include various attribute information according to the Exif format.
  • the image database 30 can store attribute information stored in a file different from the data file of the image data in association with the image data.
  • Attribute information includes, for example, image size, color space, image shooting conditions (shooting date, shutter speed, aperture, ISO sensitivity, metering mode, flash presence, focal length, shooting position, etc.), image content and features Information (tag information) described in natural language is included. These pieces of attribute information are meta information about image data.
  • the image database 30 is open to the public via a public network such as the Internet, and accepts registration and search of image data.
  • Who is registered in the image database 30 and the number of registered images are not particularly limited.
  • an image of an object that should be noted by the user of the attention area extraction device 10 may be registered.
  • the registered image is an image suitable for the attention area extraction process, and therefore the number of registered images may not be so large.
  • a third party general user or a search service provider may register the image.
  • the registered image may not be an image suitable for the attention area extraction process. Therefore, in such a case, it is preferable that many images are registered in the image database 30.
  • the arithmetic device 12 implements a function as shown in FIG. 1B by executing a program. That is, the arithmetic device 12 provides the functions of the region extraction unit 110, the similar image search unit 120, the interest level calculation unit 130, and the output unit 140. The processing content of each part will be described below.
  • FIG. 2 is a flowchart showing the flow of attention area extraction processing executed by the attention area extraction device 10.
  • the attention area extraction device 10 acquires an image (input image).
  • the input image may be acquired from the camera 20 via the image input unit 11, acquired from another computer via the communication device 14, or acquired from a storage medium via the storage device 13. Also good.
  • FIG. 3A is a diagram illustrating an example of the input image 400.
  • the region extraction unit 110 extracts a region of interest (partial region) from the input image.
  • the attention area extraction algorithm used by the area extraction unit 110 is not particularly limited, and any existing algorithm including a learning base algorithm and a model base algorithm can be employed. Further, the algorithm to be employed need not be limited to one, and the attention area may be extracted according to a plurality of algorithms. Note that it is preferable to use a model-based extraction algorithm because a learning-based extraction algorithm can extract only learned objects.
  • FIG. 3B is a diagram illustrating an example of a region of interest extracted from the input image 400.
  • four attention areas 401 to 404 are extracted from the input image 400.
  • Area 401 is a vehicle
  • area 402 is a person
  • area 403 is a road sign.
  • the region 404 is not originally a region of interest, but is a region erroneously detected as a region of interest by the region extraction unit 110.
  • the similar image search unit 120 performs similar image search processing for each of the attention areas extracted in step S ⁇ b> 20, and the interest level of the attention area based on the search result. Is calculated (loop L1). More specifically, in step S ⁇ b> 30, the similar image search unit 120 issues a query for searching for an image similar to each region of interest to the image database 30 and acquires a search result from the image database 30. When the image database 30 receives the search query, the image database 30 searches the database for an image similar to the search image (image of the attention area) included in the search query, and transmits the search result. As a similar image search algorithm in the image database 30, any known algorithm can be adopted.
  • the image database 30 transmits the similar image obtained by the search and its attribute information to the attention area extracting apparatus 10 as a search result.
  • the interest level calculation unit 130 of the attention area extraction device 10 calculates the interest degree of the attention area based on the search result obtained from the image database 30.
  • the interest level calculation unit 130 calculates a plurality of individual interest levels (R1 to R4) based on the search result, and integrates the plurality of interest level scores to obtain a final interest level (total interest level). R is calculated.
  • the individual interest level is an interest level evaluated from different viewpoints.
  • the interest level (R1) based on the number of similar images matching the search
  • the interest level (R2) based on the average similarity of similar images
  • the degree of interest (R3) based on the relative size of the similar region in the image and the degree of interest (R4) based on the convergence of the meaning of the tag information are included.
  • the total interest level R is determined based on the individual interest levels R1 to R4, for example, it may be obtained as an average (including a weighted average) or a maximum or minimum value of the individual interest levels R1 to R4. Good.
  • the individual interest level shown here is an example, and a value determined according to a criterion other than the above based on the search condition may be used.
  • the degree of interest does not necessarily need to be calculated only from the search result, and may be calculated in consideration of the extraction region itself or the input image, for example.
  • FIG. 5A is a diagram illustrating an example of a search result in step S30.
  • search image an image having a similarity greater than or equal to a predetermined threshold
  • the image number 501 an image having a similarity greater than or equal to a predetermined threshold
  • the similarity 502 the overall size of the similar image 503, and the attention among the similar images
  • the size 504 of the area similar to the area and the tag information 505 stored in association with the similar image are shown, information other than these may be included in the search result.
  • FIG. 5B is a diagram illustrating an example of interest level calculation performed by the interest level calculation unit 130.
  • the degree of interest R1 based on the number of similar images that match the search is calculated higher as the number of search hits increases. As a result, the degree of interest is calculated higher for objects that are registered in the image database 30 more frequently.
  • the number of search hits used for calculating the interest level R1 may be the total number of similar images sent from the image database 30, or the number of search results whose similarity 502 is greater than or equal to a predetermined threshold. It may be.
  • the degree of interest R2 based on the average similarity of similar images is calculated higher as the average similarity of similarities 502 of similar images included in the search result is higher. Even if the number of search hits is large, if the similarity of the similar images is low, the object is not necessarily an object of high interest. Therefore, the accuracy of interest calculation can be improved by considering the average similarity.
  • the average similarity is used for calculating the interest level R2 here, it may be based on other statistics such as the mode, median, intermediate value, variance, standard deviation, and the like.
  • the degree of interest (R3) based on the relative size of the similar region in the similar image is calculated as the average value of the ratio of the size 504 of the similar region to the overall size 503 of the similar image included in the search result increases. As a result, the degree of interest is calculated higher for an object that is larger in the image. Note that the degree of interest R3 may be obtained by another criterion based on these values, in addition to the ratio of the overall size 503 of similar images and the size 504 of similar regions.
  • the degree of interest R4 based on the convergence of the meaning of the tag information is calculated higher as the convergence of the meaning of the tag information included in the search result is higher. As a result, the degree of interest is calculated higher for objects to which many people have tag information having the same meaning.
  • the convergence of meaning is preferably determined by natural language processing, and it is preferable that the convergence of meaning is determined to be high if the concept is the same or a similar concept even if the wording of the tag information is different.
  • the interest level calculation unit 130 can, for example, divide the meaning of the tag information included in the search result into several categories, and obtain the ratio of the number of elements in the maximum category as the interest level R4. In the example of tag information shown in FIG.
  • “automobile” and “car” have the same concept and can be classified into the same category. Since “sports car” is a subordinate concept of “car” and “car”, it can be classified into the same category as “car” and “car”. On the other hand, “park” is a different concept from “automobile” or the like, and is therefore classified into a different category. Since “motor show” is a concept related to “automobile” or the like, it may be classified into the same category as “automobile” or the like, or may be classified into a different category. Here, if “motor show” is also classified into the same category as “automobile” and the search results are the five items shown in FIG.
  • the tag information is a word is shown, but the tag information may be expressed as a sentence, and in that case, the meaning can be estimated by natural language processing.
  • the interest level calculation unit 130 calculates the total interest level R as described above based on the individual interest levels R1 to R4.
  • the individual interest levels R1 to R4 are calculated as large values for a region that is assumed to be noted by a general human. That is, the individual interest levels R1 to R4 are general-purpose interest levels for all human beings, and the total interest level R calculated based on these is also a general-purpose interest level.
  • step S50 the output unit 140 outputs the position of the attention area in the input image and the interest level for each attention area.
  • the output unit 140 does not output all the attention areas extracted in step S20, but outputs attention areas whose interest level is greater than or equal to a predetermined threshold Th R among these attention areas.
  • FIG. 6A is a flowchart for explaining the output process in step S50 in more detail.
  • the output unit 140 repeats the following process (loop L2) for all the attention areas extracted in step S20. First, the output unit 140 determines whether or not the degree of interest calculated for the attention area is greater than or equal to the threshold Th R (S51).
  • the position of the region of interest and the interest level are output (S52). If the interest level is less than the threshold Th R (S51-NO), The position of the region of interest and its interest level are not output.
  • FIG. 6B is a diagram illustrating an example of the output of the position of the region of interest and the degree of interest in the present embodiment.
  • the attention regions 401 to 403 have the interest level equal to or higher than the threshold Th R among the attention regions 401 to 404 shown in FIG. Therefore, the positions of the attention areas 401 to 403 are displayed by a frame display surrounding the areas. Also, beside the attention areas 401 to 403, the interest levels of these attention areas are displayed as numerical values on the interest level display sections 601 to 603. The attention area 404 is not displayed because the degree of interest is less than the threshold Th R. Note that this example is merely an example of display.
  • the position of the attention area can be specified by changing the luminance or color in the display of the attention area and the area other than the attention area.
  • the degree of interest does not need to be displayed numerically.
  • the degree of interest can be indicated by changing the color or shape of the symbol, and the thickness or color of the frame indicating the region of interest can be changed. You can also indicate the level of interest.
  • ⁇ Effect of this embodiment> by extracting a region of interest from an input image using information related to an image included in the image database, it is possible to perform extraction with higher accuracy than extracting a region of interest from only the input image.
  • various target objects can be extracted as attention areas without being limited to the target areas that can be extracted.
  • the extraction accuracy can be improved by using the search result of the image database.
  • FIG. 7 is a flowchart showing a flow of attention area extraction processing in the present embodiment.
  • the difference is that a process of comparing the number of similar images retrieved with a threshold Th N is added after the similar image retrieval step S30. If the number of similar images retrieved is greater than or equal to the threshold Th N (S35-YES), the interest level calculation unit 130 calculates the interest level for the attention area as in the first embodiment (S40). If the number of images is less than the threshold Th N (S35-NO), the interest level is not calculated for this attention area.
  • the degree of interest is not calculated for the region where the number of similar images hit by the search is small. It can be said that the fact that the number of similar images is small is an area that does not need to be noticed so much, and the above determination process can also be regarded as a process for determining whether the extraction accuracy of the attention area extraction process in step S20 is equal to or higher than a threshold value. .
  • the extraction accuracy does not necessarily have to be evaluated by the number of similar image search hits, but may be evaluated by other criteria.
  • the extraction accuracy and interest level of the region extracted by the conventional attention region extraction process (S20) are calculated based on different criteria using the similar image search results.
  • the degree of interest is calculated as a general-purpose measure for all human beings.
  • the attention area extracting apparatus 310 receives the interest level calculation criterion determined based on the prior knowledge, and also obtains the interest level specialized for the user.
  • FIG. 8 is a diagram showing functional blocks realized by the arithmetic device 12 of the attention area extracting device 310 according to the present embodiment executing a program.
  • the functional blocks of the attention area extraction device 310 are basically the same as those in the first embodiment (FIG. 1B), but the interest level calculation unit 130 includes a general interest level calculation unit 131, an interest level calculation reference acquisition unit. 132, a specific interest level calculation unit 133, and an interest level integration unit 134.
  • FIG. 9 is a flowchart showing the flow of attention area extraction processing executed by the attention area extraction device 310 according to this embodiment.
  • the same processes as those in the first embodiment (FIG. 2) are denoted by the same reference numerals, and detailed description thereof is omitted.
  • the interest level calculation reference acquisition unit 132 acquires a reference for calculating the interest level (specific interest level) for a specific user or application.
  • the calculation criterion of the specific interest level varies depending on the user or application that uses the processing result of the attention area extraction device 310. For example, if there is prior knowledge that a certain user is particularly interested in a specific object, the degree of interest of the object should be greatly calculated for this user. Also, if the application prompts the user to pay attention to objects that are easily overlooked, calculate the interest level of objects that are difficult to see due to their size being small or similar to the surrounding colors in the input image. Should.
  • the interest level calculation standard acquisition unit 132 may receive the calculation standard itself from the outside, or acquires information specifying the user or application, and acquires the interest level calculation standard corresponding to the user or application by itself. May be. In the latter case, the interest level calculation reference acquisition unit 132 stores the interest level calculation reference for each user or application, or inquires and acquires an external device. In FIG. 9, the interest level calculation reference is acquired after step S20. However, the interest level calculation reference may be acquired before the input image acquisition process S10 or the attention area extraction process S20.
  • the point that the interest level calculation unit 130 calculates the interest level for each of the attention areas extracted from the input image in the loop L1 is the same as in the first embodiment. In the present embodiment, a specific calculation method is different from that of the first embodiment, and will be described below.
  • step S30 the similar image search unit 120 searches the image database 30 for an image similar to the attention area, and acquires the search result.
  • step S41 the general interest level calculation unit 131 calculates a general interest level using the search result and a predetermined calculation criterion. This process is the same as the interest level calculation process (S40) in the first embodiment.
  • step S ⁇ b> 42 the specific interest level calculation unit 133 uses the search results obtained by the similar image search unit 120 and the calculation criteria acquired by the interest level calculation reference acquisition unit 132 to specify the interest level for the specific user or application (specific Interest level).
  • This process is the same as the process by the general interest level calculation unit 131 except that the calculation criteria are different.
  • the specific interest level calculation unit 133 may calculate a plurality of individual interest levels according to different criteria, and may calculate the specific interest level by integrating the plurality of individual interest levels.
  • the interest level integration unit 134 integrates the general interest level calculated by the general interest level calculation unit 131 and the specific interest level calculated by the specific interest level calculation unit 133 to obtain a final interest level. Is calculated.
  • the method of integration may be arbitrary. For example, an average (simple average or weighted average) of general interest and specific interest may be used as the final interest. The weight in the weighted average may be fixed or may be changed according to the user or application.
  • the interest level integration unit 134 determines the weighted average of the individual interest levels obtained when calculating the general interest level and the specific interest level as the final interest level. The degree of interest may be determined.
  • the output process (S50) after the interest level for each region of interest is calculated is the same as in the first embodiment.
  • the degree of interest can be calculated higher for an object that the user is interested in using the tendency of interest of the user. Further, when it is difficult for the user to visually recognize specific colors, the degree of interest of an object having these colors can be calculated high. If the application is for detecting an object that is difficult to visually recognize, the degree of interest can be calculated higher for an object having a smaller size of the region of interest in the input image. In addition, when applying to moving images, the degree of interest of an object that suddenly appeared (an object that did not exist in the previous frame) is calculated to be high, or conversely, It is possible to calculate a high degree of interest.
  • the general interest level and the interest level specialized for a specific application are calculated, and these are integrated to obtain the final interest level. Can be calculated.
  • the general interest level calculation unit 130 the general interest level calculation unit 131 and the interest level integration unit 134 can be omitted.
  • the region of interest output processing is different compared to the first to third embodiments.
  • the attention areas adjacent to each other in the input image are integrated and output as one attention area.
  • FIG. 10 is a diagram showing functional blocks realized by the arithmetic device 12 of the attention area extracting device 410 according to the present embodiment executing a program.
  • the attention area extraction device 410 includes an area integration unit 150 in addition to the functions of the first embodiment.
  • FIG. 11 is a flowchart showing the flow of attention area extraction processing executed by the attention area extraction device 410 according to this embodiment.
  • the region integration unit 150 integrates a plurality of attention regions based on the positional relationship of the attention regions. For example, if the distance between the attention areas is equal to or smaller than a predetermined threshold Th D , the area integration unit 150 integrates these attention areas.
  • the distance between the attention areas may be defined as the distance between the centers (number of pixels), or may be defined as the distance between the closest border portions.
  • the threshold value Th D may be a fixed value, or may vary according to the size of the attention area and the object type in the attention area.
  • FIG. 12A is a diagram showing attention areas 1201 to 1203 extracted from the input image 1200 in step S20. While the attention area 1201 is far from other attention areas, the attention area 1202 and the attention area 1203 are short. Therefore, the region integration unit 150 integrates the attention region 1202 and the attention region 1203.
  • FIG. 12B shows an image 1200 after the integration process. As shown in the figure, the attention area 1202 and the attention area 1203 are integrated into one attention area 1204.
  • the integrated attention area 1204 is a minimum rectangle including the attention area 1202 and the attention area 1203, but the integrated attention area 1204 may be generated by a method different from this.
  • attention regions with a low interest level need not be integrated, and only when the interest level of the attention region satisfies a predetermined relationship (for example, the average interest level is equal to or greater than a threshold).
  • the areas may be integrated. That is, the region integration unit 150 may determine whether to integrate based on the interest level of the attention region in addition to the distance between the attention regions. Further, the region integration unit 150 may integrate three or more attention regions into one region.
  • the area integration unit 150 When integrating a plurality of attention areas, the area integration unit 150 also determines the degree of interest in the attention area after integration.
  • the interest level of the attention area after the integration is preferably, for example, an average value or a maximum value of the interest degree regarding the integrated attention area, but may be determined by other methods.
  • the interest level output process of the attention area in step S50 is the same as the process in the first embodiment except that the attention area output process is performed on the attention area after integration.
  • the number of attention areas to be output can be suppressed by integrating a plurality of attention areas that are close to each other. Further, in determining whether or not to integrate the regions, the regions can be more appropriately integrated by adopting the interest level using the search result of the image database.
  • the image database is configured as a device different from the attention area extraction device.
  • the image database may be configured integrally with the attention region extraction device.
  • the image data included in the image database may be registered by the manufacturer of the attention area extracting device, or may be registered by the user.
  • the attention area extraction apparatus may use a plurality of image databases including an image database inside the apparatus and an image database outside the apparatus.
  • the method for calculating the degree of interest described above is an example, and in the present invention, if the degree of interest is calculated using a search result obtained by searching for an image similar to the region of interest, the method for calculating the degree of interest is not particularly limited.
  • the degree of interest is preferably calculated using statistical information of the search results.
  • the search result statistical information includes the number of search hits, the similarity statistic, the size of the similar image size, the position of the region similar to the search image in the similar image, the convergence of the meaning indicated by the tag information, etc. included.
  • the degree of interest can be calculated based on the statistic of meta information.
  • the statistic is an amount obtained by performing statistical processing on a plurality of data, and typically includes an average value, a mode value, a median value, an intermediate value, a variance, a standard deviation, and the like. included.
  • the interest level of the attention area can be calculated using information other than the result of the similar image search. For example, it can be calculated based on the size and color of the attention area itself, the position of the attention area in the input image, and the like.
  • the input image is assumed to be a still image, but the input image may be a moving image (a plurality of still images).
  • the region extraction unit 110 may extract the attention region using an existing algorithm that extracts the attention region from the moving image.
  • the interest level calculation unit 130 can also calculate the interest level in consideration of the temporal change in the position of the attention area. For example, the moving speed and moving direction of the attention area can be considered. The degree of interest may be calculated higher or lower as the moving speed of the region of interest increases.
  • the interest level may be calculated based on the moving direction itself, or the interest level may be calculated based on variations in the moving direction.
  • the attention area extraction apparatus can be implemented as an arbitrary information processing apparatus (computer) such as a desktop computer, a notebook computer, a slate computer, a smartphone, a mobile phone, a digital camera, or a digital video camera.
  • a desktop computer such as a desktop computer, a notebook computer, a slate computer, a smartphone, a mobile phone, a digital camera, or a digital video camera.
  • Region of interest extraction device 20 Camera
  • Image database 110 Region extraction unit
  • 120 Similar image search unit
  • Interest level calculation unit 140 Output unit
  • Region integration unit 400 Input Images 401, 402, 403, 404: attention area 601, 602, 603: interest display section 1200: input image 1201, 1202, 1203: attention area (before integration processing)
  • 1204 Region of interest (after integration processing)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Geometry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2016/050344 2015-03-05 2016-01-07 注目領域抽出装置および注目領域抽出方法 WO2016139964A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
DE112016001039.7T DE112016001039T5 (de) 2015-03-05 2016-01-07 Vorrichtung und Verfahren zur Extraktion eines interessierenden Bereichs
US15/683,997 US20170352162A1 (en) 2015-03-05 2017-08-23 Region-of-interest extraction device and region-of-interest extraction method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510098283.2A CN105989174B (zh) 2015-03-05 2015-03-05 关注区域提取装置以及关注区域提取方法
CN201510098283.2 2015-03-05

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/683,997 Continuation US20170352162A1 (en) 2015-03-05 2017-08-23 Region-of-interest extraction device and region-of-interest extraction method

Publications (1)

Publication Number Publication Date
WO2016139964A1 true WO2016139964A1 (ja) 2016-09-09

Family

ID=56849320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/050344 WO2016139964A1 (ja) 2015-03-05 2016-01-07 注目領域抽出装置および注目領域抽出方法

Country Status (4)

Country Link
US (1) US20170352162A1 (de)
CN (1) CN105989174B (de)
DE (1) DE112016001039T5 (de)
WO (1) WO2016139964A1 (de)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017203705A1 (ja) * 2016-05-27 2017-11-30 楽天株式会社 画像処理装置、画像処理方法及び画像処理プログラム
JP6948128B2 (ja) 2017-01-13 2021-10-13 キヤノン株式会社 映像監視装置及びその制御方法及びシステム
US10810773B2 (en) * 2017-06-14 2020-10-20 Dell Products, L.P. Headset display control based upon a user's pupil state
JP6907774B2 (ja) * 2017-07-14 2021-07-21 オムロン株式会社 物体検出装置、物体検出方法、およびプログラム
CN111666952B (zh) * 2020-05-22 2023-10-24 北京腾信软创科技股份有限公司 一种基于标签上下文的显著区域提取方法及系统
CN113656395B (zh) * 2021-10-15 2022-03-15 深圳市信润富联数字科技有限公司 数据质量治理方法、装置、设备及存储介质
US20230368108A1 (en) * 2022-05-11 2023-11-16 At&T Intellectual Property I, L.P. Method and system for assessment of environmental and/or social risks
CN114840700B (zh) * 2022-05-30 2023-01-13 来也科技(北京)有限公司 结合rpa和ai实现ia的图像检索方法、装置及电子设备
US11941043B2 (en) * 2022-07-25 2024-03-26 Dell Products L.P. System and method for managing use of images using landmarks or areas of interest
US12032612B2 (en) 2022-07-25 2024-07-09 Dell Products L.P. System and method for managing storage and use of biosystem on a chip data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010122931A (ja) * 2008-11-20 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> 類似領域検索方法、類似領域検索装置、類似領域検索プログラム
WO2013031096A1 (ja) * 2011-08-29 2013-03-07 パナソニック株式会社 画像処理装置、画像処理方法、プログラム、集積回路
JP2014063377A (ja) * 2012-09-21 2014-04-10 Nikon Systems Inc 画像処理装置およびプログラム

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5893095A (en) * 1996-03-29 1999-04-06 Virage, Inc. Similarity engine for content-based retrieval of images
US6175829B1 (en) * 1998-04-22 2001-01-16 Nec Usa, Inc. Method and apparatus for facilitating query reformulation
EP1293925A1 (de) * 2001-09-18 2003-03-19 Agfa-Gevaert Verfahren zur Bestimmung von RÖntgenbildern
US8467631B2 (en) * 2009-06-30 2013-06-18 Red Hat Israel, Ltd. Method and apparatus for identification of image uniqueness
WO2011140786A1 (zh) * 2010-10-29 2011-11-17 华为技术有限公司 一种视频兴趣物体提取与关联的方法及系统
EP2810218A4 (de) * 2012-02-03 2016-10-26 See Out Pty Ltd Benachrichtigungs- und privatsphärenverwaltung von onlinefotos und -videos
CN104217225B (zh) * 2014-09-02 2018-04-24 中国科学院自动化研究所 一种视觉目标检测与标注方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010122931A (ja) * 2008-11-20 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> 類似領域検索方法、類似領域検索装置、類似領域検索プログラム
WO2013031096A1 (ja) * 2011-08-29 2013-03-07 パナソニック株式会社 画像処理装置、画像処理方法、プログラム、集積回路
JP2014063377A (ja) * 2012-09-21 2014-04-10 Nikon Systems Inc 画像処理装置およびプログラム

Also Published As

Publication number Publication date
CN105989174A (zh) 2016-10-05
CN105989174B (zh) 2019-11-01
US20170352162A1 (en) 2017-12-07
DE112016001039T5 (de) 2018-01-04

Similar Documents

Publication Publication Date Title
WO2016139964A1 (ja) 注目領域抽出装置および注目領域抽出方法
US11657084B2 (en) Correlating image annotations with foreground features
CN107251045B (zh) 物体识别装置、物体识别方法及计算机可读存储介质
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
CN107784282B (zh) 对象属性的识别方法、装置及系统
US10872424B2 (en) Object tracking using object attributes
US9323785B2 (en) Method and system for mobile visual search using metadata and segmentation
JP5763965B2 (ja) 情報処理装置、情報処理方法、及びプログラム
CN112348117B (zh) 场景识别方法、装置、计算机设备和存储介质
US20240330372A1 (en) Visual Recognition Using User Tap Locations
WO2017045443A1 (zh) 一种图像检索方法及系统
JP5963609B2 (ja) 画像処理装置、画像処理方法
US20160026854A1 (en) Method and apparatus of identifying user using face recognition
US20130243249A1 (en) Electronic device and method for recognizing image and searching for concerning information
US20190114780A1 (en) Systems and methods for detection of significant and attractive components in digital images
WO2018121287A1 (zh) 目标再识别方法和装置
US20160148070A1 (en) Image processing apparatus, image processing method, and recording medium
WO2020052513A1 (zh) 图像识别和行人再识别方法及装置,电子和存储设备
CN107644105A (zh) 一种搜题方法及装置
US11921774B2 (en) Method for selecting image of interest to construct retrieval database and image control system performing the same
CN107203638B (zh) 监控视频处理方法、装置及系统
JP7351344B2 (ja) 学習装置、学習方法、推論装置、推論方法、及び、プログラム
Mu et al. Finding autofocus region in low contrast surveillance images using CNN-based saliency algorithm
US20230131717A1 (en) Search processing device, search processing method, and computer program product
JP2015165433A (ja) 情報処理装置、情報処理方法、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16758660

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 112016001039

Country of ref document: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16758660

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP