WO2013175608A1

WO2013175608A1 - Image analysis device, image analysis system, and image analysis method

Info

Publication number: WO2013175608A1
Application number: PCT/JP2012/063322
Authority: WO
Inventors: 渡邉裕樹; 廣池敦
Original assignee: 株式会社日立製作所
Priority date: 2012-05-24
Filing date: 2012-05-24
Publication date: 2013-11-28
Also published as: SG11201407749TA; JPWO2013175608A1; CN104321802A; US20150286896A1; CN104321802B; US9665798B2; JP5857124B2

Abstract

The purpose of the present invention is to provide an image analysis technique enabling a detection subject to be rapidly detected from image data. This image analysis device generates metadata for a query image containing the detection subject, and using the metadata, narrows down the image data serving as the search subject beforehand and then conducts object detection.

Description

Image analysis apparatus, image analysis system, and image analysis method

The present invention relates to a technique for detecting a specific object included in image data.

With the development of IT infrastructure for individuals / enterprises, a large amount of multimedia data (documents, video / images, audio, various log data, etc.) has been accumulated in large-scale storage. In order to efficiently extract information from a large amount of accumulated data, various information retrieval techniques have been devised and put into practical use for individual media data.

As an example of information retrieval for multimedia data, it is conceivable to detect an object or a specific area included in an image. Object detection and area specification in an image correspond to morphological analysis (means for determining a part of speech by dividing a document into words) in document analysis, and are important elemental techniques when analyzing the meaning of an image.

As a method for detecting an object in an image, the method of Non-Patent Document 1 is widely known, and has been commercialized as a face area detection function in a digital camera or a monitoring system. In the method of Non-Patent Document 1, a large number of samples of detection target images are collected, and a plurality of discriminators based on the luminance value of the image are generated by machine learning. This discriminator is connected to create a discriminator for the partial area of the image, and the object area is specified by searching for the partial area in the image.

The most common object to be detected is a human face at present. However, when targeting a wide range of contents stored in the storage, it is desired to detect various objects such as vehicles, animals, buildings, figures, and various articles. Further, in order to process large-scale data, it is necessary to improve analysis processing efficiency.

Regarding the improvement of analysis processing efficiency, Patent Document 1 below discloses a technique for limiting an area in which image processing for detecting an object area is performed using an object existence probability. The method of Patent Document 1 determines a region for performing image processing using static information of an imaging system such as a focal length and a resolution, and a shooting environment and a shooting device are limited like an in-vehicle camera. Therefore, it is considered effective in an environment where structured data is managed.

JP 2010-003254 A

The technique described in Patent Document 1 is based on the assumption that the shooting environment is specified to some extent and the data to be subjected to image processing is structured. In general, however, the shooting environment and the position of the subject cannot always be predicted in advance. Further, in an environment where data to be subjected to image processing occurs ad hoc, the data is not structured. In such an environment, it is considered that the technique described in Patent Document 1 is not effective for reducing the time for detecting an object.

The technique described in Non-Patent Document 1 is effective when the detection target is determined in advance, for example, as in face detection, but in applications where the user sequentially specifies the detection target, a sample is used. Collection and machine learning each time, it is not realistic from the viewpoint of processing time.

The present invention has been made in view of the above-described problems, and an object of the present invention is to provide an image analysis technique that can detect a detection target from image data at high speed.

The image analysis apparatus according to the present invention generates metadata of a query image including a detection target, and performs object detection after narrowing down image data to be searched using this metadata in advance.

The image analysis apparatus according to the present invention can extract an image including an arbitrary object from a large amount of image data at high speed.

Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

1 is a configuration diagram of an image analysis system 100 according to Embodiment 1. FIG. It is a figure which shows the structure and data example of the image database 105. FIG. It is a figure which shows the data flow explaining the procedure which produces | generates the metadata of the query image designated by the user, and narrows down an object detection target with the metadata. It is a flowchart explaining the process which the image analysis system 100 specifies the object area | region in an image. It is a figure explaining the procedure in which the metadata production | generation part 108 produces | generates the metadata of a query image. 5 is a flowchart for explaining a processing procedure for generating metadata of a query image 301 by a metadata generation unit 108. It is a figure explaining the detection method of the object area | region in step S407 of FIG. It is a flowchart explaining the process in which the object area | region detection part 110 detects an object. It is a figure explaining the processing sequence between each function part in the process which the image analysis system 100 specifies the object area | region in an image. It is a figure which shows the structural example of the operation screen used in order to acquire the image containing the designated object from the image database. It is a figure explaining the example which expands bibliographic information. It is a flowchart explaining the procedure of the process which expands bibliographic information. It is a Venn diagram showing the analysis object for demonstrating a mode that detection omission is reduced by the expansion process of bibliographic information. It is a chart showing the relationship between the image analysis processing time and coverage. It is a figure explaining the method of raising the precision of object detection by expanding the template used when searching the image similar to a query image. It is the schematic of the content cloud system 1600 which concerns on Embodiment 4. FIG.

<Embodiment 1: System configuration>
FIG. 1 is a configuration diagram of an image analysis system 100 according to the first embodiment of the present invention. The image analysis system 100 is a system for searching for an image including an arbitrary object designated by a user from a large number of images. The image analysis system 100 includes an image / document storage device 101, an input device 102, a display device 103, a data storage device 104, an image database 105, and an image analysis device 106.

The image / document storage device 101 is a storage medium for storing image data, and uses a storage system connected via a network such as an external hard disk drive, NAS (Network Attached Storage) or SAN (Storage Area Network). Can be configured. The scale of image data to be analyzed by the image analysis system 100 is assumed to be a large scale of, for example, several hundred thousand.

The input device 102 is an input interface for transmitting user operations to the image analysis device 106 such as a mouse, a keyboard, and a touch device. The display device 103 is an output interface such as a liquid crystal display, and is used for displaying an image analysis result of the image analysis device 106, interactive operation with a user, and the like. The data storage device 104 is a storage that records the analysis result of the image analysis device 106, and is used to use the analysis result in a higher-level application.

The image database 105 is a database management system for accumulating images. The image database 105 not only temporarily stores data to be analyzed, but is also used for analysis processing itself as a dictionary for generating metadata. Details will be described later with reference to FIG.

The image analysis device 106 is a device that detects an object included in a query image designated by a user from image data stored in the image database 105. The image analysis apparatus 106 includes an image / document input unit 107, a metadata generation unit 108, an analysis target determination unit 109, an object region detection unit 110, an operation information input unit 111, and a data output unit 112.

The image / document input unit 107 reads out image data stored in the image database 105 and bibliographic information related thereto from the image / document storage device 101, and stores them in the image database 105 in association with each other. In addition, a query image including an object that is a detection target is read from the image recording apparatus 101 and passed to the metadata generation unit 108 and the object region detection unit 110.

The metadata generation unit 108 automatically generates the metadata of the query image by image recognition processing using the image database 105 as a dictionary. Here, the metadata is data having a higher abstraction level than the image data, for example, information such as a word describing the image, a creation time, and a creation location. In the following, for the sake of simplification, description will be made assuming that “metadata = word”, but the metadata generation unit 108 can generate various metadata. Confidence is given to the metadata. The generated metadata is sent to the analysis target determination unit 109. The procedure for generating metadata will be described later with reference to FIG.

The analysis target determination unit 109 searches the bibliographic information stored in the image database 105 using the metadata generated by the metadata generation unit 108 as a search key, and displays a list of image data having bibliographic information that matches the search key. get. Metadata used as a search key may be automatically selected according to its reliability, or may be selected by a user from metadata candidates. When the user selects metadata to be used as a search key, an interactive operation is performed between the user and the image analysis device 106. Therefore, a list of metadata candidates, the number of search results, and the like are displayed via the data output unit 112. Present to the user. Also, the operation information input unit 111 receives search parameters such as designation of metadata used as a search key and a threshold value. The image list obtained as a result is sent to the object region detection unit 110 as a candidate for analysis.

The object area detection unit 110 identifies the coordinates of the area where the object specified in the image is captured by image analysis processing. The object to be detected is not fixed and can be specified by the user each time. In addition, a plurality of conceptual objects (for example, human faces, cars, cats, star marks, etc.) can be detected simultaneously. The result of the analysis is the coordinates of the rectangular area of the object (for example, [horizontal coordinates of the upper left corner of the rectangle, vertical coordinates of the upper left corner of the rectangle, horizontal coordinates of the lower right corner of the rectangle, vertical coordinates of the lower right corner of the rectangle)] The data is sent to the data output unit 112 as the degree of reliability representing “likeness”. At this time, the metadata generated by the metadata generation unit 108 can be output in association with the semantic information of the detected object.

The operation information input unit 111 receives a user operation from the input device 102 and transmits the signal to the image analysis device 106. The data output device 112 receives an image list to be subjected to image analysis, an image analysis result, and the like, and outputs them to the display device 103 and the data storage device 104.

FIG. 2 is a diagram illustrating the configuration of the image database 105 and data examples. Here, a configuration example of the table format is shown, but the data format may be arbitrary. The image database 105 is a database that stores image feature amounts and bibliographic information in association with each other, and includes an image ID field 1051, an image data field 1052, an image feature amount field 1053, and a bibliographic information field 1054.

The image ID field 1051 holds the identification number of each image data. The image data field 1052 is a field for holding image data in a binary format, and is used when the user confirms the analysis result. The image feature quantity field 1053 holds fixed-length numeric vector data in which features such as color and shape of the image itself are digitized. The bibliographic information field 1054 holds bibliographic information (sentence, category classification, date / time, location, etc.) associated with the image. The bibliographic information field may be divided into a plurality of fields as necessary.

<Embodiment 1: Operation of each part>
The overall configuration of the image analysis system 100 has been described above. In the following, the operation principle of the image analysis system 100 is outlined, and the detailed operation of each functional unit is described.

The image analysis system 100 searches the database 105 for image data including an object included in the query image specified by the user using an image recognition process. Simply, the object detection process may be performed on all the images in the image database 105. However, since the processing speed of object detection is usually slow, it is not practical to perform object detection processing on a large number of image sets for all cases.

For example, if it is assumed that an image recognition process of 0.5 seconds per sheet is required, it takes about 140 hours to analyze 1 million images. If the object to be detected is limited to “a person's front face”, the analysis process is performed only once at the time of database construction, and the result can be used to shorten the subsequent processing time, but the detection object is fixed. However, in the case of detecting an arbitrary specified object rather than the user, it is necessary to perform an analysis process after the user specifies a detection target, so that the response time becomes a problem.

Therefore, the image analysis system 100 automatically generates the metadata of the detection target object, and uses the metadata to narrow down the image data to be subjected to the object detection process in advance, thereby reducing the processing time.

FIG. 3 is a diagram illustrating a data flow for explaining a procedure in which the image analysis apparatus 106 generates metadata of a query image specified by the user and narrows down an object detection target based on the metadata. Assume that image data and bibliographic information are already registered in the image database 105 as shown in FIG.

The query image 301 is a query image that the user inputs via the image / document input device 107. Here, it is assumed that there is only one object (star shape) in the query image 301.

The metadata generation unit 108 generates the metadata 302 of the query image 301 (S301). The metadata 302 is output as a list with a score (= metadata reliability). Details of generation of the metadata 302 will be described later with reference to FIG.

The analysis target determination unit 109 searches the bibliographic information that matches the search key from the image database 105 using the metadata generated by the metadata generation unit 108 as a search key, and acquires a set 303 of image data that matches the condition. (S302). Here, an example of performing an OR search for the three words “star”, “pentagon”, and “celestial object” has been shown, but an AND search can be combined as necessary. An image having metadata similar to that of the detection target image is likely to contain the object.

The object area detection unit 110 specifies in which area of each image in the image set 303 an object similar to the object included in the query image 301 exists (S303). In this process, the processing time increases according to the number of images included in the image set 303. Details of the object detection will be described later with reference to FIG.

For each image in the image set 303, the detection result 304 is, for example, the number of detected objects, the position of the object (dotted rectangle in the detection result 304), and the reliability of “object likelihood” (in percentage display in the detection result 304). ). The metadata generated in step S301 may be linked and output as the semantic information of each detected object. The data output unit 112 displays the detection result 304 on the screen of the display device 103 or outputs the data to the data storage device 104.

As illustrated in FIG. 3, the image analysis system 100 performs object detection after narrowing down an image with a high possibility of the presence of a detection target based on the metadata of the query image 301, thereby reducing processing time. be able to.

On the other hand, the “look” of the image and the bibliographic information do not necessarily match semantically. For example, the bibliographic information matches the search key as in the image 305, but the “look” of the object is different, or even if an object similar to “look” is included in the image as in the image 306, The bibliographic information may not include words that match the conditions. The former causes useless image analysis processing, which causes an increase in processing time, and the latter causes detection failure. A method for reducing detection omission is described in Embodiment 2.

FIG. 4 is a flowchart for explaining processing in which the image analysis system 100 specifies an object region in an image. Hereinafter, each step of FIG. 4 will be described.

(FIG. 4: Step S401)
The image / document input unit 107 registers the received image data and bibliographic information in the image database 105. The image database 105 extracts image feature amounts from image data and registers them in association with bibliographic information. The process of extracting the image feature amount may be configured to be performed by the image / document input unit 107. This step may be performed in advance before performing step S402 and subsequent steps, and need not be performed every time this flowchart is performed.

(FIG. 4: Steps S402 to S403)
The image / document input unit 107 acquires a query image including the detection target (S402). The metadata generation unit 108 generates metadata of the query image (S403). Details will be described later with reference to FIG.

(FIG. 4: Step S404)
The analysis target determination unit 109 determines metadata to be used for narrowing down image data to be detected by the object from the metadata generated by the metadata generation unit 108 in step S403. Specifically, the determination may be made mechanically according to the reliability of the metadata (for example, the predetermined range is automatically selected in order from the highest reliability), and the metadata is sent via the data output unit 111. It may be presented to the user and the user himself / herself selected.

(FIG. 4: Step S405)
The analysis target determination unit 109 searches the bibliographic information stored in the image database 105 using the metadata selected in step S404 as a search key, and acquires a set of image data that matches the search key. This set of images is a target of object detection processing.

(FIG. 4: Steps S406 to S408)
The image analysis apparatus 106 performs step S407 for each image data included in the image set acquired in step S405. In step S407, the object region detection unit 110 extracts a region similar to the object included in the query image from the images included in the image set acquired in step S405. The object region extraction method will be described later with reference to FIG.

(FIG. 4: Step S409)
The data output unit 112 outputs the detection result of the object area detected by the object area detection unit 110. The detection results may be output in the order of processing, or may be output after sorting based on the number of detected objects and reliability. Furthermore, as shown in the detection result 304 of FIG. 3, supplemental information such as the number of detected objects, detection reliability, and a rectangle indicating the detected object area may be output together. Furthermore, it may be output on the screen via the display device 103, or data describing the detection result and each of the supplementary information may be output.

(FIG. 4: Step S410)
If there are no more objects to be detected (no instruction from the user), the process ends and another object is detected, such as when there is another object in the query image or the user newly specified a query image. If so, the process returns to step S402 and the same processing is performed.

FIG. 5 is a diagram illustrating a procedure in which the metadata generation unit 108 generates the metadata of the query image. Hereinafter, each step shown in FIG. 5 will be described.

(FIG. 5: Step S501)
The metadata generation unit 108 searches for similar images from the image database 105 using the query image 301 as a search key. Similar image search is a method of searching for similar images by extracting information such as color and shape of the image itself as high-dimensional vector information and evaluating the similarity between images based on the distance between vectors. As a result, a set 501 of images whose “look” is similar to the query image 301 is obtained. Further, since the image database 105 holds images and bibliographic information in association with each other, a bibliographic information set 502 is obtained from the similar image set 502.

(FIG. 5: Step S502: Procedure 1)
The metadata generation unit 108 extracts characteristic words included in the set of bibliographic information. Bibliographic information is desirable if organized data such as image classification codes is provided, but even if a document such as an explanatory text is attached, the document expresses the meaning of the image. There is a high possibility that the word is included. Therefore, in this step, the metadata generation unit 108 decomposes each bibliographic information into atomic data (minimum constituent unit) (for example, decomposes from a document into words), and considers it as metadata. As described above, the metadata of the query image 301 can be generated.

(FIG. 5: Step S502: Procedure 2)
The metadata generation unit 108 counts the frequency at which the metadata generated in step 1 appears in the bibliographic information. The metadata generation unit 108 calculates the score of each metadata generated in procedure 1 using the appearance frequency. Simply, the appearance frequency may be used as the metadata score, and sorting may be performed in descending order of the score, or an evaluation index weighted to the appearance frequency may be used as the score.

(FIG. 5: Step S502: Example 1 of score calculation method)
As the metadata score, TF-IDF (Term Frequency-Inverse Document Frequency) can be used. TF-IDF is an evaluation index obtained by multiplying the frequency tf (t) of the metadata t and the inverse document frequency idf (t). The reverse document frequency idf (t) is expressed by the following equation 1 where N is the number of records in the database and df (t) is the frequency of bibliographic information including metadata t in the entire database.

(FIG. 5: Step S502: Example 2 of score calculation method)
A probabilistic evaluation index may be used as the metadata score. For example, when evaluating the metadata t, when an image is randomly acquired from the entire database, the probability that the bibliographic information includes the metadata t is q (t), and the image set of similar image search results is randomly selected. Assuming that the probability that the metadata t is included in the bibliographic information when the image is acquired is p (t), a measure kl of the probability distribution difference between p (t) and q (t) shown in equations 2 to 4. (T) can be used as a metadata score.

FIG. 6 is a flowchart for explaining a processing procedure in which the metadata generation unit 108 generates metadata of the query image 301. Hereinafter, each step of FIG. 6 will be described.

(FIG. 6: Steps S601 to S602)
The metadata generation unit 108 calculates the image feature amount of the query image 301 (S601). The metadata generation unit 108 performs a similar image search using the image feature amount extracted in step S601 as a search key (S602). The smaller the distance between feature quantity vectors of each image, the higher the similarity between the images, and the result of sorting according to the distance value is output as a search result.

(FIG. 6: Steps S603 to S607)
The metadata generation unit 108 performs steps S604 to S606 for each similar image obtained in step S602.

(FIG. 6: Steps S604 to S605)
The metadata generation unit 108 reads the bibliographic information associated with the similar image obtained in step S602 from the image database 105 (S604). The metadata generation unit 108 decomposes the bibliographic information acquired in step S604 into atomic data, and uses this as metadata (S605). For example, when the bibliographic information is a document, morphological analysis is performed and the document is broken down into words. The process of decomposing bibliographic information may be performed in advance when registering a document in the image database 105 for efficiency.

(FIG. 6: Step S606)
The metadata generation unit 108 counts the frequency at which the metadata generated in step S605 appears in the bibliographic information read in step S604. The metadata generation unit 108 obtains the cumulative frequency for each metadata throughout the steps S603 to S607. At this time, in order to reflect the similarity of images to the frequency of metadata, weighting may be performed according to the similarity and then added to the cumulative frequency.

(FIG. 6: Step S608)
The metadata generation unit 108 calculates a metadata score using the cumulative frequency for each metadata obtained in steps S603 to S607. The score calculation method is as described in FIG.

(FIG. 6: Step S609)
The metadata generation unit 108 sorts the metadata in the order of the scores calculated in step S608, excludes metadata below the threshold value, and outputs the result.

FIG. 7 is a diagram for explaining the object region detection method in step S407 of FIG. This method detects an area where an object exists in an image by detecting an area matching the template using an image of the object to be detected as a template.

First, an image feature amount of a typical image (template) of an object to be detected is extracted and stored in the template database 704 in advance. The template image here corresponds to the query image 301. For example, when it is desired to detect a plurality of objects, the template database 704 can hold a plurality of templates (detection target images) corresponding to the respective objects. The template held in the template database 704 is reset every time the object to be detected changes.

When an input image 701 (an image in the image database 105), which is a target for detecting an object, is given, the object region detection unit 110 varies the position and size of the scanning window 702 and extracts a candidate region 703 for the object. Next, for all candidate regions 703, a search is made for a feature vector closest to the feature vector in the candidate region 703 from among a plurality of templates in the template database 704. If the distance between the feature vectors of the found template and the candidate area 703 is equal to or smaller than a predetermined threshold, it is determined that the object of the template is included in the candidate area 703, and the candidate area 703 is added to the detection result. At this time, the distance between the feature amount vectors of the nearest neighbor template and the candidate region 703 can be used as the reliability of the detection result.

FIG. 8 is a flowchart for explaining processing in which the object region detection unit 110 detects an object. Hereinafter, each step of FIG. 8 will be described.

(FIG. 8: Step S800)
The object region detection unit 110 calculates the feature amount of the template and registers it in the template database. When there are a plurality of input images 701 that are objects to be detected, and the detection process is performed using the same template, this step may be performed only once.

(FIG. 8: Step S801)
The object area detection unit 110 extracts candidate areas 703 in the input image 701. The candidate area 703 is mechanically extracted by moving or resizing the scan window step by step.

(FIG. 8: Steps S802 to S806)
The object area detection unit 110 performs Steps S802 to S806 for all candidate areas 703.

(FIG. 8: Step S803)
The object area detection unit 110 calculates the reliability of the candidate area 703. As a reliability calculation method, for example, as described in FIG. 7, the distance between the feature amount of the template and the feature amount of the candidate area 703 can be used.

(FIG. 8: Steps S804 to S805)
If the reliability of the candidate area 703 obtained in step S803 is equal to or less than the predetermined threshold value, the process moves to step S805, and otherwise, step S805 is skipped (S804). The object region detection unit 110 adds the candidate region 703 whose reliability is a predetermined threshold value to the detection result list (S805).

(FIG. 8: Step S807)
The object area detection unit 110 outputs the detection result list and ends this processing flow. The detection result is the coordinate information in the input image 701 (for example, [horizontal coordinates of the upper left corner of the rectangle, vertical coordinates of the upper left corner of the rectangle, horizontal coordinates of the lower right corner of the rectangle, vertical coordinates of the lower right corner of the rectangle]) and reliability Output as a set of degrees.

FIG. 9 is a diagram for explaining the processing sequence between the functional units in the processing in which the image analysis system 100 specifies the object region in the image. Hereinafter, each step of FIG. 9 will be described.

(FIG. 9: Steps S901 to S902)
The user inputs an image stored in the image database 105 and a document associated therewith via the input device 102 (S901). The set of images and sentences is sent to the image database 105 via the image analysis device 106. The image database 105 extracts a feature amount from the image received from the image analysis device 106 and registers it in association with bibliographic information obtained from the document (S902). Steps S901 to S902 correspond to step S401 in FIG.

(FIG. 9: Steps S903 to S906)
The user inputs an image (query image) of an object to be detected via the input device 102 (S903). The image analysis apparatus 106 requests a similar image search to the image database 105 using the query image as a search key (S904). The image database 105 extracts an image feature amount from the query image, searches for an image similar to the query image using this, and returns the similar image and its bibliographic information to the image analysis device 106 (S905). The image analysis device 106 generates query image metadata using the bibliographic information received from the image database 105, and calculates the score (S906).

(FIG. 9: Steps S907 to S908)
The image analysis device 106 presents the metadata generated in step S906 and its score to the user via the display device 103 or the data storage device 104 (S907). The user selects metadata to be used for narrowing down the images to be searched with reference to the metadata itself and its score (S908). The image analysis apparatus 106 can automatically select metadata by omitting step S908 and selecting metadata in order from the highest score, for example.

(FIG. 9: Steps S909 to S910)
The image analysis apparatus 106 requests the image database 105 to search for an image whose bibliographic information matches the search key using the metadata selected by the user in step S908 as a search key (S909). The image database 105 searches for bibliographic information corresponding to the search query, and returns an image associated therewith to the image analysis device 106 (S910).

(FIG. 9: Step S911)
The image analysis apparatus 106 performs a process of detecting an object included in the query image on each image obtained as a result of step S910, and specifies a region similar to the query image. The detection result is the coordinates of the rectangular area of the object in the image (for example, [horizontal coordinates of the upper left corner of the rectangle, vertical coordinates of the upper left corner of the rectangle, horizontal coordinates of the lower right corner of the rectangle, vertical coordinates of the lower right corner of the rectangle]) And the degree of reliability representing “object-likeness”. The detection result is output via the data output unit 112.

FIG. 10 is a diagram illustrating a configuration example of an operation screen used for acquiring an image including a specified object from the image database 105. This screen can be provided on the display device 103. The user operates the cursor 1006 displayed on the screen using the input device 102 to send operation information to the operation information input unit 111.

The operation screen of FIG. 10 includes a query image input area 1001, a similar image search button 1002, a metadata generation button 1003, a similar image display area 1004, a metadata display area 1007, a detection target number display area 1008, and an expected processing time display area 1009. A detection start / interrupt button 1010 and a detection result display area 1011.

First, the user inputs the query image stored in the image / document storage device 101 into the query image input area 1001. As an input method, for example, a dialog for specifying a file path of a file system may be used, or an intuitive input operation by drag and drop may be used.

When the user clicks the similar image search button 1002, the image analysis device 106 acquires an image similar to the query image from the image database 105 and displays it in the similar image display area 1004. The image analysis apparatus 106 uses the bibliographic information of similar images displayed in the similar image display area 1004 to generate metadata of the query image. The metadata may be generated using all the similar images, or the user may specify which similar image is used after confirming the similar image. The user designates a similar image using a check box 1005, for example. In the example illustrated in FIG. 10, the similar image at the right end is unchecked, and it is specified not to use the image when generating metadata.

When a metadata generation button 1003 is clicked, the metadata generation unit 108 generates metadata using bibliographic information attached to the selected similar image and displays it in the metadata display area 1007. The metadata display area 1007 displays the number of images in which each metadata is included in the bibliographic information. If the search speed of bibliographic information is sufficiently high, the number of corresponding images when bibliographic information is searched for by each metadata alone may be displayed.

The user selects metadata to be used for narrowing down the images to be detected by the object in consideration of the metadata itself, the score, the number of corresponding images, and the like. For example, the user selects metadata using a check box 1012. Each time the check box 1012 is clicked, the image analysis device 106 searches the bibliographic information, and displays the number of images including the selected metadata in the bibliographic information in the detection target number display area 1008. Also, the expected processing time when the object detection is performed on the same number of images is displayed in the expected processing time display area 1009. The processing time can be estimated based on the number of images to be detected. Thereby, the user can select metadata effectively.

When the detection start / interrupt button 1010 is clicked, the analysis target determination unit 109 acquires an image set that is a target for object detection using the metadata selected by the above operation, and the object region detection unit 110 displays the image. Object detection is performed on the set. Since the detection processing performed by the object region detection unit 110 is independent for each image, the detection processing is performed each time images are displayed in the detection result display region 1011 sequentially from the processed image or when the detection start / interrupt button 1010 is clicked. It can be started / interrupted.

<Embodiment 1: Summary>
As described above, the image analysis system 100 according to the first embodiment performs object detection only on image data including the query image metadata as bibliographic information. As a result, it is possible to effectively narrow down objects to be detected from a large number of images, and to quickly search for an image including the object designated by the user.

The image analysis system 100 according to the first embodiment is used, for example, in determining whether a graphic to be newly registered is not used in a registered graphic trademark in graphic trademark search or examination. it can. In this case, as the bibliographic information of the image necessary for generating the metadata, an image classification code or an explanatory note can be used.

The image analysis system 100 according to the first embodiment can also be applied to an auction site or a shopping site. Thereby, the goods containing the pattern and mark designated by the user can be searched at high speed. In this case, as the bibliographic information of the image, the title of the product or the comment of the exhibitor can be used.

The image analysis system 100 according to the first embodiment can be applied to video content. As a result, it is possible to check the scene in which a celebrity or landmark is reflected and its position in the frame image. In this case, as the bibliographic information of the image, a closed caption or text-like voice can be used.

<Embodiment 2>
In the image analysis system 100 described in the first embodiment, the analysis target determination unit 109 narrows down images to be detected by the bibliographic information search. For this reason, an image with insufficient bibliographic information in spite of the fact that an object designated by the user is actually included is not subjected to detection processing and does not appear as an analysis result. In the following, a method for reducing leakage of detection processing targets by expanding bibliographic information will be described. Since other configurations are substantially the same as those in the first embodiment, the following description will focus on the differences.

FIG. 11 is a diagram for explaining an example of extending bibliographic information. For comparison, FIG. 11A shows a search conceptual diagram when bibliographic information is not expanded, and FIG. 11B shows a search conceptual diagram when bibliographic information is expanded in the second embodiment.

As illustrated in FIG. 11A, the image analysis system 100 described in the first embodiment searches for bibliographic information on the basis of metadata “star” in order to search for an image including an object included in the query image 301. . As a result, if “star” is included in the bibliographic information as in the image 1101, the object is subjected to object detection processing, but an image that does not include “star” in the bibliographic information as in the image 1102 is not a detection target. . However, since the image 1102 actually includes a region similar to the query image 301, the image 1102 is not detected.

Therefore, in the second embodiment, metadata is also generated for images stored in the image database 105 as shown in FIG. The method for generating the metadata may be the same as the method used when generating the metadata of the query image 301. The newly generated metadata is registered in the image database 105 as additional bibliographic information. When the image analysis device 106 narrows down the image that is the object detection target, the additional bibliographic information is also the search target. As a result, as shown in the image 1103, an image that does not originally include “star” as the bibliographic information can be a search target.

In general, compared to an image that shows a single object, an image that shows multiple objects has more "look" variations due to variations in the layout of the object, so it can be found as an image that is similar to the query image Becomes lower. On the other hand, if an image with a high similarity between the query image and a plurality of objects is found, it is considered that there is little deterioration in the amount of information even if the bibliographic information of the similar image is transferred.

FIG. 12 is a flowchart for explaining a procedure of processing for extending bibliographic information. This flowchart is a process performed by the metadata generation unit 108 for all images registered in the image database 105, and is a repetition process from step S1201 to step S1204. This flowchart may be performed, for example, during a time period when the system load is low, or may be performed immediately after the first registration of an image in the image database 105. Hereinafter, each step of FIG. 12 will be described.

(FIG. 12: Step S1202)
The metadata generation unit 108 uses the existing bibliographic information held in the image database 105 to generate image metadata in the image database 105. The method for generating the metadata is the same as the procedure shown in FIG. 6. For example, the similarity threshold is made stricter than in FIG. 6, or an image feature amount that does not change even when the object layout changes is used. May be.

(FIG. 12: Step S1203)
The metadata generation unit 108 registers the metadata generated in step S1202 in the image database 105 as additional bibliographic information.

FIG. 13 is a Venn diagram representing an analysis target for explaining how detection omissions are reduced by bibliographic information expansion processing. FIG. 13A is a Venn diagram when only existing bibliographic information is used, and FIG. 13B is a Venn diagram when expanded bibliographic information is used.

13A, a set 1301 represents a set of all images registered in the image database 105. When searching for an object designated by the user without using the image analysis system 100, the set 1301 is a processing target of image analysis.

The set 1302 is an image set including an area of “star shape” designated by the user, and ideally, the image analysis system 100 only needs to be able to output this set.

The set 1303 is an image set obtained as a result of the bibliographic information search performed by the image analysis system 100 using the automatically generated metadata “star” as a query. The image analysis system 100 sets this set as an object detection processing target.

The set 1304 is an image set including a “star-shaped figure”, but does not include “star” in the bibliographic information, and thus is not an object of detection processing and is a detection omission.

The set 1305 is an image set that can be detected because it is an object to be detected and includes “star-shaped figures”. However, whether it can actually be detected depends on the recognition performance of the object detector itself. A method for improving the performance of the object detector will be described with reference to FIG. 15 of the third embodiment.

The set 1306 is an image set including bibliographic information including “stars”, but actually does not include an area similar to the “star-shaped figure” designated by the user, and thus is an image set that originally does not require detection processing.

As shown in FIG. 13B, when bibliographic information is expanded, a set including “stars” in the bibliographic information becomes large. At this time, since the set is expanded based on the result of the similar image search, the expanded region has a high ratio of including “star figure”. As a result, the object detection time increases, but detection omission can be reduced.

FIG. 14 is a chart showing the relationship between image analysis processing time and coverage. The horizontal axis represents processing time, and the vertical axis represents coverage. Coverage is a percentage that represents what percentage of the set 1302 of FIG. 13 has been processed. The horizontal axis represents 100 as the processing time when all images to be searched, that is, the set 1302 of FIG.

14, it is assumed that the ratio of the set 1305 in the set 1302 is 60% and the ratio of the set 1304 in the set 1302 is 40%. By extending the bibliographic information, it is assumed that the ratio of the set 1305 in the set 1302 is 80% and the ratio of the set 1304 in the set 1302 is 20%. Further, it is assumed that the processing time when performing object detection on the set 1305 is one tenth of the processing time when performing object detection on all image sets.

A straight line 1401 represents a transition of coverage when the entire image set (set 1301) is an analysis target. When images are randomly extracted from the image database 105 and processed, the coverage increases linearly.

A broken line 1402 is a transition of coverage when the analysis target is narrowed down using metadata by the method described in the first embodiment. Up to a point 1404 shows detection processing for an image narrowed down as a search target, and after the point 1404 shows a case where detection processing is applied to the remaining images. At point 1404, it can be seen that 60% coverage is achieved in 1/10 the processing time of the straight line 1401.

A broken line 1403 is a change in coverage when the analysis target is narrowed down using the expanded bibliographic information by the method described in the second embodiment. Up to a point 1405 shows detection processing for an image narrowed down as a search target, and after the point 1405 shows a case where detection processing is applied to the remaining images. The processing time until reaching the point 1405 increases as the number of detection processing targets increases, but the coverage becomes higher.

As shown in FIG. 14, since processing time and coverage are in a trade-off relationship, it is necessary to decide whether or not to expand bibliographic information depending on the application. When examining a graphic trademark, it is sufficient to find at least one similar image. Therefore, a system with higher responsiveness can be obtained by performing detection processing after sufficiently narrowing down the processing target. If it is desired to increase the coverage, processing may be performed using only the original bibliographic information at first, and additional bibliographic information may be used as necessary.

<Embodiment 2: Summary>
As described above, the image analysis system 100 according to the second embodiment generates the metadata of the image stored in the image database 105 and adds it to the image database 105 as new bibliographic information. The same processing as in the first mode is performed. As a result, when only existing bibliographic information is used, an image that is not detected can be processed.

<Embodiment 3>
In the third embodiment of the present invention, a method for improving the accuracy of object detection by using intermediate data in the processing of the image analysis system 100 will be described. This method uses a plurality of templates described in FIG. 7 as an object detection method. Since other configurations are the same as those in the first and second embodiments, the following description focuses on searching for similar images using a plurality of templates when generating metadata for a query image.

FIG. 15 is a diagram for explaining a technique for increasing the accuracy of object detection by expanding a template used when searching for an image similar to a query image. Hereinafter, the expansion of the template in the third embodiment will be described with reference to FIG.

In the object detection method described in FIG. 7, the object region is specified by examining the similarity between the partial region of the image and the template. Therefore, when only the query image 301 is used as a template, a star shape with extremely different appearance such as the image 1505 cannot be detected. Further, even with the same concept of “star”, it is not possible to detect a “sun figure” like the image 1506 and a “planet figure” like the image 1507.

Therefore, the image analysis apparatus 106 according to the third embodiment uses intermediate data generated during processing as an additional template. Specifically, an image group 1501 obtained as an image similar to the query image 301 when generating the metadata of the query image 301 is used as a template in the object detection process. That is, not only an object designated by the user but also an object similar to this is set as an object detection target. Therefore, in the third embodiment, an object template to be detected is expanded using a similar image obtained by a similar image search performed for generating metadata.

The image analysis apparatus 106 searches for an image similar to the query image 301 according to the method described in FIG. 6 (S601 to S602). Although the appearance of the image group 1501 obtained at this time does not completely match the query image 301, it is close to the object specified by the user, so it is considered suitable as a template for performing object detection later. . Therefore, these similar images are registered in the template database 1504. The template database 1504 is a database that temporarily holds a template used when performing object detection, and is reset every time the query image 301 changes.

The image analysis apparatus 106 generates metadata of the query image 301 and the image group 1501 according to the method described in FIG. Thus, for example, it is assumed that metadata “star” is generated.

The image analysis device 106 searches for bibliographic information that matches the metadata “star”. As a result, image groups 1505 to 1507 including images 1503 corresponding to the concept of “star” are obtained. The result of the bibliographic information search is an image including a plurality of objects or a lot of noise as described with reference to FIG. 13. For example, the template is obtained by the user's interactive operation using the operation screen as shown in FIG. You can also choose. For example, an image obtained by bibliographic information search can be displayed once on the operation screen, and the user can select an image to be used as a template in the object detection process from among them.

The image analysis apparatus 106 performs object detection on the image groups 1505 to 1507 using a plurality of templates stored in the template database 1504. For example, in addition to the query image 301, the image group 1501 is used as a template, and when the user designates that the image 1503 is used as a template on the operation screen, this is also used as a template. Thereby, it is possible to detect a star-shaped region that does not necessarily look similar to the query image 301 (for example, a sun-type or Saturn-type like the image 1503).

<Embodiment 3: Summary>
As described above, the image analysis apparatus 106 according to the third embodiment uses the similar image obtained when generating the metadata of the query image 301 or the image obtained when searching the bibliographic information for object detection. Used as an extension template. This makes it possible to detect objects with different “looks” even if the concept is common.

<Embodiment 4>
In the fourth embodiment of the present invention, a configuration example in which the image analysis system 100 is incorporated into a content cloud system will be described. Below, the outline | summary of a content cloud system is demonstrated first, Then, the method of incorporating the image analysis system 100 into a content cloud system as an analysis module is demonstrated. The configuration of the image analysis system 100 is the same as in the first to third embodiments.

FIG. 16 is a schematic diagram of a content cloud system 1600 according to the fourth embodiment. The content cloud system 1600 includes an Extract Transform Load (ETL) module 1603, a content storage 1604, a search engine 1605, a metadata server 1606, and a multimedia server 1607. The content cloud system operates on a general computer including one or more CPUs, memories, and storage devices, and the system itself is composed of various modules. In addition, each module may be executed by an independent computer. In this case, each storage is connected to the module via a network or the like, and is realized by distributed processing in which data communication is performed via them.

The application program 1608 sends a request to the content cloud system 1600 via a network or the like, and the content cloud system 1600 sends information corresponding to the request to the application 1608.

The content cloud system 1600 receives data 1601 in an arbitrary format such as video data, image data, document data, and audio data as an input. The data 1601 is, for example, a graphic trademark and its publicity document, a website image and HTML document, closed caption or video data with audio, and may be structured data or unstructured data. Data input to the content cloud system 1600 is temporarily stored in the storage 1602.

The ETL 1603 monitors the storage 1602. When the data 1601 is stored in the storage 1602, the ETL 1603 operates the information extraction processing module 16031 according to the data and archives the extracted information (metadata) in the content storage 1604. And save.

The information extraction processing module 16031 includes, for example, a text index module, an image recognition module, and the like. Examples of metadata include time, an N-gram index, an image recognition result (object name, region coordinates in the image), an image feature amount and related words, a voice recognition result, and the like. As the information extraction module 16031, any program for extracting some information (metadata) can be used, and a known technique can be adopted. Therefore, the description of the information extraction processing module 16031 is omitted here. If necessary, the metadata may be compressed in data size by a data compression algorithm. Further, after the ETL 1603 extracts the information, the data file name, data registration date, original data type, metadata text information, and the like may be registered in the Relational Data Base (RDB).

The content storage 1604 stores information extracted by the ETL 1603 and data 1601 before processing temporarily stored in the storage 1602.

When there is a request from the application program 1608, the search engine 1605 performs a text search based on the index created by the ETL 1603 and transmits the search result to the application program 1608, for example, if it is a text search. A known technique can be applied to the algorithm of the search engine 1605. The search engine 1605 can include a module that searches not only text but also data such as images and sounds.

The metadata server 1606 manages the metadata stored in the RDB. For example, it is assumed that the data file name, data registration date, original data type, metadata text information, etc. extracted by the ETL 1603 are registered in the RDB. When there is a request from the application 1608, the metadata server 1606 transmits information in the RDB to the application 1608 in accordance with the request.

The multimedia server 1607 associates pieces of information of the metadata extracted by the ETL 1603 with each other, stores the information in a graph format, and stores the metadata. As an example of association mapping, the correspondence relationship of the original voice file, image data, related words, and the like can be expressed in a network format with respect to the voice recognition result “apple” stored in the storage 1604. When there is a request from the application 1608, the multimedia server 1607 transmits meta information corresponding to the request to the application 1608. For example, when there is a request of “apple”, meta information associated on a network graph such as an image including an apple, an average market price, and an artist's song name is provided based on the constructed graph structure.

In the content cloud system 1600, the image analysis system 100 functions as the information extraction processing module 16031 in the ETL 1603. The image / document storage device 101 and the data storage device 104 in FIG. 1 correspond to the storage 1602 and the content storage 104 in FIG. The image analysis device 106 corresponds to the information extraction processing module 16031. When a plurality of information extraction processing modules 16031 are incorporated in the ETL 1603, the resources of one computer may be shared, or independent computers may be used for each module. The image database 105 in FIG. 1 corresponds to dictionary data 16032 required when the ETL 1603 extracts information.

<Embodiment 4: Summary>
As described above, the image analysis system 100 according to the present invention can be applied as a component of the content cloud system 1600. The content cloud system 1600 can integrate information across media by generating metadata that can be commonly used for each media data. As a result, it is expected to provide information with higher added value to the user.

The present invention is not limited to the above-described embodiment, and includes various modifications. The above embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. A part of the configuration of one embodiment can be replaced with the configuration of another embodiment. The configuration of another embodiment can be added to the configuration of a certain embodiment. Further, with respect to a part of the configuration of each embodiment, another configuration can be added, deleted, or replaced.

The above components, functions, processing units, processing means, etc. may be realized in hardware by designing some or all of them, for example, with an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

100: Image analysis system 101: Image / document input device 102: Input device 103: Display device 104: Data storage device 105: Image database 106: Image analysis device 107: Image / document input unit 108 : Metadata generation unit, 109: analysis target determination unit, 110: object region detection unit, 111: operation information input unit, 112: data output unit, 1600: content cloud system, 1602: storage, 1603: ETL module, 1604: Content storage, 1605: search engine, 1606: metadata server, 1607: multimedia server, 1608: application program.

Claims

An image input unit that receives query image data including an image of an object to be detected;
A metadata generation unit that generates metadata of the query image data using an image database that associates and holds image data and bibliographic information;
An analysis target determining unit that extracts one or more of the image data held in the image database and whose bibliographic information matches the metadata;
An object region detection unit for detecting a region including an image of the object among the one or more pieces of image data extracted by the analysis target determination unit;
An output unit for outputting a result detected by the object region detection unit;
An image analysis apparatus comprising:
The metadata generation unit
The image data stored in the image database is searched for data similar to the query image data, and the metadata is generated using the bibliographic information of the image data obtained as a result. The image analysis apparatus according to claim 1.
The metadata generation unit
A score of the metadata is calculated using a frequency at which the metadata appears in the bibliographic information of the image data obtained as a result of the search;
The analysis target determining unit
The image analysis apparatus according to claim 2, wherein the metadata is used as a search key when extracting a match with the bibliographic information using the score.
The analysis target determining unit
4. The image according to claim 3, wherein the image data associated with the bibliographic information matching the metadata is extracted using the metadata within a predetermined range in order from the highest score as a search key. 5. Analysis device.
The analysis target determining unit
The image data associated with the bibliographic information that receives the metadata designation that specifies which of the metadata is used to extract the one that matches the bibliographic information and that matches the designated metadata The image analysis device according to claim 3, wherein the image analysis device is extracted.
The image analysis device includes:
A display unit that displays the number of the image data to be detected by the object region detection unit and a region that includes an image of the object and a detection processing time thereof;
The analysis target determining unit
The image analysis apparatus according to claim 5, wherein each time the metadata designation is received, the number of cases and the detection processing time are recalculated and the result of the recalculation is reflected on the display unit.
The metadata generation unit
Receiving a similar image designation that designates one of the image data obtained as a result of the search and used to generate the metadata together with the query image data;
The image data stored in the image database is searched for data similar to the query image data and data similar to the image data designated by the similar image designation, and the image data obtained as a result The image analysis apparatus according to claim 2, wherein the metadata is generated using the bibliographic information.
The object region detection unit
Calculating an intervector distance between a feature vector of a partial region of the image data and a feature vector of the query image data;
2. The method according to claim 1, wherein whether or not the object included in the query image data is included in the partial region is determined based on whether or not the inter-vector distance is within a predetermined range. Image analysis device.
The output unit is
The image analysis apparatus according to claim 1, wherein the number of the objects detected in the image data by the object area detection unit is output together with a result of detection by the object area detection unit.
The output unit is
The image analysis apparatus according to claim 1, wherein the detection reliability of the object detected in the image data by the object region detection unit is output together with a result detected by the object region detection unit.
The metadata generation unit
Generating metadata of the image data held by the image database using the other image data held by the image database, adding the generated metadata as the bibliographic information,
The analysis target determining unit
The bibliographic information to which the metadata is added is used to extract one or more of the image data held in the image database whose bibliographic information matches the metadata. The image analysis apparatus according to 1.
The object region detection unit
Among the one or more image data extracted by the analysis target determination unit,
The area including the image of the object and the area including the image of the object included in the image data obtained as a result of the search performed by the metadata generation unit are detected. Image analysis device.
The object region detection unit
Receiving the detection target designation that designates the image data obtained as a result of the search performed by the metadata generation unit, including the object to be detected together with the object included in the query image data;
Among the one or more image data extracted by the analysis target determination unit,
The image analysis apparatus according to claim 12, wherein an area including the image of the object and an area including an image of the object included in the image data designated by the detection target designation are detected.
An image analysis apparatus according to claim 1;
An image database that stores image data and its bibliographic information in association with each other;
Have
The metadata generation unit generates metadata of the query image data using the image database.
An image input step of receiving query image data including an image of an object to be detected;
A metadata generation step of generating metadata of the query image data using an image database that stores image data and bibliographic information in association with each other;
An analysis target determining step for extracting one or more of the image data held in the image database and whose bibliographic information matches the metadata;
An object region detection step for detecting a region including an image of the object among the one or more image data extracted in the analysis target determination step;
An output step of outputting a result detected in the object region detection step;
An image analysis method characterized by comprising: