WO2014092548A1

WO2014092548A1 - A method and system for identifying multiple entities in images

Info

Publication number: WO2014092548A1
Application number: PCT/MY2013/000255
Authority: WO
Inventors: Tan Sieow YEEK; Bong Chin WEI; Dickson Lukose
Original assignee: Mimos Berhad
Priority date: 2012-12-13
Filing date: 2013-12-12
Publication date: 2014-06-19
Also published as: MY172808A

Abstract

The present invention provides a method for identifying multiple entities in a learned image. The present invention utilizes a visual knowledge-base storing multiple predefined visual features of various entities. The learned image is sectioned into a plurality of image sub-sections and visual features information from each of the plurality of sub-section images is thereafter extracted. The extracted visual features information of the sub-section images is compared with those stored in the knowledgebase. The visual similarity between the extracted visual features information of the sub-section images and the stored visual features is rated. Based on the visual similarity rate, entities of the image can thereby be identified.

Description

A Method and System for Identifying Multiple Entities in Images

Field of the Invention

[0001] The present invention relates to image processing method. In particular, the present invention proposes a method and system for identifying and extracting multiple entities in an image.

Background

[0002] Understanding semantic meanings and contents of an image remains as one of the most challenging problems in image processing systems. Most image processing systems are only able to perform low-level visual context processing of the image, while not able to perform high-level semantic conceptualization.

[0003] In order to be capable of performing high-level image semantic understanding, the image processing system first must be able to identify all entities available in the image. Thereafter, further processing methods may be applied to the identified entities so as to achieve higher level of image semantic understanding. [0004] Identification of all entities available in the image is one of the fundamental challenges of image processing system. The image processing system may not be able to accurately recognize an entity as humans do. For example, while humans can identify an entity regardless of the position, size or orientation thereof, the image processing system usually is not able to do so. Further, two or more different entities often have very low differences in visual features that even human hardly differentiates them. For example, sea and sky share a lot of similar visual features and often misinterpreted as one single object.

[0005] US patent no. US 8,150,165 disclose a method for visual recognition of one object in an image. The method includes extracting unique points of an object to be learned. An icon corresponding to each of the unique points is extracted. The size and orientation of the icon correspond to the size and angel of the unique point. After extraction of the various icons, an object becomes a collection of icons. Each of these icons is normalized to a constant size so that it can be compared with other icons. The icons are then used to recognize a target object from large number of trained images. Summary

[0006] In one aspect of the present invention, a method for identifying multiple entities in an image is provided. The method comprises constructing a visual knowledge-base for storing multiple various visual features of various entities, generating a plurality of sub-section images from the image, extracting visual features information from each of the plurality of sub-section images, comparing the extracted visual features information of the sub-section images with the visual features stored in the knowledge-base, rating the visual similarity between the extracted visual features information of the sub-section images and the stored visual features, and identifying the image sub-sections having high visual similarity with the stored visual features as entities of the image.

[0007] In one embodiment, the constructing the knowledge-base further comprises constructing visual-feature ontology containing a list of pre-defined entities visual concepts and properties, wherein each concepts and properties is associated with a range value of feature signatures, preparing a training image dataset containing various entities concepts pre-defined in the visual-feature ontology, extracting plurality of visual features from the training image dataset, generating features signatures based on the extracted visual features, creating plurality of instances based on the training image dataset and adding the instances into the visual feature ontology under the pre-defined concepts, correspondingly adding the features signatures into the created instances, wherein the features signatures are hold as a range of value and obtaining the visual knowledge-base.

[0008] In a further embodiment, the comparing the extracted visual features information of the sub-section images further comprises generating plurality of feature signatures of the image sub-sections based on the extracted visual features information, retrieving the feature signatures stored in the visual knowledge-base, and comparing and rating the similarity between the feature signatures of the image subsections and the feature signatures stored under one specific pre-defined concept in the visual knowledge-base.

[0009] In one embodiment of the present invention, the identifying the sub-section image which has high visual similarity with the stored visual features further comprises receiving the visual similarity rate of the image sub-sections with respect to one specific concept, comparing the visual similarity rates of the image sub-sections to a pre-defined threshold value, and identifying the image sub-sections having visual similarity rates exceeding the pre-defined threshold value as entity having the specific concept. [0010] In another embodiment of the present invention, the generating plurality of sub-section images from the image further comprises generating a plurality of first bounding boxes within the image, generating a plurality of second bounding boxes within the first bounding boxes, and sectioning the image according to all generated bounding boxes.

[0011] The bounding boxes are generated in random locations and scales, wherein the second bounding boxes are smaller than the first bounding boxes. It is preferable that the bounding boxes are generated based on a pre-defined scale.

[0012] Another aspect of the present invention provides a system for identifying multiple entities in an image. The system comprises a feature conceptualization module for constructing a knowledge-base storing plurality visual features of various entities, a location and scale sub-image generator module operable for generating a plurality of sub-section images from the image, a concept-feature extractor module operable for extracting visual features information from each of the plurality of image sub-sections, comparing and rating the visual similarity between the extracted visual features and the visual features stored in the knowledge-base, and a concept-feature analyzer module operable for identifying the sub-section image which has high visual similarity with the stored visual features as entities of the image.

[0013] In a further embodiment of the present invention, the system further comprises a feature extraction method library storing feature extraction methods operable for extracting visual features information from the image. Brief Description of the Drawings

[0014] This invention will be described by way of non-limiting embodiments of the present invention, with reference to the accompanying drawings, in which:

[0015] Fig. 1 illustrates a block diagram of a system for identifying multiple entities in an image in accordance with one embodiment of the present invention;

[0016] Fig. 2 illustrates a flow diagram of a method for identifying multiple entities in an image in accordance with one embodiment of the present invention;

[0017] Fig. 3 A illustrates a detailed process performed as to construct the concept- feature knowledge-base in accordance with an embodiment of the present invention; [0018] Fig. 3B illustrates design of concept- feature ontology in accordance with one embodiment of the present invention;

[0019] Fig. 3C illustrates an exemplary design of concept-feature knowledge-base 330 in a field of transportation;

[0020] Fig. 4 illustrates an example method performed so as to generate multiple sub- sections of a learned image;

[0021] Fig. 4 A illustrates a pyramid which scale of bases is referred to during generation of multiple bounding boxes according to one embodiment of the present invention;

[0022] Fig. 5 illustrates a detailed process performed by concept-feature extractor of Fig. 1 in accordance with one embodiment of the present invention; and [0023] Fig. 6 illustrates a detailed process performed by concept-feature analyzer of Fig. 2 in accordance with one embodiment of the present invention.

Detailed Description

[0024] The following descriptions of a number of specific and alternative embodiments are provided to understand the inventive features of the present invention. It shall be apparent to one skilled in the art, however that this invention may be practiced without such specific details. Some of the details may not be described in length so as to not obscure the invention. For ease of reference, common reference numerals will be used throughout the figures when referring to same or similar features common to the figures.

[0025] The present invention is directed to a system and method for identifying multiple entities in a learned image, regardless of the positions and scales (sizes) of the entities. Plurality of unique visual features carried by an entity is extracted, stored, and then used for entity identification. Position, scale and orientation of the entity are also extracted.

[0026] Fig. 1 illustrates a block diagram of a system 100 in accordance with one embodiment of the present invention. The system 100 is to perform multiple entities identification of a learned image. The system 100 comprises a feature conceptualization module 101, a knowledge-base 102, a location and scale sub- section image generator 103, a concept-feature extractor 104, and a concept-feature analyzer 105. The feature conceptualization module 101 extracts plurality of visual features from various entities of a training image dataset and stores them in the knowledge-base 102. These extracted visual features will later be used for identifying image entities in the learned image.

[0027] Still referring to Fig. 1, the location and scale sub-section image generator 103, the concept-feature extractor 104, and the concept-feature analyzer 105 are modules operable to perform the entities identification of the learned image. The identification method will be elaborated in greater details below.

[0028] In a further embodiment of the present invention, the system 100 may further comprise a feature extraction-method library. The feature extraction-method library is configured to store plurality of methods operable for extracting visual features information of an image. The feature extraction method may be any extraction methods known in the art.

[0029] Fig. 2 illustrates a flow diagram of a method 200 for identifying multiple entities in an image of an embodiment of the present invention. At step 202, concept- feature knowledge-base 102 is constructed by the feature conceptualization module 101. The concept-feature knowledge base 102 contains concept-feature ontology which comprises a list of pre-defined entities concepts that are ready to hold multiple visual features signatures of various entities in ranging values. At step 204, a learned image is processed by the location and scale sub-section image generator 103 so as to generate multiple sub-sections of the learned image. The sub-sections may be in various scales. At step 206, plurality of visual feature signatures are extracted from the sub-sections by the concept-feature extractor 104. [0030] Still referring to Fig. 2, at step 208, the concept-feature extractor 104 retrieves the features signatures stored in the concept knowledge-base 102 and compares them with the feature signatures of each sub-section. When the feature signatures of one sub-section are similar to a particular concept pre-defmed in the concept-feature ontology of the concept-knowledge base, at step 210, the concept- feature extractor 104 then rates concept similarity between the sub-section and the particular concept. Thereby, concept similarity of each sub-section is accordingly rated. Thereafter, at step 212, the concept-feature analyzer 105 receives set of sub-sections rated with concept-similarity, and based on the rate of the concept similarity, all entities concepts in the learned image is accordingly identified at step 214

[0031] Detailed process performed at each step of the method 200 illustrated by Fig. 2 is elaborated in greater details below.

[0032] Fig. 3A illustrates a detailed process 300 performed at step 202 of Fig. 2 to construct the concept-feature knowledge-base in another embodiment of the present invention. The construction of the concept-feature knowledge-base is begun with construction of concept-feature ontology at step 302. The concept-feature ontology contains a list of pre-defmed entities concepts. Each concept may have plurality of entities, each of which is associated with a range of feature signatures. As such, the concept-feature ontology is holding plurality visual features information of various entities in ranging values. Fig. 3B illustrates design of concept-feature ontology in accordance with one embodiment of the present invention.

[0033] Referring back to Fig. 3A, at step 304, a training image dataset containing various entities pre-defined in the concept-feature ontology is prepared. At step 306, plurality of visual features in the training image dataset are extracted. The visual features extraction is performed according to the extraction method stored in the Feature Extraction Methods library. The visual features extraction method may be any extraction methods known in the art. [0034] Based on the extracted visual features, features signatures are accordingly generated at step 308. Meanwhile, at step 310, a plurality of entities of the training image are created and, respectively, added into concept-feature ontology under the suitable pre-determined concept. At step 312, the entities are marked with the corresponding features signatures. The entities are configured to hold a range of feature signatures. When all concepts pre-defined earlier in the concept ontology have been associated with corresponding entities and a range of feature signatures thereof, the concept-feature ontology construction process is thereby completed at step 314, and the concept-feature knowledge-base is ready for use.

[0035] Fig. 3C illustrates an exemplary design of concept-feature knowledge-base 330 in a field of transportation comprising ground transportation and air transportation. The ground transportation comprises car, bus and lorry, as its concepts, while the air transportation comprises airplane and helicopter, as its concepts. The concept of car comprises plurality of car instances, i.e., Car-Sig 1, Car-Sig 2, Car- Sig n. Each of the car instances associates with a plurality of features signatures. Similarly, the each of bus, lorry, airplane and helicopter concepts comprises plurality of corresponding instances that associate with a plurality of features signatures.

[0036] Fig. 4 illustrates an example method 400 performed at step 204 of Fig. 2 so as to generate multiple sub-sections of a learned image. The method 400 starts at step 402, at which a plurality of first bounding boxes is generated within the learned image. The location and the scale of the first bounding boxes are randomly determined. Subsequently, at step 404, a plurality of second bounding boxes is generated within the first bounding boxes generated at step 402. The second bounding boxes are generated based on a pre-defined scale and are smaller than the first bounding boxes. After plurality of bounding boxes is generated, at step 406, the learned image is sectioned according to all generated bounding boxes. As a result, a plurality of image sub-sections is obtained.

[0037] In a further embodiment of the present invention, subsequent bounding boxes may be further generated after the second bounding boxes. It is preferable that the subsequent bounding boxes are formed within the previously generated bounding boxes. For example, a plurality of third bounding boxes may be generated within the second bounding boxes; a plurality of fourth bounding boxes may be generated within the third bounding boxes, and so on. One skilled in the art will understand that number of generated bounding boxes may be varied as desired.

[0038] It is preferable that the bounding boxes are generated according to a predefined scale. In one embodiment, the first bounding boxes are generated in a scale according to the scale of a first base 420 of a pyramid 421 depicted in Fig. 4 A. When second bounding boxes are to be generated subsequently, the scale of the second bounding boxes will be in accordance with scale of second base 422 that is formed when the pyramid is horizontally cross-sectioned at a height of t measured from the first base 420. Similarly, the scale of the third bounding boxes will be in accordance with scale of third base 423 that is formed as the pyramid is horizontally cross- sectioned at a height of 2t measured from the first base 420 or, in other means, at a height of t measured from the second base 422. If plurality of subsequent bounding boxes is desired, they shall be generated in this pyramidal scaling manner.

[0039] Fig. 5 illustrates a detailed process 500 performed by the concept-feature extractor at step 206 and step 208 of Fig. 2 in accordance with one embodiment of the present invention. The process 500 is to extract and rate feature signatures of plurality of sub-sections of the learned image obtained at step 204 of Fig. 2. The process 500 starts at step 502, at which the plurality of sub-sections of the learned image is received. At step 504, the feature signatures of the sub- sections are extracted according to extraction methods stored in the feature-extraction method library. At step 506, the feature signatures of the sub-sections are accordingly generated.

[0040] Still referring to Fig. 5, at step 508, the concept-feature extractor 104 retrieves the features signatures of each pre-defined entities stored in the concept knowledgebase 102 and compares them with the feature signatures of each sub-section. Each of the feature signature from the concept-feature KB is registered with a unique entity that belong to a specific concept. The concept-feature KB holds a large number of concepts, and each concept holds a list of entities registered with a feature signature. At step 510, based on the feature signatures similarity between one sub-section and one particular pre-defined entity contained in the concept knowledge-base, concept similarity of the one sub-section with respect to a pre-defined concept which the one particular pre-defined entity belongs to is evaluated. By evaluating the feature signature matching rate from the entities list carried by each of the concepts, a Concept Similarity (CS) rate for each concept is determined. The CS rate identifies that a sub-section image is having a content similar to the concepts in the KB. In other words, all the concepts in the KB will be evaluated with a CS rate with respect to a sub-section image. The concept that having highest CS rate from a sub-section image will be taken under a high threshold. As provided in the processes in Fig. 4, multiple sub-section images will be generated, and each sub-section will carry a concept, Accordingly, an image that contains multiple sub-sections is also identified with multiple concepts. These identified concepts are herein referred to as entities.

[0041] Fig. 6 illustrates a detailed process 600 performed by the concept-feature analyzer at step 210 of Fig. 2. At step 602, the concept-feature analyzer receives a set of image sub-sections rated with the concept-similarity respecting to a specific predefined concept. At step 604, the concept similarity rate of each sub-section is compared with a pre-defined threshold value. If the concept similarity rate of one subsection exceeds the threshold value, the sub-section is deemed to have that specific concept. At step 606, all sub-sections, which concept similarity rate exceed the threshold value, is selected, and entities concepts of the sub-sections are thereby identified.

[0042] The above description illustrates various embodiments of the present invention along with examples of how aspects of the present invention may be implemented. While specific embodiments have been described and illustrated it is understood that many charges, modifications, variations and combinations thereof could be made to the present invention without departing from the scope of the present invention. The above examples, embodiments, instructions semantics, and drawings should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims:

Claims

1. A method for identifying multiple entities in an image, comprising: constructing a visual knowledge-base for storing multiple various visual features of various entities; generating a plurality of sub-section images from the image; extracting visual features information from each of the plurality of sub-section images; comparing the extracted visual features information of the sub-section images with the visual features stored in the knowledge-base rating the visual similarity between the extracted visual features information of the sub-section images and the stored visual features; identifying the image sub-sections having high visual similarity with the stored visual features as entities of the image.

2. The method of claim 1, wherein the constructing the knowledge-base further comprises: constructing visual-feature ontology containing a list of pre-defined entities visual concepts and properties, wherein each concepts and properties is associated with a range value of feature signatures; preparing a training image dataset containing various entities concepts predefined in the visual-feature ontology; extracting plurality of visual features from the training image dataset; generating features signatures based on the extracted visual features; creating plurality of instances based on the training image dataset and adding the instances into the visual feature ontology under the pre-defined concepts; correspondingly adding the features signatures into the created instances, wherein the features signatures are hold as a range of value; obtaining the visual knowledge-base.

3. The method of claim 2, wherein the comparing the extracted visual features information of the sub-section images further comprises: generating plurality of feature signatures of the image sub-sections based on the extracted visual features information; retrieving the feature signatures stored in the visual knowledge-base; comparing and rating the similarity between the feature signatures of the image sub-sections and the feature signatures stored under one specific pre-defined concept in the visual knowledge-base.

4. The method of claim 1 , wherein the identifying the sub-section image which has high visual similarity with the stored visual features further comprises: receiving the visual similarity rate of the image sub-sections with respect to one specific concept; comparing the visual similarity rates of the image sub-sections to a pre-defined threshold value; and identifying the image sub-sections having visual similarity rates exceeding the pre-defined threshold value as entity having the specific concept.

5. The method of claim 1 , wherein the generating plurality of sub-section images from the image further comprises: generating a plurality of first bounding boxes within the image; generating a plurality of second bounding boxes within the first bounding boxes; and sectioning the image according to all generated bounding boxes.

6. The method of claim 4, wherein the bounding boxes are generated in random locations and scales.

7. The method of claim 4, wherein the second bounding boxes are smaller than the first bounding boxes.

8. The method of claim 5, wherein the bounding boxes are generated based on a pre-defined scale.

9. A system for identifying multiple entities in an image, comprising : a feature conceptualization module for constructing a knowledge-base storing plurality visual features of various entities; a location and scale sub-image generator module operable for generating a plurality of sub-section images from the image; a concept-feature extractor module operable for extracting visual features information from each of the plurality of image sub-sections, comparing and rating the visual similarity between the extracted visual features and the visual features stored in the knowledge-base; and a concept-feature analyzer module operable for identifying the sub-section image which has high visual similarity with the stored visual features as entities of the image.

10. The system of claim 9, further comprising a feature extraction method library storing feature extraction methods operable for extracting visual features information from the image.