CN102222239A - Labelling image scene clustering method based on vision and labelling character related information - Google Patents

Labelling image scene clustering method based on vision and labelling character related information Download PDF

Info

Publication number
CN102222239A
CN102222239A CN2011101487603A CN201110148760A CN102222239A CN 102222239 A CN102222239 A CN 102222239A CN 2011101487603 A CN2011101487603 A CN 2011101487603A CN 201110148760 A CN201110148760 A CN 201110148760A CN 102222239 A CN102222239 A CN 102222239A
Authority
CN
China
Prior art keywords
image
vision
emd
scene
labelling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011101487603A
Other languages
Chinese (zh)
Other versions
CN102222239B (en
Inventor
刘咏梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201110148760.3A priority Critical patent/CN102222239B/en
Publication of CN102222239A publication Critical patent/CN102222239A/en
Application granted granted Critical
Publication of CN102222239B publication Critical patent/CN102222239B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention provides a labelling image scene clustering method based on vision and labelling character related information. The method comprises the following steps of: dividing a training image and a test image respectively by using a NCut (Normalized Cut) image dividing algorithm; constructing a vision nearest-neighbour graph G(C)(V, E) of all images {J1, ., Jl} PCtrain for learning, wherein in a training image set, each image has one group of initial normalized labelling character weight vectors; spreading the labelling character of each training image among the vision nearest neighbours, receiving the accepted images according to the degree of normalized EMD (Earth Mover's Distance) among the accepted images; for each training image, normalizing the accumulated labelling character weights; after the vision characteristics of the image are converted into a group of labelling characters with weights, carrying out the scene semantic clustering by using a PLSA (Probabilistic Latent Semantic Analysis) model; learning each scene semantic vision space by using a Gaussian mixture model; and carrying out the scene classification by using the vision characteristics. With the invention, the coupling precision between the vision characteristics of the image and the labelling character can be increased, and the method can be directly used for the automatic semantic labelling of the image.

Description

Mark image scene clustering method based on vision and mark word relevant information
Technical field
What the present invention relates to is a kind of image processing method.Specifically a kind of method of image to be analyzed being carried out the automatic scene classification.
Background technology
In the image understanding fields such as automatic semantic tagger of image, rely on visual signature that non-mark image is classified, will guarantee that the semantic scene classification of being carried presents consistance on vision distributes.On the one hand, the semantic content that image can be expressed is very abundant, and piece image is placed under the different environment, may present the information of different aspects.On the other hand, because the deficiency of descriptive power, then there is more significantly semantic ambiguity in visual feature of image, and the similar image of vision can't guarantee the consistance of semantic content.
The mark word of image is as the senior semantic content describing mode of a kind of simple and high-efficient image, for the relevance of seeking between image labeling word and the vision content provides a large amount of learning samples reliably.But the intrinsic ambiguousness (as polysemy, many speech synonym) of mark word has also limited the image clustering effect that only relies on image labeling word information.
Summary of the invention
The object of the present invention is to provide a kind of visual feature of image and mark of improving to connect precision between the word, can be directly used in the mark image scene clustering method based on vision and mark word relevant information of the automatic semantic tagger of image.
The object of the present invention is achieved like this:
Step 1 adopts NCut (Normalized Cut) image segmentation algorithm respectively training image and test pattern to be cut apart, and the vision that obtains image-region is described;
Step 2, all images { J that is configured to learn 1,, J lThe vision arest neighbors figure of PCtrain
Figure BDA0000066209890000011
Corresponding each image of vertex set V, the corresponding piece image in each summit, the visible sensation distance between the collection E representative image of limit; Visible sensation distance between image is adopted similarity measure-dozer distance of the integrated coupling of multizone, and (Earth Mover ' s Distance EMD), connects the EMD visible sensation distance between the weights correspondence image on the limit on two summits;
Step 3 is concentrated at training image, and every width of cloth image has one group of initial normalization mark word weight vector;
Step 4 makes the mark word of every width of cloth training image propagate between the vision arest neighbors, and the image of acceptance receives according to the degree of normalized EMD distance between them, and the method for normalizing of EMD distance sees that formula is
Wherein, the EMD distance between the Emd representative image, Emd NorExpression normalization EMD distance, δ is an empirical parameter, gets the EMD variance of training plan image set;
Step 5, to every width of cloth training image, the mark word weights that accumulation is finished carry out normalization again, and the method for normalizing that initially marks the word weight vector is that statistics all marks the frequency that word occurs in this image;
Step 6 after visual feature of image is converted into one group of mark word that has weights, adopts the PLSA model to carry out the scene Semantic Clustering;
Step 7 is utilized gauss hybrid models (GMM) learning each scene semantic visual space;
Step 8 to test pattern, is utilized visual signature to carry out scene and is sorted out, and directly utilizes the semantic mark word accordingly that obtains of this scene.
In order to alleviate too complicated getting in touch between vision and the mark word, reduce mark word ambiguousness, the regional area that the present invention attempts to set up from image pixel to representative surface material is described, carry out the transition to the mark WD of the senior semantic content of representative image again from the scene semantic classes of image, form a kind of multi-level lower-level vision feature and the connection system between the mark word.To this, the present invention is in the process that this multi-cascade junctor system sets up, how to make full use of mark word and this two aspects information of visual signature of training image, the height semantic consistency that keeps image clustering solves the proportion assignment problem of vision and semantic information in the image clustering process with a kind of natural effective and efficient manner.
In view of probability latent semantic analysis (the Probability Latent Semantic Analysis in the speech bag model (Bag of Words), PLSA) good behaviour aspect automatic semantic classes extraction, we take the PLSA model to extract the scene semanteme of image.But, method of the present invention is with only to rely on visual signature that the mark image set that is used to learn is carried out the PLSA cluster different, because research object is the mark image, except visual signature also comprises mark word information, and the mark word seems particularly important in Semantic Clustering, therefore the present invention's information of combining image vision and two aspects of mark word is carried out PLSA scene cluster, increases the rationality of PLSA model in the scene cluster.
The present invention utilizes the correlativity between visual signature and the mark word, and Image Visual Feature is converted into one group of mark word that has weights, and weights are represented the correlation degree between visual signature and the mark word.The mode of taking a kind of reliability to propagate innovatively, the mark word of every width of cloth image is propagated to its vision adjacent image, the vision difference that information propagation amount is pressed between adjacent image determines that the image of accepting the mark word then carries out message pick-up according to the correlativity between the mark word.Allow mark word cumulative growth in the similar image of visual signature, thereby visual signature is converted into one group of weights representing this image and each mark word degree of correlation.The benefit of this method is, not only determined the proportion assignment problem of mark word and visual information in the cluster process, also solved the sparse property of mark WD, utilizes the PLSA model to extract the scene semantic classes of image with a kind of more natural reasonable manner.
Description of drawings
Fig. 1 is based on the process flow diagram of the mark image scene clustering method of vision and mark word relevant information.
Embodiment
Below in conjunction with the accompanying drawing distance the present invention is done more detailed description:
Step 1 adopts NCut (Normalized Cut) image segmentation algorithm respectively training image (to the mark image that is used to learn) and test pattern to be cut apart, and the vision that obtains image-region is described.
Step 2, all images { J that is configured to learn 1,, J lPC TrainVision arest neighbors figure
Figure BDA0000066209890000031
Corresponding each image of vertex set V, the corresponding piece image in each summit, the visible sensation distance between the collection E representative image of limit.We adopt similarity measure-dozer distance of the integrated coupling of multizone to the visible sensation distance between image, and (Earth Mover ' s Distance EMD), connects the EMD visible sensation distance between the weights correspondence image on the limit on two summits.
Step 3 is concentrated at training image, and every width of cloth image has one group of initial normalization mark word weight vector.The method for normalizing of initial mark word weight vector is that statistics all marks the frequency that word occurs in this image.
Step 4 makes the mark word of every width of cloth training image propagate between the vision arest neighbors, and the image of acceptance receives according to the degree of normalized EMD distance between them, and the method for normalizing of EMD distance is seen formula (1).
Figure BDA0000066209890000032
Wherein, the EMD distance between the Emd representative image, Emd NorExpression normalization EMD distance, δ is an empirical parameter, gets the EMD variance of training plan image set.In the communication process of the mark word of representing semantic classes, every width of cloth image is not passive this mark word of reception, but utilizes the visible sensation distance that receives image and propagate image as marking the confidence level that word is propagated, the scheme of having taked a kind of active to receive.
Step 5, to every width of cloth training image, the mark word weights that accumulation is finished carry out normalization again, promptly divided by and each image between the normalization EMD of (comprising this image itself) apart from sum, specified image and the normalization EMD of self distance are 1.
Step 6 after visual feature of image is converted into one group of mark word that has weights, adopts the PLSA model to carry out the scene Semantic Clustering.At the selection problem of poly-number in this model, this patent has proposed a solution, at first chooses bigger clusters number, and the frequency that appears at the mark word in each cluster result has been given prominence to the semantic emphasis of this scene classification.Judge semantic similarity degree according to the mark word information in the cluster result then, merge with the consistent cluster result of vision distribution, can solve the selection problem of poly-number effectively semantic.
Step 7 is utilized gauss hybrid models (GMM) learning each scene semantic visual space.
Step 8 to test pattern, is utilized visual signature to carry out scene and is sorted out, and can directly utilize the semantic mark word accordingly that obtains of this scene.

Claims (3)

1. mark image scene clustering method based on vision and mark word relevant information is characterized in that:
Step 1 adopts the NCut image segmentation algorithm respectively training image and test pattern to be cut apart, and the vision that obtains image-region is described;
Step 2, all images { J that is configured to learn 1,, J l∈ C TrainVision arest neighbors figure G=(V, E), corresponding each image of vertex set V, the corresponding piece image in each summit, the visible sensation distance between the collection E representative image of limit; It is EMD that visible sensation distance between image is adopted the similarity measure of the integrated coupling of multizone, the EMD visible sensation distance between the weights correspondence image on the limit on two summits of connection;
Step 3 is concentrated at training image, and every width of cloth image has one group of initial normalization mark word weight vector, and the method for normalizing that initially marks the word weight vector is that statistics all marks the frequency that word occurs in this image;
Step 4 makes the mark word of every width of cloth training image propagate between the vision arest neighbors, and the image of acceptance receives according to the degree of normalized EMD distance between them, and the method for normalizing of EMD distance sees that formula is
Emd nor = e - Emd δ
Wherein, the EMD distance between the Emd representative image, Emd NorExpression normalization EMD distance, δ is an empirical parameter, gets the EMD variance of training plan image set;
Step 5, to every width of cloth training image, the mark word weights that accumulation is finished carry out normalization again;
Step 6 after visual feature of image is converted into one group of mark word that has weights, adopts the PLSA model to carry out the scene Semantic Clustering;
Step 7 is utilized gauss hybrid models learning each scene semantic visual space;
Step 8 to test pattern, is utilized visual signature to carry out scene and is sorted out, and directly utilizes the semantic mark word accordingly that obtains of this scene.
2. the mark image scene clustering method based on vision and mark word relevant information according to claim 1, it is characterized in that: the described mark word weights that accumulation is finished carry out normalization again, be divided by and each image between, the normalization EMD that comprises this image itself is apart from sum, specified image and the normalization EMD of self distance are 1.
3. the mark image scene clustering method based on vision and mark word relevant information according to claim 1 and 2, it is characterized in that: described employing PLSA model carries out the scene Semantic Clustering, at first choose bigger clusters number, judge semantic similarity degree according to the mark word information in the cluster result then, merge with the consistent cluster result of vision distribution semantic.
CN201110148760.3A 2011-06-03 2011-06-03 Labelling image scene clustering method based on vision and labelling character related information Expired - Fee Related CN102222239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110148760.3A CN102222239B (en) 2011-06-03 2011-06-03 Labelling image scene clustering method based on vision and labelling character related information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110148760.3A CN102222239B (en) 2011-06-03 2011-06-03 Labelling image scene clustering method based on vision and labelling character related information

Publications (2)

Publication Number Publication Date
CN102222239A true CN102222239A (en) 2011-10-19
CN102222239B CN102222239B (en) 2014-03-26

Family

ID=44778786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110148760.3A Expired - Fee Related CN102222239B (en) 2011-06-03 2011-06-03 Labelling image scene clustering method based on vision and labelling character related information

Country Status (1)

Country Link
CN (1) CN102222239B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867192A (en) * 2012-09-04 2013-01-09 北京航空航天大学 Scene semantic shift method based on supervised geodesic propagation
CN102968620A (en) * 2012-11-16 2013-03-13 华中科技大学 Scene recognition method based on layered Gaussian hybrid model
CN103324940A (en) * 2013-05-02 2013-09-25 广东工业大学 Skin pathological image feature recognition method based on multi-example multi-label study
CN103377381A (en) * 2012-04-26 2013-10-30 富士通株式会社 Method and device for identifying content attribute of image
CN106469437A (en) * 2015-08-18 2017-03-01 联想(北京)有限公司 Image processing method and image processing apparatus
CN111061890A (en) * 2019-12-09 2020-04-24 腾讯云计算(北京)有限责任公司 Method for verifying labeling information, method and device for determining category
CN111191027A (en) * 2019-12-14 2020-05-22 上海电力大学 Generalized zero sample identification method based on Gaussian mixture distribution (VAE)
CN114898426A (en) * 2022-04-20 2022-08-12 国网智能电网研究院有限公司 Synonym label aggregation method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182443B (en) * 2016-12-08 2020-08-07 广东精点数据科技股份有限公司 Automatic image labeling method and device based on decision tree

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059872A1 (en) * 2006-09-05 2008-03-06 National Cheng Kung University Video annotation method by integrating visual features and frequent patterns
CN101295360A (en) * 2008-05-07 2008-10-29 清华大学 Semi-supervision image classification method based on weighted graph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059872A1 (en) * 2006-09-05 2008-03-06 National Cheng Kung University Video annotation method by integrating visual features and frequent patterns
CN101295360A (en) * 2008-05-07 2008-10-29 清华大学 Semi-supervision image classification method based on weighted graph

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
20021231 P. Duygulu, K. Barnard, J. F. G. de Freitas, D. A. Forsyth Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary 97-112 1-3 , *
P. DUYGULU, K. BARNARD, J. F. G. DE FREITAS, D. A. FORSYTH: "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary", <PROCEEDINGS OF THE 7TH EUROPEAN CONFERENCE ON COMPUTER VISION-PART IV.> *
于林森,张田文: "基于视觉与标注相关信息的图像聚类算法", 《电子学报》 *
魏昕路: "自然图像语义标注的方法研究与应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377381B (en) * 2012-04-26 2016-09-28 富士通株式会社 The method and apparatus identifying the contents attribute of image
CN103377381A (en) * 2012-04-26 2013-10-30 富士通株式会社 Method and device for identifying content attribute of image
CN102867192B (en) * 2012-09-04 2016-01-06 北京航空航天大学 A kind of Scene Semantics moving method propagated based on supervision geodesic line
CN102867192A (en) * 2012-09-04 2013-01-09 北京航空航天大学 Scene semantic shift method based on supervised geodesic propagation
CN102968620B (en) * 2012-11-16 2015-05-20 华中科技大学 Scene recognition method based on layered Gaussian hybrid model
CN102968620A (en) * 2012-11-16 2013-03-13 华中科技大学 Scene recognition method based on layered Gaussian hybrid model
CN103324940A (en) * 2013-05-02 2013-09-25 广东工业大学 Skin pathological image feature recognition method based on multi-example multi-label study
CN106469437A (en) * 2015-08-18 2017-03-01 联想(北京)有限公司 Image processing method and image processing apparatus
CN106469437B (en) * 2015-08-18 2020-08-25 联想(北京)有限公司 Image processing method and image processing apparatus
CN111061890A (en) * 2019-12-09 2020-04-24 腾讯云计算(北京)有限责任公司 Method for verifying labeling information, method and device for determining category
CN111061890B (en) * 2019-12-09 2023-04-07 腾讯云计算(北京)有限责任公司 Method for verifying labeling information, method and device for determining category
CN111191027A (en) * 2019-12-14 2020-05-22 上海电力大学 Generalized zero sample identification method based on Gaussian mixture distribution (VAE)
CN111191027B (en) * 2019-12-14 2023-05-30 上海电力大学 Generalized zero sample identification method based on Gaussian mixture distribution (VAE)
CN114898426A (en) * 2022-04-20 2022-08-12 国网智能电网研究院有限公司 Synonym label aggregation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN102222239B (en) 2014-03-26

Similar Documents

Publication Publication Date Title
CN102222239B (en) Labelling image scene clustering method based on vision and labelling character related information
CN101777059B (en) Method for extracting landmark scene abstract
CN110674407B (en) Hybrid recommendation method based on graph convolution neural network
CN101894275B (en) Weakly supervised method for classifying SAR images
CN101894134B (en) Spatial layout-based fishing webpage detection and implementation method
CN103268635B (en) The segmentation of a kind of geometric grid model of place and semanteme marking method
CN102663382B (en) Video image character recognition method based on submesh characteristic adaptive weighting
CN109710701A (en) A kind of automated construction method for public safety field big data knowledge mapping
CN101923653B (en) Multilevel content description-based image classification method
CN103678680B (en) Image classification method based on area-of-interest multi dimensional space relational model
CN109919159A (en) A kind of semantic segmentation optimization method and device for edge image
CN106101222A (en) The method for pushing of information and device
CN102663431A (en) Image matching calculation method on basis of region weighting
CN101751438A (en) Theme webpage filter system for driving self-adaption semantics
CN111539452B (en) Image recognition method and device for multi-task attribute, electronic equipment and storage medium
CN106599051A (en) Method for automatically annotating image on the basis of generation of image annotation library
CN110334578A (en) Image level marks the Weakly supervised method for automatically extracting high score remote sensing image building
CN103413142A (en) Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model
CN101398846A (en) Image, semantic and concept detection method based on partial color space characteristic
CN102142089A (en) Semantic binary tree-based image annotation method
CN109376352A (en) A kind of patent text modeling method based on word2vec and semantic similarity
CN104142995A (en) Social event recognition method based on visual attributes
CN104751175A (en) Multi-label scene classification method of SAR (Synthetic Aperture Radar) image based on incremental support vector machine
CN103810303A (en) Image search method and system based on focus object recognition and theme semantics
CN102129568A (en) Method for detecting image-based spam email by utilizing improved gauss hybrid model classifier

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140326

Termination date: 20200603

CF01 Termination of patent right due to non-payment of annual fee