CN108062421A - A kind of extensive picture multiscale semanteme search method - Google Patents

A kind of extensive picture multiscale semanteme search method Download PDF

Info

Publication number
CN108062421A
CN108062421A CN201810020300.4A CN201810020300A CN108062421A CN 108062421 A CN108062421 A CN 108062421A CN 201810020300 A CN201810020300 A CN 201810020300A CN 108062421 A CN108062421 A CN 108062421A
Authority
CN
China
Prior art keywords
picture
network
vector
text
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810020300.4A
Other languages
Chinese (zh)
Inventor
田腾飞
李仁勇
崇志宏
张云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Focus Technology Co Ltd
Original Assignee
Southeast University
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University, Focus Technology Co Ltd filed Critical Southeast University
Priority to CN201810020300.4A priority Critical patent/CN108062421A/en
Publication of CN108062421A publication Critical patent/CN108062421A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

A kind of extensive picture semantic search method carrys out training network using unsupervised deep learning model and obtains the feature vector of picture, and considers the semantic relation between the text description of picture to realize the retrieval of large-scale picture;Processing for the feature vector of picture, the production for differentiating 46 layers of generation network composition of network using one 46 layers resists network, for extracting the feature of picture;Processing for the text of picture is obtained picture vector using the distributed method for expressing of term vector, the semantic information of picture is described using word nesting;It is clustered using clustering method come the picture to retrieval, by clustering one only to show in certain class commodity to user, reduces the time of the lookup commodity of user;Then picture text description vectors are obtained by trained term vector;The vector of text vector and picture is connected together as to the character representation of picture;Picture is clustered by k means++ afterwards.

Description

A kind of extensive picture multiscale semanteme search method
Technical field
The present invention is a kind of extensive picture semantic retrieval technique, particularly the multiscale semanteme of large-scale electric business picture Search method.
Background technology
The retrieval technique of existing picture is broadly divided into text based picture retrieval technology and the picture inspection based on content Rope technology.The technology of text based retrieval describes the feature of picture using the mode that text describes.Picture inspection based on content Rope technology is analyzed and retrieved by the color of picture, texture, layout etc..The work that text based retrieval passes through picture Person, age, school, size describe picture, and such mode cannot embody similar between semanteme between picture.Based on content Picture retrieval technology need the feature of manual extraction picture, it is necessary to add in the input of man power and material.Come in recent years, depth Study larger success is had been achieved in computer vision field, using deep learning realize image retrieval will be one very Good method.
The search method as disclosed in CN106777177A receives the retrieval request that client is sent, wherein, the retrieval please It asks including Target Photo;The Target Photo is parsed, extracts text message and characteristics of image;By the text message with The text message of each preset picture in preset picture set is matched, and determines the first similarity, also, in response to institute It states the first similarity and is more than default first threshold, described image feature is matched with the characteristics of image of the preset picture, Based on matching result, it is determined whether the preset picture is determined as identical picture;The related information of identical picture is obtained, it will be identical Picture and the related information are sent to the client as retrieval result, so that the client shows the retrieval knot Fruit.
CN105760390A picture retrieval systems, run in electronic equipment, including picture acquisition module, for obtaining one Open picture to be identified;Picture processing module, for being pre-processed to above-mentioned picture to be identified;Characteristic extracting module is used In the characteristics of image for extracting the picture to be identified;And retrieval module, for according to acquired characteristics of image, being deposited from default cloud The picture that retrieval matches with the picture to be identified in reservoir.
The content of the invention
The shortcomings that in order to overcome existing method semantic expressiveness imperfect and need the input of substantial amounts of human and material resources. The present invention seeks to, it is proposed that a kind of extensive picture semantic retrieval technique considers the relation between picture from multiple scales, Training network is come by using unsupervised deep learning model and obtains the feature vector of picture, and considers the text of picture Semantic relation between description realizes the retrieval of large-scale picture.Both without being labeled to picture, manpower is reduced, together When considered relation between the semanteme of picture.The method of the present invention has merged text based picture retrieval technology and base In the retrieval technique of the picture of content the advantages of.
The present invention solve extensive picture search problem used in technical solution be:A kind of extensive picture semantic retrieval Method carrys out training network using unsupervised deep learning model and obtains the feature vector of picture, and considers the text of picture Semantic relation between this description realizes the retrieval of large-scale picture;
Processing for the feature vector of picture, the generation for differentiating 4-6 layers of generation network composition of network using one 4-6 layers Formula resists network, for extracting the feature of picture;It may be referred to Fig. 3,5 layers of production for differentiating 5 layers of generation network composition of network Resist network;
Processing for the text of picture obtains picture vector using the distributed method for expressing of term vector, utilizes word Nesting describes the semantic information of picture;It may be referred to Fig. 7 in embodiment;
It is clustered come the picture to retrieval using clustering method, is only shown to user in certain class commodity by clustering It one, may be referred to Fig. 4 in embodiment, reduce the time of the lookup commodity of user;The use of clustering method is k-means++ side Method;
After picture vector is obtained, similitude is calculated with the picture to be searched by calculating, similitude is found out and is more than 0.5 picture is as candidate;
Then picture text description vectors are obtained by trained term vector;The vector of text vector and picture is connected Character representation as picture together;Picture is clustered by k-means++ afterwards, one is found out in each cluster Picture is presented to user, if user wants to check all pictures of the cluster where this pictures, clicks on the pictures, then can See all pictures.
Further, the character representation for differentiating network acquisition picture in network is resisted using production, then passes through feature Between similitude find out similar picture, may be referred to Figures 5 and 6;Meanwhile using term vector obtain the description of picture text to Amount represents;Then the text description of the vector sum picture of picture is connected together as the expression of the pictures, then used K-means clusters picture, and one is chosen from each classification and shows user.
It is specific to implement to be divided into training and two steps of production environment;Training step is trained production confrontation network;Instruction Using tensorflow model platforms when practicing, for a convolutional neural networks, generation network is the differentiation network used when training One deconvolution neutral net;
Above-mentioned is typically 5 layers of production confrontation network for differentiating network and 5 layers of generation network composition, in the network In, the input of network is generated as the random vector of 100 dimensions, is exported as the picture of a 64*64*3;Differentiate network input be A pictures of 64*64*3 are exported as the number between one 0 to 1, represent the probability that the picture is true picture;
In training, confrontation is formed by minimizing the loss of true picture and generating the loss of picture respectively;Network It is middle to have used batch normalization to solve the problems, such as that the explosion of the gradient in network training and gradient disappear, cancel complete Articulamentum improves the convergence rate of network;After network training, differentiate the output of layer second from the bottom of network as figure Picture is picked out the higher part picture of similarity by the feature of piece according to the characteristic similarity between picture.
In the training of term vector, as input, output is then each word institute for the text description of the corresponding commodity of picture Corresponding vector;Then the word vector included that the text of every pictures describes is added to obtain the language of the pictures Justice represents.
The semanteme between picture can be given expression to for the distributed method represented compared to one-hot of above-mentioned term vector It is similar.In the training of term vector, as input, output is then that each word institute is right for the text description of the corresponding commodity of picture The vector answered.Then the word vector included that the text of every pictures describes is added to obtain the semanteme of the pictures It represents.
The method of above-mentioned cluster is in order to which when being shown to user, it is only shown for of a sort picture In one, reduce user lookup burden.K-means++ is compared with k-mean so that during initialization cluster centre so that poly- Farther out, k-means methods are improved at the distance between class center.
Advantageous effect of the present invention:Resisting differentiating in network using production, network obtains the character representation of picture, then Similar picture is found out by the similitude between feature.Meanwhile the vector that the description of picture text is obtained using term vector is represented. Then the text description of the vector sum picture of picture is connected together as the expression of the pictures, then using k-means Picture is clustered, one is chosen from each classification and shows user.The present invention is to consider picture from multiple scales Semantic feature, compared to method before, need not largely artificial participation, picture is obtained by deep learning method automatically Feature, and considered the semantic feature of the description of picture, retrieved suitable for ten million magnitude picture multiscale semanteme.Figure The more diversification of the character representation of piece can more take out the profound feature of picture.Especially with unsupervised learning Method extracts the feature of picture so that this method is still general in Large Scale Graphs under piece.
Description of the drawings
Fig. 1 is the frame that production resists network;
Fig. 2 is whole system flow chart.
Fig. 3 makes a living into the specific implementation of network.
Fig. 4 is the flow chart of keyword search results.
Fig. 5 makes a living into network flow chart.
Fig. 6 is differentiation flow through a network figure.
Fig. 7 is the product process of text description vectors.
Specific embodiment
The present invention is further described below in conjunction with the accompanying drawings, as shown in the figure, specific implement to be divided into training and production environment Two parts.In trained part mainly training production confrontation network.This training uses tensorflow platforms.Differentiate net Network is a convolutional neural networks, and generation network is a deconvolution neutral net.Each iteration uses 64 figures in a network Piece.Main framework is in fig 2.
After training is completed, then the model after being trained utilizes trained one standard of model foundation The server of tensorflow models.In actual application, one or a collection of picture can be sent to this server every time To obtain the vector of picture.
After picture vector is obtained, similitude is calculated with the picture to be searched by calculating, similitude is found out and is more than 0.5 picture is as candidate.Then picture text description vectors are obtained by trained term vector.By text vector and picture Vector be connected together as the character representation of picture.Picture is clustered by k-means++ afterwards, in each cluster In find out a pictures and be presented to user, if user wants to check all pictures of the cluster where this pictures, clicking on should Pictures, then it can be seen that all pictures.The flow of Fig. 3-7 can refer to.
Network is resisted with reference to 3,5 layers of production for differentiating that 5 layers of generation network of network form of figure;
With reference to figure 4, clustered using clustering method come the picture to retrieval, certain class is only shown to user by clustering One in commodity, reduce the time of the lookup commodity of user;The use of clustering method is k-means++ methods;
After picture vector is obtained, similitude is calculated with the picture to be searched by calculating, similitude is found out and is more than 0.5 picture is as candidate;
It refers to Figures 5 and 6;Meanwhile the vector that the description of picture text is obtained using term vector is represented;Then by the vector of picture Text description with picture is connected together as the expression of the pictures, and then picture is clustered using k-means, from One, which is chosen, in each classification shows user.
With reference to figure 7, picture vector is obtained using the distributed method for expressing of term vector, picture is described using word nesting Semantic information.
Present invention is not limited to the embodiments described above, using identical with the above-mentioned embodiment of the present invention or approximate structure, Obtained from other structures design, within protection scope of the present invention.

Claims (5)

1. a kind of extensive picture semantic search method, is obtained it is characterized in that carrying out training network using unsupervised deep learning model The feature vector of picture is taken, and considers the semantic relation between the text description of picture to realize the inspection of large-scale picture Rope;
Processing for the feature vector of picture, the production pair for differentiating 4-6 layers of generation network composition of network using one 4-6 layers Anti- network, for extracting the feature of picture;
Processing for the text of picture obtains picture vector using the distributed method for expressing of term vector, nested using word To describe the semantic information of picture;
It is clustered using clustering method come the picture to retrieval, by clustering one only to show in certain class commodity to user It is a, reduce the time of the lookup commodity of user;The use of clustering method is k-means++ methods;
After picture vector is obtained, similitude is calculated with the picture to be searched by calculating, similitude is found out and is more than 0.5 Picture is as candidate;
Then picture text description vectors are obtained by trained term vector;The vector of text vector and picture is connected to one Play the character representation as picture;Picture is clustered by k-means++ afterwards, a pictures are found out in each cluster User is presented to, if user wants to check all pictures of the cluster where this pictures, the pictures is clicked on, then can see All pictures.
2. extensive picture semantic search method according to claim 1, it is characterized in that the feature vector for picture In processing, resisting differentiating in network using production, network obtains the character representation of picture, then passes through the phase between feature Similar picture is found out like property;Meanwhile the vector that the description of picture text is obtained using term vector is represented;Then by the vector of picture Text description with picture is connected together as the expression of the pictures, and then picture is clustered using k-means, from One, which is chosen, in each classification shows user.
3. extensive picture semantic search method according to claim 2, it is characterized in that it is specific implement to be divided into training and Two steps of production environment;Training step is trained production confrontation network;Tensorflow model platforms, instruction are used during training For the differentiation network used when practicing for a convolutional neural networks, generation network is a deconvolution neutral net;
Differentiate network and 5 layers of generation network, 5 layers of production confrontation network for differentiating network and 5 layers of generation network composition using 5 layers In, the input of network is generated as the random vector of 100 dimensions, is exported as the picture of a 64*64*3;Differentiate network input be A pictures of 64*64*3 are exported as the number between one 0 to 1, represent the probability that the picture is true picture;
In training, confrontation is formed by minimizing the loss of true picture and generating the loss of picture respectively;Make in network It solves the problems, such as that the explosion of the gradient in network training and gradient disappear with batch normalization, cancels full connection Layer improves the convergence rate of network;After network training, differentiate the output of layer second from the bottom of network as picture Picture is picked out the higher part picture of similarity by feature according to the characteristic similarity between picture.
4. extensive picture semantic search method according to claim 1, it is characterized in that in the training of term vector, picture As input, output is then the vector corresponding to each word for the text description of corresponding commodity;Then by the text of every pictures The word vector included of this description is added to obtain the semantic expressiveness of the pictures.
5. extensive picture semantic search method according to claim 1, it is characterized in that the method for above-mentioned cluster be for When being shown to user, for of a sort picture one therein is only shown, reduce the lookup burden of user;
K-means++ is compared with k-mean so that initialization cluster centre when so that the distance between cluster centre farther out, K-means methods are improved;
After training is completed, then the model after being trained utilizes trained one standard of model foundation The server of tensorflow models;In actual application, one or a collection of picture can be sent to this server every time To obtain the vector of picture.
CN201810020300.4A 2018-01-09 2018-01-09 A kind of extensive picture multiscale semanteme search method Pending CN108062421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810020300.4A CN108062421A (en) 2018-01-09 2018-01-09 A kind of extensive picture multiscale semanteme search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810020300.4A CN108062421A (en) 2018-01-09 2018-01-09 A kind of extensive picture multiscale semanteme search method

Publications (1)

Publication Number Publication Date
CN108062421A true CN108062421A (en) 2018-05-22

Family

ID=62141120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810020300.4A Pending CN108062421A (en) 2018-01-09 2018-01-09 A kind of extensive picture multiscale semanteme search method

Country Status (1)

Country Link
CN (1) CN108062421A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829847A (en) * 2018-06-20 2018-11-16 山东大学 Commodity search method and system based on multi-modal shopping preferences
CN108932660A (en) * 2018-07-26 2018-12-04 北京旷视科技有限公司 A kind of commodity using effect analogy method, device and equipment
CN109063772A (en) * 2018-08-02 2018-12-21 广东工业大学 A kind of image individuation semantic analysis, device and equipment based on deep learning
CN109584257A (en) * 2018-11-28 2019-04-05 中国科学院深圳先进技术研究院 A kind of image processing method and relevant device
CN109901835A (en) * 2019-01-25 2019-06-18 北京三快在线科技有限公司 Method, apparatus, equipment and the storage medium of layout element
CN110059217A (en) * 2019-04-29 2019-07-26 广西师范大学 A kind of image text cross-media retrieval method of two-level network
CN111339340A (en) * 2018-12-18 2020-06-26 顺丰科技有限公司 Training method of image description model, image searching method and device
CN113656582A (en) * 2021-08-17 2021-11-16 北京百度网讯科技有限公司 Training method of neural network model, image retrieval method, device and medium
CN115186119A (en) * 2022-09-07 2022-10-14 深圳市华曦达科技股份有限公司 Picture processing method and system based on picture and text combination and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253996A (en) * 2011-07-08 2011-11-23 北京航空航天大学 Multi-visual angle stagewise image clustering method
CN106126581A (en) * 2016-06-20 2016-11-16 复旦大学 Cartographical sketching image search method based on degree of depth study
CN106997380A (en) * 2017-03-21 2017-08-01 北京工业大学 Imaging spectrum safe retrieving method based on DCGAN depth networks
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN107330364A (en) * 2017-05-27 2017-11-07 上海交通大学 A kind of people counting method and system based on cGAN networks
US20170365038A1 (en) * 2016-06-16 2017-12-21 Facebook, Inc. Producing Higher-Quality Samples Of Natural Images

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253996A (en) * 2011-07-08 2011-11-23 北京航空航天大学 Multi-visual angle stagewise image clustering method
US20170365038A1 (en) * 2016-06-16 2017-12-21 Facebook, Inc. Producing Higher-Quality Samples Of Natural Images
CN106126581A (en) * 2016-06-20 2016-11-16 复旦大学 Cartographical sketching image search method based on degree of depth study
CN106997380A (en) * 2017-03-21 2017-08-01 北京工业大学 Imaging spectrum safe retrieving method based on DCGAN depth networks
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN107330364A (en) * 2017-05-27 2017-11-07 上海交通大学 A kind of people counting method and system based on cGAN networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘玉杰: "基于条件生成对抗网络的手绘图像检索", 《计算机辅助设计与图形学学报》 *
樊雷: "一种基于TensorFlow的DCGAN模型实现", 《电脑知识与技术》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829847B (en) * 2018-06-20 2020-11-17 山东大学 Multi-modal modeling method based on translation and application thereof in commodity retrieval
CN108829847A (en) * 2018-06-20 2018-11-16 山东大学 Commodity search method and system based on multi-modal shopping preferences
CN108932660A (en) * 2018-07-26 2018-12-04 北京旷视科技有限公司 A kind of commodity using effect analogy method, device and equipment
CN109063772A (en) * 2018-08-02 2018-12-21 广东工业大学 A kind of image individuation semantic analysis, device and equipment based on deep learning
CN109063772B (en) * 2018-08-02 2022-05-10 广东工业大学 Image personalized semantic analysis method, device and equipment based on deep learning
CN109584257A (en) * 2018-11-28 2019-04-05 中国科学院深圳先进技术研究院 A kind of image processing method and relevant device
CN109584257B (en) * 2018-11-28 2022-12-09 中国科学院深圳先进技术研究院 Image processing method and related equipment
CN111339340A (en) * 2018-12-18 2020-06-26 顺丰科技有限公司 Training method of image description model, image searching method and device
CN109901835A (en) * 2019-01-25 2019-06-18 北京三快在线科技有限公司 Method, apparatus, equipment and the storage medium of layout element
CN110059217A (en) * 2019-04-29 2019-07-26 广西师范大学 A kind of image text cross-media retrieval method of two-level network
CN110059217B (en) * 2019-04-29 2022-11-04 广西师范大学 Image text cross-media retrieval method for two-stage network
CN113656582A (en) * 2021-08-17 2021-11-16 北京百度网讯科技有限公司 Training method of neural network model, image retrieval method, device and medium
CN115186119A (en) * 2022-09-07 2022-10-14 深圳市华曦达科技股份有限公司 Picture processing method and system based on picture and text combination and readable storage medium
CN115186119B (en) * 2022-09-07 2022-12-06 深圳市华曦达科技股份有限公司 Picture processing method and system based on picture and text combination and readable storage medium

Similar Documents

Publication Publication Date Title
CN108062421A (en) A kind of extensive picture multiscale semanteme search method
CN108985377B (en) A kind of image high-level semantics recognition methods of the multiple features fusion based on deep layer network
CN107220657B (en) A kind of method of high-resolution remote sensing image scene classification towards small data set
CN106845529B (en) Image feature identification method based on multi-view convolution neural network
CN107944559B (en) Method and system for automatically identifying entity relationship
CN109902665A (en) Similar face retrieval method, apparatus and storage medium
CN110532996A (en) The method of visual classification, the method for information processing and server
CN107025284A (en) The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN111353542A (en) Training method and device of image classification model, computer equipment and storage medium
CN112119388A (en) Training image embedding model and text embedding model
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
CN106778921A (en) Personnel based on deep learning encoding model recognition methods again
CN109993102A (en) Similar face retrieval method, apparatus and storage medium
CN109783794A (en) File classification method and device
CN110399895A (en) The method and apparatus of image recognition
CN112949740B (en) Small sample image classification method based on multilevel measurement
CN111898703B (en) Multi-label video classification method, model training method, device and medium
CN110287323A (en) A kind of object-oriented sensibility classification method
CN112074828A (en) Training image embedding model and text embedding model
CN107609055B (en) Text image multi-modal retrieval method based on deep layer topic model
CN105989336A (en) Scene identification method based on deconvolution deep network learning with weight
CN109325529B (en) Sketch identification method and application of sketch identification method in commodity retrieval
CN109871749A (en) A kind of pedestrian based on depth Hash recognition methods and device, computer system again
CN113688894A (en) Fine-grained image classification method fusing multi-grained features
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180522