CN108062421A

CN108062421A - A kind of extensive picture multiscale semanteme search method

Info

Publication number: CN108062421A
Application number: CN201810020300.4A
Authority: CN
Inventors: 田腾飞; 李仁勇; 崇志宏; 张云
Original assignee: Southeast University; Focus Technology Co Ltd
Current assignee: Southeast University; Focus Technology Co Ltd
Priority date: 2018-01-09
Filing date: 2018-01-09
Publication date: 2018-05-22

Abstract

A kind of extensive picture semantic search method carrys out training network using unsupervised deep learning model and obtains the feature vector of picture, and considers the semantic relation between the text description of picture to realize the retrieval of large-scale picture；Processing for the feature vector of picture, the production for differentiating 46 layers of generation network composition of network using one 46 layers resists network, for extracting the feature of picture；Processing for the text of picture is obtained picture vector using the distributed method for expressing of term vector, the semantic information of picture is described using word nesting；It is clustered using clustering method come the picture to retrieval, by clustering one only to show in certain class commodity to user, reduces the time of the lookup commodity of user；Then picture text description vectors are obtained by trained term vector；The vector of text vector and picture is connected together as to the character representation of picture；Picture is clustered by k means++ afterwards.

Description

A kind of extensive picture multiscale semanteme search method

Technical field

The present invention is a kind of extensive picture semantic retrieval technique, particularly the multiscale semanteme of large-scale electric business picture Search method.

Background technology

The retrieval technique of existing picture is broadly divided into text based picture retrieval technology and the picture inspection based on content Rope technology.The technology of text based retrieval describes the feature of picture using the mode that text describes.Picture inspection based on content Rope technology is analyzed and retrieved by the color of picture, texture, layout etc..The work that text based retrieval passes through picture Person, age, school, size describe picture, and such mode cannot embody similar between semanteme between picture.Based on content Picture retrieval technology need the feature of manual extraction picture, it is necessary to add in the input of man power and material.Come in recent years, depth Study larger success is had been achieved in computer vision field, using deep learning realize image retrieval will be one very Good method.

The search method as disclosed in CN106777177A receives the retrieval request that client is sent, wherein, the retrieval please It asks including Target Photo；The Target Photo is parsed, extracts text message and characteristics of image；By the text message with The text message of each preset picture in preset picture set is matched, and determines the first similarity, also, in response to institute It states the first similarity and is more than default first threshold, described image feature is matched with the characteristics of image of the preset picture, Based on matching result, it is determined whether the preset picture is determined as identical picture；The related information of identical picture is obtained, it will be identical Picture and the related information are sent to the client as retrieval result, so that the client shows the retrieval knot Fruit.

CN105760390A picture retrieval systems, run in electronic equipment, including picture acquisition module, for obtaining one Open picture to be identified；Picture processing module, for being pre-processed to above-mentioned picture to be identified；Characteristic extracting module is used In the characteristics of image for extracting the picture to be identified；And retrieval module, for according to acquired characteristics of image, being deposited from default cloud The picture that retrieval matches with the picture to be identified in reservoir.

The content of the invention

The shortcomings that in order to overcome existing method semantic expressiveness imperfect and need the input of substantial amounts of human and material resources. The present invention seeks to, it is proposed that a kind of extensive picture semantic retrieval technique considers the relation between picture from multiple scales, Training network is come by using unsupervised deep learning model and obtains the feature vector of picture, and considers the text of picture Semantic relation between description realizes the retrieval of large-scale picture.Both without being labeled to picture, manpower is reduced, together When considered relation between the semanteme of picture.The method of the present invention has merged text based picture retrieval technology and base In the retrieval technique of the picture of content the advantages of.

The present invention solve extensive picture search problem used in technical solution be：A kind of extensive picture semantic retrieval Method carrys out training network using unsupervised deep learning model and obtains the feature vector of picture, and considers the text of picture Semantic relation between this description realizes the retrieval of large-scale picture；

Processing for the feature vector of picture, the generation for differentiating 4-6 layers of generation network composition of network using one 4-6 layers Formula resists network, for extracting the feature of picture；It may be referred to Fig. 3,5 layers of production for differentiating 5 layers of generation network composition of network Resist network；

Processing for the text of picture obtains picture vector using the distributed method for expressing of term vector, utilizes word Nesting describes the semantic information of picture；It may be referred to Fig. 7 in embodiment；

It is clustered come the picture to retrieval using clustering method, is only shown to user in certain class commodity by clustering It one, may be referred to Fig. 4 in embodiment, reduce the time of the lookup commodity of user；The use of clustering method is k-means++ side Method；

After picture vector is obtained, similitude is calculated with the picture to be searched by calculating, similitude is found out and is more than 0.5 picture is as candidate；

Then picture text description vectors are obtained by trained term vector；The vector of text vector and picture is connected Character representation as picture together；Picture is clustered by k-means++ afterwards, one is found out in each cluster Picture is presented to user, if user wants to check all pictures of the cluster where this pictures, clicks on the pictures, then can See all pictures.

Further, the character representation for differentiating network acquisition picture in network is resisted using production, then passes through feature Between similitude find out similar picture, may be referred to Figures 5 and 6；Meanwhile using term vector obtain the description of picture text to Amount represents；Then the text description of the vector sum picture of picture is connected together as the expression of the pictures, then used K-means clusters picture, and one is chosen from each classification and shows user.

It is specific to implement to be divided into training and two steps of production environment；Training step is trained production confrontation network；Instruction Using tensorflow model platforms when practicing, for a convolutional neural networks, generation network is the differentiation network used when training One deconvolution neutral net；

Above-mentioned is typically 5 layers of production confrontation network for differentiating network and 5 layers of generation network composition, in the network In, the input of network is generated as the random vector of 100 dimensions, is exported as the picture of a 64*64*3；Differentiate network input be A pictures of 64*64*3 are exported as the number between one 0 to 1, represent the probability that the picture is true picture；

In training, confrontation is formed by minimizing the loss of true picture and generating the loss of picture respectively；Network It is middle to have used batch normalization to solve the problems, such as that the explosion of the gradient in network training and gradient disappear, cancel complete Articulamentum improves the convergence rate of network；After network training, differentiate the output of layer second from the bottom of network as figure Picture is picked out the higher part picture of similarity by the feature of piece according to the characteristic similarity between picture.

In the training of term vector, as input, output is then each word institute for the text description of the corresponding commodity of picture Corresponding vector；Then the word vector included that the text of every pictures describes is added to obtain the language of the pictures Justice represents.

The semanteme between picture can be given expression to for the distributed method represented compared to one-hot of above-mentioned term vector It is similar.In the training of term vector, as input, output is then that each word institute is right for the text description of the corresponding commodity of picture The vector answered.Then the word vector included that the text of every pictures describes is added to obtain the semanteme of the pictures It represents.

The method of above-mentioned cluster is in order to which when being shown to user, it is only shown for of a sort picture In one, reduce user lookup burden.K-means++ is compared with k-mean so that during initialization cluster centre so that poly- Farther out, k-means methods are improved at the distance between class center.

Advantageous effect of the present invention：Resisting differentiating in network using production, network obtains the character representation of picture, then Similar picture is found out by the similitude between feature.Meanwhile the vector that the description of picture text is obtained using term vector is represented. Then the text description of the vector sum picture of picture is connected together as the expression of the pictures, then using k-means Picture is clustered, one is chosen from each classification and shows user.The present invention is to consider picture from multiple scales Semantic feature, compared to method before, need not largely artificial participation, picture is obtained by deep learning method automatically Feature, and considered the semantic feature of the description of picture, retrieved suitable for ten million magnitude picture multiscale semanteme.Figure The more diversification of the character representation of piece can more take out the profound feature of picture.Especially with unsupervised learning Method extracts the feature of picture so that this method is still general in Large Scale Graphs under piece.

Description of the drawings

Fig. 1 is the frame that production resists network；

Fig. 2 is whole system flow chart.

Fig. 3 makes a living into the specific implementation of network.

Fig. 4 is the flow chart of keyword search results.

Fig. 5 makes a living into network flow chart.

Fig. 6 is differentiation flow through a network figure.

Fig. 7 is the product process of text description vectors.

Specific embodiment

The present invention is further described below in conjunction with the accompanying drawings, as shown in the figure, specific implement to be divided into training and production environment Two parts.In trained part mainly training production confrontation network.This training uses tensorflow platforms.Differentiate net Network is a convolutional neural networks, and generation network is a deconvolution neutral net.Each iteration uses 64 figures in a network Piece.Main framework is in fig 2.

After training is completed, then the model after being trained utilizes trained one standard of model foundation The server of tensorflow models.In actual application, one or a collection of picture can be sent to this server every time To obtain the vector of picture.

After picture vector is obtained, similitude is calculated with the picture to be searched by calculating, similitude is found out and is more than 0.5 picture is as candidate.Then picture text description vectors are obtained by trained term vector.By text vector and picture Vector be connected together as the character representation of picture.Picture is clustered by k-means++ afterwards, in each cluster In find out a pictures and be presented to user, if user wants to check all pictures of the cluster where this pictures, clicking on should Pictures, then it can be seen that all pictures.The flow of Fig. 3-7 can refer to.

Network is resisted with reference to 3,5 layers of production for differentiating that 5 layers of generation network of network form of figure；

With reference to figure 4, clustered using clustering method come the picture to retrieval, certain class is only shown to user by clustering One in commodity, reduce the time of the lookup commodity of user；The use of clustering method is k-means++ methods；

It refers to Figures 5 and 6；Meanwhile the vector that the description of picture text is obtained using term vector is represented；Then by the vector of picture Text description with picture is connected together as the expression of the pictures, and then picture is clustered using k-means, from One, which is chosen, in each classification shows user.

With reference to figure 7, picture vector is obtained using the distributed method for expressing of term vector, picture is described using word nesting Semantic information.

Present invention is not limited to the embodiments described above, using identical with the above-mentioned embodiment of the present invention or approximate structure, Obtained from other structures design, within protection scope of the present invention.

Claims

1. a kind of extensive picture semantic search method, is obtained it is characterized in that carrying out training network using unsupervised deep learning model The feature vector of picture is taken, and considers the semantic relation between the text description of picture to realize the inspection of large-scale picture Rope；

Processing for the feature vector of picture, the production pair for differentiating 4-6 layers of generation network composition of network using one 4-6 layers Anti- network, for extracting the feature of picture；

Processing for the text of picture obtains picture vector using the distributed method for expressing of term vector, nested using word To describe the semantic information of picture；

It is clustered using clustering method come the picture to retrieval, by clustering one only to show in certain class commodity to user It is a, reduce the time of the lookup commodity of user；The use of clustering method is k-means++ methods；

Then picture text description vectors are obtained by trained term vector；The vector of text vector and picture is connected to one Play the character representation as picture；Picture is clustered by k-means++ afterwards, a pictures are found out in each cluster User is presented to, if user wants to check all pictures of the cluster where this pictures, the pictures is clicked on, then can see All pictures.

2. extensive picture semantic search method according to claim 1, it is characterized in that the feature vector for picture In processing, resisting differentiating in network using production, network obtains the character representation of picture, then passes through the phase between feature Similar picture is found out like property；Meanwhile the vector that the description of picture text is obtained using term vector is represented；Then by the vector of picture Text description with picture is connected together as the expression of the pictures, and then picture is clustered using k-means, from One, which is chosen, in each classification shows user.

3. extensive picture semantic search method according to claim 2, it is characterized in that it is specific implement to be divided into training and Two steps of production environment；Training step is trained production confrontation network；Tensorflow model platforms, instruction are used during training For the differentiation network used when practicing for a convolutional neural networks, generation network is a deconvolution neutral net；

Differentiate network and 5 layers of generation network, 5 layers of production confrontation network for differentiating network and 5 layers of generation network composition using 5 layers In, the input of network is generated as the random vector of 100 dimensions, is exported as the picture of a 64*64*3；Differentiate network input be A pictures of 64*64*3 are exported as the number between one 0 to 1, represent the probability that the picture is true picture；

In training, confrontation is formed by minimizing the loss of true picture and generating the loss of picture respectively；Make in network It solves the problems, such as that the explosion of the gradient in network training and gradient disappear with batch normalization, cancels full connection Layer improves the convergence rate of network；After network training, differentiate the output of layer second from the bottom of network as picture Picture is picked out the higher part picture of similarity by feature according to the characteristic similarity between picture.

4. extensive picture semantic search method according to claim 1, it is characterized in that in the training of term vector, picture As input, output is then the vector corresponding to each word for the text description of corresponding commodity；Then by the text of every pictures The word vector included of this description is added to obtain the semantic expressiveness of the pictures.

5. extensive picture semantic search method according to claim 1, it is characterized in that the method for above-mentioned cluster be for When being shown to user, for of a sort picture one therein is only shown, reduce the lookup burden of user；

K-means++ is compared with k-mean so that initialization cluster centre when so that the distance between cluster centre farther out, K-means methods are improved；

After training is completed, then the model after being trained utilizes trained one standard of model foundation The server of tensorflow models；In actual application, one or a collection of picture can be sent to this server every time To obtain the vector of picture.