CN110334363A

CN110334363A - A kind of translation of description and method for measuring similarity based on hybrid coder

Info

Publication number: CN110334363A
Application number: CN201910630989.7A
Authority: CN
Inventors: 纪荣嵘; 胡杰; 李新阳
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-10-15

Abstract

A kind of translation of description and method for measuring similarity based on hybrid coder, is related to image retrieval and description son translation.Different craft or a variety of descriptions based on study are extracted from for trained image set, and to prepare to describe the training set of sub- translater, mixing self-encoding encoder is trained using feature in pairs；The hybrid coder of mapping of the training from source feature to target signature, using reconstruct and translation two-way supplemental training decoder, encoder is peculiar, and decoder is shared, obtains translation loss and loses with reconstruct；It is lost using translation loss and reconstruct to measure the similarity between two kinds of description；For description to be translated, it is input in corresponding translater, is then measured to sub- translation, acquired results confidence level is described by similarity according to the translater that training obtains.The retrieval barrier between the searching system based on different description is broken through, provides a transfer platform, convenient and efficient between not homologous ray.

Description

A kind of translation of description and method for measuring similarity based on hybrid coder

Technical field

The present invention relates to image retrievals and description son translation, turn over more particularly, to a kind of description based on hybrid coder It translates and method for measuring similarity.

Background technique

In the epoch of computer technology rapid development, especially with the prevalence of the social network sites such as Flickr, microblogging, figure The isomeric datas such as picture, video, audio, text are all increasing at an amazing speed daily.For example, Facebook registration user is more than 1000000000, monthly upload the picture more than 1,000,000,000；Flickr picture social network sites user's uploading pictures number in 2015 up to 7.28 hundred million, Average daily user uploads about 2,000,000 picture；It is in store on the back-end system of the e-commerce system Taobao of Largest In China 28600000000 plurality of pictures.It include the mass picture of abundant visual information for these, it is how square in these immense image libraries Just, quickly and accurately inquire and retrieve needed for user or interested image, become multimedia information retrieval area research Hot spot.Content-based image retrieval method given full play to computer be longer than processing iterative task advantage, by people from It needs to expend and be freed in a large amount of human and material resources and the artificial mark of financial resources.By the development over 10 years, the figure based on content As retrieval technique is widely used to the every aspect of the lives such as search engine, e-commerce, medicine, textile industry, leather industry.Figure Two classes can be divided by the difference of description picture material mode as retrieving, one kind is text based image retrieval (TBIR, Text Based Image Retrieval), another kind of is content-based image retrieval (CBIR, Content Based Image Retrieval)。

Text based image search method starts from 1970s, it is in the way of text marking in image Content is described, to form the description diagram as the keyword of content, such as the object in image, scene for each image Deng this mode can be artificial notation methods, can also carry out semi-automatic mark by image recognition technology.It is being retrieved When, user can provide key word of the inquiry according to the interest of oneself, and searching system finds out that according to the key word of the inquiry that user provides It is labeled with the corresponding picture of the key word of the inquiry, the result of inquiry is finally returned into user.It is this based on text description Image retrieval mode has manpower intervention due to being easily achieved, and in mark, so its precision ratio is also relatively high.In today Some middle and small scale picture search Web still have a use using upper, but defect brought by this mode based on text description It is also obviously: firstly, this mode based on text description needs manpower intervention annotation process, so that it is only applicable to Small-scale image data will complete this process in large-scale image data and need to expend a large amount of manpower and financial resources, and And image constantly external at any time be unable to do without artificial intervention in storage；Secondly, " thousand speech of figure victory ", accurate for needing Inquiry, user are sometimes difficult to that the image for oneself really wanting acquisition is depicted with brief keyword；Again, it manually marked Journey inevitably will receive the influence of the human-subject test of labeler, speech use and subjective judgement etc., therefore will cause text The difference of word description picture.

Iamge description is a kind of data type that must be handled in image retrieval, and is that most of existing visions are searched The basis of cable system.In typical set-up, visual search system can only handle from offline image and concentrate the predefined feature extracted. Such setting is prevented reuses certain different visual signature in not homologous ray.In addition, when upgrading visual search system, Time-consuming step is needed to extract new feature and construct corresponding index, and previous feature and index is discarded.It is prominent Broken such setting is all very useful anyway.

Summary of the invention

It is an object of the invention to can not be interknited between different description in order to solve in different searching systems Using the problem of, provide it is a kind of based on hybrid coder description translation and method for measuring similarity.

The present invention the following steps are included:

1) different a variety of descriptions is extracted, from for trained image set to prepare to describe the training of sub- translater Collection is trained mixing self-encoding encoder using feature in pairs；

2) hybrid coder for training the mapping from source feature to target signature, using reconstruct and translation two-way supplemental training Decoder, encoder is peculiar, and decoder is shared, obtains translation loss and loses with reconstruct；

3) it loses using translation obtained in step 2) and reconstruct loses to measure the similarity between two kinds of description；

4) description to be translated is input in corresponding translater, the translater then obtained according to training is to progress Description son translation, acquired results confidence level are measured by the similarity in step 3).

In step 1), a variety of description may include sub or description based on study of manual description, as long as It all can serve as the object of translation for description of image zooming-out.

In step 2), the specific method of the hybrid coder of mapping of the training from source feature to target signature can Are as follows: first by encoder E_sAnd E_tTo encode the sub- V of Source Description_sWith the sub- V of goal description_t, respectively obtain coding z_sAnd z_t, then use One shared decoder decodes z_sAnd z_t, respectively obtain the sub- V of description of conversion_stWith the sub- V of description of reconstruct_tt, trained damage Lose the L2 norm after function is the L2 norm between reconfiguration description and goal description and converts between goal description；Specifically may be used Expression are as follows:

Wherein, the L2 norm between reconfiguration description is known as reconstructing loss, goal description after goal description and conversion Between L2 norm be known as translate loss.

It is described to be lost with reconstruct using translation loss obtained in step 2) to measure two kinds of description in step 3) Between the specific method of similarity can are as follows:

Oriented similarity is constructed first with the difference of translation loss and reconstruct loss, is expressed are as follows:

Normalizing is carried out later, obtains similarity to the end:

It is described that description to be translated is input in corresponding translater in step 4), then obtained according to training Translater to sub- translation is described, what acquired results confidence level was measured by the similarity in step 3) method particularly includes:

(1) select to describe sub- translater according to goal description and Source Description；

(2) translation Source Description is sub to goal description；

(3) confidence level of translation result is measured according to the measuring similarity between description；

(4) associated downstream task is completed using description that translation is completed.

Compared with prior art, the present invention has advantage following prominent:

The retrieval barrier between the searching system based on different description can be broken through through the invention, guarantee different retrieval systems It can be interknited between system, provide a transfer platform between not homologous ray, also provided for searching system update The mode of one more convenient and efficient.The scheme proposed can obtain the translater of description and translation describes sub- effect later Confidence level.Translater can provide translation model for the conversion between different description, and confidence level can know two kinds in advance If appropriate for being translated between difference description.The sub- interpretation method of the description of hybrid coder proposed by the present invention, than tradition Multi-layer perception (MLP) algorithm it is more stable, and demonstrate the convertible degree between 16 kinds of different characteristics.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Specific embodiment

Following embodiment will the present invention is described in detail in conjunction with attached drawing.

Referring to Fig. 1, the present invention the following steps are included:

1) different craft or the various features (description) based on study are extracted, from for trained image set with standard The standby training set for describing sub- translater is trained mixing self-encoding encoder using feature in pairs；

2) hybrid coder for training the mapping from source feature to target signature, using reconstruct and translation two-way supplemental training Decoder, encoder is peculiar, and wherein decoder is shared；

3) similarity between two kinds of description is measured using the translation error in step 2) and reconstructed error；

4) it for description to be translated, is entered into corresponding translater, the translation then obtained according to training Device is to being described sub- translation.Acquired results confidence level is measured by the similarity in step 3).

Normalizing is carried out later, obtains similarity to the end:

In step 4), it is described according to the obtained translater of training to sub- translation is described method particularly includes: (1) root Select to describe sub- translater according to goal description and Source Description；(2) translation Source Description is sub to goal description；(3) basis Measuring similarity between son is described to measure the confidence level of translation result；(4) it is completed using description that translation is completed related Downstream Jobs.

The present invention is done on classical retrieval data set Holidays, Oxford5k, Paris6k for 16 kinds of description Corresponding relevant replication experiment is as shown in table 1.

Table 1

Table 1 gives description after translation and the error before translation, as it can be seen from table 1 side proposed by the invention Method is completed translation duties on largely description, with good performance.

Claims

1. a kind of translation of description and method for measuring similarity based on hybrid coder, it is characterised in that the following steps are included:

1) different a variety of descriptions is extracted, from for trained image set to prepare to describe the training set of sub- translater, benefit Mixing self-encoding encoder is trained with feature in pairs；

2) hybrid coder for training the mapping from source feature to target signature, using reconstruct and translation two-way supplemental training decoding Device, encoder is peculiar, and decoder is shared, obtains translation loss and loses with reconstruct；

4) description to be translated is input in corresponding translater, then according to the obtained translater of training to being described Son translation, acquired results confidence level are measured by the similarity in step 3).

2. a kind of translation of description and method for measuring similarity, feature based on hybrid coder exists as described in claim 1 In in step 1), a variety of description attached bags include sub or description based on study of manual description, as long as being directed to image Description of extraction all can be used as the object of translation.

3. a kind of translation of description and method for measuring similarity, feature based on hybrid coder exists as described in claim 1 In in step 2), the hybrid coder of mapping of the training from source feature to target signature method particularly includes: first by Encoder E_sAnd E_tTo encode the sub- V of Source Description_sWith the sub- V of goal description_t, respectively obtain coding z_sAnd z_t, then shared using one Decoder decode z_sAnd z_t, respectively obtain the sub- V of description of conversion_stWith the sub- V of description of reconstruct_tt, trained loss function is L2 norm after L2 norm between reconfiguration description and goal description and conversion between goal description；It embodies are as follows:

Wherein, the L2 norm between reconfiguration description is known as reconstructing loss, after goal description and conversion between goal description L2 norm is known as translating loss.

4. a kind of translation of description and method for measuring similarity, feature based on hybrid coder exists as described in claim 1 In in step 3), described lost using translation obtained in step 2) is lost with reconstruct to measure the phases between two kinds of description Like degree method particularly includes:

Normalizing is carried out later, obtains similarity to the end:

5. a kind of translation of description and method for measuring similarity, feature based on hybrid coder exists as described in claim 1 The translation that is described that description to be translated is input in corresponding translater in step 4), then being obtained according to training Device to sub- translation is described, what acquired results confidence level was measured by the similarity in step 3) method particularly includes:

(2) translation Source Description is sub to goal description；