CN113360696A

CN113360696A - Image pairing method, device, equipment and storage medium

Info

Publication number: CN113360696A
Application number: CN202110695932.2A
Authority: CN
Inventors: 龚震霆
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-09-07

Abstract

The disclosure provides an image pairing method, an image pairing device, image pairing equipment and a storage medium, relates to the field of artificial intelligence, particularly relates to the technical field of computer vision and deep learning, and can be used in smart city scenes. One embodiment of the method comprises: training the initial network model by using images in the image material library to obtain a pre-training model; acquiring a first image set and a first matching image set, wherein the first matching image set is obtained by matching images in the first image set; and training the pre-training model by utilizing the first image set and the first matching image set to obtain a matching model. The embodiment learns the characteristics of the paired images, so that the accuracy of the paired model is improved.

Description

Image pairing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of computer vision and deep learning technologies, and more particularly, to an image matching method, apparatus, device, and storage medium, which can be used in smart city scenes.

Background

With the continuous development of social networks, in the social networks, the proportion of images in the communication process of people is increasing, people may want to find another image capable of being paired for the current image based on various requirements, for example, for a new generation of young couple users, it is usually desirable to select an associated image as a couple avatar to indicate the couple relationship between users.

In the prior art, the selection of the paired images is usually based on a group or pair of images found by an internet platform, or the paired images are obtained by means of manual creation, design, shooting and the like, however, the existing paired images may not meet the user requirements with obvious differentiation.

Disclosure of Invention

The disclosure provides an image pairing method, an image pairing device, an image pairing apparatus and a storage medium.

According to a first aspect of the present disclosure, there is provided a training method of a pairing model, including: training the initial network model by using images in the image material library to obtain a pre-training model; acquiring a first image set and a first matching image set, wherein the first matching image set is obtained by matching images in the first image set; and training the pre-training model by utilizing the first image set and the first matching image set to obtain a matching model.

According to a second aspect of the present disclosure, there is provided an image pairing method including: acquiring an image to be paired; extracting the features of the images to be paired by using the pairing model to obtain the pairing features of the images to be paired, wherein the pairing model is obtained by training through a method described in any one implementation mode in the first aspect; and searching in a pre-constructed pairing feature library to obtain a target pairing image matched with the pairing feature.

According to a third aspect of the present disclosure, there is provided a training apparatus for a pairing model, including: the first training module is configured to train the initial network model by using the images in the image material library to obtain a pre-training model; a first obtaining module configured to obtain a first image set and a first matching image set, wherein the first matching image set is obtained by matching images in the first image set; and the second training module is configured to train the pre-training model by using the first image set and the first pairing image set to obtain a pairing model.

According to a fourth aspect of the present disclosure, there is provided an image pairing apparatus comprising: the second acquisition module is configured to acquire images to be paired; a second extraction module, configured to extract features of the images to be paired by using the pairing model to obtain pairing features of the images to be paired, where the pairing model is obtained by training through a method as described in any one of the implementation manners in the first aspect; and the retrieval module is configured to retrieve in a pre-constructed pairing feature library and acquire a target pairing image matched with the pairing feature.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect or the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementation manners of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first or second aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of training a pairing model according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a training method of a pairing model according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a training method of a pairing model according to the present disclosure;

FIG. 5 is a flow chart illustrating the decomposition of the step of obtaining an image set of the training method of the coupled model shown in FIG. 4;

FIG. 6 is a flow diagram of one embodiment of an image pairing method according to the present disclosure;

FIG. 7 is a schematic diagram of an embodiment of a training apparatus for a coupled model according to the present disclosure;

FIG. 8 is a schematic structural diagram of one embodiment of an image pairing apparatus according to the present disclosure;

FIG. 9 is a block diagram of an electronic device for implementing a method of training a pairing model of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the image pairing method or image pairing apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit information or the like. Various client applications may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may analyze and process images to be paired acquired from the

terminal apparatuses

101, 102, 103, and generate a processing result (e.g., a target paired image).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the image pairing method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the image pairing apparatus is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of training a pairing model according to the present disclosure is shown. The method for training the pairing model comprises the following steps:

step 201, training the initial network model by using the images in the image material library to obtain a pre-training model.

In this embodiment, an executive (e.g., the server 105 shown in fig. 1) of the training method for the pairing model may train the initial network model by using images in the image material library, so as to obtain a pre-training model. The image material library can be constructed in advance, related material pictures on the internet are stored in the image material library, and the execution main body can perform self-supervision learning training on the initial network model by using the images in the image material library so as to obtain a pre-training model. The self-supervision learning training refers to a process of automatically training a preset initial model by using a large amount of sample data to obtain a pre-training model under the condition that the sample is not manually labeled. The specific way of the self-supervised learning training is the prior art, and is not described herein again.

The initial Network model can select a Res2net50 Network with the depth of 50 layers, the Res2net50 Network is an improved version of Resnet (Residual Network) 50, a Residual block Bottleneck block in the Res2 Network is a multi-scale processing method, the parameter quantity in the Bottleneck block is reduced, and the calculation power consumption is reduced. Of course, other initial network models may be selected according to actual situations, and the disclosure does not specifically limit this. In addition, the Loss function in the step is a Cross Entropy Loss function (Cross Entropy Loss) and is trained on the image material library data set, so that a pre-training model is obtained.

Note that the loss function is a function that maps a random event or a value of a random variable related thereto to a non-negative real number to represent a loss of the random event, and is used for parameter estimation of a model in machine learning.

At step 202, a first set of images and a first set of paired images are obtained.

In this embodiment, the executing subject may obtain a first image set and a first matching image set, where the first matching image set is obtained by matching images in the first image set.

It should be noted that the first image set may be a data set obtained by capturing pictures on a search engine based on a specific scene, and the first image set is obtained by pairing images in the first image set, where the images in the first image set are successfully paired.

For example, when the application scene is a matching model training scene of a couple head portrait, the first image set may include all pictures related to the "couple head portrait" on the search engine, and then all images in the first image set are paired, and all images that are successfully paired constitute a first paired image set.

And step 203, training the pre-training model by using the first image set and the first matching image set to obtain a matching model.

In this embodiment, the executing entity may train the pre-training model obtained in step 201 by using the first image set and the first pairing image set obtained in step 202, so as to obtain a pairing model.

The pre-training model is obtained by training images in an image material library, and does not relate to a specific business scene. And the first image set and the first matching image set are image sets constructed based on specific scenes, so that the pre-training model can be adjusted and trained in a targeted manner by utilizing the first image set and the first matching image set under the specific scenes, a matching model after training is obtained, and the images are matched based on the matching model.

The method for training the pairing model comprises the steps of firstly, training an initial network model by using images in an image material library to obtain a pre-training model; then obtaining a first image set, and obtaining a first matched image set by matching images in the first image set; and then, the pre-training model is trained by utilizing the first image set and the first matching image set to obtain a matching model, and the matching model obtained by training through the method can quickly read and accurately obtain a target matching image successfully matched with the image to be matched, so that the efficiency and the accuracy of image matching are improved.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of a training method of a pairing model according to the present disclosure. The method for training the pairing model comprises the following steps:

step 301, training the initial network model by using the images in the image material library to obtain a pre-training model.

Step 302, a first image set and a first paired image set are obtained.

The steps 301-.

Step 303, fine tuning the pre-training model by using the first image set to obtain a fine-tuned model.

In this embodiment, an executing entity (for example, the server 105 shown in fig. 1) of the training method for the matching model may perform fine adjustment on the pre-training model by using the first image set, so as to obtain a fine-adjusted model.

In some optional implementations of this embodiment, the fine-tuning the pre-training model by using the first image set to obtain a fine-tuned model includes: clustering the first image set to obtain a clustered image set; setting a loss function for the pre-training model; and fine-tuning parameters of the pre-training model based on the clustered image set and the loss function to obtain a fine-tuned model.

In this implementation, N clustering centers may be selected, where N is an integer greater than 1; then, the images in the first image set are clustered based on the N clustering centers, and each cluster is used as a category, so that a clustered image set containing N categories is constructed. For example, assuming that N is 1000, the images in the first image set are clustered based on 1000 cluster centers, thereby constructing a clustered image set containing 1000 categories.

Then, Arcface (additive Angular margin) Loss is selected as a Loss function, wherein the Arcface Loss is a Loss function for expanding the distance between different classes by using margin. And then, fine-tuning parameters of the pre-training model based on the clustered image set and the Arcface Loss to obtain a fine-tuned model.

And step 304, adding a full connection layer into the fine-tuned model to obtain a target model.

In this embodiment, the execution body may add a full connection layer to the trimmed model to obtain the target model. Specifically, the execution body adds an embedding full-connection layer with 256 floating dimensions behind the Res2net50 network, so as to obtain a target model, and the full-connection layer can map the learned "distributed feature representation" to a sample mark space.

Step 305, each pair of paired images in the first paired image set is used as input, and a target model is trained to obtain a paired model.

In this embodiment, the executing entity may take each pair of paired images in the first paired image set as input, so as to train the target model, thereby obtaining the paired model. Specifically, a learning method of metric learning (metric learning) is used, each pair of paired images in the first paired image set is used as an input, and Estimated Maximum Loss (Estimated Maximum Loss) is used as a Loss function to train the target model, so that the paired model is obtained. The specific learning method for metric learning is the prior art, and is not described herein again.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, in the training method of the pairing model in this embodiment, the pre-training model is fine-tuned by using the first image set, so as to obtain a fine-tuned model, then the full connection layer is added to the fine-tuned model, so as to obtain the target model, and then each pair of pairing images in the first pairing image set is used as input to train the target model, so as to obtain the pairing model. The image features extracted based on the matching model are more accurate and vivid, and the accuracy of the matching result is further improved.

With continued reference to fig. 4, fig. 4 illustrates a flow 400 of yet another embodiment of a training method of a pairing model according to the present disclosure. The method for training the pairing model comprises the following steps:

step 401, training the initial network model by using the images in the image material library to obtain a pre-training model.

At step 402, a first image set and a first paired image set are obtained.

The steps 401-.

In step 403, data cleaning and de-duplication are performed on the first image set and the first matching image set, respectively.

In this embodiment, an executing entity (e.g., the server 105 shown in fig. 1) of the training method of the pairing model may perform washing and deduplication on the data in the first image set and the first pairing image set, respectively. Since low quality images may affect the user's use, cleaning of the images in the first image set and the first paired image set may be required, for example, images with image length and/or width values smaller than a preset value may be removed.

Furthermore, the sensitive images are not allowed to appear, so the sensitive images in the cleaned first image set and the first paired image set can be filtered by using the existing sensitive classifier.

Finally, since images of the same content but different sizes may be repeatedly downloaded while the images are acquired, deduplication techniques may be used to deduplicate the images in the first set of images and the first set of paired images.

Through the operation, the images in the first image set and the first paired image set are high in quality, and therefore the use experience of a user is improved.

And step 404, fine-tuning the pre-training model by using the first image set to obtain a fine-tuned model.

And 405, adding a full connection layer into the fine-tuned model to obtain a target model.

And 406, taking each pair of paired images in the first paired image set as input, training a target model, and obtaining a paired model.

The

steps

404 and 406 are substantially the same as the

steps

303 and 305 of the foregoing embodiment, and the specific implementation manner can refer to the foregoing description of the

steps

303 and 305, which is not described herein again.

Step 407, extracting the matching features of each pair of matching images in the first matching image set by using the matching model to obtain a matching feature library.

In this embodiment, the executing entity may extract the pairing features of each pair of paired images in the first paired image set by using the pairing model to obtain the pairing feature library. After the pairing model is trained, the feature corresponding to the embedding full-connected layer with the last 256 floating dimensions of the pairing model is the feature of the couple avatar to be acquired, so in this embodiment, the pairing model can be used to extract the pairing feature of each pair of pairing images in the first pairing data set, thereby obtaining the pairing feature library.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 3, the training method of the pairing model in this embodiment highlights the steps of performing data cleaning and deduplication on the first image set and the first pairing image set, so that the images in the first image set and the first pairing image set are both high quality, thereby improving the use experience of the user; in addition, the pairing model is used for extracting the pairing features of each pair of pairing images in the first pairing image set, so that a pairing feature library is obtained, pairing retrieval is carried out based on the pairing feature library, and the accuracy and the efficiency of the pairing result are improved.

With continued reference to FIG. 5, FIG. 5 illustrates a decomposition flow 500 of the acquire image set step of the training method of the coupled model shown in FIG. 4. The acquisition image set step is decomposed as follows:

step 501, obtaining a second image set downloaded by the user based on the search behavior of the user.

In this embodiment, the second image set downloaded by the user may be obtained based on the search behavior of the user. For example, when a user searches for a related keyword such as "lovers' avatar" using a search function of a search engine, browses a picture search result under a picture tab page, and clicks to download, the executing body records the behavior process of the user as a search behavior. Then, based on the search behavior of the user, the execution subject can download the lover avatar downloaded by the user, thereby obtaining a second image set.

Step 502, a third image set matching a preset keyword is obtained from a specified data source.

In this embodiment, a third set of images matching the preset keyword may be obtained from a specified data source. For example, keywords such as "lover avatar" and the like may be preset based on the usage scenario. Then, the executing body may crawl some articles or messages which are edited and pushed in existing websites or sent on the websites by the user, and contain preset keywords, and download all matches in the articles or messages, thereby obtaining a third image set.

And step 503, combining the second image set and the third image set to obtain the first image set.

In this embodiment, the second image set obtained in step 501 and the third image set obtained in step 502 may be combined to obtain the first image set. The merged first set of images includes all relevant images obtained based on user behavior and existing website resources.

And 504, labeling and pairing the images in the second image set and the third image set respectively to obtain a paired second paired image set and a paired third image set.

In this embodiment, the labeling and pairing may be performed on the second image set obtained in step 501 and the images in the third image set obtained in step 502, so as to obtain a corresponding second paired image set including all successfully paired images in the second image set and a corresponding third paired image set including all successfully paired images in the third image set, where the labeling manner may be manual labeling.

And 505, combining the second matching image set and the third matching image set to obtain a first matching image set.

In this embodiment, the second matching image set and the third matching image set obtained in step 504 may be combined to obtain the first matching image set.

It should be noted that, in the present disclosure, the execution sequence of the steps 503 and 504-505 is not specifically limited, that is, the step 503 may be executed before the steps 504-505, may be executed after the steps 504-505, or may even be executed simultaneously with the steps 504-505.

As can be seen from fig. 5, the image set obtaining method can obtain the first data set based on the search behavior of the user and the existing website resources, and then obtain the first paired image set including the paired images in the first data set, and by the above data capturing and processing manner, the richness of the images in the image set is ensured.

With continued reference to fig. 6, a flow 600 of one embodiment of an image pairing method according to the present disclosure is shown. The image pairing method comprises the following steps:

step 601, obtaining an image to be paired.

In the present embodiment, an execution subject of the image pairing method (for example, the server 105 shown in fig. 1) may acquire an image to be paired.

The images to be paired can be images which are input by a user in a search engine and are required to be paired.

Step 602, extracting features of the images to be paired by using the pairing model, and obtaining pairing features of the images to be paired.

In this embodiment, the executing body may extract features of the images to be paired by using the pairing model, so as to obtain pairing features of the images to be paired. The pairing model can be obtained by training through the method described in the above embodiment. Inputting the to-be-paired images acquired in the step 601 into the pairing model, and extracting the features of the to-be-paired images by using the pairing model, so as to obtain the pairing features of the to-be-paired images.

Step 603, retrieving in a pre-constructed pairing feature library to obtain a target pairing image matched with the pairing feature.

In this embodiment, the executing entity may perform a search in a pre-constructed matching feature library to obtain a target matching image matching the matching feature obtained in step 602. Since the feature of all the paired images in the acquired paired image set is stored in the paired feature library, a search can be performed in the paired feature library constructed in advance to acquire a target paired image matching the paired features.

Optionally, a forward index may be established in advance for the paired feature library, so as to improve the retrieval efficiency when retrieving in the paired feature library.

The image pairing method provided by the embodiment of the disclosure extracts the features of the images to be paired through the pairing model to obtain the pairing features of the images to be paired; and then, searching is carried out in a pre-constructed pairing feature library, so that a target pairing image matched with the pairing features is obtained, the image pairing efficiency and accuracy are improved, the use satisfaction of a user is improved, and the use experience of the user is improved.

With further reference to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for training a pairing model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 7, the training apparatus 700 for the pairing model of the present embodiment may include: a first training module 701, a first acquisition module 702, and a second training module 703. The first training module 701 is configured to train the initial network model by using images in the image material library to obtain a pre-training model; a first obtaining module 702 configured to obtain a first image set and a first matching image set, wherein the first matching image set is obtained by matching images in the first image set; the second training module 703 is configured to train the pre-training model using the first image set and the first pairing image set, to obtain a pairing model.

In the present embodiment, in the training apparatus 700 of the pairing model: the specific processing of the first training module 701, the first obtaining module 702, and the second training module 703 and the technical effects thereof can refer to the related descriptions of step 201 and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the second training module includes: a fine tuning sub-module configured to perform fine tuning on the pre-trained model using the first image set to obtain a fine-tuned model; the adding submodule is configured to add a full connection layer in the trimmed model to obtain a target model; and the training sub-module is configured to train the target model by taking each pair of paired images in the first paired image set as input, so as to obtain a paired model.

In some optional implementations of this embodiment, the fine tuning sub-module includes: the clustering unit is configured to cluster the first image set to obtain a clustered image set; a setting unit configured to set a loss function for the pre-training model; and the fine tuning unit is configured to perform fine tuning on the parameters of the pre-training model based on the clustered image set and the loss function to obtain a fine-tuned model.

In some optional implementation manners of this embodiment, the first obtaining module includes: the obtaining submodule is configured to obtain a second image set downloaded by the user based on the searching behavior of the user; an acquisition submodule configured to acquire a third image set matching a preset keyword from a specified data source; and the first merging submodule is configured to merge the second image set and the third image set to obtain a first image set.

In some optional implementation manners of this embodiment, the first obtaining module further includes: the pairing submodule is configured to respectively perform labeling pairing on the images in the second image set and the third image set to obtain a paired second paired image set and a paired third image set; and the second merging submodule is configured to merge the second paired image set and the third paired image set to obtain the first paired image set.

In some optional implementations of this embodiment, the training apparatus for the pairing model further includes: a deduplication module configured to perform data cleansing and deduplication on the first image set and the first paired image set, respectively.

In some optional implementations of this embodiment, the training apparatus for the pairing model further includes: and the first extraction module is configured to extract the pairing features of each pair of pairing images in the first pairing image set by using the pairing model to obtain a pairing feature library.

With further reference to fig. 8, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image pairing apparatus, which corresponds to the method embodiment shown in fig. 6, and which is particularly applicable to various electronic devices.

As shown in fig. 8, the image pairing apparatus 800 of the present embodiment may include: a second acquisition module 801, a second extraction module 802 and a retrieval module 803. The second obtaining module 801 is configured to obtain an image to be paired; the second extraction module 802 is configured to extract features of the images to be paired by using the pairing model, so as to obtain pairing features of the images to be paired; and a retrieval module 803 configured to perform retrieval in a pre-constructed pairing feature library to obtain a target pairing image matching the pairing feature.

In the present embodiment, in the image pairing apparatus 800: the specific processing of the second obtaining module 801, the second extracting module 802 and the retrieving module 803 and the technical effects thereof can refer to the related descriptions of step 601 and step 603 in the corresponding embodiment of fig. 6, which are not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as a trainer of various types of pairing models, a speaker, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a training method of a pairing model or an image pairing method. For example, in some embodiments, the training method of the pairing model or the image pairing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the pairing model or the image pairing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a training method of the pairing model or an image pairing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable pairing model training apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a training device (for example, CRT (cathode ray tube) or LCD (liquid crystal pairing model trainer) monitor) for pairing a training information of a model to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a pair model, comprising:

training the initial network model by using images in the image material library to obtain a pre-training model;

acquiring a first image set and a first matching image set, wherein the first matching image set is obtained by matching images in the first image set;

and training the pre-training model by using the first image set and the first matching image set to obtain the matching model.

2. The method of claim 1, wherein the training the pre-trained model using the first set of images and the first set of paired images to obtain the paired model comprises:

fine-tuning the pre-training model by using the first image set to obtain a fine-tuned model;

adding a full connection layer into the trimmed model to obtain a target model;

and taking each pair of paired images in the first paired image set as input, training the target model, and obtaining the paired model.

3. The method of claim 2, wherein the fine-tuning the pre-trained model using the first image set to obtain a fine-tuned model comprises:

clustering the first image set to obtain a clustered image set;

setting a loss function for the pre-training model;

and fine-tuning the parameters of the pre-training model based on the clustered image set and the loss function to obtain a fine-tuned model.

4. The method of any of claims 1-3, wherein the acquiring a first image set and a first paired image set comprises:

obtaining a second image set downloaded by the user based on the search behavior of the user;

acquiring a third image set matched with a preset keyword from a specified data source;

and combining the second image set and the third image set to obtain the first image set.

5. The method of claim 4, wherein the acquiring a first image set and a first paired image set further comprises:

respectively carrying out labeling pairing on the images in the second image set and the third image set to obtain a paired second paired image set and a paired third paired image set;

and combining the second matching image set and the third matching image set to obtain the first matching image set.

6. The method according to any one of claims 1-5, wherein prior to the training of the pre-trained model with the first set of images and the first set of paired images, the method further comprises:

and respectively carrying out data cleaning and de-duplication on the first image set and the first pairing image set.

7. The method of claim 6, further comprising:

and extracting the pairing features of each pair of pairing images in the first pairing image set by using the pairing model to obtain a pairing feature library.

8. An image pairing method comprising:

acquiring an image to be paired;

extracting the features of the images to be paired by using a pairing model to obtain the pairing features of the images to be paired, wherein the pairing model is obtained by training according to the method of any one of claims 1 to 7;

and searching in a pre-constructed pairing feature library to obtain a target pairing image matched with the pairing feature.

9. A training apparatus for pairing models, comprising:

the first training module is configured to train the initial network model by using the images in the image material library to obtain a pre-training model;

a first obtaining module configured to obtain a first image set and a first matching image set, wherein the first matching image set is obtained by matching images in the first image set;

a second training module configured to train the pre-training model using the first image set and the first matching image set to obtain the matching model.

10. The apparatus of claim 9, wherein the second training module comprises:

a fine tuning sub-module configured to perform fine tuning on the pre-trained model using the first image set, resulting in a fine-tuned model;

the adding submodule is configured to add a full connection layer in the trimmed model to obtain a target model;

a training sub-module configured to train the target model with each pair of paired images in the first paired image set as input, resulting in the paired model.

11. The apparatus of claim 10, wherein the fine tuning sub-module comprises:

the clustering unit is configured to cluster the first image set to obtain a clustered image set;

a setting unit configured to set a loss function for the pre-training model;

and the fine tuning unit is configured to perform fine tuning on the parameters of the pre-training model based on the clustered image set and the loss function to obtain a fine-tuned model.

12. The apparatus of any of claims 9-11, wherein the first obtaining means comprises:

a obtaining sub-module configured to obtain a second image set downloaded by a user based on a search behavior of the user;

an acquisition submodule configured to acquire a third image set matching a preset keyword from a specified data source;

a first merging sub-module configured to merge the second image set and the third image set to obtain the first image set.

13. The apparatus of claim 12, wherein the first obtaining means further comprises:

the matching sub-module is configured to respectively perform labeling matching on the images in the second image set and the third image set to obtain a matched second matching image set and a matched third matching image set;

a second merging sub-module configured to merge the second set of paired images and the third set of paired images to obtain the first set of paired images.

14. The apparatus of any of claims 9-13, further comprising:

a deduplication module configured to perform data cleansing and deduplication on the first image set and the first paired image set, respectively.

15. The apparatus of claim 14, the apparatus further comprising:

a first extraction module configured to extract a pairing feature of each pair of pairing images in the first pairing image set by using the pairing model to obtain a pairing feature library.

16. An image pairing apparatus comprising:

the second acquisition module is configured to acquire images to be paired;

a second extraction module, configured to extract features of the images to be paired by using a pairing model, so as to obtain pairing features of the images to be paired, where the pairing model is obtained by training according to the method of any one of claims 1 to 7;

and the retrieval module is configured to retrieve in a pre-constructed pairing feature library and acquire a target pairing image matched with the pairing feature.

17. A terminal device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.