CN110858220A

CN110858220A - Method, device, storage medium and processor for determining image characteristics

Info

Publication number: CN110858220A
Application number: CN201810909747.7A
Authority: CN
Inventors: 郑赟; 张严浩; 潘攀; 任小枫
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2020-03-03

Abstract

The invention discloses a method, a device, a storage medium and a processor for determining image characteristics. Wherein, the method comprises the following steps: receiving an image to be inquired; determining the image characteristics of the image to be inquired according to the image characteristic model, wherein the image characteristic model is obtained by using a plurality of groups of triple image samples through machine learning training, and the triple image samples in the plurality of groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image. The invention solves the technical problem that the image characteristics cannot be accurately determined.

Description

Method, device, storage medium and processor for determining image characteristics

Technical Field

The invention relates to the field of computers, in particular to a method, a device, a storage medium and a processor for determining image characteristics.

Background

The main challenge in the context of picture search is the large difference between images between consumers and sellers. Images of vendors are often high quality, images taken using high-end cameras in a particular controlled environment. However, the query image provided by the consumer is usually an image captured by a low-end camera of a mobile phone, and the image may have problems of uneven illumination, large-area blurring, and complicated background due to uncontrolled capturing environment.

Since the difference between the image quality of the seller image and the image quality of the buyer image is large, the image features of the seller image and the image quality of the buyer image cannot be aligned, and thus the image features of the seller image and the image quality of the buyer image cannot be accurately determined.

In order to solve the problem that the image characteristics cannot be accurately determined, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a method, a device, a storage medium and a processor for determining image characteristics, which at least solve the technical problem that the image characteristics cannot be determined accurately.

According to an aspect of an embodiment of the present invention, there is provided a method of determining an image feature, including: receiving an image to be inquired; determining the image characteristics of the image to be queried according to an image characteristic model, wherein the image characteristic model is obtained by using multiple groups of triple image samples through machine learning training, and the triple image samples in the multiple groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

According to another aspect of the embodiments of the present invention, there is also provided a method of determining image features, including: receiving a request and an image to be inquired, wherein the request is used for acquiring the characteristics of the image to be inquired; responding to the request, feeding back image features of the image to be queried, wherein the image features are determined according to an image feature model, the image feature model is obtained by using multiple groups of triple image samples through machine learning training, and each triple image sample in the multiple groups of triple image samples comprises: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for determining an image feature, including: the receiving module is used for receiving an image to be inquired; the determining module is configured to determine image features of the image to be queried according to an image feature model, where the image feature model is obtained by using multiple sets of triplet image samples through machine learning training, and the triplet image samples in the multiple sets of triplet image samples all include: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the method for determining image features described above.

According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a program, where the program executes the method for determining image features described above.

In the embodiment of the invention, the image positive sample and the image negative sample of the query image are determined in advance based on the clicking behavior of the image, the query image, the image positive sample of the query image and the image negative sample of the query image are used as triple image samples, and then the image characteristic model can be obtained through machine learning training according to a plurality of groups of predetermined triple image samples, so that the received image to be queried can be identified by using the image characteristic model under the condition that the image characteristic of the image to be queried needs to be determined, the image characteristic of the image to be queried is obtained, the purpose of determining the image characteristic of the image to be queried is achieved, the technical effect of accurately determining the image characteristic of the image to be queried is realized, and the technical problem that the image characteristic cannot be accurately determined is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 shows a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a method of determining image features;

FIG. 2 is a flowchart of a method of determining image features according to embodiment 1 of the present invention;

FIG. 3 is a flowchart of a method of determining image features according to embodiment 1 of the present invention;

FIG. 4 is a flowchart of a method of determining image features according to embodiment 1 of the present invention;

FIG. 5 is a flowchart of a method of determining image features according to embodiment 1 of the present invention;

FIG. 6 is a flowchart of a method of determining image features according to embodiment 1 of the present invention;

FIG. 7 is a flowchart of a method of determining image features according to embodiment 1 of the present invention;

fig. 8 is a schematic view of an apparatus for determining an image feature according to embodiment 2 of the present invention;

fig. 9 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

triple loss: a triplet is a triplet that is formed by: randomly selecting a sample from a training data set, wherein the sample is called an Anchor, then randomly selecting a sample belonging to the same class as the Anchor (marked as x _ a) and a sample of a different class from the Anchor (marked as x _ a), and correspondingly calling Positive (marked as x _ p) and Negative (marked as x _ n), thereby forming a (Anchor, Positive, Negative) triple. the objective of the triplet loss is to learn that the distance between the representations of the features x _ a and x _ p is as small as possible, the distance between the representations of the features x _ a and x _ n is as large as possible, and the distance between x _ a and x _ n and the distance between x _ a and x _ p are the smallest.

Positive and negative samples: for each element (sample) in the triplet, a positive sample is a sample of the same type as the query image query, and a negative sample is a sample of a different type from the query image query.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method of determining image characteristics, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method provided by the embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the method of determining image features. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and memory 104 for storing data. Besides, the method can also comprise the following steps: a transmission module, a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method for determining image features in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing, i.e., methods for implementing image features of the application programs, by running the software programs and modules stored in the memory 104. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module is used for receiving or sending data through a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

In the above operating environment, the present application provides a method of determining image characteristics as shown in fig. 2. Fig. 2 is a flowchart of a method of determining image features according to embodiment 1 of the present invention. As shown in fig. 2, the method may include the steps of:

step S202, receiving an image to be inquired.

Step S204, determining the image characteristics of the image to be inquired according to the image characteristic model, wherein the image characteristic model is obtained by using a plurality of groups of triple image samples through machine learning training, and the triple image samples in the plurality of groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

In the above step S202, the image to be queried is an image for image search, and may be a low-quality image captured by a terminal such as a mobile phone. For example, the image to be queried may be an image captured by a buyer through a smart terminal (such as a mobile phone), that is, an image for finding an image similar to the image.

In step S204, the positive image sample of the query image may be an image with a high similarity to the query image, and the negative image sample of the query image may be an image with a low similarity to the query image.

As an alternative embodiment, as shown in fig. 3, the determining the positive image sample and the negative image sample based on the clicking behavior of the image in step S204 includes:

step S212, acquiring click operation on the search image;

step S214, determining that the image receiving the click operation in the search images is an image positive sample;

in step S216, an image negative sample is determined based on the image in which the click operation is not received in the search image.

It should be noted that the search image here may be a plurality of images obtained by searching according to a predetermined search condition, and then the image required by the search image, that is, the image similar to the query image, is selected from the search images, so that the image that receives the click operation in the search images can be determined to be the image with higher similarity to the query image, that is, the positive image sample of the image to be queried; and determining that the image which does not receive the clicking operation in the search images is an image with lower similarity with the query image, namely an image negative sample of the image to be queried.

Alternatively, the search image may be an image of a seller, that is, an image of a commodity provided by the seller.

As an alternative embodiment, in step S214, the determining that the image receiving the click operation in the search image is the positive image sample includes: filtering the plurality of images received by the clicking operation by adopting a multi-feature fusion method to obtain an image positive sample; alternatively, in step S216, the determining the negative image sample based on the image not receiving the click operation in the search image includes: filtering the image which does not receive the clicking operation by adopting a multi-feature fusion method to obtain an image negative sample; wherein the features in the multi-feature fusion include at least two of: local features, version features, training features.

According to the embodiment of the invention, the image which receives the click operation and the image which does not receive the click operation can be filtered by adopting a multi-feature fusion method according to at least two of the features such as the local feature, the version feature, the training feature and the like, so that the image positive sample and the image negative sample are obtained, and the filtering is carried out by adopting the multi-feature fusion method.

Optionally, the image negative samples are shared among sets of triplet image samples.

According to the embodiment of the invention, the image negative samples are shared in the triple image samples, that is, a mode of combining one acquired image negative sample with a plurality of different image positive samples is adopted, so that more triple image samples are obtained for training the image feature model, and the image feature model obtained by training is more accurate due to the fact that the number of the training triple image samples is increased.

According to the embodiment of the invention, the image positive sample and the image negative sample of the query image are determined in advance based on the clicking behavior of the image, the query image, the image positive sample of the query image and the image negative sample of the query image are taken as the triple image samples, and then the image feature model can be obtained through machine learning training according to the multiple groups of the triple image samples determined in advance, so that under the condition that the image features of the image to be queried need to be determined, the received image to be queried can be identified by using the image feature model, the image features of the image to be queried are obtained, the purpose of determining the image features of the image to be queried is achieved, the technical effect of accurately determining the image features of the image to be queried is realized, and the technical problem that the image features cannot be accurately determined is solved.

As an alternative embodiment, as shown in fig. 4, in the step S204, deriving the image feature model through machine learning training using multiple sets of triplet image samples includes:

step S222, averaging triple losses obtained by training a plurality of groups of triple image samples to obtain a loss average value;

and S224, adjusting the image characteristic model according to the loss average value to obtain an updated image characteristic model.

By adopting the embodiment of the invention, in the process of training by using a plurality of groups of ternary image samples, each triplet loss obtained by training can be determined, the loss average value of the plurality of groups of ternary image samples is determined, and then the image feature model is adjusted according to the loss average value to complete the updating of the image feature model. By adopting the processing, compared with the adjustment of the image feature model based on the triple level in the related technology, the image feature model is adjusted by adopting the loss average value, so that the effect of adjusting the image feature module based on the query image is realized, namely, the adjustment precision is effectively improved, and the updated image feature model is more accurate.

As an alternative embodiment, as shown in fig. 5, in step S204, determining the image feature of the image to be queried according to the image feature model includes:

step S232, determining the initial characteristics of the image to be inquired according to the image characteristic model;

step S234, filtering and detecting the background of the image to be inquired with the determined initial characteristics by adopting a preset background function to obtain the image characteristics of the image to be inquired.

By adopting the above embodiment of the present invention, the image to be queried may include a main part and a background part, wherein the information of the main part may be information that needs to be queried, and the information of the background part may be information that does not need to be queried. The initial feature of the image to be queried can be obtained according to the image feature model, but the initial feature may include the feature in the background portion of the image to be queried, and in order to remove the feature of the background portion in the image to be queried, a predetermined background function can be adopted to perform filtering detection on the background of the image to be queried, so as to accurately obtain the image feature of the image to be checked.

It should be noted that the processing of "filtering the background of the image to be queried after the initial feature is determined" referred to herein may be integrated in the training process of the image feature model, or may be processed separately from the training process of the image feature model, and may be flexibly selected. Preferably, the processing of filtering the background of the image to be inquired after the initial characteristics are determined is integrated in the training process of the image characteristic model, the image characteristics of the image to be checked are trained in a characteristic joint detection mode, and the training efficiency is effectively improved.

When the background of the image to be queried is filtered and detected, a variety of functions may be used for filtering the background, for example, the predetermined background function may be: a step function is approximated. Since the step function is non-differentiable and the end-to-end training needs to be differentiable, a differentiable approximation step function can be used. I.e. the approximate step function used to filter the background of the image to be queried is differentiable.

In the above operating environment, the present application provides a method of determining image characteristics as shown in fig. 6. Fig. 6 is a flowchart of a method of determining image features according to embodiment 1 of the present invention. As shown in fig. 6, the method may include the steps of:

step S602, receiving a request and an image to be inquired, wherein the request is used for acquiring the characteristics of the image to be inquired;

step S604, in response to the request, feeding back image features of the image to be queried, where the image features are determined according to an image feature model, the image feature model is obtained by using multiple sets of triplet image samples through machine learning training, and the triplet image samples in the multiple sets of triplet image samples all include: the method comprises the steps of querying an image, and positive image samples and negative image samples of the image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

In the embodiment of the invention, the image positive sample and the image negative sample of the query image are determined in advance based on the clicking behavior of the image, and using the query image, the positive image sample of the query image and the negative image sample of the query image as triplet image samples, then an image feature model can be obtained through machine learning training according to the predetermined groups of triple image samples, so that under the trigger of a request for acquiring the characteristics of the image to be inquired, the received image to be inquired can be identified through the image characteristic model in response to the request to obtain the image characteristics of the image to be inquired, and the image characteristics are fed back, so that the purpose of determining the image characteristics of the image to be inquired is achieved, the technical effect of accurately determining the image characteristics of the image to be inquired is realized, and the technical problem that the image characteristics cannot be accurately determined is solved.

As an alternative embodiment, as shown in fig. 7, the method for determining image features provided by the present application further includes:

step S606, a request for obtaining a triple image sample is received;

step S608, responding to the request, detecting click operation on the search image, and feeding back a triple image sample based on the click operation, wherein the image receiving the click operation in the search image is determined as an image positive sample; and determining an image negative sample based on the image which does not receive the clicking operation in the search image.

By adopting the embodiment of the invention, under the condition of obtaining the triple image sample, the request for obtaining the triple image sample can be received firstly, then the click operation on the search image is detected in response to the request, and the image which receives the click operation in the search image is determined as the image positive sample; and determining an image negative sample based on the image which does not receive the clicking operation in the searched image so as to determine a triple image sample, and feeding back according to the determined triple image sample. Here, the search image may be a plurality of images searched under a predetermined search condition.

The invention also provides a preferred embodiment, which provides a detection and feature joint training method based on the user click behavior.

In order to solve the problem that the matching buyer and the matching seller form a reliable image triple triplet, under the current product form of image retrieval, the method means that the distance between the query image query and the same product image positive sample is shortened, and the distance between the query image query and different product image negative samples is further increased. With the triplet loss, the original image (i.e., the query image) can be used as an input, and the image features can be used as an output, so as to obtain the image features of the original image (e.g., the query image).

It should be noted that the purpose of the triplet loss is to learn that the distance between the feature expressions x _ a and x _ p is as small as possible, the distance between the feature expressions x _ a and x _ n is as large as possible, and the distance between x _ a and x _ n and the distance between x _ a and x _ p are the smallest.

Optionally, the elements (samples) in the triplet include an image positive sample and an image negative sample, where the image positive sample is a sample of the same type as the query image query, and the image negative sample is a sample of a different type from the query image query.

Alternatively, the main difficulties in triplet loss training are: how to obtain useful training samples. In the related art, the adopted method is as follows: another set of categories of query images is selected and used as negative examples of images. However, in this way, the resulting negative image sample may have a large visual difference from the query image, while the negative image sample image may be very different from the query image. And the triplet loss will get zero easily and will not contribute any value during training.

It should be noted that, the query image may first enter the Convolutional Neural Network (CNN), and then the features extracted at the last layer of the Convolutional Neural Network (CNN) are denoted as f _ W (q), where W denotes a parameter of the Convolutional Neural Network (CNN).

According to the image retrieval condition of the product, for a certain original image (namely a query image), more than 90% of users click on the same product, and the image of a seller clicked by the user can be regarded as a positive sample of the image of the query; and taking the image which is not clicked by the user as an image negative sample.

By the negative examples of the images determined in the above manner, other different products similar to the query image but belonging to the product represented by the query picture can be determined. However, for the un-clicked image, since a search engine is used for querying, a lot of the same product pictures are usually returned, and the user clicks only a small part of the product pictures, such as one or two, so that many un-clicked images are still similar to the queried image, and therefore the un-clicked same images need to be filtered.

In the process of filtering the images which are not clicked, a multi-feature fusion method can be adopted to calculate the negative sample of the query image, wherein the distance between the query image query and the negative sample of the image is calculated as a combination of a plurality of functions, such as local features, previous version features and features of the pre-training imagenet, and the noise image can be filtered more accurately through the multi-feature fusion method. It should be noted that a similar method is also used to filter more accurate positive image samples.

Optionally, to further exploit all available data in the minibatch, except the previously generated triplet triplets, all image negative samples may be shared in a minibatch in order to further reduce the noise of the training image, and the original triplet loss is improved at the level of query image query rather than at the level of triplet triplets, thus reducing the effect of noise on the query image query. With triplets loss, images of low quality from buyers and high quality from sellers can be mapped to the same space by a non-linear transformation to be continuous, so that images of different finger quality can be more reliably matched.

Alternatively, the detection features can be jointly trained to cope with background complications.

Alternatively, a straightforward approach is to use an off-the-shelf object detection algorithm, such as Faster Region-based Convolutional Network method (FasterRCNN for short) or Single Shot multi-box detection (SSD for short). However, the process of separation using this method may not be optimal.

Alternatively, the joint optimization learning and detection may form a unity, and the detection mask may be represented by a step function. However, the step function M (X, Y) is not differentiable, and all components need to be differentiable in order to perform end-to-end training. In order to make the step function differentiable, the step function can be approximated by having a sufficiently large Sigmoid function. Another advantage of this approach is that no human is required to annotate any bounding box for training, and all that is required is for the user to click on the data at the time of training detection.

According to the embodiment of the invention, more meaningful image negative samples are found by mining the image negative samples clicked by the user, so that the training efficiency is increased; according to the joint training detection and the characteristics, the background influence is effectively eliminated, and meanwhile, the timeliness of determining the image characteristics is improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is also provided an apparatus for implementing the method for determining an image feature, as shown in fig. 8, the apparatus including: a receiving module 82 and a determining module 84.

The receiving module 82 is configured to receive an image to be queried; a determining module 84, configured to determine image features of an image to be queried according to an image feature model, where the image feature model is obtained by using multiple sets of triplet image samples through machine learning training, and the triplet image samples in the multiple sets of triplet image samples all include: the method comprises the steps of querying an image, and positive image samples and negative image samples of the image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

It should be noted here that the receiving module 82 and the determining module 84 correspond to steps S202 to S204 in embodiment 1, and the two modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 3

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute the program code of the following steps in the method for image feature of an application program: receiving an image to be inquired; determining the image characteristics of the image to be queried according to an image characteristic model, wherein the image characteristic model is obtained by using multiple groups of triple image samples through machine learning training, and the triple image samples in the multiple groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

Alternatively, fig. 9 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 9, the computer terminal 10 includes: one or more processors 102 (only one shown), a memory 104, and a transmission module 106.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for image feature in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, the method for implementing the image feature. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memories may further include a memory located remotely from the processor, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: receiving an image to be inquired; determining the image characteristics of the image to be queried according to an image characteristic model, wherein the image characteristic model is obtained by using multiple groups of triple image samples through machine learning training, and the triple image samples in the multiple groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

Optionally, the processor may further execute the program code of the following steps: acquiring click operation on a search image; determining that an image receiving a click operation in the search images is the image positive sample; determining the image negative sample based on the image which does not receive the click operation in the search image.

Optionally, the processor may further execute the program code of the following steps: filtering a plurality of images received by clicking operation by adopting a multi-feature fusion method to obtain an image positive sample; or filtering the image which does not receive the clicking operation by adopting a multi-feature fusion method to obtain the image negative sample; wherein the features in the multi-feature fusion include at least two of: local features, version features, training features.

Optionally, the processor may further execute the program code of the following steps: averaging the triple losses obtained by training the multiple groups of triple image samples to obtain a loss average value; and adjusting the image feature model according to the loss average value to obtain an updated image feature model.

Optionally, the processor may further execute the program code of the following steps: determining the initial characteristics of the image to be inquired according to the image characteristic model; and adopting a preset background function to filter and detect the background of the image to be inquired with the determined initial characteristics to obtain the image characteristics of the image to be inquired.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: receiving a request and an image to be inquired, wherein the request is used for acquiring the characteristics of the image to be inquired; responding to the request, feeding back image features of the image to be queried, wherein the image features are determined according to an image feature model, the image feature model is obtained by using multiple groups of triple image samples through machine learning training, and each triple image sample in the multiple groups of triple image samples comprises: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

Optionally, the processor may further execute the program code of the following steps: receiving a request for obtaining a triple image sample; responding to the request, detecting a click operation on a search image, and feeding back a triple image sample based on the click operation, wherein an image receiving the click operation in the search image is determined as the image positive sample; determining the image negative sample based on the image which does not receive the click operation in the search image.

The embodiment of the invention provides a scheme for determining image characteristics. The method comprises the steps of determining an image positive sample and an image negative sample of a query image in advance based on the clicking behavior of the image, using the query image, the image positive sample of the query image and the image negative sample of the query image as triple image samples, and obtaining an image feature model through machine learning training according to a plurality of groups of predetermined triple image samples, so that the received image to be queried can be identified by using the image feature model under the condition that the image feature of the image to be queried needs to be determined, the image feature of the image to be queried is obtained, the purpose of determining the image feature of the image to be queried is achieved, the technical effect of accurately determining the image feature of the image to be queried is achieved, and the technical problem that the image feature cannot be accurately determined is solved.

It can be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 9 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 4

The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be configured to store program codes executed by the method for determining image characteristics provided in embodiment 1.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving an image to be inquired; determining the image characteristics of the image to be queried according to an image characteristic model, wherein the image characteristic model is obtained by using multiple groups of triple image samples through machine learning training, and the triple image samples in the multiple groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring click operation on a search image; determining that an image receiving a click operation in the search images is the image positive sample; determining the image negative sample based on the image which does not receive the click operation in the search image.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: filtering a plurality of images received by clicking operation by adopting a multi-feature fusion method to obtain an image positive sample; or filtering the image which does not receive the clicking operation by adopting a multi-feature fusion method to obtain the image negative sample; wherein the features in the multi-feature fusion include at least two of: local features, version features, training features.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: averaging the triple losses obtained by training the multiple groups of triple image samples to obtain a loss average value; and adjusting the image feature model according to the loss average value to obtain an updated image feature model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining the initial characteristics of the image to be inquired according to the image characteristic model; and adopting a preset background function to filter and detect the background of the image to be inquired with the determined initial characteristics to obtain the image characteristics of the image to be inquired.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving a request and an image to be inquired, wherein the request is used for acquiring the characteristics of the image to be inquired; responding to the request, feeding back image features of the image to be queried, wherein the image features are determined according to an image feature model, the image feature model is obtained by using multiple groups of triple image samples through machine learning training, and each triple image sample in the multiple groups of triple image samples comprises: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving a request for obtaining a triple image sample; responding to the request, detecting a click operation on a search image, and feeding back a triple image sample based on the click operation, wherein an image receiving the click operation in the search image is determined as the image positive sample; determining the image negative sample based on the image which does not receive the click operation in the search image.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of determining image features, comprising:

receiving an image to be inquired;

determining the image characteristics of the image to be queried according to an image characteristic model, wherein the image characteristic model is obtained by using multiple groups of triple image samples through machine learning training, and the triple image samples in the multiple groups of triple image samples all comprise: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

2. The method of claim 1, wherein determining the image positive and negative examples based on click behavior on an image comprises:

acquiring click operation on a search image;

determining that an image receiving a click operation in the search images is the image positive sample;

determining the image negative sample based on the image which does not receive the click operation in the search image.

3. The method of claim 2, wherein,

determining that an image of the search images that received a click operation is the image positive sample comprises: filtering a plurality of images received by clicking operation by adopting a multi-feature fusion method to obtain an image positive sample; or, determining the negative image sample based on the image in which the click operation is not received in the search image comprises: filtering the image which does not receive the clicking operation by adopting a multi-feature fusion method to obtain the image negative sample;

wherein the features in the multi-feature fusion include at least two of: local features, version features, training features.

4. The method of claim 3, wherein the image negative examples are shared among the sets of triplet image samples.

5. The method of claim 1, wherein deriving the image feature model by machine learning training using sets of triplet image samples comprises:

averaging the triple losses obtained by training the multiple groups of triple image samples to obtain a loss average value;

and adjusting the image feature model according to the loss average value to obtain an updated image feature model.

6. The method of claim 1, wherein determining image features of the image to be queried from an image feature model comprises:

determining the initial characteristics of the image to be inquired according to the image characteristic model;

and adopting a preset background function to filter and detect the background of the image to be inquired with the determined initial characteristics to obtain the image characteristics of the image to be inquired.

7. The method of claim 6, wherein the predetermined context function comprises: a step function is approximated.

8. The method of claim 7, wherein the approximate step function is differentiable.

9. A method of determining image features, comprising:

receiving a request and an image to be inquired, wherein the request is used for acquiring the characteristics of the image to be inquired;

responding to the request, feeding back image features of the image to be queried, wherein the image features are determined according to an image feature model, the image feature model is obtained by using multiple groups of triple image samples through machine learning training, and each triple image sample in the multiple groups of triple image samples comprises: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

10. The method of claim 9, further comprising:

receiving a request for obtaining a triple image sample;

responding to the request, detecting a click operation on a search image, and feeding back a triple image sample based on the click operation, wherein an image receiving the click operation in the search image is determined as the image positive sample; determining the image negative sample based on the image which does not receive the click operation in the search image.

11. An apparatus for determining image features, comprising:

the receiving module is used for receiving an image to be inquired;

the determining module is configured to determine image features of the image to be queried according to an image feature model, where the image feature model is obtained by using multiple sets of triplet image samples through machine learning training, and the triplet image samples in the multiple sets of triplet image samples all include: the method comprises the steps of querying an image, and positive image samples and negative image samples of the queried image, wherein the positive image samples and the negative image samples are determined based on clicking behaviors of the image.

12. A storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method of determining image features of any of claims 1 to 10.

13. A processor for executing a program, wherein the program when executed performs the method of determining image features of any of claims 1 to 10.