WO2020244437A1 - 图像处理方法、装置及计算机设备 - Google Patents

图像处理方法、装置及计算机设备 Download PDF

Info

Publication number
WO2020244437A1
WO2020244437A1 PCT/CN2020/092834 CN2020092834W WO2020244437A1 WO 2020244437 A1 WO2020244437 A1 WO 2020244437A1 CN 2020092834 W CN2020092834 W CN 2020092834W WO 2020244437 A1 WO2020244437 A1 WO 2020244437A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash
image
feature
conditional probability
probability distribution
Prior art date
Application number
PCT/CN2020/092834
Other languages
English (en)
French (fr)
Inventor
揭泽群
袁粒
冯佳时
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP20818214.7A priority Critical patent/EP3982275A4/en
Publication of WO2020244437A1 publication Critical patent/WO2020244437A1/zh
Priority to US17/408,880 priority patent/US20210382937A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of image retrieval technology, and in particular to an image processing method, device and computer equipment.
  • the commonly used image retrieval method is to describe the image content by extracting the underlying characteristics of the image, and then use the features. The comparison determines whether the images are similar.
  • the embodiments of the present application provide an image processing method, device, and computer equipment, which implement the training of the image hash model through the center collision method, which greatly improves the training efficiency and accuracy.
  • This application provides an image processing method, which is executed by an image processing system, and the method includes:
  • the target hash code is determined from the image hash code database according to the description information.
  • Each hash code in the image hash code database is obtained by learning an image through an image hash model. It is a mathematical model that makes similar images projected to the same center point in space;
  • the retrieval target is determined from the image database.
  • This application provides an image processing method, which is executed by an image processing system, and the method includes:
  • the first number of feature points and the second number of target feature center points are respectively mapped to Hamming space to obtain the respective hash codes of the first number of training images and the respective Ha Hope center point
  • Network training is performed using the hash conditional probability distribution and the ideal conditional probability distribution to obtain an image hash model and an image hash code database, the image hash code database including the first number of trainings obtained by network training The target hash code of each image.
  • the present application also provides an image processing device, which includes:
  • the communication module is used to receive the description information of the retrieval target
  • the retrieval module is used to determine the target hash code from the image hash code database according to the description information.
  • Each hash code in the image hash code database is obtained by learning the image through the image hash model.
  • the image hash model is a mathematical model that makes similar images projected to the same center point in space;
  • the determining module is used to determine the retrieval target from the image library according to the target hash code and the correspondence between the image and the hash code.
  • the present application also provides an image processing device, the device includes: a first acquisition module configured to acquire a first number of training images;
  • the first processing module is configured to obtain respective feature points corresponding to the first number of training images in the feature embedding space according to the convolutional neural network, and the respective features of the second number class of images to which the first number of training images belong Center point
  • the second acquiring module is used to acquire the characteristic conditional probability distribution of the first number of characteristic points colliding with the corresponding characteristic center point, and the preset ideal conditional probability distribution;
  • the first network training module is configured to use the characteristic conditional probability distribution and the ideal conditional probability distribution to perform network training to obtain respective target characteristic center points of the second quantity class of images;
  • the mapping module is used to map the first number of feature points and the second number of target feature center points to Hamming space respectively to obtain the respective hash codes of the first number of training images and the second number The hash center point of each class image;
  • the third obtaining module is used to obtain the hash conditional probability distribution of the first number of hash codes colliding with the corresponding hash center point;
  • the second network training module is configured to use the hash conditional probability distribution and the ideal conditional probability distribution for network training to obtain an image hash model and an image hash code database, the image hash code database including network training The target hash code of each of the first number of training images.
  • the present application also provides a storage medium on which a program is stored, and the program is executed by a processor to realize the steps of the above image processing method.
  • the application also provides a computer cluster, the computer cluster includes at least one computer device, and the computer device includes:
  • the memory is used to store a program that implements the above image processing method
  • the processor is used to load and execute the program stored in the memory to implement the steps of the above image processing method.
  • an image processing method provided by an embodiment of the present application.
  • This method is based on the principle that similar images are projected to the same center point in the space (for example, the feature space and/or Hamming space).
  • the image hash model trained on the image in the image library processes the image of each image. Greek code to obtain the image hash code database.
  • the above-mentioned hash codes are used instead of image features for retrieval, which improves retrieval efficiency.
  • the above-mentioned image hash model is to learn the similarity between the image feature points and the center points of a class of images formed by similar images, rather than the similarity between the images.
  • the global distribution of the images can be learned. Improve the quality of hash codes and improve retrieval accuracy.
  • the complexity of learning the image hash model is greatly reduced, the training time is shortened, and the learning efficiency is improved.
  • FIG. 1A is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • FIG. 1B is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • FIG. 2 is a schematic flowchart of another image processing method provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of the application of an image processing method provided by an embodiment of this application.
  • FIG. 4 is a schematic flowchart of another image processing method provided by an embodiment of this application.
  • FIG. 5a is a schematic diagram of an image retrieval application in the image processing method provided by an embodiment of this application.
  • Figure 5b is a schematic diagram of another image retrieval application in the image processing method provided by the embodiment of the application.
  • FIG. 6 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of another image processing device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of another image processing apparatus provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of another image processing device provided by an embodiment of the application.
  • FIG. 10 is a schematic diagram of the hardware structure of a computer device provided by an embodiment of the application.
  • the inventor of the present application found that in the application of the existing image retrieval method based on the hash algorithm, the time complexity of the n images to be retrieved is O(log(n!)). In actual application scenarios, n The value of is often very large, which makes it difficult to learn the global distribution of the image. In response to this problem, the inventor of the present application continues to improve the existing image database retrieval method.
  • this application proposes to combine a convolutional neural network with a hash algorithm-based image retrieval method, that is, by training a convolutional neural network to map the image into low-dimensional features, and then convert the features into binary codes for retrieval. To achieve the purpose of improving retrieval accuracy and retrieval efficiency.
  • this application proposes to design a hash method based on learnable dynamic data center similarity on the basis of commonly used hash algorithms to achieve a more accurate and efficient image processing solution, which can be used for image retrieval applications Scenes.
  • This method is based on a simple geometric intuition, that is, when all similar data is projected to the same point in space (such as feature space and/or Hamming space), and dissimilar data is projected to different points in space,
  • the ideal hash codes of these data can be obtained, and the high-precision image database retrieval can be realized by using the hash codes in the ideal state.
  • this application can learn a series of center points in the feature space of the training image.
  • the center points can be called feature center points. These feature center points retain the semantic information of the image and the similarity with the original image.
  • the feature point of each training image can be collided with its corresponding feature center point (that is, the feature center point of the category of the image feature point), that is, the center collision.
  • the collision is derived from the collision library.
  • the collision between the feature point of the training image and the corresponding feature center point means that the similarity between the feature point and the feature center point reaches a preset threshold.
  • each type of image corresponds to a feature center point.
  • its feature point can be directly compared with the feature center point to quickly identify the category of the image to be retrieved, and based on the category, you can quickly retrieve the desired image.
  • the present application can also project the obtained feature points and feature center points of the training images to the Hamming space through the hash layer to obtain the hash codes corresponding to each training image and the hash center points corresponding to the feature center points.
  • center collisions will also occur in Hamming space. In this way, the center collision occurs in the two spaces of the feature space and the Hamming space, and the consistency of the center similarity in the feature space and the Hamming space can be maintained.
  • This application applies the above concept to large-scale image database retrieval.
  • the related hash algorithm-based image database retrieval method such as the method described in the background art section above
  • it can not only learn the global distribution of data, but also focus on The collision hash can learn similarity information from multiple data points each time, which greatly improves the learning efficiency, shortens the training time, and further improves the efficiency and accuracy of image retrieval.
  • the image processing method provided in the embodiments of the present application can be applied to an image processing system.
  • the image processing system can be deployed in a cloud computing cluster (including at least one cloud computing device), through artificial intelligence cloud services, which are generally called AIaaS (AI as a Service, Chinese as "AI as Service”).
  • AIaaS AI as a Service, Chinese as "AI as Service”
  • AIaaS AI as a Service, Chinese as "AI as Service”
  • the image processing system can also be deployed in physical devices, such as terminals and/or servers, and provided to users in the form of clients.
  • the terminal may obtain the installation package of the image processing system, and then run the installation package, so as to deploy the client of the image processing system on the terminal.
  • the terminal runs the client to realize image processing.
  • the terminal runs the client and interacts with the server of the image processing system deployed on the server to implement image processing.
  • the method includes:
  • S102 The image processing system receives the description information of the retrieval target.
  • the retrieval target refers to the image to be retrieved.
  • the user in order to retrieve the image to be retrieved, the user can input the description information of the retrieval target through a graphical user interface (GUI), and the description information can be text or image. That is to say, the method supports searching images by text or searching images by image.
  • GUI graphical user interface
  • the description information of the target is text
  • the description information may be at least one sentence.
  • the descriptive information may be "The weather is very good today, and there is a golden retriever dog hurt running wild on the grass.”
  • the description information may also be at least one keyword.
  • the description information may be "sunny, grass, golden retriever, and wild”.
  • the description information of the target is an image
  • the image is specifically an image similar to the retrieval target, or even the same image as the retrieval target.
  • the description information can be a low-resolution image or an image with a watermark, and the user can input the description information to search to obtain a high-resolution image or an image without a watermark.
  • the user can input the above description information by voice.
  • the user can input sentences or keywords through voice, or input the address of an image.
  • the image processing system can convert speech into text, or obtain the input image according to the address of the input image in the speech, and then retrieve the image by text, or retrieve the image by image.
  • the application can also retrieve images through the image processing system.
  • the image processing system may also receive the description information of the retrieval target sent by the application.
  • S104 The image processing system determines the target hash code from the image hash code database according to the description information.
  • Each hash code in the image hash code database is obtained by learning an image through an image hash model, and the image hash model is a mathematical model that makes similar images projected to the same center point in space.
  • each type of image corresponds to a center point.
  • the center points can also be divided into different types. For example, when a similar image is projected into the feature space, the center point is the feature center point, and when the similar image is projected into the Hamming space, the center point is the hash center point. Based on this, in some implementations, each type of image may correspond to a feature center point and/or a hash center point.
  • the image processing system may determine a center point (such as a feature center point or a hash center point) corresponding to the description information based on the description information, and then determine the target hash code based on the center point.
  • a center point such as a feature center point or a hash center point
  • the image processing system can determine the category to which the retrieval target belongs based on the description information, and then determine the hash center point corresponding to the category, and then obtain the image hash code database from the image hash code database according to the hash center point. Determine the target hash code in.
  • the image processing system can determine the hash code corresponding to the image. Specifically, the image is input into the image hash model to obtain the hash code, which may be referred to as the reference hash code. The image processing system can then determine the target hash code matching the reference hash code from the image hash code database. Specifically, the image processing system may determine a hash code near the reference hash code (that is, a hash code whose distance from the reference hash code is within a preset range) as the target hash code.
  • the image processing system determines the retrieval target from the image database according to the target hash code and the correspondence between the image and the hash code.
  • the hash codes in the image hash code database have a one-to-one correspondence with the images in the image library, and the image processing system can determine the image matching the target hash code from the image library according to the correspondence relationship as the retrieval target.
  • the image matching the target hash code may be an image corresponding to the target hash code.
  • the image processing system may also present the retrieval target through a GUI for the user to view.
  • the key to image processing lies in the image hash model.
  • the image hash model can be obtained through training.
  • the image processing system can construct an initial image hash model, which includes a feature extraction network and a feature embedding network.
  • the feature extraction network is used to extract image features.
  • the feature extraction network may be convolutional neural networks (convolutional neural networks, CNN), recurrent neural networks (recurrent neural networks, RNN), or deep neural networks (deep neural networks, DNN), etc.
  • the feature embedding network includes a feature embedding layer and a hash layer, wherein the feature embedding layer is used to project the features extracted by the feature extraction network to the embedding space (ie feature space) to obtain feature points, and the hash layer is used to project the feature points Go to Hamming Space.
  • the hash layer may include multiple fully connected layers, and after the multiple fully connected layers, a hyperbolic tangent function may also be included to integrate continuous vectors into binary vectors.
  • the loss function (also referred to as the objective function) of the initial image hash model may be determined according to the first loss function of similar images projected to the feature space and/or the second loss function of similar images projected to the Hamming space. If only the center collision is performed in the feature space, the loss function of the initial image hash model is determined according to the above-mentioned first loss function. If only the center collision is performed in the Hamming space, the loss function of the initial image hash model is determined according to the above-mentioned first loss function. The function is determined. If the center collision is performed in both the feature space and the Hamming space, the loss function of the initial image hash model is determined according to the above-mentioned first loss function and second loss function.
  • the image processing system inputs the initial image hash model according to the training image, and updates the parameters of the initial image hash model by the loss value determined by the loss function to train the initial image hash model to obtain the image hash model.
  • the first loss function may be determined according to the characteristic condition probability distribution and the ideal condition probability distribution.
  • the characteristic conditional probability distribution is used to characterize the probability that a characteristic point is projected to a characteristic center point.
  • the second loss function may be determined according to the hash conditional probability distribution and the ideal conditional probability distribution.
  • the hash conditional probability distribution is used to characterize the probability that a hash code is projected to the hash center point.
  • the loss function of the initial image hash model is based on the first loss function of similar images being projected to the feature space, the second loss function of similar images being projected to the Hamming space, and the feature space
  • the eigenvector of is determined by the third loss function of binarization.
  • the above mainly introduces the image processing method from the perspective of user interaction, and then the image processing method from the technical perspective.
  • FIG. 1B is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • the method provided in this implementation can be applied to a computer device.
  • the computer device can be a server or a terminal device.
  • the product type of the computer device in this application is It is not limited, and this embodiment mainly describes the training process of the image hash model used in the image retrieval application scenario, that is, how to train the realization process of the image hash model through the center collision method, as shown in Figure 1B, the method can Including but not limited to the following steps:
  • Step S11 acquiring the first number of training images
  • the first number of training images can be recorded as ⁇ x 1 , x 2 ,..., x i ,..., x n ⁇ , which can be seen ,
  • the first number can be recorded as n, and the specific value of n is not limited in this embodiment.
  • the image library composed of n training images can be recorded as the training image library, and before the model training, the second number of image categories contained in the n training images can be determined in advance, and the second number can be It is marked as z, and its specific value is not limited.
  • this application does not limit the method for determining the label of the training image. In practical applications, after a set of numbers of the training image label is determined, the label can be used for subsequent conditional probability calculations.
  • Step S12 according to the convolutional neural network, obtain respective feature points of the first number of training images in the feature embedding space, and respective feature center points of the second number of classes of images to which the first number of training images belong;
  • the first number of training images can be sequentially input to the convolutional neural network to obtain the image features of each training image, and then the image features are mapped to the feature embedding space to obtain the feature points corresponding to the first number of training images.
  • the convolutional neural network may include a convolutional layer, a pooling layer, a fully connected layer, etc., in this embodiment, multiple convolutional layers can be used to implement feature extraction of training images to obtain image features, and specific convolutional layer processing The process is not detailed.
  • the process of mapping the extracted image features to the feature embedding space is actually a dimensionality reduction process, that is, the high-dimensional feature vector (that is, composed of the extracted image features) is mapped to the low-dimensional space (that is, the feature in this embodiment). Space), this embodiment does not elaborate on the implementation process of this feature mapping.
  • this embodiment may adopt a word embedding method.
  • Embedding module bag of words model
  • the high-dimensional vector obtained by the convolutional neural network is converted into low-dimensional features.
  • Vector the specific implementation method of this word embedding method is not described in detail in this application, and the method of dimensionality reduction processing of feature vectors proposed in this application is not limited to the word embedding method given in this application.
  • each feature point may be denoted as ⁇ v 1, v 2, ... , v i, ..., v n ⁇ , each feature point may be A multi-dimensional vector can be used to characterize the content of the corresponding training image, and this application does not limit the content of each feature point.
  • the feature center point obtained by the fully connected layer needs to be trained by the loss function (the objective function given in the following embodiment) to minimize the distance from the feature point of the same training image If the sum is the smallest, it can be considered that the feature center point obtained this time is the best feature center point of this type of image, and the network training process can refer to the description of the corresponding part of the following embodiment.
  • the loss function the objective function given in the following embodiment
  • Step S13 Obtain the feature condition probability distribution of the first number of feature points colliding with the corresponding feature center point, and the preset ideal condition probability distribution;
  • V i may represent the i-th training feature points of the image
  • the center point C of k may represent the characteristics of the image class k
  • v i ) may be the conditional probability function, specifically showing the feature point i and V corresponding to the conditional probability of the feature center point c k possible collision, characterized in that the central point c k is refers to a characteristic having a center point of the image belongs to the class of the training image feature points of the v i
  • exp ⁇ may represent a natural number exponential base e function
  • D () may represent the distance function, such as Euclidean distance or other parameters calculated distance, and therefore, D (v i, c k ) may represent a particular feature may be a feature point feature and v i corresponding to the center point of a distance c k , D (v i, c m may represent a particular feature point corresponding to v i and the center point of the characteristic features possibly c m
  • the above formula (1) may specifically be the conditional probability function is normal function, the maximum value of the normal function corresponding features may be the center point c k, for the training image feature points closer to the feature v i the center point c k, the larger the corresponding conditional probability, if the feature points of a training images and wherein the central point v i c k collision, i.e.
  • the corresponding conditional probability value may be a maximum, indicating that the training images belonging to the determined wherein the center point of the image corresponding to the class; conversely, the conditional probability of a feature point of training images and wherein v i c k of the central point is smaller, the greater the gap between the training image and the center point of the image corresponding to the type of feature, i.e. In other words, the image classification of the training image above may be wrong, and the training image may not belong to this type of image.
  • step S13 is not limited to the calculation method represented by formula (1).
  • the label l i of a certain training image is equal to the category k of the corresponding image
  • the feature point of the training image collides with the feature center point of this type of image
  • the training image belongs to this type of image. Therefore, if p (c k
  • v i) p 0 (c k
  • Step S14 Use the characteristic conditional probability distribution and the ideal conditional probability distribution to perform network training to obtain the respective target characteristic center points of the second quantity category of images;
  • the present application may adjust the configuration parameters of the convolutional neural network according to the first similar requirement of the characteristic conditional probability distribution and the ideal conditional probability distribution to obtain the respective target characteristic center points of the second quantity category of images.
  • the KL divergence Kullback–Leibler divergence
  • the present application can perform center collision learning in the above manner to obtain more accurate characteristic center points of various training images, that is, determine the target characteristic center points of center collisions of various images.
  • the above method can be used as the objective function of the center collision learning, that is, the center collision learning is controlled by the objective function, wherein the objective function (ie, the first loss function described above) can be but is not limited to In formula (3):
  • B may represent the set of feature points corresponding to each training image, and the specific calculation process of the KL divergence is not described in detail in this application.
  • this embodiment can adjust the configuration parameters of the convolutional neural network, such as the weight value, and then use the adjusted
  • the new feature center points are reacquired, that is, the z feature center points obtained above are updated until the newly obtained feature conditional probability distribution and the ideal conditional probability distribution meet the first similarity requirements, and the final result
  • Each feature center point is used as the target feature center point.
  • Step S15 the first number of feature points and the second number of target feature center points are respectively mapped to the Hamming space to obtain the respective hash codes of the first number of training images and the respective hash center points of the second number of classes of images ;
  • H i hash code of the present embodiment may be implemented by hashing layer, the first number of training images corresponding to the feature point, to map the space Hamming give the corresponding training image, similarly, can be obtained above a certain z
  • the characteristic center point is input to the hash layer and projected to the Hamming space to obtain the corresponding hash center point.
  • the loss function shown in the following formula (4) may be used to first integrate the obtained feature points and feature center points of each training image into binary data.
  • the loss function (that is, the third loss function described above) can be:
  • Step S16 Obtain the hash conditional probability distribution of the first number of hash codes colliding with the corresponding hash center point;
  • Step S17 using hash conditional probability distribution and ideal conditional probability distribution for network training to obtain an image hash model
  • the hash center collision learning is similar to the implementation method of the feature center collision learning.
  • the hash conditional probability configuration parameters are adjusted, the The second similarity requirement may be that the degree of matching between the hash condition probability distribution and the ideal condition probability distribution reaches a similarity threshold.
  • h i can represent the target hash code of the i-th training image
  • c k h can represent the hash center point of the k-th image
  • represents the configuration parameter of the hash conditional probability.
  • the objective function ie, the second loss function described above
  • the objective function for realizing hash center collision learning can be the following formula (6), but it is not limited to this:
  • the global objective function for center collision learning in this application can be:
  • ⁇ 1 and ⁇ 2 can represent the corresponding objective function or loss function, the weight in the entire global objective function, the specific value is not limited, this application is how to use the global objective function to achieve network training to obtain the required image hash The realization process of the model is no longer detailed.
  • the objective function of the above formula (3) can be used to control the collision learning of the center to obtain various training images
  • the more accurate target feature center; the quantized loss function of the above formula (4) is used to realize the binarization of the target feature center and feature point, and then the processed target feature center and feature point are mapped to the Hamming space
  • the objective function of the above formula (6) can be used to control the hash center collision learning to obtain more accurate hash codes of various training images.
  • determine the best feature center point and the best training image The hash function used by the hash code is used as an image hash model to quickly and accurately obtain the hash code of the input image in practical applications.
  • this application learns the similarity between the feature points of the training image and the corresponding center point, not only learning the global image Distribution, and greatly shorten the network training time, improve the learning efficiency and the accuracy of the image hash code.
  • the refinement method of the image processing method may include:
  • Step S21 Obtain the first number of training images and their corresponding labeled labels
  • Step S22 Obtain an indicator function that is similar to the training image according to the obtained label
  • Step S23 sequentially inputting the first number of training images into the convolutional neural network to obtain image features of each training image
  • Step S24 Map the respective image features of the first number of training images to the feature embedding space to obtain feature points of the corresponding training images;
  • Step S25 through learning of a fully connected layer of the convolutional neural network, the respective characteristic center points of the multiple types of images to which the first number of training images belong are obtained;
  • a parameter vector composed of 1 can be input to the fully connected layer of the convolutional neural network for learning, and a characteristic center point c j of each type of training image is generated, and The characteristic center points of various training images obtained are used as vector elements to form a characteristic center point vector.
  • the size of this fully connected layer is z ⁇ d
  • the number of training image categories, that is, one type of training image corresponds to a feature center point
  • d represents the dimension of the feature vector composed of image features extracted by the convolutional layer.
  • the specific values of z and d are not limited in this application.
  • the multiple feature center points obtained in step S25 can be denoted as ⁇ c 1 , c 2 ,..., c z ⁇ in turn.
  • Step S26 acquiring the characteristic distance between each of the first number of characteristic points and the corresponding characteristic center point
  • Step S27 using the indicator function and the acquired feature distance to obtain the feature conditional probability of each of the first number of feature points and the corresponding feature center point;
  • Step S28 Determine the feature condition probability distribution of the first number of feature points colliding with the corresponding feature center point from the acquired feature condition probability
  • step S26 to step S28 can refer to the above embodiment The description of the corresponding part.
  • Step S29 the first KL divergence between the acquired characteristic conditional probability distribution and the ideal conditional probability distribution
  • Step S210 Use the first KL divergence to adjust the configuration parameters of the convolutional neural network until the new first KL divergence obtained by using the adjusted configuration parameters meets the first similarity requirement;
  • the first similarity requirement in step S29 It may be that the degree of matching between the actual obtained characteristic conditional probability distribution and the ideal characteristic conditional probability distribution is greater than the threshold, that is, the characteristic conditional probability distribution is very close to the ideal characteristic conditional probability distribution, and it may even be required that the characteristic conditional probability distribution and the ideal characteristic conditional probability The distribution is the same.
  • Step S211 taking the respective characteristic center points of the images of the second quantity finally obtained as the target characteristic center points of the corresponding images
  • Step S212 Binarize the first number of feature points and the second number of target feature center points respectively;
  • step S212 the above formula (4) can be used to implement step S212, but it is not limited to this binary processing method.
  • Step S213 Map the first number of binarized feature points and the second number of binarized target feature center points to Hamming space respectively to obtain the hash codes of the corresponding training images and the hash center points of various images;
  • Step S214 Obtain the hash condition probability distribution that matches the hash code of each training image with the hash center point of the corresponding image;
  • step S214 may include:
  • the hash condition probability distribution of each of the first number of hash codes colliding with the corresponding hash center point is determined.
  • Step S215 According to the second similarity requirement of the hash conditional probability distribution and the ideal conditional probability distribution, adjust the hash conditional probability configuration parameters to obtain the image hash model and the respective target hash codes of the first number of training images ;
  • step S215 may specifically include:
  • the first number of hash codes finally obtained is used as the target hash code of the corresponding training image.
  • the center collision between the feature embedding space and the Hamming space ensures the consistency of the similarity of the center points in the two spaces, thereby ensuring the reliability and accuracy of image retrieval.
  • this application can also use the formulas used in the above steps for image feature extraction, feature mapping in feature embedding space, and hash layer processing methods to form a hash function, and use the finally learned hash function as the image hash function. Greek model, the specific content of which will not be described in detail in this embodiment.
  • Step S216 using the target hash codes of the first number of training images to generate an image hash code database.
  • this embodiment can implement efficient coding of the training image library by using the central collision hash according to the above method to obtain the target hash code of each training image, and use the target hash code of these training images to construct the image hash code
  • the database is used for image retrieval in actual application scenarios.
  • this application does not limit the storage method of each target hash code in the image hash code search library.
  • Different feature center points and their corresponding target hash codes can be used to generate multiple hash code groups, and
  • the multiple hash code groups constitute the image hash code search library, that is, the target hash codes in the image hash code search library are classified and stored, but it is not limited to this kind of hash code storage method.
  • this application can extract the depth features of the image through the deep learning network, and obtain the image hash model through the center collision hashing method. It can be seen that for n training images and z center points, the learning method of this application is The time complexity is O(nz). Compared with related technologies, the learning efficiency is greatly improved, the training time is shortened, and the global distribution of the data can be learned, which improves the accuracy of the output data of the obtained image hash model.
  • this embodiment mainly analyzes the obtained image hash model and graph hash code database in the image retrieval application scenario
  • the usage method is described.
  • the method can include:
  • Step S31 obtaining a retrieval image
  • Step S32 input the retrieved image into the image hash model to obtain the hash code of the retrieved image
  • the implementation process of obtaining the corresponding hash code is shown in Figure 5a.
  • the processing process of the hash function refer to the corresponding part of the above embodiment description.
  • this application can also use the hash code of the retrieved image to update the image hash code database to expand the image hash code database and improve the reliability and accuracy of image retrieval.
  • Step S33 Obtain the Hamming distance between the hash code of the retrieved image and the hash code in the image hash code database;
  • Step S34 using the Hamming distance to obtain an image retrieval result.
  • this application may use the k-nearest neighbor method of Hamming distance to perform image retrieval, but it is not limited to this image retrieval method.
  • the retrieved image can be directly input into the hash function obtained by the above learning, that is, the image hash model, to obtain the hash code of the retrieved image .
  • the hash function obtained by the above learning, that is, the image hash model
  • the query At least one image with similar images, or at least one image that is not similar to the image to be queried, and other search results, etc., meet the user's image search requirements.
  • the hash codes of each image are not directly used to obtain the similarity between corresponding images, but the hash codes of the images are compared with the hash center points of various images.
  • the comparison can quickly and accurately obtain the image type to which the image belongs, which greatly shortens the training time and improves the efficiency and accuracy of image classification and retrieval.
  • the embodiment of the present application also provides an image processing device, the device including:
  • the communication module is used to receive the description information of the retrieval target
  • the retrieval module is used to determine the target hash code from the image hash code database according to the description information.
  • Each hash code in the image hash code database is obtained by learning the image through the image hash model.
  • the image hash model is a mathematical model that makes similar images projected to the same center point in space;
  • the determining module is used to determine the retrieval target from the image library according to the target hash code and the correspondence between the image and the hash code.
  • the description information is text
  • the retrieval module is specifically used for:
  • the target hash code is determined from the image hash code database according to the hash center point.
  • the description information is an image
  • the retrieval module is specifically used for:
  • the target hash code matching the reference hash code is determined from the image hash code database.
  • the communication module is specifically used for:
  • GUI graphical user interface
  • the device also includes:
  • the presentation module is used to present the retrieval target through the GUI.
  • the device further includes:
  • the construction module is used to construct an initial image hash model.
  • the initial image hash model includes a feature extraction network and a feature embedding network.
  • the feature embedding network includes a feature embedding layer and a hash layer.
  • the loss function is determined according to the first loss function where the similar image is projected to the feature space and/or the second loss function where the similar image is projected to the Hamming space;
  • the training module is used to input the initial image hash model according to the training image, and update the parameters of the initial image hash model by the loss value determined by the loss function to train the initial image hash model to obtain the image hash model.
  • the first loss function is determined according to a characteristic conditional probability distribution and an ideal conditional probability distribution, and the characteristic conditional probability distribution is used to characterize the probability that a characteristic point is projected to a characteristic center point.
  • the second loss function is determined according to a hash conditional probability distribution and an ideal conditional probability distribution, and the hash conditional probability distribution is used to characterize the probability of a hash code being projected to the hash center point.
  • the loss function of the initial image hash model is based on the first loss function of similar images projected to the feature space, the second loss function of similar images projected to the Hamming space, and the features of the feature space.
  • the third loss function of the vector binarization is determined.
  • FIG. 6 is a schematic structural diagram of an image processing apparatus provided by an embodiment of this application.
  • the apparatus may be applied to computer equipment. As shown in FIG. 6, the apparatus may include:
  • the first acquisition module 101 is configured to acquire the first number of training images
  • the first processing module 102 is configured to obtain the corresponding feature points of the first number of training images in the feature embedding space according to the convolutional neural network, and the respective feature points of the second number class of images to which the first number of training images belong Feature center point
  • the first processing module 102 may include:
  • the first processing unit 1021 is configured to input the first number of training images into a convolutional neural network to obtain respective image features of the first number of training images;
  • the feature mapping unit 1022 is configured to map each of the image features of the first number of training images to a feature embedding space to obtain feature points of the corresponding training images;
  • the second processing unit 1023 is configured to use the learning result of a fully connected layer of the convolutional neural network to obtain the respective characteristic center points of the second number category images to which the first number of training images belong.
  • the second acquisition module 103 is configured to acquire the characteristic conditional probability distribution of the collision of the first number of characteristic points with the corresponding characteristic center point, and the preset ideal conditional probability distribution;
  • the device may also include:
  • a labeling module configured to obtain respective labels of the first number of training images, wherein the labels of similar training images are the same;
  • An indication function acquisition module which is used to obtain an indication function similar to the training image according to the label
  • the aforementioned second acquisition module 103 may include:
  • the first acquiring unit is configured to acquire the characteristic distance between each of the first number of characteristic points and the corresponding characteristic center point;
  • the second acquiring unit is configured to acquire the characteristic conditional probability of each of the first number of characteristic points and the corresponding characteristic center point by using the indicator function and the acquired characteristic distance;
  • the first determining unit is configured to determine the characteristic condition probability distribution of the first number of characteristic points colliding with the corresponding characteristic center point from the acquired characteristic condition probability;
  • each similar training image is mapped to a corresponding center point
  • each dissimilar training image is mapped to a different center point
  • the center point includes the feature center Point and the hash center point.
  • the first network training module 104 is configured to use the characteristic conditional probability distribution and the ideal conditional probability distribution to perform network training to obtain respective target characteristic center points of the second quantity class of images;
  • the first network training module 104 may be specifically configured to adjust the configuration parameters of the convolutional neural network according to the first similarity requirements of the characteristic conditional probability distribution and the ideal conditional probability distribution to obtain the first The center point of the target feature of each of the two types of images.
  • the first network training module 104 may include:
  • the third acquiring unit is configured to acquire the first KL divergence of the characteristic conditional probability distribution and the ideal conditional probability distribution
  • the first adjustment unit is configured to use the first KL divergence to adjust the configuration parameters of the convolutional neural network until the new first KL divergence obtained by using the adjusted configuration parameters meets the first similarity requirement;
  • the second determining unit is configured to use the respective characteristic center points of the images of the second quantity category finally obtained as the target characteristic center points of the corresponding images;
  • the mapping module 105 is configured to map the first number of feature points and the second number of target feature center points to Hamming space respectively to obtain the respective hash codes of the first number of training images and the second number The hash center point of each quantity image;
  • the third obtaining module 106 is configured to obtain the hash conditional probability distribution of the first number of hash codes colliding with the corresponding hash center point;
  • the third obtaining module 106 may include:
  • the fourth obtaining unit is configured to obtain the hash code distance between each of the first number of hash codes and the corresponding hash center point;
  • a fifth obtaining unit configured to obtain the hash conditional probability of each of the first number of hash codes and the corresponding hash center point by using the indicator function and the obtained hash code distance;
  • the third determining unit is configured to determine the hash condition probability distribution of each of the first number of hash codes colliding with the corresponding hash center point from the obtained hash condition probability.
  • the second network training module 107 is configured to use the hash conditional probability distribution and the ideal conditional probability distribution to perform network training to obtain an image hash model.
  • the second network training module 107 may be specifically configured to adjust the configuration parameters of the hash conditional probability according to the second similarity requirement of the hash conditional probability distribution and the ideal conditional probability distribution to obtain an image hash model .
  • the second network training module 107 may include:
  • the sixth acquiring unit is configured to acquire the second KL divergence of the hash conditional probability distribution and the ideal conditional probability distribution,
  • the second adjustment unit is configured to use the second KL divergence to adjust and adjust the hash conditional probability configuration parameter until the new second KL divergence obtained by using the adjusted configuration parameter meets the second similarity requirement;
  • the fourth determining unit is used to use the first number of hash codes finally obtained as the target hash code of the corresponding training image.
  • the device may further include:
  • the binarization processing module 108 is configured to perform binarization processing on the first number of feature points and the second number of target feature center points respectively;
  • mapping module 105 may be specifically configured to map the first number of binarized feature points and the second number of binarized target feature center points to Hamming space, respectively, to obtain the The respective hash codes of the first number of training images and the respective hash center points of the second number of classes of images.
  • the device may further include:
  • the target hash code acquisition module 109 is configured to adjust the hash conditional probability configuration parameters according to the second similar requirement of the hash conditional probability distribution and the ideal conditional probability distribution to obtain the respective training images of the first number Target hash code;
  • the hash code database obtaining module 110 is used to form an image hash code database from the obtained first number of target hash codes.
  • the foregoing apparatus may further include:
  • the image acquisition module 111 is used to acquire retrieval images
  • the second processing module 112 is configured to input the search image into the image hash model to obtain the hash code of the search image
  • the Hamming distance obtaining module 113 is configured to obtain the Hamming distance between the hash code of the retrieved image and the hash code in the image hash code database;
  • the image retrieval module 114 is configured to use the Hamming distance to obtain an image retrieval result.
  • This application also provides an embodiment of a storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement each step of the above-mentioned image processing method.
  • a storage medium on which a computer program is stored, and the computer program is executed by a processor to implement each step of the above-mentioned image processing method.
  • the embodiment of the present application also provides a schematic diagram of the hardware structure of a computer device.
  • the computer device may be a server that implements the above-mentioned image processing method, or a terminal device.
  • the type is not limited.
  • the computer device may include a communication interface 21, a memory 22, and a processor 23;
  • the communication interface 21, the memory 22, and the processor 23 may communicate with each other through a communication bus, and the number of the communication interface 21, the memory 22, the processor 23, and the communication bus may be at least one.
  • the communication interface 21 may be an interface of a communication module, such as an interface of a GSM module;
  • the processor 23 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 22 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the memory 22 stores a program
  • the processor 23 calls the program stored in the memory 22 to implement the steps of the image processing method applied to the computer device.
  • the specific implementation process refer to the description of the corresponding part of the above method embodiment.
  • the steps of the method or algorithm described in the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种图像处理方法、装置及计算机设备,依据卷积神经网络,得到各训练图像在特征嵌入空间对应的特征点,及各训练图像所属第二数量类图像各自的特征中心点,通过中心碰撞方式进行网络训练,得到各类图像对应的目标特征中心点,之后,得到的各特征点及各目标特征中心点映射到汉明空间,得到第一数量个哈希码及第二数量个哈希中心点,此时,仍可以通过中心碰撞方式进行网络训练,得到图像哈希模型。通过学习各训练图像的哈希码之间的相似度,以及训练图像特征点与对应的中心点之间的相似度,不仅学习到图像的全局分布,且极大缩短了网络训练时间,提高了学习效率及图像哈希码的精度。

Description

图像处理方法、装置及计算机设备
本申请要求于2019年06月06日提交中国国家知识产权局、申请号为201910492017.6、申请名称为“图像处理方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像检索技术领域,具体涉及一种图像处理方法、装置及计算机设备。
背景技术
随着互联网图像的日益增多,如何快速且准确地为用户提供所需的图像资源显得越来越重要,目前常用的图像检索方法是通过提取图像的底层特性对图像内容进行描述,然后再利用特征比对判断是否为相似图像。
为了提高检索准确率,往往需要提取成百上千维的图像特征,这就需要庞大的存储空间来保存图像特征库,且每次特征比对的工作量很大,极大降低了检索速度。
如何提供一种高效的、精准的图像检索方法成为亟待解决的技术问题。
发明内容
有鉴于此,本申请实施例提供一种图像处理方法、装置及计算机设备,通过中心碰撞方式实现图像哈希模型的训练,极大提高了训练效率及准确性。
为实现上述目的,本申请实施例提供如下技术方案:
本申请提供了一种图像处理方法,由图像处理系统执行,所述方法包括:
接收检索目标的描述信息;
根据所述描述信息从图像哈希码数据库中确定目标哈希码,所述图像哈希码数据库中的每个哈希码是通过图像哈希模型对图像进行学习得到,所述图像哈希模型是使得相似图像投射到空间中同一中心点的数学模型;
根据所述目标哈希码以及图像与哈希码的对应关系,从图像库中确定所述检索目标。
本申请提供了一种图像处理方法,由图像处理系统执行,所述方法包括:
获取第一数量个训练图像;
依据卷积神经网络,得到所述第一数量个训练图像在特征嵌入空间各自对应的特征点,及所述第一数量个训练图像所属第二数量类图像各自的特征中心点;
获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,及预设的理想条件概率分布;
利用所述特征条件概率分布与所述理想条件概率分布进行网络训练,得到所述第二数量类图像各自的目标特征中心点;
将所述第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点;
获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布;
利用所述哈希条件概率分布与所述理想条件概率分布进行网络训练,得到图像哈希模型及图像哈希码数据库,所述图像哈希码数据库包括网络训练得到的所述第一数量个训练图像各自的目标哈希码。
本申请还提供了一种图像处理装置,所述装置包括:
通信模块,用于接收检索目标的描述信息;
检索模块,用于根据所述描述信息从图像哈希码数据库中确定目标哈希码,所述图像哈希码数据库中的每个哈希码是通过图像哈希模型对图像进行学习得到,所述图像哈希模型是使得相似图像投射到空间中同一中心点的数学模型;
确定模块,用于确定根据所述目标哈希码以及图像与哈希码的对应关系,从图像库中确定所述检索目标。
本申请还提供了一种图像处理装置,所述装置包括:第一获取模块,用于获取第一数量个训练图像;
第一处理模块,用于依据卷积神经网络,得到所述第一数量个训练图像在特征嵌入空间各自对应的特征点,及所述第一数量个训练图像所属第二数量类图像各自的特征中心点;
第二获取模块,用于获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,及预设的理想条件概率分布;
第一网络训练模块,用于利用所述特征条件概率分布与所述理想条件概率分布进行网络训练,得到所述第二数量类图像各自的目标特征中心点;
映射模块,用于将所述第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点;
第三获取模块,用于获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布;
第二网络训练模块,用于利用所述哈希条件概率分布与所述理想条件概率分布进行网络训练,得到图像哈希模型及图像哈希码数据库,所述图像哈希码数据库包括网络训练得到的所述第一数量个训练图像各自的目标哈希码。
本申请还提供了一种存储介质,其上存储有程序,所述程序被处理器执行,实现如上图像处理方法的各步骤。
本申请还提供了一种计算机集群,所述计算机集群包括至少一台计算机设备,所述计算机设备包括:
通信接口;
存储器,用于存储实现如上图像处理方法的程序;
处理器,用于加载并执行所述存储器存储的程序,实现如上图像处理方法的各步骤。
基于上述技术方案,本申请实施例提供的一种图像处理方法。该方法基于相似的图像投射到空间(例如是特征空间和/或汉明空间)中的同一中心点这一原理训练出的图像哈希模型对图像库中的图像进行处理,得到各图像的哈希码,从而获得图像哈希码数据库。在检索图像时,利用上述哈希码代替图像特征进行检索,提高了检索效率。
进一步地,上述图像哈希模型是学习图像特征点与相似图像形成的一类图像的中心点之间相似度,而不是各图像之间的相似度,一方面可以学习到图像的全局分布,可以提高哈希码的质量,提高检索精度。另一方面,大幅度降低了图像哈希模型学习的复杂度,缩短了训练时间,提高了学习效率。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1A为本申请实施例提供的一种图像处理方法的流程示意图;
图1B为本申请实施例提供的一种图像处理方法的流程示意图;
图2为本申请实施例提供的另一种图像处理方法的流程示意图;
图3为本申请实施例提供的一种图像处理方法的应用示意图;
图4为本申请实施例提供的又一种图像处理方法的流程示意图;
图5a为本申请实施例提供的图像处理方法中,一种图像检索应用示意图;
图5b为本申请实施例提供的图像处理方法中,另一种图像检索应 用示意图;
图6为本申请实施例提供的一种图像处理装置的结构示意图;
图7为本申请实施例提供的另一种图像处理装置的结构示意图;
图8为本申请实施例提供的另一种图像处理装置的结构示意图;
图9为本申请实施例提供的又一种图像处理装置的结构示意图;
图10为本申请实施例提供的一种计算机设备的硬件结构示意图。
具体实施方式
本申请的发明人发现,已有的基于哈希算法的图像检索方法应用中,对于待检索的n个图像,其时间复杂度为O(log(n!)),在实际应用场景中,n的取值往往很大,导致很难学习到图像的全局分布,针对该问题,本申请的发明人不断对已有的图像库检索方法进行改善。
在研究过程中,发明人注意到,如卷积神经网络等深度学习算法,在如图像分类、物体识别、人脸识别等众多计算机视觉任务的准确度上,能够实现一个飞跃。其中,卷积神经网络依据其具备的特征提取能力,能够很好地适用于基于内容的图像检索(content-based image retrieval,CBIR)任务。所以,本申请提出将卷积神经网络与基于哈希算法的图像检索方法结合起来,即通过训练一个卷积神经网络将图像映射成低维度的特征,再将特征转化为二进制码进行检索,以达到提高检索精度及检索效率的目的。
并且,发明人注意到,相关的图像库检索方法中,无论是利用图像特征,直接计算图像之间的相似度,还是通过哈希码计算图像之间的相似度,进而实现模型训练,该过程都是样本图像之间的数据处理,需要花费大量时间进行学习,导致图像库检索的时间复杂度很大。
基于上述分析,本申请提出在常用的哈希算法基础上,设计一种基于可学习的动态数据中心相似性的哈希方法,实现更加精确且高效的图像处理方案,其可以用于图像检索应用场景。
该方法是基于一个简单的几何直觉,即当所有相似的数据被投射到空间(例如特征空间和/或汉明空间)中的相同点,不相似的数据 被投射到空间中的不同点时,可以得到这些数据理想的哈希码,利用这些理想状态下的哈希码,可以实现高精度的图像库检索。
具体的,本申请可以在训练图像的特征空间学习一系列的中心点,该中心点可以称为特征中心点,这些特征中心点保留着图像的语意信息,及与原始图像之间的相似性,之后,可以将每个训练图像的特征点与其相应的特征中心点(即该图像特征点所属类别的特征中心点)碰撞,即中心碰撞。其中,碰撞是由撞库衍生而来,在本申请实施例中,训练图像的特征点与其相应的特征中心点碰撞是指特征点与特征中心点的相似度达到预设阈值。也就是说,每一类图像对应一个特征中心点,对于待检索的图像,可以直接将其特征点与特征中心点比较,快速识别待检索的图像所属类别,基于该类别可以快速检索到需要的图像。
同时,本申请还可以将得到的训练图像的特征点和特征中心点,通过哈希层投射到汉明空间,得到各训练图像对应的哈希码及特征中心点对应的哈希中心点。按照上述方式,中心碰撞也会发生在汉明空间。这样,中心碰撞发生在特征空间和汉明空间这两个空间中,可以保持中心相似性在特征空间与汉明空间的一致性。
本申请将上述构思应用于大规模的图像库检索中,相对于相关的基于哈希算法的图像库检索方法(如上述背景技术部分描述的方法),不仅能够学习到数据的全局分布,同时中心碰撞哈希每次可以从多个数据点学习相似度信息,极大提高了学习效率,缩短了训练时间,进而提高了图像检索效率且准确性。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的图像处理方法可以应用于图像处理系统。该图像处理系统可以部署在云计算集群(包括至少一台云计算设备)中, 通过人工智能云服务,一般也被称作是AIaaS(AI as a Service,中文为“AI即服务”)的方式提供给用户使用。具体地,用户通过浏览器在云计算集群中创建图像处理系统的实例,然后通过浏览器与图像处理系统的实例交互,实现图像处理。
在一些实现方式中,图像处理系统也可以部署在物理设备,如终端和/或服务器中,以客户端的形式提供给用户使用。具体地,终端可以获取图像处理系统的安装包,然后运行该安装包,以实现在终端上部署图像处理系统的客户端。终端运行该客户端,实现图像处理。当然,在一些情况下,终端运行客户端,并与服务器上部署的图像处理系统的服务端交互,实现图像处理。
为了使得本申请的技术方案更加清楚、易于理解,下面将从图像处理系统的角度对本申请提供的图像处理方法进行介绍。
参见图1A所示的图像处理方法的流程图,该方法包括:
S102:图像处理系统接收检索目标的描述信息。
检索目标是指需要检索的图像。在图像检索场景中,为了检索出需要检索的图像,用户可以通过图形用户界面(graphical user interface,GUI)输入上述检索目标的描述信息,该描述信息可以是文字,也可以是图像。也即该方法支持以文字检索图像,或者以图像检索图像。
目标的描述信息是文字时,该描述信息可以是至少一条语句。例如,该描述信息可以是“今天天气很好,草地上有一只金毛狗在欢快地撒野”。在一些实现方式中,描述信息也可以是至少一个关键词。例如,该描述信息可以是“晴天、草地、金毛狗、撒野”。
目标的描述信息是图像时,该图像具体是与检索目标相似的图像,甚至可以是与检索目标相同的图像。例如,描述信息可以是清晰度较低的图像或者带有水印的图像,用户可以输入该描述信息进行检索获得清晰度较高的图像,或者不带水印的图像。
还需要说明的是,在一些实现方式中,用户可以通过语音的方式输入上述描述信息。例如,用户可以通过语音输入语句或者关键词,或者输入图像的地址等。图像处理系统可以将语音转换为文字,或者 根据语音中输入图像的地址获得输入图像,然后以文字检索图像,或者以图像检索图像。
在一些实现方式中,应用也可以通过该图像处理系统检索图像。具体地,图像处理系统也可以接收应用发送的、检索目标的描述信息。
S104:图像处理系统根据所述描述信息从图像哈希码数据库中确定目标哈希码。
所述图像哈希码数据库中的每个哈希码是通过图像哈希模型对图像进行学习得到,所述图像哈希模型是使得相似图像投射到空间中同一中心点的数学模型。
其中,每一类图像对应一个中心点。根据相似图像投射空间不同,中心点还可以分为不同类型。例如,相似图像投射到特征空间时,该中心点为特征中心点,相似图像投射到汉明空间时,该中心点为哈希中心点。基于此,在一些实现方式中每一类图像可以对应一个特征中心点和/或一个哈希中心点。
图像处理系统可以基于描述信息确定与该描述信息相对应的中心点(例如特征中心点或哈希中心点),然后基于该中心点确定目标哈希码。
具体地,描述信息为文字时,图像处理系统可以根据描述信息确定检索目标所属的类别,然后确定与该类别相对应的哈希中心点,根据该哈希中心点从所述图像哈希码数据库中确定目标哈希码。
描述信息为图像时,图像处理系统可以确定该图像对应的哈希码,具体是将图像输入图像哈希模型,从而获得哈希码,该哈希码可以称为参考哈希码。然后图像处理系统可以从图像哈希码数据库中确定与所述参考哈希码匹配的目标哈希码。具体地,图像处理系统可以将参考哈希码附近的哈希码(即与参考哈希码的距离在预设范围以内的哈希码)确定为目标哈希码。
S106:图像处理系统根据所述目标哈希码以及图像与哈希码的对应关系,从图像库中确定所述检索目标。
图像哈希码数据库中的哈希码与图像库中的图像是一一对应的,图像处理系统可以根据该对应关系,从图像库中确定与目标哈希码匹配的图像,作为检索目标。其中,与目标哈希码匹配的图像可以是与目标哈希码对应的图像。
在一些实现方式中,图像处理系统还可以通过GUI呈现所述检索目标,以便用户查看。
图1A所示实施例中,图像处理的关键即在于图像哈希模型。该图像哈希模型可以通过训练得到。具体地,图像处理系统可以构建一个初始图像哈希模型,该初始图像哈希模型包括特征提取网络和特征嵌入网络。
其中,特征提取网络用于提取图像的特征。该特征提取网络可以是卷积神经网络(convolutional neural networks,CNN)、循环神经网络(recurrent neural networks,RNN)或者深度神经网络(deep neural networks,DNN)等。
所述特征嵌入网络包括特征嵌入层和哈希层,其中,特征嵌入层用于将特征提取网络提取的特征投射到嵌入空间(即特征空间)得到特征点,哈希层用于将特征点投射到汉明空间。其中,哈希层可以包括多层全连接层,在多层全连接层之后还可以包括一个双曲正切函数,以将连续向量整合为二值化的向量。
所述初始图像哈希模型的损失函数(也可以称为目标函数)可以根据相似图像被投射到特征空间的第一损失函数和/或相似图像被投射到汉明空间的第二损失函数确定。如果仅在特征空间进行中心碰撞,则初始图像哈希模型的损失函数根据上述第一损失函数确定,如果仅在汉明空间进行中心碰撞,则初始图像哈希模型的损失函数根据上述第而损失函数确定,如果在特征空间和汉明空间均进行中心碰撞,则初始图像哈希模型的损失函数根据上述第一损失函数和第二损失函数确定。
接着,图像处理系统根据训练图像输入所述初始图像哈希模型,通过所述损失函数确定的损失值更新所述初始图像哈希模型的参数,以训练所述初始图像哈希模型得到图像哈希模型。
需要说明的是,所述第一损失函数可以根据特征条件概率分布和理想条件概率分布确定。其中,所述特征条件概率分布用于表征一个特征点被投射到一个特征中心点的概率。
所述第二损失函数可以根据哈希条件概率分布和理想条件概率分布确定。其中,所述哈希条件概率分布用于表征一个哈希码被投射到哈希中心点的概率。
在一些可能的实现方式中,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数、相似图像被投射到汉明空间的第二损失函数以及所述特征空间的特征向量进行二值化的第三损失函数确定。
以上主要从用户交互的角度对图像处理方法进行介绍,接下来从技术的角度对图像处理方法进行介绍。
参照图1B,为本申请实施例提供的一种图像处理方法的流程示意图,本实施提供的方法可以应用于计算机设备,该计算机设备可以是服务器或终端设备,本申请对该计算机设备的产品类型不作限定,且本实施例主要描述图像检索应用场景所使用的图像哈希模型的训练过程,即如何通过中心碰撞方式,训练得到图像哈希模型的实现过程,如图1B所示,该方法可以包括但并不局限于以下步骤:
步骤S11,获取第一数量个训练图像;
本申请对训练图像的来源不作限定,为了方便技术方案的描述,本实施例可以将第一数量个训练图像依次记为{x 1,x 2,…,x i,…,x n},可见,第一数量可以记为n,本实施例对n的具体数值大小不做限定。
在本实施例中,可以将n个训练图像组成的图像库记为训练图像库,且在模型训练之前,可以提前确定上述n个训练图像包含的第二 数量个图像类别,该第二数量可以记为z,其具体数值不做限定。
另外,本申请还可以按照图像类别,对训练图像库中的每个训练图像进行标注,得到每个训练图像对应的标签l i,因此,上述n个训练图像对应标注的标签可以记为{l 1,l 2,…,l i,…,l n},且对于上述n个训练图像中的相似训练图像,如x i、x j,两者标注的标签可以相同,即l i=l j,相应地,用于指示这两个训练图像是否相似的指示函数δ(l i,l j)=1;否则为0。可见,同类别训练图像可以标注相同的标签。
需要说明,本申请对上述训练图像的标签的确定方法不做限定,在实际应用中,确定训练图像标签的一组编号后,可以一直使用该标签进行后续条件概率的计算。
步骤S12,依据卷积神经网络,得到第一数量个训练图像在特征嵌入空间各自的特征点,及第一数量个训练图像所属第二数量类图像各自的特征中心点;
本实施例中,可以将第一数量个训练图像依次输入卷积神经网络,得到各训练图像的图像特征,再将图像特征映射到特征嵌入空间,得到第一数量个训练图像对应的特征点。其中,卷积神经网络可以包括卷积层、池化层、全连接层等,本实施例可以利用多层卷积层,实现对训练图像的特征提取,得到图像特征,具体卷积层的处理过程不做详述。
另外,将提取的图像特征映射到特征嵌入空间的过程实际上是一个降维过程,即将高维度的特征向量(即由提取的图像特征构成)映射到低维度空间(即本实施例中的特征空间),本实施例对该特征映射实现过程不做详述。
可选的,本实施例可以采用词嵌入方式,具体可以使用但并不局限于Embedding module(词袋模型)这种嵌入模型,将卷积神经网络得到的高维度的向量转换为低维度的特征向量,这种词嵌入方式的具体实现方法本申请不做详述,且对于本申请提出的特征向量降维处理的方法,并不局限于本申请给出的这种词嵌入方式。
按照上述方式,本实施例可以得到n个训练图像在特征嵌入空间 对应的特征点,依次可以记为{v 1,v 2,…,v i,…,v n},每一个特征点可以是一个多维度向量,用来可以表征相应训练图像的内容,本申请对各特征点的内容不作限定。
对于上述特征中心点的获取过程,可以利用卷积神经网络的全连接层的学习结果得到,如可以将由1构成的参数向量输入全连接层,学习得到特征中心点,通常情况下,此时得到的特征中心点往往不够准确,还需要通过网络训练,进一步对特征中心点进行优化,具体实现过程可以参照下文实施例相应部分的描述。
需要说明,在多次网络训练过程中,需要经过全连接层得到的特征中心点,受到损失函数(如下文实施例给出的目标函数)最小化训练后,与同类训练图像的特征点的距离之和最小,可以认为本次得到的特征中心点即为该类图像的最佳特征中心点,网络训练过程可以参照下文实施例相应部分的描述。
步骤S13,获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,及预设的理想条件概率分布;
本实施例按照上述方式得到每一类训练图像的特征中心点c k,以及各训练图像的特征点v i后,针对该类图像中的每一个训练图像的特征点v i,可以利用以下公式(1),获取该特征点v i与对应类图像的特征中心点c k碰撞的特征条件概率分布,但并不局限于这种计算方式:
Figure PCTCN2020092834-appb-000001
其中,v i可以表示第i个训练图像的特征点,c k可以表示第k类图像的特征中心点,p(c k|v i)可以是条件概率函数,具体可以表示特征点v i与对应的可能的特征中心点c k碰撞的条件概率,该特征中心点c k是指具有该特征点v i的训练图像所属图像类的特征中心点,exp{}可以表示自然数e为底的指数函数,D()可以表示距离函数,如计算欧式距离或其他距离的参数,因此,D(v i,c k)具体可以表示特征点v i与对应的可能的特征中心点c k的特征距离,D(v i,c m具体可以表示特征点v i与对应的可能的特征中心点c m的特征距离;∑()可以是求和函 数,m可以为累加求和符合∑的动态编号,可以从[1,z]之间取值;δ(l i,l m)可以是指示标签l i,l m分别对应的训练图像是否相似的指示函数,如δ(l i,l m)=1可以表示这两个训练图像相似;δ(l i,l m)=0可以表示这两个训练图像不相似,该指示函数可以在模型训练前生成,如在对训练图像标注标签时生成;θ可以表示条件概率配置参数,本申请对θ的数值可以不做限定,且在实际应用中,可以通过改变θ的数值,调整上述条件概率函数的准确性。
本实施例中,上述公式(1)这一条件概率函数具体可以是正态函数,该正态函数的极大值可以对应特征中心点c k,对于训练图像的特征点v i越靠近该特征中心点c k,对应的条件概率越大,若某训练图像的特征点v i与特征中心点c k碰撞,即中心碰撞,对应的条件概率可以是极大值,说明该训练图像确定属于该特征中心点对应的图像类;反之,某训练图像的特征点v i与特征中心点c k的条件概率越小,说明该训练图像与该特征中心点对应的图像类的差距越大,也就是说,上文对该训练图像的图像类别划分可能有误,该训练图像可能不属于该类图像。
可见,本申请可以利用上文得到的条件概率,实现对训练图像的准确分类,具体实现过程本实施例不做详述,对于如何利用公式(1)实现步骤S13的具体方法也不作详述,且步骤S13的实现方法也并不局限于公式(1)表示的计算方法。
另外,结合上文对本申请技术构思的分析,理想情况下,上述训练图像库中,相似的训练图像可能被投射到对应类图像的特征中心点,不相似的训练图像可能被投射到不同类图像的特征中心点,由此得到的理想特征条件概率分布可以为:
Figure PCTCN2020092834-appb-000002
如公式(2)所示,某训练图像的标签l i与对应的图像的类别k相等,该训练图像的特征点与该类图像的特征中心点碰撞,该训练图 像属于该类图像。因此,若p(c k|v i)=p 0(c k|v i),
Figure PCTCN2020092834-appb-000003
说明所有相似训练图像的特征点碰撞到相应的特征中心点。
步骤S14,利用特征条件概率分布与理想条件概率分布进行网络训练,得到第二数量类图像各自的目标特征中心点;
可选的,本申请可以按照特征条件概率分布与理想条件概率分布的第一相似要求,调整卷积神经网络的配置参数,得到第二数量类图像各自的目标特征中心点。
其中,对于特征条件概率分布与理想条件概率分布的相似度或者匹配程度,可以采用KL散度(Kullback–Leibler divergence)方式确定,该KL散度描述这两个概率分布的差异,具体计算原理不做详述。
具体的,本申请可以按照上述方式进行中心碰撞学习,获取各类训练图像更加准确的特征中心点,即确定各类图像的中心碰撞的目标特征中心点。且本实施例可以将上述方式作为中心碰撞学习的目标函数,即由该目标函数控制实现中心碰撞学习,其中,该目标函数(即上文所述的第一损失函数)可以为但并不局限于公式(3):
Figure PCTCN2020092834-appb-000004
上述公式(3)中,B可以表示各训练图像对应的特征点组成的集合,本申请对KL散度的具体计算过程不做详述。
由此可见,本申请利用上述目标函数进行中心碰撞学习中,每一次学习得到的特征中心点后,若利用该特征中心及各训练图像的特征点,按照上述方式得到的特征条件概率分布,与理想特征条件概率分布的匹配程度不满足第一条件,说明本次学习得到的特征中心点不够准确,所以,本实施例可以对上述卷积神经网络的配置参数,如权重值,进而利用调整后的配置参数,重新获取新的特征中心点,即更新上述得到的z个特征中心点,直至新得到的特征条件概率分布与理想 条件概率分布的满足第一相似要求,并并将最后得到的多个特征中心点作为目标特征中心点。
步骤S15,将第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到第一数量个训练图像各自的哈希码及第二数量类图像各自的哈希中心点;
本实施例可以通过哈希层,将第一数量个训练图像对应的特征点,映射到汉明空间,得到相应训练图像的哈希码h i,同理,可以将上文得到的z个目标特征中心点输入哈希层,投射到汉明空间,得到对应的哈希中心点。
可选的,在实际应用中,由于通过哈希层的数据很难被完全整合为二值化向量,也就是说,将特征点及特征中心点输入哈希层,直接得到的哈希码和哈希中心点可能不都是二值数据,因此,本实施例可以利用如下公式(4)所示的损失函数,先将得到的各训练图像的特征点和特征中心点,进一步整合为二值化的数据,该损失函数(即上文所述的第三损失函数)可以为:
Figure PCTCN2020092834-appb-000005
步骤S16,获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布;
步骤S17,利用哈希条件概率分布与理想条件概率分布进行网络训练,得到图像哈希模型;
本实施例中,哈希中心碰撞学习与上述特征中心碰撞学习的实现方法类似,如按照哈希条件概率分布与所述理想条件概率分布的第二相似要求,调整哈希条件概率配置参数,该第二相似要求可以是哈希条件概率分布与所述理想条件概率分布的匹配程度达到相似阈值。
具体的,可以采用如下公式(5)来获取各训练图像的目标哈希码h i与对应类图像的哈希中心点c k h匹配的哈希条件概率分布:
Figure PCTCN2020092834-appb-000006
其中,h i可以表示第i个训练图像的目标哈希码,c k h可以表示第k类图像的哈希中心点,θ表示哈希条件概率配置参数,本申请对其具体数值不做限定。
在汉明空间中,实现哈希中心碰撞学习的目标函数(即上文所述的第二损失函数)可以为如下公式(6),但并不局限于此:
Figure PCTCN2020092834-appb-000007
可见,如图3所示,本申请进行中心碰撞学习的全局目标函数可以为:
L=L v collapsing1L h collapsing2L Q;          (7)
其中,λ 1和λ 2可以表示相应目标函数或损失函数,在整个全局目标函数中的权重,具体数值不做限定,本申请对如何利用全局目标函数实现网络训练,得到所需的图像哈希模型的实现过程不再详述。
因此,在本实施例实际应用中,将训练图像的图像特征映射到特征嵌入空间后,在该特征嵌入空间,可以利用上述公式(3)的目标函数,控制中心碰撞学习,得到各类训练图像的更准确的目标特征中心;利用上述公式(4)的量化损失函数,实现对目标特征中心及特征点的二值化处理,再将处理后的目标特征中心及特征点映射到汉明空间后,在汉明空间,可以利用上述公式(6)的目标函数,控制哈希中心碰撞学习,得到各类训练图像更准确的哈希码,同时,确定得到训练图像最佳特征中心点和最佳哈希码所使用的哈希函数,并将其作为图像哈希模型,用来在实际应用中,快速且准确获取输入图像的哈希码。
由此可见,相对于相关技术中,学习各训练图像的哈希码之间的相似度,本申请这种学习训练图像特征点与对应的中心点之间的相似 度,不仅学习到图像的全局分布,且极大缩短了网络训练时间,提高了学习效率及图像哈希码的精度。
参照图2所示的本申请提供的一种图像处理方法的细化实施例的流程示意图,以及图3所示图像处理方法的应用示意图,该图像处理方法的细化方法可以包括:
步骤S21,获取第一数量个训练图像及其对应标注的标签;
步骤S22,依据获取的标签,得到训练图像相似的指示函数;
其中,关于训练图像的标签以及该指示函数δ(l i,l j)的确定方式,可以参照上述实施例相应部分的描述。
步骤S23,将第一数量个训练图像依次输入卷积神经网络,得到各训练图像的图像特征;
步骤S24,将第一数量个训练图像各自的图像特征映射到特征嵌入空间,得到相应训练图像的特征点;
步骤S25,经卷积神经网络的一全连接层的学习,得到第一数量个训练图像所属的多类图像各自的特征中心点;
本实施例中,继上文描述,如图3所示,可以将1构成的参数向量输入卷积神经网络的全连接层进行学习,生成每一类训练图像的一个特征中心点c j,并将得到的各类训练图像的特征中心点作为向量元素,构成特征中心点向量。
其中,这一层全连接层的尺寸为z×d,上述j=1、2、…、z,z表示上述n个训练图像存在的特征中心点的个数,也就是n个训练图像包含的训练图像类别数量,即一类训练图像对应有一个特征中心点,d表示卷积层提取得到的图像特征组成的特征向量的维度,本申请对z和d的具体数值不做限定。基于此,步骤S25得到的多个特征中心点依次可以记为{c 1,c 2,…,c z}。
步骤S26,获取第一数量个特征点各自与对应特征中心点的特征距离;
步骤S27,利用指示函数及获取的特征距离,获取第一数量个特征点各自与对应特征中心点的特征条件概率;
步骤S28,由获取的特征条件概率,确定第一数量个特征点与对应特征中心点碰撞的特征条件概率分布;
结合上述公式(1)表示的条件概率的计算方法,实现第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,因此,步骤S26~步骤S28的具体实现方法,可以参照上述实施例相应部分的描述。
步骤S29,获取的特征条件概率分布与理想条件概率分布的第一KL散度;
步骤S210,利用该第一KL散度,调整卷积神经网络的配置参数,直至利用调整后的配置参数得到的新的第一KL散度满足第一相似要求;
为了提高训练图像分类的准确性,本申请需要利用训练图像的特征点及特征中心点,得到的特征条件概率分布,能够与上述理想特征条件概率分布匹配,因此,步骤S29中的第一相似要求可以是实际获取的特征条件概率分布与理想特征条件概率分布的匹配程度大于阈值,即该特征条件概率分布与理想特征条件概率分布非常接近,甚至可以是要求该特征条件概率分布与理想特征条件概率分布相同。
步骤S211,将最后得到的第二数量类图像各自的特征中心点作为相应类图像的目标特征中心点;
步骤S212,分别对第一数量个特征点和所述第二数量个目标特征中心点进行二值化处理;
本实施例可以利用上述公式(4)实现步骤S212,但并不局限于这种二值化处理方法。
步骤S213,将第一数量个二值化特征点和第二数量个二值化目标特征中心点分别映射到汉明空间,得到相应训练图像的哈希码及各类图像的哈希中心点;
步骤S214,获取各训练图像的哈希码与对应类图像的哈希中心点匹配的哈希条件概率分布;
本实施例中,哈希条件概率分布的获取方式与特征条件概率分布的获取方式类似,因此,结合上述公式(5),步骤S214可以包括:
获取第一数量个哈希码各自与对应哈希中心点的哈希码距离;
利用指示函数及获取的哈希码距离,获取第一数量个哈希码各自与对应哈希中心点的哈希条件概率;
由获取的哈希条件概率,确定第一数量个哈希码各自与对应哈希中心点碰撞的哈希条件概率分布。
步骤S215,按照所述哈希条件概率分布与所述理想条件概率分布的第二相似要求,调整哈希条件概率配置参数,得到图像哈希模型及第一数量个训练图像各自的目标哈希码;
在汉明空间的网络训练过程,与上述特征嵌入空间的网络训练过程类似,因此,结合公式(6),步骤S215具体可以包括:
获取哈希条件概率分布与理想条件概率分布的第二KL散度,
利用第二KL散度,调整调整哈希条件概率配置参数,直至利用调整后的配置参数得到的新的第二KL散度满足第二相似要求,得到图像哈希模型;
将最后得到的第一数量个哈希码作为相应训练图像的目标哈希码。
如上述分析,在特征嵌入空间和汉明空间发生的中心碰撞,保证了这两个空间中的中心点相似性的一致性,进而保证了图像检索的可靠性和准确性。并且,本申请还可以利用上述步骤中进行图像特征提取、特征嵌入空间的特征映射以及哈希层的处理方法所使用的公式,构成哈希函数,并将最后学习得到的哈希函数作为图像哈希模型,其具体内容本实施例不做详述。
步骤S216,利用第一数量个训练图像的目标哈希码,生成图像哈希码数据库。
可见,本实施例可以按照上述方法,利用中心碰撞哈希实现对训练图像库的高效编码,得到各训练图像的目标哈希码,并利用这些训 练图像的目标哈希码,构建图像哈希码数据库,用来在实际应用场景中进行图像检索。
需要说明,本申请对图像哈希码检索库中各目标哈希码的存储方式不做限定,可以将不同特征中心点及其对应的目标哈希码,生成多个哈希码组,并由这多个哈希码组构成该图像哈希码检索库,即对该图像哈希码检索库中的目标哈希码进行分类存储,但并不局限于这种哈希码存储方式。
综上,本申请可以通过深度学习网络提取图像的深度特征,并通过中心碰撞哈希法,得到图像哈希模型,可见,对于n个训练图像和z个中心点,本申请这种学习方法的时间复杂度是O(nz),相对于相关技术,极大提高了学习效率,缩短了训练时间,而且还能够学习到数据的全局分布,提高了所得图像哈希模型输出数据的准确性。
在上述各实施例的基础上,参照图4所示的另一种图像处理方法实施例的流程示意图,本实施例主要对所得图像哈希模型及图形哈希码数据库,在图像检索应用场景的使用方法进行说明,如图4所示,该方法可以包括:
步骤S31,获取检索图像;
步骤S32,将检索图像输入图像哈希模型,得到该检索图像的哈希码;
关于检索图像输入图像哈希模型(即图5a所示的哈希函数),得到对应的哈希码的实现过程如图5a所示,关于哈希函数的处理过程可以参照上述实施例相应部分的描述。
可选的,本申请还可以利用检索图像的哈希码,更新图像哈希码数据库,以扩充图像哈希码数据库,提高图像检索的可靠性及准确性。
步骤S33,获取检索图像的哈希码与图像哈希码数据库中的哈希码的汉明距离;
本实施例对汉明距离的具体获取方法不做详述。
步骤S34,利用所述汉明距离的大小,得到图像检索结果。
可选的,本申请可以利用汉明距离的k近邻法进行图像检索,但并不局限于这一种图像检索方式。
综上,参照图5b所示的图像检索的应用示意图,在图像实际检索应用中,可以将检索图像直接输入上述学习得到的哈希函数,即图像哈希模型,得到该检索图像的哈希码,之后,通过比对该检索图像的哈希码与图像哈希码检索库中的哈希码的汉明距离,再通过k邻近点等方式进行图像检索,以得到用户所需的与待查询图像相似的至少一个图像,或与待查询图像不相似的至少一个图像等检索结果等等,满足用户的图像检索需求。
如上述分析,本实施例在图像检索过程中,并不是直接利用各图像哈希码,得到相应图像之间的相似度,而是将图像的哈希码与各类图像的哈希中心点进行比对,快速且准确得到该图像所属图像类型,即极大缩短了训练时间,提高了图像分类检索的效率及准确性。
根据本申请实施例提供的上述图像处理方法,本申请实施例还提供一种图像处理装置,所述装置包括:
通信模块,用于接收检索目标的描述信息;
检索模块,用于根据所述描述信息从图像哈希码数据库中确定目标哈希码,所述图像哈希码数据库中的每个哈希码是通过图像哈希模型对图像进行学习得到,所述图像哈希模型是使得相似图像投射到空间中同一中心点的数学模型;
确定模块,用于确定根据所述目标哈希码以及图像与哈希码的对应关系,从图像库中确定所述检索目标。
在一些实现方式中,所述描述信息为文字;
所述检索模块具体用于:
根据所述描述信息确定所述检索目标所属的类别;
确定与所述类别相对应的哈希中心点;
根据所述哈希中心点从所述图像哈希码数据库中确定目标哈希码。
在一些实现方式中,所述描述信息为图像;
所述检索模块具体用于:
根据所述描述信息确定参考哈希码;
从图像哈希码数据库中确定与所述参考哈希码匹配的目标哈希码。
在一些实现方式中,所述通信模块具体用于:
接收用户通过图形用户界面(GUI)输入的检索目标的描述信息;
所述装置还包括:
呈现模块,用于通过所述GUI呈现所述检索目标。
在一些实现方式中,所述装置还包括:
构建模块,用于构建初始图像哈希模型,所述初始图像哈希模型包括特征提取网络和特征嵌入网络,所述特征嵌入网络包括特征嵌入层和哈希层,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数和/或相似图像被投射到汉明空间的第二损失函数确定;
训练模块,用于根据训练图像输入所述初始图像哈希模型,通过所述损失函数确定的损失值更新所述初始图像哈希模型的参数,以训练所述初始图像哈希模型得到图像哈希模型。
在一些实现方式中,所述第一损失函数根据特征条件概率分布和理想条件概率分布确定,所述特征条件概率分布用于表征一个特征点被投射到一个特征中心点的概率。
在一些实现方式中,所述第二损失函数根据哈希条件概率分布和理想条件概率分布确定,所述哈希条件概率分布用于表征一个哈希码被投射到哈希中心点的概率。
在一些实现方式中,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数、相似图像被投射到汉明空间的第二损失函数以及所述特征空间的特征向量进行二值化的第三损失函数确定。
参照图6,为本申请实施例提供的一种图像处理装置的结构示意图,该装置可以应用于计算机设备,如图6所示,该装置可以包括:
第一获取模块101,用于获取第一数量个训练图像;
第一处理模块102,用于依据卷积神经网络,得到所述第一数量个训练图像在特征嵌入空间各自对应的特征点,及所述第一数量个训练图像所属第二数量类图像各自的特征中心点;
可选的,如图7所示,该第一处理模块102可以包括:
第一处理单元1021,用于将所述第一数量个训练图像输入卷积神经网络,得到所述第一数量个训练图像各自的图像特征;
特征映射单元1022,用于将所述第一数量个训练图像各自的所述图像特征映射到特征嵌入空间,得到相应训练图像的特征点;
第二处理单元1023,用于利用所述卷积神经网络的一全连接层的学习结果,得到所述第一数量个训练图像所属第二数量类图像各自的特征中心点。
第二获取模块103,用于获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,及预设的理想条件概率分布;
可选的,该装置还可以包括:
标注模块,用于获取所述第一数量个训练图像各自标注的标签,其中,相似训练图像标注的标签相同;
指示函数获取模块,用于依据所述标签,得到训练图像相似的指示函数;
相应地,上述第二获取模块103可以包括:
第一获取单元,用于获取第一数量个特征点各自与对应特征中心点的特征距离;
第二获取单元,用于利用所述指示函数及获取的特征距离,获取所述第一数量个特征点各自与对应特征中心点的特征条件概率;
第一确定单元,用于由获取的所述特征条件概率,确定所述第一数量个特征点与对应特征中心点碰撞的特征条件概率分布;
本实施例中,上述理想条件概率分布结果可以表征:各相似的训练图像被映射到对应的中心点,各不相似的训练图像被映射到不同的中心点,所述中心点包括所述特征中心点和所述哈希中心点。
第一网络训练模块104,用于利用所述特征条件概率分布与所述理想条件概率分布进行网络训练,得到所述第二数量类图像各自的目标特征中心点;
可选的,该第一网络训练模块104具体可以用于按照所述特征条件概率分布与所述理想条件概率分布的第一相似要求,调整所述卷积神经网络的配置参数,得到所述第二数量类图像各自的目标特征中心点。
本实施例中,第一网络训练模块104可以包括:
第三获取单元,用于获取所述特征条件概率分布与所述理想条件概率分布的第一KL散度,
第一调整单元,用于利用所述第一KL散度,调整所述卷积神经网络的配置参数,直至利用调整后的配置参数得到的新的第一KL散度满足第一相似要求;
第二确定单元,用于将最后得到的第二数量类图像各自的特征中心点作为相应类图像的目标特征中心点;
映射模块105,用于将所述第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点;
第三获取模块106,用于获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布;
可选的,该第三获取模块106可以包括:
第四获取单元,用于获取第一数量个哈希码各自与对应哈希中心点的哈希码距离;
第五获取单元,用于利用所述指示函数及获取的哈希码距离,获取所述第一数量个哈希码各自与对应哈希中心点的哈希条件概率;
第三确定单元,用于由获取的所述哈希条件概率,确定所述第一数量个哈希码各自与对应哈希中心点碰撞的哈希条件概率分布。
第二网络训练模块107,用于利用所述哈希条件概率分布与所述理想条件概率分布进行网络训练,得到图像哈希模型。
本实施例中,该第二网络训练模块107具体可以用于按照所述哈希条件概率分布与所述理想条件概率分布的第二相似要求,调整哈希条件概率配置参数,得到图像哈希模型。
可选的,该第二网络训练模块107可以包括:
第六获取单元,用于获取所述哈希条件概率分布与所述理想条件概率分布的第二KL散度,
第二调整单元,用于利用所述第二KL散度,调整调整哈希条件概率配置参数,直至利用调整后的配置参数得到的新的第二KL散度满足第二相似要求;
第四确定单元,用于将最后得到的第一数量个哈希码作为相应训练图像的目标哈希码。
可选的,在上述实施例的基础上,如图8所示,该装置还可以包括:
二值化处理模块108,用于分别对所述第一数量个特征点和所述第二数量个目标特征中心点进行二值化处理;
相应地,上述映射模块105具体可以用于将所述第一数量个二值化的特征点及所述第二数量个二值化的目标特征中心点,分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点。
在上述各实施例的基础上,如图8所示,该装置还可以包括:
目标哈希码获取模块109,用于按照所述哈希条件概率分布与所述理想条件概率分布的第二相似要求,调整哈希条件概率配置参数,得到所述第一数量个训练图像各自的目标哈希码;
哈希码数据库获取模块110,用于由得到的第一数量个目标哈希码,构成图像哈希码数据库。
可选的,在上述各实施例的基础上,如图9所示,上述装置还可以包括:
图像获取模块111,用于获取检索图像;
第二处理模块112,用于将所述检索图像输入所述图像哈希模型,得到所述检索图像的哈希码;
汉明距离获取模块113,用于获取所述检索图像的哈希码与所述图像哈希码数据库中的哈希码的汉明距离;
图像检索模块114,用于利用所述汉明距离的大小,得到图像检索结果。
本申请还提供了一种存储介质的实施例,其上存储有计算机程序,该计算机程序被处理器执行,实现上述图像处理方法的各步骤,该图像处理方法的实现过程可以参照上述方法实施例的描述。
如图10所示,本申请实施例还提供了一种计算机设备的硬件结构示意图,该计算机设备可以是实现上述图像处理方法的服务器,也可以是终端设备等,本申请对该计算机设备的产品类型不作限定,如图10所示,该计算机设备可以包括通信接口21、存储器22和处理器23;
在本申请实施例中,通信接口21、存储器22、处理器23可以通过通信总线实现相互间的通信,且该通信接口21、存储器22、处理器23及通信总线的数量可以为至少一个。
可选的,通信接口21可以为通信模块的接口,如GSM模块的接口;
处理器23可能是一个中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。
存储器22可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
其中,存储器22存储有程序,处理器23调用存储器22所存储的程序,以实现上述应用于计算机设备的图像处理方法的各步骤,具体实现过程可以参照上述方法实施例相应部分的描述。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置、计算机设备而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计第一条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的核心思想或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (27)

  1. 一种图像处理方法,由图像处理系统执行,所述方法包括:
    接收检索目标的描述信息;
    根据所述描述信息从图像哈希码数据库中确定目标哈希码,所述图像哈希码数据库中的每个哈希码是通过图像哈希模型对图像进行学习得到,所述图像哈希模型是使得相似图像投射到空间中同一中心点的数学模型;
    根据所述目标哈希码以及图像与哈希码的对应关系,从图像库中确定所述检索目标。
  2. 根据权利要求1所述的方法,其特征在于,所述描述信息为文字;
    所述根据所述描述信息从图像哈希码数据库中确定目标哈希码,包括:
    根据所述描述信息确定所述检索目标所属的类别;
    确定与所述类别相对应的哈希中心点;
    根据所述哈希中心点从所述图像哈希码数据库中确定目标哈希码。
  3. 根据权利要求1所述的方法,其特征在于,所述描述信息为图像;
    所述根据所述描述信息从图像哈希码数据库中确定目标哈希码,包括:
    根据所述描述信息确定参考哈希码;
    从图像哈希码数据库中确定与所述参考哈希码匹配的目标哈希码。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述接收检索目标的描述信息,包括:
    接收用户通过图形用户界面(GUI)输入的检索目标的描述信息;
    所述方法还包括:
    通过所述GUI呈现所述检索目标。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述方法还包括:
    构建初始图像哈希模型,所述初始图像哈希模型包括特征提取网络和特征嵌入网络,所述特征嵌入网络包括特征嵌入层和哈希层,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数和/或相似图像被投射到汉明空间的第二损失函数确定;
    根据训练图像输入所述初始图像哈希模型,通过所述损失函数确定的损失值更新所述初始图像哈希模型的参数,以训练所述初始图像哈希模型得到图像哈希模型。
  6. 根据权利要求5所述的方法,其特征在于,所述第一损失函数根据特征条件概率分布和理想条件概率分布确定,所述特征条件概率分布用于表征一个特征点被投射到一个特征中心点的概率。
  7. 根据权利要求5所述的方法,其特征在于,所述第二损失函数根据哈希条件概率分布和理想条件概率分布确定,所述哈希条件概率分布用于表征一个哈希码被投射到哈希中心点的概率。
  8. 根据权利要求5所述的方法,其特征在于,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数、相似图像被投射到汉明空间的第二损失函数以及所述特征空间的特征向量进行二值化的第三损失函数确定。
  9. 一种图像处理方法,由图像处理系统执行,所述方法包括:
    获取第一数量个训练图像;
    依据卷积神经网络,得到所述第一数量个训练图像在特征嵌入空间各自的特征点,及所述第一数量个训练图像所属第二数量类图像各自的特征中心点;
    获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,及预设的理想条件概率分布;
    利用所述特征条件概率分布与所述理想条件概率分布进行网络训练,得到所述第二数量类图像各自的目标特征中心点;
    将所述第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点;
    获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布;
    利用所述哈希条件概率分布与所述理想条件概率分布进行网络训练,得到图像哈希模型。
  10. 根据权利要求9所述的方法,所述方法还包括:
    分别对所述第一数量个特征点和所述第二数量个目标特征中心点进行二值化处理;
    所述将所述第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点,包括:
    将所述第一数量个二值化的特征点及所述第二数量个二值化的目标特征中心点,分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点。
  11. 根据权利要求9所述的方法,所述依据卷积神经网络,得到所述第一数量个训练图像在特征嵌入空间各自对应的特征点,及所述第一数量个训练图像所属第二数量类图像各自的特征中心点,包括:
    将所述第一数量个训练图像输入卷积神经网络,得到所述第一数量个训练图像各自的图像特征;
    将所述第一数量个训练图像各自的所述图像特征映射到特征嵌入空间,得到相应训练图像的特征点;
    利用所述卷积神经网络的一全连接层的学习结果,得到所述第一数量个训练图像所属第二数量类图像各自的特征中心点。
  12. 根据权利要求9所述的方法,所述方法还包括:
    获取所述第一数量个训练图像各自标注的标签,其中,相似训练图像标注的标签相同;
    依据所述标签,得到训练图像相似的指示函数;
    所述获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,包括:
    获取第一数量个特征点各自与对应特征中心点的特征距离;
    利用所述指示函数及获取的特征距离,获取所述第一数量个特征点各自与对应特征中心点的特征条件概率;
    由获取的所述特征条件概率,确定所述第一数量个特征点与对应特征中心点碰撞的特征条件概率分布;
    所述获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布,包括:
    获取第一数量个哈希码各自与对应哈希中心点的哈希码距离;
    利用所述指示函数及获取的哈希码距离,获取所述第一数量个哈希码各自与对应哈希中心点的哈希条件概率;
    由获取的所述哈希条件概率,确定所述第一数量个哈希码各自与对应哈希中心点碰撞的哈希条件概率分布。
  13. 根据权利要求9至12任一项所述的方法,所述利用所述特征条件概率分布与所述理想条件概率分布进行网络训练,得到所述第二数量类图像各自的目标特征中心点,包括:
    按照所述特征条件概率分布与所述理想条件概率分布的第一相似要求,调整所述卷积神经网络的配置参数,得到所述第二数量类图像各自的目标特征中心点;
    在训练得到所述图像哈希模型的过程中,所述方法还包括:
    按照所述哈希条件概率分布与所述理想条件概率分布的第二相似要求,调整哈希条件概率配置参数,得到所述第一数量个训练图像各自的目标哈希码;
    由得到的第一数量个目标哈希码,构成图像哈希码数据库。
  14. 根据权利要求13所述的方法,所述方法还包括:
    获取检索图像;
    将所述检索图像输入所述图像哈希模型,得到所述检索图像的哈希码;
    获取所述检索图像的哈希码与所述图像哈希码数据库中的哈希码的汉明距离;
    利用所述汉明距离的大小,得到图像检索结果。
  15. 根据权利要求9所述的方法,所述理想条件概率分布结果表征:各相似的训练图像被映射到对应的中心点,各不相似的训练图像被映射到不同的中心点,所述中心点包括所述特征中心点和所述哈希中心点。
  16. 根据权利要求13所述的方法,所述按照所述特征条件概率分布与所述理想条件概率分布的第一相似要求,调整所述卷积神经网络的配置参数,得到所述第二数量类图像各自的目标特征中心点,包括:
    获取所述特征条件概率分布与所述理想条件概率分布的第一KL散度,
    利用所述第一KL散度,调整所述卷积神经网络的配置参数,直至利用调整后的配置参数得到的新的第一KL散度满足第一相似要求;
    将最后得到的第二数量类图像各自的特征中心点作为相应类图像的目标特征中心点;
    所述按照所述哈希条件概率分布与所述理想条件概率分布的第二相似要求,调整哈希条件概率配置参数,得到所述第一数量个训练图像各自的目标哈希码,包括:
    获取所述哈希条件概率分布与所述理想条件概率分布的第二KL散度,
    利用所述第二KL散度,调整调整哈希条件概率配置参数,直至利用调整后的配置参数得到的新的第二KL散度满足第二相似要求;
    将最后得到的第二数量类图像各自的哈希中心点作为相应类图像的目标哈希中心点。
  17. 一种图像处理装置,所述装置包括:
    通信模块,用于接收检索目标的描述信息;
    检索模块,用于根据所述描述信息从图像哈希码数据库中确定目标哈希码,所述图像哈希码数据库中的每个哈希码是通过图像哈希模型对图像进行学习得到,所述图像哈希模型是使得相似图像投射到空间中同一中心点的数学模型;
    确定模块,用于确定根据所述目标哈希码以及图像与哈希码的对应关系,从图像库中确定所述检索目标。
  18. 根据权利要求17所述的装置,所述描述信息为文字;
    所述检索模块具体用于:
    根据所述描述信息确定所述检索目标所属的类别;
    确定与所述类别相对应的哈希中心点;
    根据所述哈希中心点从所述图像哈希码数据库中确定目标哈希码。
  19. 根据权利要求17所述的装置,所述描述信息为图像;
    所述检索模块具体用于:
    根据所述描述信息确定参考哈希码;
    从图像哈希码数据库中确定与所述参考哈希码匹配的目标哈希码。
  20. 根据权利要求17至19任一项所述的装置,所述通信模块具体用于:
    接收用户通过图形用户界面(GUI)输入的检索目标的描述信息;
    所述装置还包括:
    呈现模块,用于通过所述GUI呈现所述检索目标。
  21. 根据权利要求17至20任一项所述的装置,所述装置还包括:
    构建模块,用于构建初始图像哈希模型,所述初始图像哈希模型包括特征提取网络和特征嵌入网络,所述特征嵌入网络包括特征嵌入层和哈希层,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数和/或相似图像被投射到汉明空间的第二损失函数确定;
    训练模块,用于根据训练图像输入所述初始图像哈希模型,通过所述损失函数确定的损失值更新所述初始图像哈希模型的参数,以训练所述初始图像哈希模型得到图像哈希模型。
  22. 根据权利要求21所述的装置,所述第一损失函数根据特征条件概率分布和理想条件概率分布确定,所述特征条件概率分布用于表征一个特征点被投射到一个特征中心点的概率。
  23. 根据权利要求21所述的装置,所述第二损失函数根据哈希条件概率分布和理想条件概率分布确定,所述哈希条件概率分布用于表征一个哈希码被投射到哈希中心点的概率。
  24. 根据权利要求21所述的装置,所述初始图像哈希模型的损失函数根据相似图像被投射到特征空间的第一损失函数、相似图像被投射到汉明空间的第二损失函数以及所述特征空间的特征向量进行二值化的第三损失函数确定。
  25. 一种图像处理装置,所述装置包括:
    第一获取模块,用于获取第一数量个训练图像;
    第一处理模块,用于依据卷积神经网络,得到所述第一数量个训练图像在特征嵌入空间各自对应的特征点,及所述第一数量个训练图像所属第二数量类图像各自的特征中心点;
    第二获取模块,用于获取第一数量个特征点与对应特征中心点碰撞的特征条件概率分布,及预设的理想条件概率分布;
    第一网络训练模块,用于利用所述特征条件概率分布与所述理想条件概率分布进行网络训练,得到所述第二数量类图像各自的目标特征中心点;
    映射模块,用于将所述第一数量个特征点及第二数量个目标特征中心点分别映射到汉明空间,得到所述第一数量个训练图像各自的哈希码及所述第二数量类图像各自的哈希中心点;
    第三获取模块,用于获取第一数量个哈希码与对应哈希中心点碰撞的哈希条件概率分布;
    第二网络训练模块,用于利用所述哈希条件概率分布与所述理想条件概率分布进行网络训练,得到图像哈希模型。
  26. 一种计算机集群,所述计算机集群包括至少一台计算机,所述至少一台计算机包括:
    通信接口;
    存储器,用于存储实现如权利要求1至8或者9至16任意一项所述的图像处理方法的程序;
    处理器,用于加载并执行所述存储器存储的程序,实现如权利要求1至8或者9至16任意一项所述的图像处理方法的各步骤。
  27. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1至8或者9至16任意一项所述的图像处理方法。
PCT/CN2020/092834 2019-06-06 2020-05-28 图像处理方法、装置及计算机设备 WO2020244437A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20818214.7A EP3982275A4 (en) 2019-06-06 2020-05-28 IMAGE PROCESSING METHOD AND APPARATUS, AND COMPUTER DEVICE
US17/408,880 US20210382937A1 (en) 2019-06-06 2021-08-23 Image processing method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910492017.6 2019-06-06
CN201910492017.6A CN110188223B (zh) 2019-06-06 2019-06-06 图像处理方法、装置及计算机设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/408,880 Continuation US20210382937A1 (en) 2019-06-06 2021-08-23 Image processing method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2020244437A1 true WO2020244437A1 (zh) 2020-12-10

Family

ID=67720837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092834 WO2020244437A1 (zh) 2019-06-06 2020-05-28 图像处理方法、装置及计算机设备

Country Status (4)

Country Link
US (1) US20210382937A1 (zh)
EP (1) EP3982275A4 (zh)
CN (1) CN110188223B (zh)
WO (1) WO2020244437A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254688A (zh) * 2021-04-28 2021-08-13 广东技术师范大学 一种基于深度哈希的商标检索方法
WO2022182445A1 (en) * 2021-12-03 2022-09-01 Innopeak Technology, Inc. Duplicate image or video determination and/or image or video deduplication based on deep metric learning with keypoint features
CN113761262B (zh) * 2021-09-03 2024-02-20 奇安信科技集团股份有限公司 图像的检索类别确定方法、系统以及图像检索方法

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188223B (zh) * 2019-06-06 2022-10-04 腾讯科技(深圳)有限公司 图像处理方法、装置及计算机设备
WO2021057046A1 (en) * 2019-09-24 2021-04-01 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image hash for fast photo search
CN111325096B (zh) * 2020-01-19 2021-04-20 北京字节跳动网络技术有限公司 直播流采样方法、装置及电子设备
CN111798435A (zh) * 2020-07-08 2020-10-20 国网山东省电力公司东营供电公司 图像处理方法、工程车辆侵入输电线路监测方法及系统
US11636285B2 (en) * 2020-09-09 2023-04-25 Micron Technology, Inc. Memory including examples of calculating hamming distances for neural network and data center applications
US11609853B2 (en) 2020-09-09 2023-03-21 Micron Technology, Inc. Memory controllers including examples of calculating hamming distances for neural network and data center applications
US11586380B2 (en) 2020-09-09 2023-02-21 Micron Technology, Inc. Memory systems including examples of calculating hamming distances for neural network and data center applications
CN114896434B (zh) * 2022-07-13 2022-11-18 之江实验室 一种基于中心相似度学习的哈希码生成方法及装置
CN115686868B (zh) * 2022-12-28 2023-04-07 中南大学 一种基于联邦哈希学习的面向跨节点多模态检索方法
US11983955B1 (en) * 2023-08-16 2024-05-14 Netskope, Inc. Image matching using deep learning image fingerprinting models and embeddings

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147868A1 (en) * 2014-04-11 2017-05-25 Beijing Sesetime Technology Development Co., Ltd. A method and a system for face verification
CN108491528A (zh) * 2018-03-28 2018-09-04 苏州大学 一种图像检索方法、系统及装置
CN109145143A (zh) * 2018-08-03 2019-01-04 厦门大学 图像检索中的序列约束哈希算法
CN109241327A (zh) * 2017-07-03 2019-01-18 北大方正集团有限公司 图像检索方法及装置
CN110188223A (zh) * 2019-06-06 2019-08-30 腾讯科技(深圳)有限公司 图像处理方法、装置及计算机设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
JP6041439B2 (ja) * 2013-09-12 2016-12-07 Kddi株式会社 画像に基づくバイナリ特徴ベクトルを用いた画像検索装置、システム、プログラム及び方法
CN104123375B (zh) * 2014-07-28 2018-01-23 清华大学 数据搜索方法及系统
CN104346440B (zh) * 2014-10-10 2017-06-23 浙江大学 一种基于神经网络的跨媒体哈希索引方法
CN104376051A (zh) * 2014-10-30 2015-02-25 南京信息工程大学 随机结构保形哈希信息检索方法
CN104951559B (zh) * 2014-12-30 2018-06-15 大连理工大学 一种基于位权重的二值码重排方法
CN104715021B (zh) * 2015-02-27 2018-09-11 南京邮电大学 一种基于哈希方法的多标记学习的学习方法
CN109711422B (zh) * 2017-10-26 2023-06-30 北京邮电大学 图像数据处理、模型的建立方法、装置、计算机设备和存储介质
US20190171665A1 (en) * 2017-12-05 2019-06-06 Salk Institute For Biological Studies Image similarity search via hashes with expanded dimensionality and sparsification
CN109376256B (zh) * 2018-09-29 2021-03-26 京东方科技集团股份有限公司 图像搜索方法及装置
CN109299216B (zh) * 2018-10-29 2019-07-23 山东师范大学 一种融合监督信息的跨模态哈希检索方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147868A1 (en) * 2014-04-11 2017-05-25 Beijing Sesetime Technology Development Co., Ltd. A method and a system for face verification
CN109241327A (zh) * 2017-07-03 2019-01-18 北大方正集团有限公司 图像检索方法及装置
CN108491528A (zh) * 2018-03-28 2018-09-04 苏州大学 一种图像检索方法、系统及装置
CN109145143A (zh) * 2018-08-03 2019-01-04 厦门大学 图像检索中的序列约束哈希算法
CN110188223A (zh) * 2019-06-06 2019-08-30 腾讯科技(深圳)有限公司 图像处理方法、装置及计算机设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3982275A4

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254688A (zh) * 2021-04-28 2021-08-13 广东技术师范大学 一种基于深度哈希的商标检索方法
CN113761262B (zh) * 2021-09-03 2024-02-20 奇安信科技集团股份有限公司 图像的检索类别确定方法、系统以及图像检索方法
WO2022182445A1 (en) * 2021-12-03 2022-09-01 Innopeak Technology, Inc. Duplicate image or video determination and/or image or video deduplication based on deep metric learning with keypoint features

Also Published As

Publication number Publication date
CN110188223A (zh) 2019-08-30
EP3982275A1 (en) 2022-04-13
CN110188223B (zh) 2022-10-04
US20210382937A1 (en) 2021-12-09
EP3982275A4 (en) 2022-08-03

Similar Documents

Publication Publication Date Title
WO2020244437A1 (zh) 图像处理方法、装置及计算机设备
WO2022068196A1 (zh) 跨模态的数据处理方法、装置、存储介质以及电子装置
CN110162593B (zh) 一种搜索结果处理、相似度模型训练方法及装置
US11244205B2 (en) Generating multi modal image representation for an image
CN108694225B (zh) 一种图像搜索方法、特征向量的生成方法、装置及电子设备
CA2786727C (en) Joint embedding for item association
CN106202256B (zh) 基于语义传播及混合多示例学习的Web图像检索方法
CN105960647B (zh) 紧凑人脸表示
WO2019134567A1 (zh) 样本集的处理方法及装置、样本的查询方法及装置
CN106570141B (zh) 近似重复图像检测方法
US20230039496A1 (en) Question-and-answer processing method, electronic device and computer readable medium
WO2020114100A1 (zh) 一种信息处理方法、装置和计算机存储介质
CN110222218B (zh) 基于多尺度NetVLAD和深度哈希的图像检索方法
CN110858217A (zh) 微博敏感话题的检测方法、装置及可读存储介质
CN112948601B (zh) 一种基于受控语义嵌入的跨模态哈希检索方法
CN111985228A (zh) 文本关键词提取方法、装置、计算机设备和存储介质
US11120214B2 (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN114329029B (zh) 对象检索方法、装置、设备及计算机存储介质
Li et al. Hashing with dual complementary projection learning for fast image retrieval
CN112149410A (zh) 语义识别方法、装置、计算机设备和存储介质
CN114416979A (zh) 一种文本查询方法、设备和存储介质
CN112182262A (zh) 一种基于特征分类的图像查询方法
CN110209895B (zh) 向量检索方法、装置和设备
WO2012077818A1 (ja) ハッシュ関数の変換行列を定める方法、該ハッシュ関数を利用するハッシュ型近似最近傍探索方法、その装置及びそのコンピュータプログラム
WO2023155304A1 (zh) 关键词推荐模型训练方法、推荐方法和装置、设备、介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20818214

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2020818214

Country of ref document: EP