WO2017020741A1 - 图像检索、获取图像信息及图像识别方法、装置及系统 - Google Patents

图像检索、获取图像信息及图像识别方法、装置及系统 Download PDF

Info

Publication number
WO2017020741A1
WO2017020741A1 PCT/CN2016/091519 CN2016091519W WO2017020741A1 WO 2017020741 A1 WO2017020741 A1 WO 2017020741A1 CN 2016091519 W CN2016091519 W CN 2016091519W WO 2017020741 A1 WO2017020741 A1 WO 2017020741A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature value
feature
matching
local
Prior art date
Application number
PCT/CN2016/091519
Other languages
English (en)
French (fr)
Inventor
杨川
张伦
楚汝峰
Original Assignee
阿里巴巴集团控股有限公司
杨川
张伦
楚汝峰
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 杨川, 张伦, 楚汝峰 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017020741A1 publication Critical patent/WO2017020741A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data

Definitions

  • the present application relates to an image retrieval technology, and in particular, to an image retrieval method and apparatus.
  • the present application also provides a method and device for acquiring image information, an image recognition method and device, an image recognition system, a method and device for calculating image feature values, and an electronic device.
  • Object recognition and visual search technology can greatly shorten the distance between the physical world and the data world, helping users to obtain information quickly and easily.
  • image recognition technology can be used to find the matching image from the pre-registered image for the image to be recognized obtained by photographing the camera or downloaded from the Internet, and further knowing the image to be recognized. Relevant information (this process is also commonly referred to as the image recognition process). For example, by searching the book cover image, the name, author, and the like of the book can be known.
  • Image retrieval can be achieved by local feature matching. Since local features of images usually contain more redundancy and noise, in order to meet practical requirements, it is usually necessary to adopt a more streamlined and effective expression of local features. the way. In the currently used image retrieval technology of the Bag of Word, the local features of the image are characterized by "words".
  • the image retrieval technology based on the word package model includes two processes for indexing the image retrieval database and character matching.
  • the indexing stage the local features of a part of the image are usually randomly collected as training samples.
  • the Kmeans clustering algorithm is used to cluster the training samples, and the cluster center is used as the “word”.
  • the local features of the registered images in the image retrieval database are obtained. Find the "word” closest to its Euclidean distance, and use the index of the word as the quantitative representation of the local feature, and build the index structure of the inverted table on this basis; in the feature matching (retrieval) stage, find and identify The Euclidean distance of the image local feature is the nearest "word", and the index of the word is used to find the corresponding registered image.
  • the statistical voting method is used to obtain the search result corresponding to the image to be identified.
  • the local features of the image are represented by "words", and different image features may correspond to the same "word”. That is, the distance between the word indexes does not represent the true distance of the feature.
  • the indexes after the three feature quantization are 1, 5, 100, respectively, and the local features corresponding to 1 and 5 are not necessarily the local features corresponding to 100. More similar.
  • the existing image retrieval technology achieves accurate matching between images, and a large number of mismatches may occur, which further leads to the need to filter and rearrange a large number of images and mismatched feature pairs, thereby affecting retrieval performance.
  • the embodiment of the present invention provides an image retrieval method and device, which provides a scheme for improving the accuracy of image retrieval matching with respect to the problem that the existing image retrieval technology achieves low matching precision.
  • the embodiment of the present application further provides a method and device for acquiring image information, an image recognition method and device, an image recognition system, a method and device for calculating image feature values, and an electronic device.
  • the application provides an image retrieval method, including:
  • a registration image that satisfies a preset condition is selected as a retrieval result of the image to be retrieved according to the matching result.
  • the feature value includes a binarized feature value.
  • the matching the feature value with the feature value of the registered image in the image retrieval database includes: matching by using a Hamming distance method, and matching the feature value pair whose Hamming distance is less than a preset threshold Successful eigenvalue pairs.
  • the matching according to the Hamming distance method includes:
  • Matching is performed by using a binary eigenvalue as an index query hash table.
  • the depth self-coding network model is pre-trained, including:
  • the reconstruction error after encoding and decoding the input data is minimized by the depth self-coding network model, and iterative training is performed until the deep self-coding network model converges.
  • the image retrieval database is pre-established by the following steps:
  • the feature values are stored in an image retrieval database, and a correspondence between the feature values and the registered images is established.
  • the size of the registered image is normalized in a preset manner.
  • the feature values are filtered according to the calculated distribution of the feature values
  • the storing the feature value in the image retrieval database comprises: storing the filtered feature value in an image retrieval database.
  • the filtering the feature values according to the calculated distribution of the feature values includes:
  • the feature value is selected according to the position distribution of the feature value in the registered image.
  • the calculating the feature value of the local feature by using the pre-trained depth self-coding network model comprises: calculating the feature value of the local feature after performing the culling operation by using the depth self-coding network model.
  • the matching the feature value with the feature value of the registered image in the image retrieval database includes:
  • the registration image that meets the preset condition includes:
  • the registration value of the eigenvalue matching success is that the accumulated score is greater than the preset threshold.
  • the feature values of the image to be retrieved are matched with the registered image feature values extracted from the image retrieval database, and the number of feature value pairs satisfying the preset rearrangement matching condition is recorded. ;
  • the selecting, by the matching result, the registration image that satisfies the preset condition as the retrieval result of the image to be retrieved includes: selecting the registration image after the performing the rearrangement operation as the retrieval result of the image to be retrieved.
  • the spatial relationship consistency check is performed by using the transformation model, and the mismatched feature value pair is removed from the pair of feature values satisfying the preset rearrangement matching condition;
  • the sorting the selected registration image according to the number of the feature value pairs satisfying the preset reordering matching condition includes: matching the number of the feature value pairs satisfying the preset rearrangement matching condition after performing the above culling operation Select the registration image to sort.
  • the local features of the image are extracted by using a SIFT algorithm, using an LBP algorithm, or using a convolutional neural network.
  • the step of extracting a local feature of the image to be retrieved and calculating the feature value of the local feature by using a pre-trained deep self-coding network model is performed on the client device;
  • the step of matching the feature value with the feature value of the image in the image retrieval database, and the step of selecting the registration image that satisfies the preset condition according to the matching result as the retrieval result of the image to be retrieved is performed on the server device .
  • an image retrieval apparatus including:
  • a local feature extraction unit configured to extract local features of the image to be retrieved
  • An eigenvalue calculation unit configured to calculate a feature value of the local feature output by the local feature extraction unit by using a pre-trained depth self-coding network model
  • an eigenvalue matching unit configured to match the feature value output by the feature value calculation unit with the feature value of the registered image in the image retrieval database
  • the search result generating unit is configured to select, as the search result of the image to be retrieved, the registration image that satisfies the preset condition according to the matching result output by the feature value matching unit.
  • the feature value matching unit is specifically configured to perform matching according to a Hamming distance method, and compare the feature value of the Hamming distance to be less than a preset threshold. As a feature value pair that matches successfully.
  • the device includes: a model training unit, configured to pre-train the deep self-coding network model;
  • the model training unit includes:
  • a sample selection subunit for selecting a sample image set
  • a sample feature extraction subunit configured to extract local features of the sample image in the sample image set
  • the iterative training sub-unit is configured to take the local feature as an input, and to minimize the reconstruction error after encoding and decoding the input data by the depth self-coding network model, and perform iterative training until the deep self-coding network model converges.
  • the device includes: a database establishing unit, configured to pre-establish the image retrieval database;
  • the database establishing unit includes:
  • a registration image selection subunit for selecting a registration image for constructing the image retrieval database
  • the feature value registration subunit is configured to store the feature value in an image retrieval database and establish a correspondence between the feature value and the registered image.
  • the database establishing unit further includes:
  • the feature value filtering subunit is configured to filter the feature values calculated by the registered image feature value calculation subunit according to the distribution of the feature values
  • the feature value registration subunit is specifically configured to store the feature value after the feature value screening subunit is filtered in an image retrieval database, and establish a correspondence between the feature value and the registration image.
  • the device includes:
  • a distance calculation unit configured to calculate a distance from a key point corresponding to the local feature to a center of the image to be retrieved
  • a local feature culling unit configured to remove a local feature corresponding to a key point calculated by the distance calculation unit and greater than a preset threshold
  • the feature value calculation unit is configured to calculate, by using the depth self-coding network model, a feature value of a local feature after the culling operation is performed by the local feature culling unit.
  • the search result generating unit includes:
  • Registering an image primary selection sub-unit configured to select a registration image that satisfies a preset condition according to a matching result output by the feature value matching unit;
  • a rearrangement filtering subunit configured to sort the selected registration image according to the number of feature value pairs recorded by the rearrangement matching subunit, and select a top registration image from which to search for the image to be retrieved result.
  • the search result generating unit further includes:
  • a spatial consistency check sub-unit configured to remove a mismatched feature value pair from the feature value pairs obtained by the rearrangement matching sub-unit by performing a spatial relationship consistency check by using a transformation model
  • the rearrangement screening sub-unit is specifically configured to sort the selected registration image according to the number of feature value pairs satisfying the preset rearrangement matching condition after the spatial consistency check sub-unit performs the culling operation.
  • the local feature extraction unit and the feature value calculation unit are deployed on a client device;
  • the feature value matching unit and the search result generating unit are deployed on a server device.
  • the present application also provides a method for acquiring image information, including:
  • the feature value includes a binarized feature value.
  • the calculating the feature value of the local feature by using the pre-trained depth self-coding network model comprises: calculating the feature value of the local feature after performing the culling operation by using the depth self-coding network model.
  • the local features of the image to be identified are extracted by using a SIFT algorithm, using an LBP algorithm, or using a convolutional neural network.
  • the method is implemented on a mobile terminal device.
  • the application further provides an apparatus for acquiring image information, including:
  • a local feature extraction unit configured to extract local features of the image to be identified
  • An eigenvalue calculation unit configured to calculate a feature value of the local feature output by the local feature extraction unit by using a pre-trained depth self-coding network model
  • An eigenvalue sending unit configured to send the feature value output by the feature value calculating unit to a server that provides an image recognition service
  • an image information receiving unit configured to receive related information of the image to be recognized returned by the server.
  • an image recognition method including:
  • the registration information corresponding to the selected registration image is obtained and returned to the client.
  • the feature value includes: a binarized feature value.
  • the matching the feature value with the feature value of the registered image in the image retrieval database includes: matching by using a Hamming distance method, and using a Hamming distance less than a preset threshold
  • the levy pair is a feature value pair that matches the success.
  • the image retrieval database is pre-established by the following steps:
  • the feature values are stored in an image retrieval database, and a correspondence between the feature values and the registered images is established.
  • the feature values are filtered according to the calculated distribution of the feature values
  • the storing the feature value in the image retrieval database comprises: storing the filtered feature value in an image retrieval database.
  • the filtering the feature values according to the calculated distribution of the feature values includes:
  • the feature value is selected according to the position distribution of the feature value in the registered image.
  • the selecting, by the matching result, the registration image that satisfies the preset condition as the retrieval result of the image to be retrieved includes: selecting the registration image after the performing the rearrangement operation as the retrieval result of the image to be retrieved.
  • an image recognition apparatus including:
  • a feature value receiving unit configured to receive a feature value of the image to be recognized uploaded by the client, where the feature value is calculated by using a local feature of the image to be recognized, and using a pre-trained depth self-coding network model;
  • an eigenvalue matching unit configured to match the feature value received by the feature value receiving unit with the feature value of the registered image in the image retrieval database
  • a registration image selection unit configured to select a registration image that satisfies a preset condition according to the matching result
  • the image information sending unit is configured to acquire registration information corresponding to the selected registration image, and return the information to the client.
  • the present application provides an image recognition system, comprising: the apparatus for acquiring image information according to any one of the above, and the image recognition apparatus according to any one of the above.
  • the present application also provides a method for calculating image feature values, including:
  • the feature values of the local features are calculated using a pre-trained deep self-coding network model.
  • the feature value includes: a binarized feature value.
  • the depth self-coding network model is pre-trained, including:
  • the reconstruction error after encoding and decoding the input data is minimized by the depth self-coding network model, and iterative training is performed until the deep self-coding network model converges.
  • the local feature of the image of the feature value to be calculated is extracted by using a SIFT algorithm, using an LBP algorithm, or using a convolutional neural network.
  • the present application further provides an apparatus for calculating image feature values, including:
  • a local feature extraction unit configured to extract local features of the image of the feature value to be calculated
  • an eigenvalue calculation unit configured to calculate a feature value of the local feature output by the local feature extraction unit by using a pre-trained depth self-coding network model.
  • the application further provides an electronic device, including:
  • a memory for storing a program for acquiring image information, the program, when being read and executed by the processor, performing an operation of: extracting a local feature of the image to be identified; and calculating the local portion by using a pre-trained depth self-coding network model a binarized feature value of the feature; sending the binarized feature value to a server that provides an image recognition service; and receiving a correlation about the image to be recognized returned by the server information.
  • the technical solution provided by the present application calculates the local feature of the image to be retrieved by using a pre-trained depth self-coding network model, and then extracts the feature value and the feature of the registered image in the image retrieval database. The values are matched, and a registration image that satisfies the preset condition is selected as the retrieval result of the image to be retrieved according to the matching result.
  • the above method provided by the present application combines the local features of the image and the depth self-encoding network model. Since the depth self-encoding network compresses and represents the local features, the distance information and the discriminating ability between the feature values can be effectively maintained. Therefore, the accuracy of image retrieval can be effectively improved, the workload of rearrangement filtering can be reduced, and retrieval efficiency can be improved.
  • the process of converting the local features of the image to be retrieved into "words" is a million-level nearest neighbor problem, it cannot be implemented on a common mobile terminal device; and the technical solution of the present application is
  • the depth self-encoding network model is used to calculate the feature values of the local features of the image.
  • the storage space requirement is small, which can be implemented on the mobile terminal device, so that the mobile terminal directly uploads the image feature value to the server. It is possible to reduce the pressure on the server.
  • the feature value output by the depth self-encoding network model may be a quantized binarized feature value, thereby implementing a further compressed representation of the image feature, for example, may be compressed into a binary code sequence of only a few K size.
  • the image retrieval database can be extended to a scale of one million or even 100 million, and the Hash and other technologies can be conveniently used to speed up the retrieval process; on the other hand, the amount of data uploaded by the client to the server can be effectively reduced, and the pair can be saved.
  • the occupation of the network bandwidth and the reduction of the data transmission time enable the function of directly quantizing the image features and uploading the quantized feature data to the server on the mobile terminal device.
  • FIG. 1 is a flow chart of an embodiment of an image retrieval method provided by the present application.
  • FIG. 2 is a schematic diagram of a deep self-encoding network provided by an embodiment of the present application.
  • FIG. 3 is a process flowchart of constructing an image retrieval database provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a process for selecting a registration image according to a matching result according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of an embodiment of an image retrieval apparatus provided by the present application.
  • FIG. 6 is a flowchart of an embodiment of a method for acquiring image information provided by the present application.
  • FIG. 7 is a schematic diagram of an apparatus for acquiring image information provided by the present application.
  • FIG. 8 is a flowchart of an embodiment of an image recognition method provided by the present application.
  • FIG. 9 is a schematic diagram of an embodiment of an image recognition apparatus provided by the present application.
  • FIG. 10 is a schematic diagram of an embodiment of an image recognition system provided by the present application.
  • FIG. 11 is a flow chart of an embodiment of a method for calculating image feature values provided by the present application.
  • FIG. 12 is a schematic diagram of an apparatus embodiment for calculating image feature values provided by the present application.
  • FIG. 13 is a schematic diagram of an embodiment of an electronic device provided by the present application.
  • an image retrieval method and apparatus a method and device for acquiring image information, an image recognition method and device, an image recognition system, and a method for calculating image feature values are provided respectively
  • the device, and an electronic device are explained in detail in the following embodiments.
  • the technical solution of the present application improves the accuracy of image retrieval by combining local features with a depth self-encoding network.
  • the deep self-encoding network model can be trained in advance, and the image retrieval database is constructed by using the trained deep self-coding network model. The two parts are sequentially described below.
  • the deep self-encoding network is a deep neural network.
  • FIG. 2 is a schematic diagram of a deep self-encoding network provided by the embodiment.
  • the network consists of 5 layers of neurons, including multiple hidden layers, because it is in the middle.
  • the hidden layer that is, the number of neurons in layer 3 (commonly referred to as the coding layer) in this figure is less than the number of neurons in the input layer, so the output of the coding layer is usually a compressed representation of the input data.
  • the first layer is the input layer, and the second layer and the third layer each output a representation of the input signal (also called Encoding process), Layer 4 and Layer 5 are used to reconstruct the input data (also called the decoding process).
  • the training process of the deep self-encoding network is to minimize the reconstruction error after layer-by-layer coding and decoding of the input data.
  • the goal is to adjust the process of each layer parameter in an iterative manner using an algorithm such as gradient descent.
  • the depth self-coding network model is trained by randomly selecting a sample image set, extracting local features of the sample image in the sample image set, using the local feature as an input, and using a deep self-coding network model.
  • the reconstruction error after encoding and decoding the input data is the minimum target, and iterative training is performed until the depth self-coding network model converges, then the depth self-coding network model is trained, and the output of the model coding layer is the input local feature.
  • the compressed feature value for example, the input is a 128-dimensional real vector, and the output is a 16-dimensional real vector.
  • a compressed representation of the input features can be obtained at the coding layer by performing several matrix multiplications. For example, in the deep self-coding network shown in FIG. 2, the above compression process can be performed. Implemented by two matrix multiplications.
  • the feature value output by the coding layer can generally represent the input feature well, that is, it can represent the original complexity in a small amount of storage form.
  • the information redundancy is a large representation, and the distance information and the discriminating ability between the feature values are effectively maintained, thereby providing a guarantee for improving the retrieval precision.
  • 600,000 images are randomly selected, and 200 local features are extracted from each image, and then iteratively trained with 200 ⁇ 600,000 to 12 million local features as inputs, and finally trained. Depth self-encoding network model.
  • constraints may be added during the above training process to limit the output of the coding layer to binarized data.
  • the value of the output of the coding layer is rounded off in the range of [0, 1], thereby obtaining a binary expression as the input of the next layer, and the gradient descent method is used in the back propagation.
  • the output data of the coding layer used is still a real result without quantization.
  • the coding layer is rounded to obtain 0 and 1 when the output is output.
  • the binary sequence that is, the binarized feature value described in the present application, is also referred to as a quantized binary code or a quantized feature value.
  • FIG. 2 is a schematic diagram of the depth self-encoding network according to the embodiment.
  • the number of network layers and the number of neurons shown in the figure are only schematic.
  • the image retrieval database is also referred to as a feature database for storing a large number of image features, so that when performing image retrieval, a registration matching the image to be retrieved can be found by matching the image to be retrieved with the image features in the library. image.
  • the image retrieval database may be constructed by using a pre-trained deep self-coding network model. The specific implementation may include the following steps 101-1 to 101-4, which will be described below with reference to FIG.
  • Step 101-1 Select a registration image for constructing the image retrieval database.
  • the image may be obtained from image materials provided by the Internet, various resource servers, and various applications, or may be acquired by photographing or the like.
  • the reason why these images are referred to as registered images means that the feature values of the images will be stored in the image retrieval database for retrieval matching, and the images themselves may become images that match the images to be retrieved.
  • Step 101-2 Extract local features of the registered image.
  • the size of the registered image may be normalized according to a preset manner, for example, by scaling the size of the registered image to make all
  • the registered image reaches a uniform specification with a length of 300 pixels.
  • SIFT Scale-Invariant feature Transform
  • LBP Long Binary Patterns
  • convolutional neural network For example, for each registered image, 200 128-dimensional feature vectors can be extracted using the SIFT algorithm.
  • Step 101-3 Calculate the feature value of the local feature by using the depth self-coding network model.
  • each local feature acquired in step 101-2 is taken as an input, and the feature value of the local feature is calculated, and the output of the coding layer of the depth self-encoding network model is the local feature.
  • the eigenvalue for example, may be a real-dimensional vector after dimension reduction. If a constraint related to binarization is added to the training depth self-encoding network model, then this step obtains the binarization feature of each local feature. value.
  • the eigenvalues calculated in this step are correspondingly many.
  • the eigenvalues can be obtained according to the calculated distribution of eigenvalues.
  • the screening is performed and only the filtered feature values are registered (ie stored in the image retrieval database for matching retrieval). There are many ways to filter, here are two. These two methods can be used independently or in combination:
  • Step 101-4 Store the feature value in an image retrieval database, and establish a correspondence between the feature value and the registration image.
  • each registered image corresponds to a plurality of feature values
  • the step stores the feature values in an image retrieval database, and establishes a correspondence between the feature values and the registered images.
  • This process is also called the registration process of eigenvalues.
  • the output of the pre-trained depth self-encoding network model is a binarized feature value
  • the storage space required for the registered feature value of each registered image can be effectively reduced, so that the image retrieval database can be expanded.
  • the binary eigenvalue can be converted into an index, and a Hash table is constructed, and the corresponding registered image can be recorded in the Hash table corresponding to each registered binarized feature value. Identification, so that Hash technology can be used for quick retrieval during the feature matching phase.
  • FIG. 1 is a flowchart of an embodiment of an image retrieval method of the present application.
  • the method includes the following steps:
  • Step 101 Extract local features of the image to be retrieved.
  • the local features are usually extracted first, and the local features are extracted because each image is decomposed into many parts when the image is described by local features, and each part corresponds to one feature vector, and such local features are adopted. It can effectively use the structural information of the image for image matching and recognition, and it can have a relatively stable feature matching ability for angle change and scale change to a certain extent.
  • the local features of the image may be extracted by using a SIFT algorithm, an LBP algorithm, or a convolutional neural network, that is, a feature vector capable of characterizing the main features of the image is obtained.
  • the same local feature extraction method should be used, for example, the SIFT algorithm is used, so that the present embodiment can be effectively implemented.
  • Step 102 Calculate the feature value of the local feature by using a pre-trained deep self-coding network model.
  • the image to be retrieved is often transformed into a feature space with a relatively high dimension, and a relatively large number of feature vectors are usually obtained.
  • a feature subset that best represents the image may be selected from the local feature space before calculating the feature values of the local features.
  • the present embodiment provides a preferred embodiment: a distance from the center of the image to be retrieved as a criterion for local feature selection.
  • the key point corresponding to the local feature can be calculated to The distance of the center of the image to be retrieved, if the distance is greater than the preset threshold, indicating that the key point is far away from the image center, the local feature corresponding to the key point may be removed, otherwise the local feature corresponding to the key point is retained. In the above manner, the number of local features of the image to be retrieved can be reduced, and the retrieval efficiency is improved.
  • each local feature can be used as an input, and the feature values of each local feature can be calculated using the pre-trained depth self-coding neural network model.
  • the eigenvalue output by the deep self-encoding network model is a compressed representation of the input local feature, for example, may be a real-dimensional vector after dimension reduction; if a constraint condition related to binarization is added when training the depth self-encoding network model, then The step is to obtain a quantized binary code for each local feature.
  • Step 103 Match the feature value with a feature value of a registered image in an image retrieval database.
  • the step may perform the pairwise matching of the feature value and the registered feature value in the image retrieval database one by one, and may calculate each pair of feature values for matching. Determining an index value of the degree of difference of the feature value pair, and determining that the feature value pair is successful when the indicator value is less than a preset threshold.
  • the indicator for characterizing the feature value to the degree of difference may be an Euclidean distance between the pair of feature values. Further, if the feature value output by the depth self-encoding network model is a binarized feature value, the indicator for characterizing the feature value to the degree of difference may also be a Hamming distance between the feature value pairs. from.
  • the linear query method for calculating the Hamming distance or the Hash technique can be used for matching, and both methods can effectively improve the retrieval efficiency. The two methods are further explained below.
  • the Hamming distance generally refers to a number of two corresponding bits of the same length string, and the pair of feature values represented by the quantized binary code composed of 0 and 1 can be XORed and the statistical result is 1. The number is obtained to obtain the Hamming distance, for example, the Hamming distance between 1011101 and 1001001 is 2.
  • the threshold value may be preset for the linear query matching process based on the Hamming distance, and then the feature value of the image to be retrieved and the registered image in the image retrieval database are The feature values are matched one by one, and if the Hamming distance of the currently matched feature value pair is less than the threshold, the feature value pair is considered to be successful.
  • the length of the quantized binary code is 62 bits
  • the preset Hamming distance threshold is 4, if the Hamming distance of the matched feature value pair is less than or equal to 3, that is, the Hamming distance If it belongs to the interval [0, 3], it can be determined that the feature value pair is successfully matched.
  • Matching is performed by using a binary eigenvalue as an index query hash table.
  • Hash technology is used for matching, the Hash table with the binarized feature value index is usually established when constructing the image retrieval database. Therefore, in this step, the binarized feature value can be indexed directly in the Hash table. In the query.
  • the threshold of the Hamming distance may be preset, and then all the binarized feature values (hereinafter referred to as the binary code to be retrieved) of the image to be retrieved are generated.
  • the Hamming distance is less than the threshold value of the binary code, and the binary code is converted into an index, and the query is directly performed in the Hash table. If the registered image identifier is recorded in the Hash entry corresponding to an index, the general description is A registration feature value matching the binary code to be retrieved is found.
  • the matching result can be recorded for the corresponding registered image.
  • the number of successful matching of the feature values of the registered image may be recorded, and each time the certain registered feature value of the registered image matches successfully, the number is incremented by one; and the matching score of the registered image may also be recorded.
  • the score accumulation strategy may be further refined, and different scores may be accumulated according to the matching degree of the feature value pairs.
  • the Hamming distance is 0 or 1
  • the preset lower score can be accumulated.
  • Step 104 Select, according to the matching result, a registration image that satisfies a preset condition as a retrieval result of the image to be retrieved.
  • This step selects a registration image that satisfies the preset condition as a retrieval result of the image to be retrieved according to the result of performing the matching operation in step 103.
  • the embodiment further provides a preferred embodiment for rearranging the registered image that satisfies the preset condition.
  • the entire process includes steps 104-1 through 104-3, which are described below in conjunction with FIG.
  • Step 104-1 selecting a registration image that satisfies a preset condition according to the matching result.
  • this step may select a registration image that satisfies the following conditions: the registration image that ranks the top according to the number of successful matching of the feature values, Alternatively, if the number of successful feature value matching is greater than the registered image of the preset threshold; if the cumulative score is recorded for the registered image in step 103, then this step may select a registered image that satisfies the following conditions: successful matching according to the feature value The registered image with the accumulated scores from the largest to the smallest, or the registered image with the accumulated scores of the feature values being greater than the preset threshold.
  • Step 104-2 Perform, for each selected registration image, match the feature value of the image to be retrieved with the registered image feature value extracted from the image retrieval database, and record the feature that satisfies the preset rearrangement matching condition. The number of value pairs.
  • the step of searching the image and the registered image selected in step 104-1 between the map and the image are performed.
  • a one-to-one matching between the image to be retrieved and the registered image may be performed in the following manner: for each feature value of the image to be retrieved, each registered feature value that characterizes the feature value and the registered image is calculated.
  • the index value of the difference degree is determined whether the index value satisfies a preset rearrangement matching condition: if the index value is sorted in order from small to large, if the first index value is smaller than a preset number a threshold value, and the difference between the second bit and the index value of the first bit is greater than a preset second threshold, then the feature value of the image to be retrieved and the registered feature value corresponding to the first bit are considered to satisfy the preset rearrangement.
  • the difference between the index value of the second bit and the first bit can be calculated by solving the difference or the ratio of the two; a specific rearrangement matching condition is listed above, and in the specific implementation, Other rearrangement matching conditions can be set in advance, as long as the registration image selected in step 104-1 can be rearranged and filtered, and the image retrieval accuracy can be improved.
  • the one-to-one matching process described above is sequentially performed, so that the feature value pairs satisfying the preset rearrangement matching condition for each selected registered image can be obtained.
  • the number is the number.
  • this step may further filter the feature value pairs satisfying the preset rearrangement matching condition for each selected registration image by using the spatial relationship consistency check on the basis of the foregoing processing, thereby further improving the image retrieval.
  • Precision Since the eigenvalue pairs with successful matching may have mismatches due to noise or the like, and the eigenvalue pairs corresponding to the two homologous images can be mutually transformed by a transform model, this embodiment It is this feature that is used to eliminate mismatched pairs (also known as noise matching pairs).
  • the following operations may be performed: arbitrarily selecting 3 pairs or 4s from the feature value pairs (hereinafter referred to as matching feature value pairs) that satisfy the preset rearrangement matching condition.
  • matching feature value pairs the feature value pairs
  • the RANSAC algorithm is used to estimate the transformation model. By selecting different eigenvalue pairs by cycle, different transformation models can be estimated, and the best transformations with all matching eigenvalue pairs are selected from these transformation models.
  • a model also called a transformation matrix
  • the matched feature value pair of the preset condition can be regarded as a noise matching pair, and thus such feature value pairs are eliminated from the feature value pair satisfying the preset rearrangement matching condition.
  • Step 104-3 Sort the selected registration images according to the number of feature value pairs satisfying the preset rearrangement matching condition, and select the top registered image as the retrieval result of the image to be retrieved.
  • the sequence is performed again in descending order. Sorting, and selecting a top-ranked registered image as a search result of the image to be retrieved. Since the registration image selected in step 104-1 and the image to be retrieved are again matched one-to-one, and the mismatched feature value pairs are eliminated, the retrieval result obtained after the above rearrangement operation is generally more accurate.
  • the image retrieval process is completed through the above steps 101 to 104, and the retrieval result of the image to be retrieved is acquired.
  • the technical solution provided by this embodiment can be implemented on a single device or in a system based on a C/S (client/server) architecture.
  • the steps 101 and 102 described in this embodiment that is, the step of extracting the local features of the image to be retrieved and calculating the feature values may be performed on the client device, and the calculated feature values will be calculated by the client device.
  • Sending to the server device, and steps 103 and 104, that is, performing feature value matching and selecting a retrieval result may be performed by the server device after receiving the feature value.
  • the image retrieval method provided by the present application combines the local features of the image and the depth self-encoding network model, and the depth self-encoding network model can effectively maintain the local features in the process of compression representation.
  • the distance information and the discriminating ability between the feature values can effectively improve the accuracy of the image retrieval, reduce the workload of the rearrangement filtering, and improve the retrieval efficiency.
  • FIG. 5 is a schematic diagram of an embodiment of an image retrieval device of the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • An image retrieval apparatus of the embodiment includes: a local feature extraction unit 501 for extracting local features of an image to be retrieved; and a feature value calculation unit 502 for calculating the local feature by using a pre-trained depth self-coding network model Extracting the feature value of the local feature output by the unit; the feature value matching unit 503 is configured to match the feature value output by the feature value calculation unit with the feature value of the registered image in the image retrieval database; the retrieval result generating unit 504 is configured to: The registration image that satisfies the preset condition is selected as the retrieval result of the image to be retrieved according to the matching result output by the feature value matching unit.
  • the feature value matching unit is specifically configured to perform matching according to a Hamming distance method, and compare the feature value of the Hamming distance to be less than a preset threshold. As a feature value pair that matches successfully.
  • the device includes: a model training unit, configured to pre-train the deep self-coding network model;
  • the model training unit includes:
  • a sample selection subunit for selecting a sample image set
  • a sample feature extraction subunit configured to extract local features of the sample image in the sample image set
  • the iterative training sub-unit is configured to take the local feature as an input, and to minimize the reconstruction error after encoding and decoding the input data by the depth self-coding network model, and perform iterative training until the deep self-coding network model converges.
  • the device includes: a database establishing unit, configured to pre-establish the image retrieval database;
  • the database establishing unit includes:
  • a registration image selection subunit for selecting a registration image for constructing the image retrieval database
  • the feature value registration subunit is configured to store the feature value in an image retrieval database and establish a correspondence between the feature value and the registered image.
  • the database establishing unit further includes:
  • the feature value filtering subunit is configured to filter the feature values calculated by the registered image feature value calculation subunit according to the distribution of the feature values
  • the feature value registration subunit is specifically configured to store the feature value after the feature value screening subunit is filtered in an image retrieval database, and establish a correspondence between the feature value and the registration image.
  • the feature value screening sub-unit is specifically configured to: select a feature value whose frequency is lower than a preset threshold in the registration image; and/or select a feature value according to a position distribution of the feature value in the registration image.
  • the device includes:
  • a distance calculation unit configured to calculate a distance from a key point corresponding to the local feature to a center of the image to be retrieved
  • a local feature culling unit configured to remove a local feature corresponding to a key point calculated by the distance calculation unit and greater than a preset threshold
  • the feature value calculation unit is configured to calculate, by using the depth self-coding network model, a feature value of a local feature after the culling operation is performed by the local feature culling unit.
  • the search result generating unit includes:
  • Registering an image primary selection sub-unit configured to select a registration image that satisfies a preset condition according to a matching result output by the feature value matching unit;
  • a rearrangement filtering subunit configured to sort the selected registration image according to the number of feature value pairs recorded by the rearrangement matching subunit, and select a top registration image from which to search for the image to be retrieved result.
  • the search result generating unit further includes:
  • a spatial consistency check sub-unit configured to remove a mismatched feature value pair from the feature value pairs obtained by the rearrangement matching sub-unit by performing a spatial relationship consistency check by using a transformation model
  • the rearrangement screening sub-unit is specifically configured to sort the selected registration image according to the number of feature value pairs satisfying the preset rearrangement matching condition after the spatial consistency check sub-unit performs the culling operation.
  • the local feature extraction unit and the feature value calculation unit are deployed on a client device;
  • the feature value batch matching unit and the retrieval result generating unit are deployed on a server device.
  • FIG. 6 is a flowchart of an embodiment of a method for acquiring image information provided by the present application. The same parts will not be described again, and the following highlights the differences.
  • a method for obtaining image information provided by the present application includes:
  • Step 601 Extract local features of the image to be identified.
  • the image to be identified may include a cover image such as a CD, a book or a poster, and this step extracts a local feature of the image to be recognized.
  • local features of the image to be identified may be extracted by using a SIFT algorithm, using an LBP algorithm, or using a convolutional neural network.
  • Step 602 Calculate the feature value of the local feature by using a pre-trained deep self-coding network model.
  • the distance between the key point corresponding to the local feature and the center of the image to be identified may be calculated, and the local feature corresponding to the key point whose distance is greater than a preset threshold is removed, and then the depth self-coding network model is used to calculate and execute The feature value of the local feature after the above culling operation.
  • the eigenvalues output by the depth self-encoding network model can be binarized eigenvalues, thereby implementing further feature compression and quantization. Said.
  • Step 603 Send the feature value to a server that provides an image recognition service.
  • Step 604 Receive related information of the to-be-identified image returned by the server.
  • the server may use the image retrieval method provided by the application to find a registration image that matches the image to be identified, and return corresponding registration information, and the step may receive the information, for example, the image to be identified is a book. Cover image, then this step can receive the following information: title, author name, price, book review, online purchase URL, etc.
  • the method for acquiring image information provided by the present application is compared with the existing image retrieval technology based on the word package model.
  • the existing image retrieval technology based on the word package model on the one hand, the number of "words” (also called central feature vectors) as the cluster center is usually in the order of millions, and requires a very large storage space for storage.
  • the process of converting the local features of an image into "words” is a million-level nearest neighbor problem.
  • the above requirements for storage space and performance requirements result in the process of calculating the feature values of the image to be recognized cannot be used in ordinary mobile terminals.
  • the mobile terminal device that performs image recognition can only upload the image to be recognized or the compressed version of the image to be recognized to the server. The former is slow due to the large uploading traffic, and the latter not only introduces additional codec time, but also Loss of information can result in inaccurate image recognition results.
  • the method for obtaining image information provided by the present application generally requires only performing several matrix multiplications by using a deep self-coding network model to calculate the feature values of local features of the image to be recognized, and the requirements for storage space and computing performance are relatively low. It can be implemented on the mobile terminal device, so that it is possible for the mobile terminal device to directly upload the feature value of the image to be recognized to the server, which can reduce the working pressure of the server.
  • the feature value output by the depth self-encoding network model may be a quantized binarized feature value, thereby implementing a further compressed representation of the image feature to be identified, which can effectively reduce the amount of data uploaded by the client to the server, and save network bandwidth. Occupancy and reduce data transfer time.
  • the length of the binarized feature value output by the depth self-encoding network model is 62 bits.
  • the function of directly quantizing the image features to be recognized and uploading the quantized feature data to the server can be implemented on the mobile terminal device.
  • FIG. 7 is a schematic diagram of an apparatus for acquiring image information according to the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • An apparatus for acquiring image information includes: a local feature extracting unit 701, configured to extract local features of an image to be identified; and a feature value calculating unit 702, configured to calculate, by using a pre-trained deep self-coding network model a feature value of the local feature output by the local feature extraction unit; the feature value sending unit 703, configured to send the feature value output by the feature value calculation unit to a server that provides an image recognition service; and the image information receiving unit 704 is configured to receive The related information of the image to be recognized returned by the server.
  • the device includes:
  • a distance calculation unit configured to calculate a distance from a key point corresponding to the local feature to a center of the image to be retrieved
  • a local feature culling unit configured to remove a local feature corresponding to a key point calculated by the distance calculation unit and greater than a preset threshold
  • the feature value calculation unit is configured to calculate, by using the depth self-coding network model, a feature value of a local feature after the culling operation is performed by the local feature culling unit.
  • the local feature extraction unit is specifically configured to extract a local feature of the to-be-identified image by using a SIFT algorithm, an LBP algorithm, or using a convolutional neural network.
  • FIG. 8 is a flowchart of an embodiment of an image recognition method provided by the present application. The same part of the embodiment is not the same as the previously provided embodiments. Again, the following highlights the differences.
  • An image recognition method provided by the present application includes:
  • Step 801 Receive a feature value of an image to be recognized uploaded by a client, where the feature value is the The local features of the identified image are inputs, calculated using a pre-trained depth self-coding network model.
  • Step 802 Match the feature value with a feature value of a registered image in an image retrieval database.
  • the image retrieval database is pre-established by: selecting a registration image for constructing the image retrieval database; extracting local features of the registration image; and calculating features of the local feature by using the depth self-coding network model a value; storing the feature value in an image retrieval database, and establishing a correspondence between the feature value and the registered image.
  • the feature value may be filtered according to the calculated distribution of the feature value. For example, the frequency of occurrence in the registered image may be selected to be lower than the pre-presence.
  • the feature value of the threshold is set, and the feature value may also be selected according to the position distribution of the feature value in the registered image, and the filtered feature value is stored in the image retrieval database.
  • the feature value uploaded by the client and the feature value of the registered image in the image retrieval database may be: a binarized feature value, and the step may be performed by using a Hamming distance method and the Hamming distance is less than a preset threshold value. A pair of feature values that are successful as a match.
  • Step 803 Select a registration image that meets a preset condition according to the matching result.
  • This step selects the registered image that matches the image to be identified.
  • the registered image that matches the image to be identified generally refers to a registered image that is relatively high in degree of matching with the image to be identified, for example, belongs to the same image as the image to be recognized.
  • the same image generally refers to an image obtained by a series of changes of the same image (ie, a near-duplicate image), and the series of changes may include adjusting a resolution, adjusting a shooting angle, adjusting brightness, adding a watermark, and the like.
  • a set of registered images satisfying the preset condition may be selected in the same manner as the previously provided image retrieval method embodiment, and then further filtered by rearrangement to find the image to be recognized. Match the registered image.
  • the rearranging operation includes: matching, for each selected registration image, the feature value of the image to be recognized and the registered image feature value extracted from the image retrieval database, and recording the predetermined rearrangement matching condition. The number of feature value pairs; sorting the selected registration images according to the number of feature value pairs satisfying the preset rearrangement matching condition, and selecting the registered image with the top ranking;
  • the registered image In order to improve the accuracy of the identification, it is also possible to add some conditions for selecting the registered image, for example, selecting the cumulative score obtained by matching the eigenvalues to be in the first place, and the accumulated score difference with the second bit is greater than the preset threshold. Registering an image; or, selecting a registered image whose matching number of feature values is greater than a preset threshold (for example, for a registered image containing 200 registered feature values, at least 50 features) The value matches successfully) and so on.
  • a preset threshold for example, for a registered image containing 200 registered feature values, at least 50 features
  • the feature value of the image to be recognized uploaded by the client and the registered feature value stored in the image retrieval database can maintain the resolution of the original image feature, the image retrieval accuracy is high, and the registered image size in the image retrieval database is sufficiently large. In this case, this step can usually accurately find the registered image that matches the image to be identified.
  • Step 804 Acquire registration information corresponding to the selected registration image, and return the information to the client.
  • the registration information typically includes information related to the image content, such as for a book cover image, the registration information may include information related to the book in the image, such as title, author name, price, book review, online purchase URL, and the like.
  • this step may extract corresponding registration information from the database according to the registration image selected in step 803.
  • the corresponding registration information record may be read according to the registration image identifier, and the registration information therein may be sent to the client.
  • the image recognition method provided by the present application adopts an image retrieval technology combining image local features and depth self-encoding network. Since the depth self-encoding network compresses and represents local features, the feature values can be effectively maintained. The distance information and the discriminating ability can effectively improve the accuracy of image retrieval, so that it is usually possible to accurately retrieve the required registration image and return the registration information of the registered image to the client.
  • the feature value uploaded by the client and the feature value stored in the image retrieval database may be binarized feature values. Since the binarized feature value is a further quantized compressed representation of the image feature, the image recognition method provided by the present application has good Scalability: On the one hand, the image retrieval database can be extended to a scale of one million or even billions. On the other hand, technologies such as Hash can be conveniently used to speed up the retrieval process and improve retrieval performance.
  • FIG. 9 is a schematic diagram of an embodiment of an image recognition apparatus provided by the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • An image recognition apparatus of this embodiment includes: a feature value receiving unit 901, configured to receive a client a feature value of the image to be recognized that is uploaded, the feature value is calculated by using a local feature of the image to be recognized, and is calculated by using a pre-trained depth self-coding network model; the feature value matching unit 902 is configured to The feature value received by the feature value receiving unit is matched with the feature value of the registered image in the image retrieval database; the registration image selection unit 903 is configured to select a registration image that satisfies the preset condition according to the matching result; the image information transmitting unit 904, Obtaining registration information corresponding to the selected registration image and returning to the client.
  • the feature value matching unit is specifically configured to perform matching according to a Hamming distance method, and use a feature value pair whose Hamming distance is less than a preset threshold as a matching feature value pair.
  • the device includes: a database establishing unit, configured to pre-establish the image retrieval database;
  • the database establishing unit includes:
  • a registration image selection subunit for selecting a registration image for constructing the image retrieval database
  • the feature value registration subunit is configured to store the feature value in an image retrieval database and establish a correspondence between the feature value and the registered image.
  • the database establishing unit further includes:
  • the feature value filtering subunit is configured to filter the feature values calculated by the registered image feature value calculation subunit according to the distribution of the feature values
  • the feature value registration subunit is specifically configured to store the feature value after the feature value screening subunit is filtered in an image retrieval database, and establish a correspondence between the feature value and the registration image.
  • the feature value screening sub-unit is specifically configured to: select a feature value whose frequency is lower than a preset threshold in the registration image; and/or select a feature value according to a position distribution of the feature value in the registration image.
  • the registration image selection unit includes:
  • Registering an image primary selection sub-unit configured to select a registration image that satisfies a preset condition according to a matching result output by the feature value matching unit;
  • Rearranging a matching subunit for performing, for each registered image selected by the registration image primary subunit, the feature value of the image to be recognized and the registered image feature value extracted from the image retrieval database Matching two pairs, and recording the number of feature value pairs that satisfy the preset rearrangement matching condition;
  • a rearrangement screening subunit configured to sort the selected registration image according to the number of feature value pairs recorded by the rearrangement matching subunit, and select a top ranked registration image as the retrieval of the image to be identified result.
  • FIG. 10 is a schematic diagram of an embodiment of an image recognition system provided by the present application.
  • the same portions of the present embodiment as those of the previously provided embodiments will not be described again, and the differences will be mainly described below.
  • the image recognition system provided by the present application includes: an apparatus 1001 for acquiring image information and an image recognition apparatus 1002.
  • the device for acquiring image information may be deployed on a desktop computer or may be deployed on a mobile terminal device, but is not limited to the above-mentioned devices listed herein, and may be any device capable of implementing the method for acquiring image information provided by the present application.
  • the image recognition device is usually deployed on a server, and may be any device capable of implementing the image recognition method provided by the present application.
  • FIG. 11 is a flowchart of an embodiment of a method for calculating image feature values provided by the present application. The same portions of the respective embodiments will not be described again, and the differences will be mainly described below.
  • a method for calculating image feature values provided by the present application includes:
  • Step 1101 Extract local features of the image of the feature value to be calculated.
  • the depth self-coding network model may be pre-trained, the training process includes: selecting a sample image set; extracting local features of the sample image in the sample image set; using the local feature as an input, and depth
  • the coding network model minimizes the reconstruction error after encoding and decoding the input data, and performs iterative training until the depth self-coding network model converges.
  • this step may extract local features of the image of the feature value to be calculated by using a SIFT algorithm, using an LBP algorithm, or using a convolutional neural network.
  • Step 1102 Calculate the feature value of the local feature by using a pre-trained depth self-coding network model.
  • the feature value includes: a binarized feature value.
  • the above method for calculating the image feature value adopts the deep self-coding network model, not only the dimensionality reduction of the local features is achieved, but also the distance information and the discriminating ability between the feature values can be effectively maintained, thereby improving the image.
  • the accuracy of the search provides protection. Especially when the depth is deep
  • the eigenvalue output from the coded network model is a binarized feature value, it can also provide conditions for improving the scalability of the image retrieval database and improving the retrieval efficiency.
  • the eigenvalue is calculated by using the deep self-encoding network model, the requirements for storage space and computing performance are reduced, and the calculation process of the feature value of the image to be recognized can be completed on the mobile terminal device, which helps to reduce the work of the server.
  • the pressure in particular, when the eigenvalue outputted by the depth self-encoding network model is a binarized feature value, the amount of data uploaded by the mobile terminal device can be effectively reduced, the upload time is reduced, and the user experience is improved.
  • FIG. 12 is a schematic diagram of an apparatus embodiment for calculating image feature values according to the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. The device embodiments described below are merely illustrative.
  • An apparatus for calculating image feature values includes: a local feature extracting unit 1201 for extracting local features of the image of the feature value to be calculated; and a feature value calculating unit 1202 for adopting pre-trained depth self-encoding
  • the network model calculates feature values of the local features output by the local feature extraction unit.
  • the device includes: a model training unit, configured to pre-train the deep self-coding network model;
  • the model training unit includes:
  • a sample selection subunit for selecting a sample image set
  • a sample feature extraction subunit configured to extract local features of the sample image in the sample image set
  • the iterative training sub-unit is configured to take the local feature as an input, and to minimize the reconstruction error after encoding and decoding the input data by the depth self-coding network model, and perform iterative training until the deep self-coding network model converges.
  • the local feature extraction unit is specifically configured to extract a local feature of the image to be calculated using a SIFT algorithm, an LBP algorithm, or a convolutional neural network.
  • the present application also provides an electronic device, such as the following. Please refer to FIG. 13, which shows a schematic diagram of an embodiment of an electronic device of the present application.
  • the electronic device includes: a display 1301; a processor 1302; a memory 1303;
  • the memory 1303 is configured to store a program for acquiring image information, and when executed by the processor, the program performs the following operations: extracting local features of the image to be identified; using a pre-trained depth self-coding network model calculation Deriving a binarized feature value of the local feature; transmitting the binarized feature value to a server that provides an image recognition service; and receiving related information of the image to be recognized returned by the server.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media including both permanent and non-persistent, removable and non-removable media may be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Abstract

本申请公开了一种图像检索方法及装置,一种获取图像信息的方法及装置、一种图像识别方法及装置、一种图像识别系统、一种用于计算图像特征值的方法及装置、以及一种电子设备。其中,所述图像检索方法包括:提取待检索图像的局部特征;采用预先训练的深度自编码网络模型计算所述局部特征的特征值;将所述特征值与图像检索数据库中注册图像的特征值进行匹配;根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。采用上述方法,由于深度自编码网络在对局部特征进行压缩表示的过程中,可以有效保持特征值之间的距离信息和辨别能力,从而能够有效提升图像检索的精确度,减少重排过滤的工作量,提高检索效率。

Description

图像检索、获取图像信息及图像识别方法、装置及系统 技术领域
本申请涉及图像检索技术,具体涉及一种图像检索方法及装置。本申请同时提供一种获取图像信息的方法及装置、一种图像识别方法及装置、一种图像识别系统、一种用于计算图像特征值的方法及装置、以及一种电子设备。
背景技术
物体识别和视觉化搜索技术可以极大的缩短物理世界同数据世界的距离,帮助用户快速便捷的获取信息。在目前备受关注的互联网领域,对于通过摄像头拍照获取的、或者从互联网下载获取的待识别图像,采用图像检索技术可以从预先注册的图像中找到与之匹配的图像,并进一步获知待识别图像的相关信息(该过程通常也称为图像识别过程),例如:通过对图书封面图像的检索,可以获知图书的名称、作者等信息。
图像检索可以通过局部特征匹配实现,由于图像的局部特征通常含有较多的冗余和噪声,从存储和检索性能上考虑,为了满足实用需求,通常需要对局部特征采用更为精简、有效的表达方式。在目前常用的图像检索技术词包模型(Bag of word)中,通过“词”来表征图像的局部特征。
基于词包模型的图像检索技术包括为图像检索数据库建立索引、以及特征匹配两个流程。在建立索引阶段,通常随机采集一部分图像的局部特征作为训练样本,利用Kmeans聚类算法对训练样本进行聚类,将聚类中心作为“词”,然后,针对图像检索数据库中注册图像的局部特征,查找与其欧氏距离最近的“词”,并用该词的索引作为该局部特征的量化表示,并在此基础上建立倒排表的索引结构;在特征匹配(检索)阶段,查找与待识别图像局部特征的欧式距离最近的“词”,并用该词的索引查找对应的注册图像,最后采用统计投票的方式,得到与待识别图像对应的检索结果。
通过上面的描述可以看出,由于词包模型在建立索引和特征匹配阶段,都是通过“词”来表征图像的局部特征,而不同的图像特征可能对应于同一个“词”, 即,词索引之间的距离并不能表示特征的真实距离,例如,三个特征量化以后的索引分别为1、5、100,而1和5对应的局部特征并不一定比100对应的局部特征更相似。基于上述原因,现有的图像检索技术实现图像之间匹配的精确低,可能会出现大量的误匹配,从而进一步导致需要对大量图像和误匹配特征对进行过滤和重排,影响检索性能。
发明内容
本申请实施例提供一种图像检索方法和装置,相对于现有图像检索技术实现匹配的精度低的问题,提供了一种提高图像检索匹配精度的方案。本申请实施例还提供一种获取图像信息的方法及装置、一种图像识别方法及装置、一种图像识别系统、一种用于计算图像特征值的方法及装置、以及一种电子设备。
本申请提供一种图像检索方法,包括:
提取待检索图像的局部特征;
采用预先训练的深度自编码网络模型计算所述局部特征的特征值;
将所述特征值与图像检索数据库中注册图像的特征值进行匹配;
根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。
可选的,所述特征值包括二值化特征值。
可选的,所述将所述特征值与图像检索数据库中注册图像的特征值进行匹配包括:采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
可选的,所述采用基于汉明距离的方式进行匹配包括:
采用计算汉明距离的线性查询方式进行匹配;或者,
采用以二值化特征值为索引查询哈希表的方式进行匹配。
可选的,预先训练所述深度自编码网络模型,包括:
选择样本图像集;
提取所述样本图像集中的样本图像的局部特征;
以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
可选的,所述图像检索数据库是采用如下步骤预先建立的:
选取用于构建所述图像检索数据库的注册图像;
提取所述注册图像的局部特征;
利用所述深度自编码网络模型计算所述局部特征的特征值;
将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,在所述提取所述注册图像的局部特征之前,执行下述操作:
按照预设的方式对所述注册图像的尺寸进行归一化。
可选的,在利用所述深度自编码网络模型计算所述局部特征的特征值之后,执行下述步骤:
按照计算得到的特征值的分布,对特征值进行筛选;
所述将所述特征值存储在图像检索数据库中包括:将筛选后的特征值存储在图像检索数据库中。
可选的,所述按照计算得到的特征值的分布,对特征值进行筛选包括:
选择在注册图像中出现频率低于预设阈值的特征值;和/或,
按照特征值在注册图像中的位置分布选择特征值。
可选的,在所述提取待检索图像的局部特征之后,执行下述操作:
计算所述局部特征对应的关键点到所述待检索图像中心的距离;
剔除所述距离大于预设阈值的关键点对应的局部特征;
所述采用预先训练的深度自编码网络模型计算所述局部特征的特征值包括:采用所述深度自编码网络模型计算执行上述剔除操作后的局部特征的特征值。
可选的,所述将所述特征值与图像检索数据库中注册图像的特征值进行匹配包括:
将所述特征值与图像检索数据库中注册图像的特征值,采用如下方式逐一进行两两匹配:计算表征待匹配特征值对的差异程度的指标值,并当所述指标值小于预先设定的阈值时判定所述特征值对匹配成功。
可选的,所述满足预设条件的注册图像包括:
按照特征值匹配成功的个数从大到小排序靠前的注册图像;或者,
特征值匹配成功的个数大于预设阈值的注册图像;或者,
按照特征值匹配成功所得累计分值从大到小排序靠前的注册图像;或者,
特征值匹配成功所得累计分值大于预设阈值的注册图像。
可选的,在所述根据匹配结果选择满足预设条件的注册图像后,执行下述重排操作:
针对每个所选注册图像,将所述待检索图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
根据满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像;
所述根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果包括:将执行上述重排操作后所选注册图像作为所述待检索图像的检索结果。
可选的,在所述记录满足预设重排匹配条件的特征值对的个数后,执行下述操作:
通过利用变换模型进行空间关系一致性校验,从所述满足预设重排匹配条件的特征值对中剔除误匹配的特征值对;
所述根据满足预设重排序匹配条件的特征值对的个数对所选注册图像进行排序包括:根据执行上述剔除操作后的、满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序。
可选的,通过如下方式提取图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
可选的,所述提取待检索图像的局部特征、以及所述采用预先训练的深度自编码网络模型计算所述局部特征的特征值的步骤在客户端设备上执行;
所述将所述特征值与图像检索数据库中图像的特征值进行匹配、以及所述根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果的步骤在服务端设备上执行。
相应的,本申请还提供一种图像检索装置,包括:
局部特征提取单元,用于提取待检索图像的局部特征;
特征值计算单元,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值;
特征值匹配单元,用于将所述特征值计算单元输出的特征值与图像检索数据库中注册图像的特征值进行匹配;
检索结果生成单元,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。
可选的,当所述特征值为二值化特征值时,所述特征值匹配单元具体用于,采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
可选的,所述装置包括:模型训练单元,用于预先训练所述深度自编码网络模型;
所述模型训练单元包括:
样本选择子单元,用于选择样本图像集;
样本特征提取子单元,用于提取所述样本图像集中的样本图像的局部特征;
迭代训练子单元,用于以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
可选的,所述装置包括:数据库建立单元,用于预先建立所述图像检索数据库;
所述数据库建立单元包括:
注册图像选择子单元,用于选取用于构建所述图像检索数据库的注册图像;
注册图像特征提取子单元,用于提取所述注册图像的局部特征;
注册图像特征值计算子单元,用于利用所述深度自编码网络模型计算所述局部特征的特征值;
特征值注册子单元,用于将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,所述数据库建立单元还包括:
特征值筛选子单元,用于根据特征值的分布,对所述注册图像特征值计算子单元计算得到的特征值进行筛选;
所述特征值注册子单元具体用于,将所述特征值筛选子单元筛选后的特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,所述装置包括:
距离计算单元,用于计算所述局部特征对应的关键点到所述待检索图像中心的距离;
局部特征剔除单元,用于剔除所述距离计算单元计算得到的距离大于预设阈值的关键点对应的局部特征;
所述特征值计算单元具体用于,采用所述深度自编码网络模型计算由所述局部特征剔除单元执行剔除操作后的局部特征的特征值。
可选的,所述检索结果生成单元包括:
注册图像初选子单元,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像;
重排匹配子单元,用于针对所述注册图像初选子单元所选的每个注册图像,将所述待检索图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
重排筛选子单元,用于根据所述重排匹配子单元记录的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像,作为所述待检索图像的检索结果。
可选的,所述检索结果生成单元还包括:
空间一致性校验子单元,用于通过利用变换模型进行空间关系一致性校验,从所述重排匹配子单元得到的特征值对中剔除误匹配的特征值对;
所述重排筛选子单元具体用于,根据所述空间一致性校验子单元执行剔除操作后的、满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序。
可选的,所述局部特征提取单元以及所述特征值计算单元部署于客户端设备上;
所述特征值匹配单元以及所述检索结果生成单元部署于服务端设备上。
此外,本申请还提供一种获取图像信息的方法,包括:
提取待识别图像的局部特征;
采用预先训练的深度自编码网络模型计算所述局部特征的特征值;
将所述特征值发送给提供图像识别服务的服务端;
接收所述服务端返回的所述待识别图像的相关信息。
可选的,所述特征值包括二值化特征值。
可选的,在所述提取待识别图像的局部特征之后,执行下述操作:
计算所述局部特征对应的关键点到所述待识别图像中心的距离;
剔除所述距离大于预设阈值的关键点对应的局部特征;
所述采用预先训练的深度自编码网络模型计算所述局部特征的特征值包括:采用所述深度自编码网络模型计算执行上述剔除操作后的局部特征的特征值。
可选的,通过如下方式提取待识别图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
可选的,所述方法在移动终端设备上实施。
相应的,本申请还提供一种获取图像信息的装置,包括:
局部特征提取单元,用于提取待识别图像的局部特征;
特征值计算单元,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值;
特征值发送单元,用于将所述特征值计算单元输出的特征值发送给提供图像识别服务的服务端;
图像信息接收单元,用于接收所述服务端返回的所述待识别图像的相关信息。
此外,本申请还提供一种图像识别方法,包括:
接收客户端上传的待识别图像的特征值,所述特征值是以所述待识别图像的局部特征为输入,利用预先训练的深度自编码网络模型计算得到的;
将所述特征值与图像检索数据库中注册图像的特征值进行匹配;
根据匹配结果选择满足预设条件的注册图像;
获取与所选注册图像对应的注册信息,并返回给所述客户端。
可选的,所述特征值包括:二值化特征值。
可选的,所述将所述特征值与图像检索数据库中注册图像的特征值进行匹配包括:采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特 征值对作为匹配成功的特征值对。
可选的,所述图像检索数据库是采用如下步骤预先建立的:
选取用于构建所述图像检索数据库的注册图像;
提取所述注册图像的局部特征;
利用所述深度自编码网络模型计算所述局部特征的特征值;
将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,在利用所述深度自编码网络模型计算所述局部特征的特征值之后,执行下述步骤:
按照计算得到的特征值的分布,对特征值进行筛选;
所述将所述特征值存储在图像检索数据库中包括:将筛选后的特征值存储在图像检索数据库中。
可选的,所述按照计算得到的特征值的分布,对特征值进行筛选包括:
选择在注册图像中出现频率低于预设阈值的特征值;和/或,
按照特征值在注册图像中的位置分布选择特征值。
可选的,在所述根据匹配结果选择满足预设条件的注册图像后,执行下述重排操作:
针对每个所选注册图像,将所述待识别图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
根据满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像;
所述根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果包括:将执行上述重排操作后所选注册图像作为所述待检索图像的检索结果。
相应的,本申请还提供一种图像识别装置,包括:
特征值接收单元,用于接收客户端上传的待识别图像的特征值,所述特征值是以所述待识别图像的局部特征为输入,利用预先训练的深度自编码网络模型计算得到的;
特征值匹配单元,用于将所述特征值接收单元接收到的特征值与图像检索数据库中注册图像的特征值进行匹配;
注册图像选择单元,用于根据匹配结果选择满足预设条件的注册图像;
图像信息发送单元,用于获取与所选注册图像对应的注册信息,并返回给所述客户端。
此外,本申请还提供一种图像识别系统,包括:根据上述任意一项所述的获取图像信息的装置,以及根据上述任意一项所述的图像识别装置。
此外,本申请还提供一种用于计算图像特征值的方法,包括:
提取待计算特征值图像的局部特征;
采用预先训练的深度自编码网络模型计算所述局部特征的特征值。
可选的,所述特征值包括:二值化特征值。
可选的,预先训练所述深度自编码网络模型,包括:
选择样本图像集;
提取所述样本图像集中的样本图像的局部特征;
以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
可选的,通过如下方式提取待计算特征值图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
相应的,本申请还提供一种用于计算图像特征值的装置,包括:
局部特征提取单元,用于提取待计算特征值图像的局部特征;
特征值计算单元,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值。
此外,本申请还提供一种电子设备,包括:
显示器;
处理器;
存储器,用于存储获取图像信息的程序,所述程序在被所述处理器读取执行时,执行如下操作:提取待识别图像的局部特征;采用预先训练的深度自编码网络模型计算所述局部特征的二值化特征值;将所述二值化特征值发送给提供图像识别服务的服务端;接收所述服务端返回的关于所述待识别图像的相关 信息。
与现有技术相比,本申请具有以下优点:
本申请提供的技术方案,通过提取待检索图像的局部特征,并采用预先训练的深度自编码网络模型计算所述局部特征的特征值,然后将所述特征值与图像检索数据库中注册图像的特征值进行匹配,根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。本申请提供的上述方法,将图像的局部特征和深度自编码网络模型结合起来,由于深度自编码网络在对局部特征进行压缩表示的过程中,可以有效保持特征值之间的距离信息和辨别能力,从而能够有效提升图像检索的精确度,减少重排过滤的工作量,提高检索效率。
而且与词包模型相比较,由于将待检索图像的局部特征转换为“词”的过程是一个百万级的最近邻问题,无法在普通的移动终端设备上实现;而本申请的技术方案由于采用深度自编码网络模型计算图像局部特征的特征值,通常只需要执行几个矩阵乘法,对储存空间的要求小,可以在移动终端设备上实现,从而使得移动终端直接向服务端上传图像特征值成为可能,可以减轻服务端的工作压力。
进一步地,深度自编码网络模型输出的特征值可以为量化的二值化特征值,从而实现图像特征的进一步压缩表示,例如,可以压缩为只有几K大小的二值码序列。一方面,可以将图像检索数据库扩展到百万级甚至亿级的规模,而且可以方便的利用Hash等技术加速检索过程;另一方面,能够有效减少客户端向服务端上传的数据量,节省对网络带宽的占用、以及减少数据传输时间,从而可以在移动终端设备上实现直接对图像特征进行量化、并向服务器上传量化特征数据的功能。
附图说明
图1是本申请提供的一种图像检索方法的实施例的流程图;
图2是本申请实施例提供的深度自编码网络的示意图;
图3是本申请实施例提供的构建图像检索数据库的处理流程图;
图4是本申请实施例提供的根据匹配结果选择注册图像的处理流程图;
图5是本申请提供的一种图像检索装置的实施例的示意图;
图6是本申请提供的一种获取图像信息的方法实施例的流程图;
图7是本申请提供的一种获取图像信息的装置实施例的示意图;
图8是本申请提供的一种图像识别方法的实施例的流程图;
图9是本申请提供的一种图像识别装置的实施例的示意图;
图10是本申请提供的一种图像识别系统的实施例的示意图;
图11是本申请提供的一种用于计算图像特征值的方法实施例的流程图;
图12是本申请提供的一种用于计算图像特征值的装置实施例的示意图;
图13是本申请提供的一种电子设备的实施例的示意图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是,本申请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此,本申请不受下面公开的具体实施的限制。
在本申请中,分别提供了一种图像检索方法及装置、一种获取图像信息的方法及装置、一种图像识别方法及装置、一种图像识别系统、一种用于计算图像特征值的方法及装置、以及一种电子设备,在下面的实施例中逐一进行详细说明。
本申请的技术方案通过将局部特征与深度自编码网络相结合,提高了图像检索的精确度。在执行图像检索操作之前,可以预先训练深度自编码网络模型,并利用训练好的深度自编码网络模型构建图像检索数据库,下面对这两部依次进行说明。
1)训练深度自编码网络模型。
深度自编码网络是一种深度神经网络,请参看图2,其为本实施例提供的深度自编码网络的示意图,该网络由5层神经元组成,包含多个隐藏层,由于居于最中间的隐藏层,即本图中的第3层(通常称为编码层)的神经元个数少于输入层神经元个数,因此编码层的输出通常是对输入数据的压缩表示。其中第一层是输入层,第2层以及第3层各自输出的是输入信号的一种表示(也称为 编码过程),第4层和第5层用于对输入数据进行重建(也称为解码过程),深度自编码网络的训练过程,就是以对输入数据进行逐层编解码后的重建误差最小为目标,采用梯度下降等算法通过迭代方式调整各层参数的过程。
具体到本实施例中,采用如下方式训练深度自编码网络模型:随机选取样本图像集,提取所述样本图像集中的样本图像的局部特征,以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛,则所述深度自编码网络模型训练完毕,该模型编码层的输出就是对输入局部特征进行压缩后的特征值,例如输入是128维的实数向量,输出为16维的实数向量。在具体实施时,对于训练好的深度自编码网络模型,通过执行几个矩阵乘法即可在编码层得到输入特征的压缩表示,例如在图2所示的深度自编码网络中,上述压缩过程可以通过两个矩阵乘法实现。
由于采用了深度自编码网络模型,在去除原始特征冗余信息达到降维目的的同时,编码层输出的特征值通常可以良好地代表输入特征,即,能够以少量的存储形式来表示原本较复杂的但信息冗余较大的表示形式,并且有效保持特征值之间的距离信息和辨别能力,从而为提高检索精度提供保障。
在本实施例的一个具体例子中,随机选取了60万个图像,并从每个图像中提取200个局部特征,然后以200x60万=12000万个局部特征作为输入进行迭代训练,最终得到训练好的深度自编码网络模型。
优选地,为了实现进一步的数据压缩和量化表示,可以在上述训练过程中加入约束条件,从而将编码层的输出限定为二值化数据。例如,在前向传播过程中对编码层输出的取值范围在[0,1]区间的实数值进行四舍五入,从而得到二进制的表达,作为下一层的输入,在反向传播采用梯度下降法调整参数的过程中,所采用的编码层输出数据仍然是没有经过量化的实数结果。采用上述方式经过多次迭代,网络收敛以后,即得到训练好的深度自编码网络模型,在后续使用该模型计算局部特征的特征值时,编码层在输出结果时进行四舍五入得到由0、1组成的二进制序列,即本申请所述的二值化特征值,也称为量化二值码或者量化后的特征值。
需要说明的是,图2是本实施例给出的深度自编码网络的示意图,该图中示出的网络层数以及神经元个数都仅仅是示意性的,在具体实施中,可以根据需要设置并调整深度自编码网络的层数、以及每一层的神经元的数目等,本申 请对此并不作具体的限定。
2)构建图像检索数据库。
所述图像检索数据库也称为特征数据库,用于存储大量图像特征,从而在进行图像检索时,可以通过将待检索图像与该库中的图像特征的匹配,找到与待检索图像相匹配的注册图像。在本申请技术方案中,可以采用预先训练好的深度自编码网络模型构建所述图像检索数据库。具体实现可以包括以下步骤101-1至101-4,下面结合附图3进行说明。
步骤101-1、选取用于构建所述图像检索数据库的注册图像。
具体实施时,可以从互联网、各种资源服务器、各种应用对外提供的图像素材中获取所述图像,也可以通过拍照等方式获取所述图像。之所以将这些图像称为注册图像,是指所述图像的特征值将被存储在图像检索数据库中以供检索匹配,而所述图像自身可能成为与待检索图像相匹配的图像。
步骤101-2、提取所述注册图像的局部特征。
为了提高图像检索的精确度,在提取所述注册图像的局部特征之前,可以按照预设的方式对注册图像的尺寸进行归一化,例如可以通过对注册图像的尺寸进行等比例缩放,使所有注册图像达到长度值为300像素的统一规格。
针对每一个注册图像,提取其局部特征,具体可以采用SIFT(Scale-Invariant feature Transform—尺度不变特征转换)算法、LBP(Local Binary Patterns—局部二值模式)算法或者是利用卷积神经网络提取。例如,针对每一个注册图像可以采用SIFT算法提取200个128维的特征向量。
步骤101-3、利用所述深度自编码网络模型计算所述局部特征的特征值。
利用预先训练好的深度自编码网络模型,以步骤101-2获取的每个局部特征作为输入,计算所述局部特征的特征值,深度自编码网络模型的编码层的输出即为所述局部特征的特征值,例如,可以是降维后的实数向量,如果在训练深度自编码网络模型时加入了与二值化相关的限制条件,那么本步骤得到的就是每个局部特征的二值化特征值。
由于注册图像的数量通常比较多,经常是百万、千万甚至更多,因此本步骤计算得到的特征值相应也很多,为了提高检索效率,可以按照计算得到的特征值的分布,对特征值进行筛选,并且仅对筛选出来的特征值进行注册(即存储在图像检索数据库中以供匹配检索)。筛选的方式可以有多种,此处列举两种, 这两种方式可以各自独立使用也可以结合使用:
1)选择在注册图像中出现频率低于预设阈值的特征值。具体实施时,可以统计特征值在全部或者选定部分注册图像中出现的频率,通常出现频率越低的特征值,所携带的信息量越大,区别于其他图像的能力越强,因此可以筛选出现频率低于预设阈值的特征值。
2)按照特征值在注册图像中的位置分布选择特征值。为了减少注册特征值的数量,也可以将注册图像均匀划分为一系列的图像块,然后在每个图像块中选择预定数量的特征值作为代表。
步骤101-4、将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
通过步骤101-3计算(以及筛选过程)后,每个注册图像都与多个特征值相对应,本步骤将所述特征值存储在图像检索数据库中,并建立特征值与注册图像的对应关系,该过程也称为特征值的注册过程。
如果预先训练的深度自编码网络模型的输出为二值化特征值,那么在构建图像检索数据库时,可以有效减少每个注册图像的注册特征值所需的存储空间,从而可以将图像检索数据库扩展到百万级甚至亿级的规模;此外还可以将二值化特征值转换为索引,并构建Hash表,在每个注册的二值化特征值对应的Hash表项中可以记录相应的注册图像标识,从而在特征匹配阶段能够采用Hash技术进行快速检索。
至此,描述了深度自编码网络的训练过程以及图像检索数据库的构建过程。在此基础上就可以进行图像检索。
请参考图1,其为本申请的一种图像检索方法的实施例的流程图。所述方法包括如下步骤:
步骤101、提取待检索图像的局部特征。
对于待检索图像,通常先提取局部特征,之所以提取局部特征,是因为在利用局部特征对图像描述时,每幅图像被分解成很多部分,每个部分对应一个特征向量,采用这样的局部特征可以有效的利用图像的结构信息进行图像匹配识别,而且能在一定程度上对角度变化、尺度变化具备较为稳定的特征匹配能力。具体实施时,可以采用SIFT算法、LBP算法或者卷积神经网络等方式提取图像的局部特征,即,得到能够表征所述图像主要特征的特征向量。
需要说明的是,在训练深度自编码网络模型、构建图像检索数据库以及本步骤提取待检索图像的局部特征时,应该采用相同的局部特征提取方法,例如,都采用SIFT算法,这样才能有效实施本申请提供的图像检索方法。
步骤102、采用预先训练的深度自编码网络模型计算所述局部特征的特征值。
因为位置、尺度、滤波器参数等变化,在提取局部特征的过程中往往把待检索图像转化为维度比较高的特征空间,而且通常会得到数量比较多的特征向量。为了提高检索效率,可以在计算局部特征的特征值之前,从局部特征空间中选择最能表征该图像的特征子集。
考虑到图像的主要信息通常分布在图像中心附近,本实施例提供一种优选实施方式:以到待检索图像中心的距离远近作为局部特征选择的一个标准。具体实施时,由于每个局部特征通常与待检索图像中的一个关键点相对应,而每个关键点都有其在待检索图像中的坐标值,因此可以计算与局部特征对应的关键点到所述待检索图像中心的距离,如果所述距离大于预设阈值,说明关键点远离图像中心,可以剔除该关键点对应的局部特征,否则保留该关键点对应的局部特征。采用上述方式可以减少待检索图像的局部特征数量,提高检索效率。
在步骤101提取待检索图像的局部特征(以及执行上述剔除操作)后,可以用每个局部特征作为输入,利用预先训练好的深度自编码神经网络模型计算每个局部特征的特征值。深度自编码网络模型输出的特征值是输入局部特征的压缩表示,例如,可以是降维后的实数向量;如果在训练深度自编码网络模型时加入了与二值化相关的限制条件,那么本步骤得到的就是每个局部特征的量化二值码。
步骤103、将所述特征值与图像检索数据库中注册图像的特征值进行匹配。
待检索图像通过深度自编码网络模型转换为特征值后,本步骤可以将所述特征值与图像检索数据库中的注册特征值逐一进行两两匹配,对于进行匹配的每个特征值对,可以计算表征该特征值对差异程度的指标值,并当所述指标值小于预先设定的阈值时判定所述特征值对匹配成功。
在具体实施时,所述表征特征值对差异程度的指标可以为特征值对之间的欧氏距离。进一步地,如果所述深度自编码网络模型输出的特征值为二值化特征值,那么所述表征特征值对差异程度的指标也可以为特征值对之间的汉明距 离。
对于基于汉明距离的特征值对的匹配,在具体实施时,可以采用计算汉明距离的线性查询方式、或者采用Hash技术进行匹配,这两种方式都能够有效提高检索效率。下面对这两种方式作进一步说明。
1)采用计算汉明距离的线性查询方式进行匹配。
所述汉明距离通常是指,两个相同长度字符串对应位不同的数量,对于由0、1组成的量化二值码表示的特征值对,可以通过异或运算并统计结果为1的个数,从而得到所述汉明距离,例如,1011101与1001001的汉明距离为2。
在具体实施时,考虑到图像特征表达的复杂性,可以结合具体应用需求,为基于汉明距离的线性查询匹配过程预先设定阈值,然后对待检索图像的特征值与图像检索数据库中注册图像的特征值逐一进行两两匹配,如果当前进行匹配的特征值对的汉明距离小于所述阈值则认为所述特征值对匹配成功。例如,在本实施例的一个具体例子中,量化二值码的长度为62bit,预设的汉明距离阈值为4,如果进行匹配的特征值对的汉明距离小于等于3,即汉明距离属于[0,3]区间内,则可以判定所述特征值对匹配成功。
2)采用以二值化特征值为索引查询哈希表的方式进行匹配。
如果采用Hash技术进行匹配,通常在构建图像检索数据库时,已经建立了以二值化特征值为索引的Hash表,因此在本步骤中,可以用二值化特征值为索引,直接在Hash表中进行查询。
在具体实施时,与上述基于汉明距离的线性查询方式类似,可以预先设定汉明距离的阈值,然后生成所有与待检索图像的二值化特征值(以下简称待检索二值码)的汉明距离小于所述阈值的二值码,并分别将这些二值码转换为索引,直接在Hash表中进行查询,如果与某索引对应的Hash表项中记录了注册图像标识,则通常说明找到了与所述待检索二值码相匹配的注册特征值。
对于找到的匹配成功的特征值对,可以为对应的注册图像记录匹配结果。例如,可以记录所述注册图像的特征值匹配成功的数目,每当所述注册图像的某一注册特征值匹配成功,则将所述数目加一;也可以记录所述注册图像的匹配分值,每当所述注册图像的某一注册特征值匹配成功,则累加所述分值,具体实施时,还可以细化分值累加策略,可以根据特征值对的匹配程度累加不同的分值,例如对于匹配成功的二值化特征值对,如果汉明距离为0或者1,则可 以累加预设的较高分值,如果汉明距离为2或者3,则可以累加预设的较低分值。上述列举了一些记录匹配结果的方式以及策略,在具体实施时,可以根据具体需求采取所需的方式。
步骤104、根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。
本步骤根据步骤103执行匹配操作的结果,选择满足预设条件的注册图像作为所述待检索图像的检索结果。优选地,为了进一步提高图像检索的精确度,本实施例还提供对满足预设条件的注册图像进行重排的优选实施方式。整个处理过程包括步骤104-1至104-3,下面结合附图4进行说明。
步骤104-1、根据匹配结果选择满足预设条件的注册图像。
如果在步骤103中针对注册图像记录的是特征值匹配成功的个数,那么本步骤可以选择满足如下条件的注册图像:按照特征值匹配成功的个数从大到小排序靠前的注册图像,或者,特征值匹配成功的个数大于预设阈值的注册图像;如果在步骤103中针对注册图像记录的是累积分值,那么本步骤可以选择满足如下条件的注册图像:按照特征值匹配成功所得累计分值从大到小排序靠前的注册图像,或者,特征值匹配成功所得累计分值大于预设阈值的注册图像。
步骤104-2、针对每个所选注册图像,将所述待检索图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数。
由于采用深度自编码网络模型得到的特征值能够保持图像的原始特征距离信息,为了进一步提高图像检索的精确度,本步骤对待检索图像与步骤104-1中选取的注册图像进行图与图之间的一对一匹配,从而实现对所选注册图像的重排筛选。
具体实施时,可以采用如下方式进行待检索图像与注册图像之间的一对一匹配:针对待检索图像的某一特征值,计算表征该特征值与所述注册图像的每个注册特征值之间的差异程度的指标值,判断上述指标值是否满足预先设定的重排匹配条件:将所述指标值按照从小到大的顺序排序后,如果第一位的指标值小于预先设定的第一阈值,并且第二位与第一位的指标值的差异大于预先设定的第二阈值,则可以认为待检索图像的该特征值与第一位对应的注册特征值是满足预设重排匹配条件的特征值对,针对所述注册图像累计满足预设重排匹 配条件的特征值对的个数;然后采用上述方式,依次匹配待检索图像的其他各特征值,直至将待检索图像的特征值处理完毕。
在上述处理过程中,所述第二位与第一位的指标值的差异可以用求解两者差值或者比值的方式计算;上面列举了一种具体的重排匹配条件,在具体实施中也可以预先设定其他的重排匹配条件,只要能够对步骤104-1所选注册图像进行重排筛选,提高图像检索精确度就都是可以的。
针对待检索图像与步骤104-1所选的每个注册图像,依次执行上述的一对一匹配过程,从而可以得到关于每个所选注册图像的、满足预设重排匹配条件的特征值对的个数。
优选地,本步骤还可以在上述处理的基础上,利用空间关系一致性校验,对每个所选注册图像的、满足预设重排匹配条件的特征值对进行筛选,从而进一步提高图像检索精度。由于匹配成功的特征值对中可能存在因为噪声等原因引起的误匹配,而对于两幅同源图像之间的相互对应的特征值对,是可以通过一个变换模型相互变换得到的,本实施例正是利用这一特点消除误匹配对(也称噪声匹配对)。
具体实施时,可以针对步骤104-1所选的每个注册图像,执行下述操作:从满足预设重排匹配条件的特征值对(以下简称匹配特征值对)中任意选择3对或者4对特征值对,采用RANSAC算法估算出变换模型,通过循环选择不同的特征值对,可以估算出不同的变换模型,并从这些变换模型中选择与所有匹配特征值对拟合程度最好的变换模型(也称变换矩阵),作为所述待检索图像与所述注册图像之间的变换模型,然后依次判断每个匹配特征值对对所述变换模型的拟合程度,对于拟合程度不满足预设条件的匹配特征值对可以认为是噪声匹配对,因此将这样的特征值对从所述满足预设重排匹配条件的特征值对中剔除。
步骤104-3、根据满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像作为所述待检索图像的检索结果。
对步骤104-1所选的各注册图像,根据在步骤104-2中得到的未被剔除的、满足预设重排匹配条件的特征值对的个数,按照从大到小的顺序再次进行排序,并从中选择排序靠前的注册图像作为所述待检索图像的检索结果。由于对步骤104-1选取的注册图像与待检索图像再次进行了一对一匹配,并且剔除了误匹配的特征值对,因此经过上述重排操作后得到的检索结果通常会更为精确。
需要说明的是,上述一对一匹配的重排过程,以及利用变换模型剔除噪声匹配对的机制是本实施例提供的优选实施方式,在其他实施方式中,也可以不采用这两种实施方式,或者仅采用其中之一,同样可以实现本申请的技术方案。
至此,通过上述步骤101至步骤104完成了图像检索过程,获取了待检索图像的检索结果。需要说明的是,本实施例提供的技术方案既可以在单一的设备上实施,也可以在基于C/S(客户端/服务器)架构的系统中实施。对于第二种方式,本实施例中描述的步骤101和102,即提取待检索图像的局部特征以及计算特征值的步骤可以在客户端设备上执行,并且由客户端设备将计算得到的特征值发送给服务端设备,而步骤103和步骤104,即进行特征值匹配以及选择检索结果的步骤可以由服务端设备在接收到所述特征值后执行。
通过上面的描述可以看出,本申请提供的图像检索方法,将图像的局部特征和深度自编码网络模型结合起来,由于深度自编码网络模型在对局部特征进行压缩表示的过程中,可以有效保持特征值之间的距离信息和辨别能力,从而能够有效提升图像检索的精确度,减少重排过滤的工作量,提高检索效率。
在上述的实施例中,提供了一种图像检索方法,与之相对应的,本申请还提供一种图像检索装置。请参看图5,其为本申请的一种图像检索装置的实施例示意图。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本实施例的一种图像检索装置,包括:局部特征提取单元501,用于提取待检索图像的局部特征;特征值计算单元502,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值;特征值匹配单元503,用于将所述特征值计算单元输出的特征值与图像检索数据库中注册图像的特征值进行匹配;检索结果生成单元504,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。
可选的,当所述特征值为二值化特征值时,所述特征值匹配单元具体用于,采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
可选的,所述装置包括:模型训练单元,用于预先训练所述深度自编码网络模型;
所述模型训练单元包括:
样本选择子单元,用于选择样本图像集;
样本特征提取子单元,用于提取所述样本图像集中的样本图像的局部特征;
迭代训练子单元,用于以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
可选的,所述装置包括:数据库建立单元,用于预先建立所述图像检索数据库;
所述数据库建立单元包括:
注册图像选择子单元,用于选取用于构建所述图像检索数据库的注册图像;
注册图像特征提取子单元,用于提取所述注册图像的局部特征;
注册图像特征值计算子单元,用于利用所述深度自编码网络模型计算所述局部特征的特征值;
特征值注册子单元,用于将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,所述数据库建立单元还包括:
特征值筛选子单元,用于根据特征值的分布,对所述注册图像特征值计算子单元计算得到的特征值进行筛选;
所述特征值注册子单元具体用于,将所述特征值筛选子单元筛选后的特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,所述特征值筛选子单元具体用于,选择在注册图像中出现频率低于预设阈值的特征值;和/或,按照特征值在注册图像中的位置分布选择特征值。
可选的,所述装置包括:
距离计算单元,用于计算所述局部特征对应的关键点到所述待检索图像中心的距离;
局部特征剔除单元,用于剔除所述距离计算单元计算得到的距离大于预设阈值的关键点对应的局部特征;
所述特征值计算单元具体用于,采用所述深度自编码网络模型计算由所述局部特征剔除单元执行剔除操作后的局部特征的特征值。
可选的,所述检索结果生成单元包括:
注册图像初选子单元,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像;
重排匹配子单元,用于针对所述注册图像初选子单元所选的每个注册图像,将所述待检索图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
重排筛选子单元,用于根据所述重排匹配子单元记录的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像,作为所述待检索图像的检索结果。
可选的,所述检索结果生成单元还包括:
空间一致性校验子单元,用于通过利用变换模型进行空间关系一致性校验,从所述重排匹配子单元得到的特征值对中剔除误匹配的特征值对;
所述重排筛选子单元具体用于,根据所述空间一致性校验子单元执行剔除操作后的、满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序。
可选的,所述局部特征提取单元以及所述特征值计算单元部署于客户端设备上;
所述特征值批匹配单元以及所述检索结果生成单元部署于服务端设备上。
此外,本申请还提供一种获取图像信息的方法,请参考图6,其为本申请提供的一种获取图像信息的方法的实施例的流程图,本实施例与之前提供的各实施例内容相同的部分不再赘述,下面重点描述不同之处。本申请提供的一种获取图像信息的方法包括:
步骤601、提取待识别图像的局部特征。
所述待识别图像可以包括:CD、图书或者海报等封面图像,本步骤提取所述待识别图像的局部特征。
具体实施时,可以通过如下方式提取待识别图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
步骤602、采用预先训练的深度自编码网络模型计算所述局部特征的特征值。
在采用预先训练的深度自编码网络模型计算所述局部特征的特征值之前, 可以先计算所述局部特征对应的关键点到所述待识别图像中心的距离,并剔除所述距离大于预设阈值的关键点对应的局部特征,然后再采用所述深度自编码网络模型计算执行上述剔除操作后的局部特征的特征值。
通过在训练所述深度自编码网络模型的过程中添加关于二值化的约束条件,所述深度自编码网络模型计算输出的特征值可以为二值化特征值,从而实现进一步的特征压缩和量化表示。
步骤603、将所述特征值发送给提供图像识别服务的服务端。
步骤604、接收所述服务端返回的所述待识别图像的相关信息。
所述服务端可以采用本申请提供的图像检索方法找到与所述待识别图像相匹配的注册图像、并返回对应的注册信息,本步骤就会接收到所述信息,例如,待识别图像是图书封面图像,那么本步骤可以接收到以下信息:书名、作者姓名、价格、书评、在线购买网址等。
下面对本申请提供的获取图像信息的方法与现有基于词包模型的图像检索技术进行比较。现有基于词包模型的图像检索技术中,一方面由于作为聚类中心的“词”(也称中心特征向量)的数量通常在百万级,需要非常大的存储空间进行存储,另一方面,将图像的局部特征转换为“词”的过程是一个百万级的最近邻问题,上述对存储空间的要求以及对性能的要求,导致计算待识别图像特征值的过程无法在普通的移动终端设备上实现,要进行图像识别的移动终端设备只能向服务端上传待识别图像或者待识别图像的压缩版本,前者由于上传流量大导致速度慢,后者不仅会引入额外的编解码时间,而且由于信息损失会导致图像识别结果不准确。
而本申请提供的获取图像信息的方法,由于采用深度自编码网络模型计算待识别图像局部特征的特征值,通常只需要执行几个矩阵乘法,对存储空间和计算性能的要求都相对较低,可以在移动终端设备上实现,从而使得移动终端设备直接向服务端上传待识别图像的特征值成为可能,可以减轻服务端的工作压力。
进一步地,深度自编码网络模型输出的特征值可以为量化的二值化特征值,从而实现待识别图像特征的进一步压缩表示,能够有效减少客户端向服务端上传的数据量,节省对网络带宽的占用、以及减少数据传输时间。在本实施例的一个具体例子中,所述深度自编码网络模型输出的二值化特征值的长度为62bit, 对于包含200个局部特征的待识别图像来说,通过所述深度自编码网络模型的压缩量化,该图像可以用大小为200x 62bit=12400bit=1550Byte,即大约1.5KB的二值码序列来表征。从而可以在移动终端设备上实现直接对待识别图像特征进行量化、并向服务器上传量化特征数据的功能。
在上述的实施例中,提供了一种获取图像信息的方法,与之相对应的,本申请还提供一种获取图像信息的装置。请参看图7,其为本申请的一种获取图像信息的装置实施例的示意图。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本实施例的一种获取图像信息的装置,包括:局部特征提取单元701,用于提取待识别图像的局部特征;特征值计算单元702,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值;特征值发送单元703,用于将所述特征值计算单元输出的特征值发送给提供图像识别服务的服务端;图像信息接收单元704,用于接收所述服务端返回的所述待识别图像的相关信息。
可选的,所述装置包括:
距离计算单元,用于计算所述局部特征对应的关键点到所述待检索图像中心的距离;
局部特征剔除单元,用于剔除所述距离计算单元计算得到的距离大于预设阈值的关键点对应的局部特征;
所述特征值计算单元具体用于,采用所述深度自编码网络模型计算由所述局部特征剔除单元执行剔除操作后的局部特征的特征值。
可选的,所述局部特征提取单元具体用于,采用SIFT算法、LBP算法或者利用卷积神经网络,提取所述待识别图像的局部特征。
此外,本申请还提供一种图像识别方法,请参考图8,其为本申请提供的一种图像识别方法的实施例的流程图,本实施例与之前提供的各实施例内容相同的部分不再赘述,下面重点描述不同之处。本申请提供的一种图像识别方法包括:
步骤801、接收客户端上传的待识别图像的特征值,所述特征值是以所述待 识别图像的局部特征为输入,利用预先训练的深度自编码网络模型计算得到的。
步骤802、将所述特征值与图像检索数据库中注册图像的特征值进行匹配。
所述图像检索数据库是采用如下步骤预先建立的:选取用于构建所述图像检索数据库的注册图像;提取所述注册图像的局部特征;利用所述深度自编码网络模型计算所述局部特征的特征值;将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
在上述利用所述深度自编码网络模型计算注册图像局部特征的特征值之后,还可以按照计算得到的特征值的分布,对特征值进行筛选,例如,可以选择在注册图像中出现频率低于预设阈值的特征值,也可以按照特征值在注册图像中的位置分布选择特征值,并将筛选后的特征值存储在图像检索数据库中。
客户端上传的特征值以及图像检索数据库中注册图像的特征值可以为:二值化特征值,本步骤可以采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
步骤803、根据匹配结果选择满足预设条件的注册图像。
本步骤选择与待识别图像相匹配的注册图像。所述与待识别图像相匹配的注册图像通常是指,与待识别图像匹配程度比较高的注册图像,例如,与所述待识别图像属于相同图像。所述相同图像通常是指同一幅图像经过一系列变化后得到的图像(即near-duplicate图像),所述一系列变化可以包括调整分辨率、调整拍摄角度、调整亮度、添加水印等。
在具体实施时,可以采用与之前提供的图像检索方法实施例相同的方式,先选择满足预设条件的一组注册图像,然后再通过重排的方式进行进一步筛选,从而找到与待识别图像相匹配的注册图像。所述重排操作包括:针对每个所选注册图像,将所述待识别图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;根据满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像;
为了提高识别的准确度,还可以添加一些选取注册图像的条件,例如:选择按照特征值匹配成功所得累计分值排序处于第一位、且与第二位的累计分值差大于预设阈值的注册图像;或者,选择匹配成功的特征值个数大于预设阈值的注册图像(例如,对于包含200个注册特征值的注册图像,至少有50个特征 值匹配成功)等。此处列举了一些选取方式,在具体实施时,可以根据需要进行相应调整。
由于客户端上传的待识别图像的特征值、以及图像检索数据库中存储的注册特征值都能够保持原始图像特征的分辨力,因此图像检索精确度高,在图像检索数据库中注册图像规模足够大的情况下,本步骤通常能够准确找到与所述待识别图像相匹配的注册图像。
步骤804、获取与所选注册图像对应的注册信息,并返回给所述客户端。
为了提供图像识别功能,在构建所述图像检索数据库时,通常也会存储与注册图像对应的信息,即本申请所述的注册信息。所述注册信息通常包括与图像内容有关的信息,例如对于图书封面图像,其注册信息可以包括与图像中图书相关的信息,例如书名、作者姓名、价格、书评、在线购买网址等。
具体实施时,本步骤可以根据步骤803选择的注册图像,从数据库中提取对应的注册信息。例如,可以根据注册图像标识,读取对应的注册信息记录,并将其中的注册信息发送给所述客户端。
本申请提供的图像识别方法,采用了将图像的局部特征和深度自编码网络相结合的图像检索技术,由于深度自编码网络在对局部特征进行压缩表示的过程中,可以有效保持特征值之间的距离信息和辨别能力,从而能够有效提升图像检索的精确度,因此通常可以准确地检索到所需的注册图像,并将所述注册图像的注册信息返回给客户端。
进一步地,客户端上传的特征值与图像检索数据库存储的特征值可以为二值化特征值,由于二值化特征值是图像特征的进一步量化压缩表示,使得本申请提供的图像识别方法具有良好的可扩展性:一方面,可以将图像检索数据库扩展到百万级甚至亿级的规模,另一方面,可以方便的利用Hash等技术加速检索过程,提高检索性能。
在上述的实施例中,提供了一种图像识别方法,与之相对应的,本申请还提供一种图像识别装置。请参看图9,其为本申请提供的一种图像识别装置的实施例的示意图。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本实施例的一种图像识别装置,包括:特征值接收单元901,用于接收客户 端上传的待识别图像的特征值,所述特征值是以所述待识别图像的局部特征为输入,利用预先训练的深度自编码网络模型计算得到的;特征值匹配单元902,用于将所述特征值接收单元接收到的特征值与图像检索数据库中注册图像的特征值进行匹配;注册图像选择单元903,用于根据匹配结果选择满足预设条件的注册图像;图像信息发送单元904,用于获取与所选注册图像对应的注册信息,并返回给所述客户端。
可选的,所述特征值匹配单元具体用于,采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
可选的,所述装置包括:数据库建立单元,用于预先建立所述图像检索数据库;
所述数据库建立单元包括:
注册图像选择子单元,用于选取用于构建所述图像检索数据库的注册图像;
注册图像特征提取子单元,用于提取所述注册图像的局部特征;
注册图像特征值计算子单元,用于利用所述深度自编码网络模型计算所述局部特征的特征值;
特征值注册子单元,用于将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,所述数据库建立单元还包括:
特征值筛选子单元,用于根据特征值的分布,对所述注册图像特征值计算子单元计算得到的特征值进行筛选;
所述特征值注册子单元具体用于,将所述特征值筛选子单元筛选后的特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
可选的,所述特征值筛选子单元具体用于,选择在注册图像中出现频率低于预设阈值的特征值;和/或,按照特征值在注册图像中的位置分布选择特征值。
可选的,所述注册图像选择单元包括:
注册图像初选子单元,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像;
重排匹配子单元,用于针对所述注册图像初选子单元所选的每个注册图像,将所述待识别图像的特征值与从图像检索数据库中提取的注册图像特征值进行 两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
重排筛选子单元,用于根据所述重排匹配子单元记录的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像,作为所述待识别图像的检索结果。
此外,本申请还提供一种图像识别系统,请参考图10,其为本申请提供的一种图像识别系统的实施例的示意图。本实施例与之前提供的各实施例内容相同的部分不再赘述,下面重点描述不同之处。
本申请提供的图像识别系统包括:获取图像信息的装置1001和图像识别装置1002。所述获取图像信息的装置,可以部署于台式电脑,也可以部署于移动终端设备,但并不局限于此处列举的上述设备,可以是能够实现本申请所提供的获取图像信息方法的任何设备;所述图像识别装置,通常部署于服务器上,也可以是其他能够实现本申请所提供的图像识别方法的任何设备。
此外,本申请还提供一种用于计算图像特征值的方法,请参考图11,其为本申请提供的一种用于计算图像特征值的方法实施例的流程图,本实施例与之前提供的各实施例内容相同的部分不再赘述,下面重点描述不同之处。本申请提供的一种用于计算图像特征值的方法包括:
步骤1101、提取待计算特征值图像的局部特征。
在执行本步骤之前,可以预先训练所述深度自编码网络模型,训练过程包括:选择样本图像集;提取所述样本图像集中的样本图像的局部特征;以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
具体实施时,本步骤可以通过如下方式提取待计算特征值图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
步骤1102、采用预先训练的深度自编码网络模型计算所述局部特征的特征值。所述特征值包括:二值化特征值。
从服务端的角度,由于上述计算图像特征值的方法采用了深度自编码网络模型,不仅实现了局部特征的降维压缩,而且可以有效保持特征值之间的距离信息和辨别能力,从而为提高图像检索的精确度提供保障。特别地,当所述深 度自编码网络模型输出的特征值为二值化特征值时,还可以为提高图像检索数据库的可扩展性和提高检索效率提供条件。
从客户端的角度,由于采用深度自编码网络模型计算特征值,降低了对存储空间和计算性能的要求,可以在移动终端设备上完成对待识别图像特征值的计算过程,有助于减轻服务端的工作压力;特别地,当深度自编码网络模型计算输出的特征值为二值化特征值时,可以有效减少移动终端设备上传的数据量,减少上传时间,改善用户的使用体验。
在上述的实施例中,提供了一种用于计算图像特征值的方法,与之相对应的,本申请还提供一种用于计算图像特征值的装置。请参看图12,其为本申请的一种用于计算图像特征值的装置实施例的示意图。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本实施例的一种用于计算图像特征值的装置,包括:局部特征提取单元1201,用于提取待计算特征值图像的局部特征;特征值计算单元1202,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值。
可选的,所述装置包括:模型训练单元,用于预先训练所述深度自编码网络模型;
所述模型训练单元包括:
样本选择子单元,用于选择样本图像集;
样本特征提取子单元,用于提取所述样本图像集中的样本图像的局部特征;
迭代训练子单元,用于以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
可选的,所述局部特征提取单元具体用于,采用SIFT算法、LBP算法或者利用卷积神经网络,提取待计算特征值图像的局部特征。
此外,本申请还提供了一种电子设备,所述电子设备实施例如下。请参考图13,其示出了本申请的一种电子设备的实施例的示意图。
所述电子设备,包括:显示器1301;处理器1302;存储器1303;
所述存储器1303用于存储获取图像信息的程序,所述程序在被所述处理器读取执行时,执行如下操作:提取待识别图像的局部特征;采用预先训练的深度自编码网络模型计算所述局部特征的二值化特征值;将所述二值化特征值发送给提供图像识别服务的服务端;接收所述服务端返回的所述待识别图像的相关信息。
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因此本申请的保护范围应当以本申请权利要求所界定的范围为准。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
2、本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。

Claims (46)

  1. 一种图像检索方法,其特征在于,包括:
    提取待检索图像的局部特征;
    采用预先训练的深度自编码网络模型计算所述局部特征的特征值;
    将所述特征值与图像检索数据库中注册图像的特征值进行匹配;
    根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。
  2. 根据权利要求1所述的图像检索方法,其特征在于,所述特征值包括二值化特征值。
  3. 根据权利要求2所述的图像检索方法,其特征在于,所述将所述特征值与图像检索数据库中注册图像的特征值进行匹配包括:采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
  4. 根据权利要求3所述的图像检索方法,其特征在于,所述采用基于汉明距离的方式进行匹配包括:
    采用计算汉明距离的线性查询方式进行匹配;或者,
    采用以二值化特征值为索引查询哈希表的方式进行匹配。
  5. 根据权利要求1所述的图像检索方法,其特征在于,预先训练所述深度自编码网络模型,包括:
    选择样本图像集;
    提取所述样本图像集中的样本图像的局部特征;
    以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
  6. 根据权利要求1所述的图像检索方法,其特征在于,所述图像检索数据库是采用如下步骤预先建立的:
    选取用于构建所述图像检索数据库的注册图像;
    提取所述注册图像的局部特征;
    利用所述深度自编码网络模型计算所述局部特征的特征值;
    将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之 间的对应关系。
  7. 根据权利要求6所述的图像检索方法,其特征在于,在所述提取所述注册图像的局部特征之前,执行下述操作:
    按照预设的方式对所述注册图像的尺寸进行归一化。
  8. 根据权利要求6所述的图像检索方法,其特征在于,在利用所述深度自编码网络模型计算所述局部特征的特征值之后,执行下述步骤:
    按照计算得到的特征值的分布,对特征值进行筛选;
    所述将所述特征值存储在图像检索数据库中包括:将筛选后的特征值存储在图像检索数据库中。
  9. 根据权利要求8所述的图像检索方法,其特征在于,所述按照计算得到的特征值的分布,对特征值进行筛选包括:
    选择在注册图像中出现频率低于预设阈值的特征值;和/或,
    按照特征值在注册图像中的位置分布选择特征值。
  10. 根据权利要求1所述的图像检索方法,其特征在于,在所述提取待检索图像的局部特征之后,执行下述操作:
    计算所述局部特征对应的关键点到所述待检索图像中心的距离;
    剔除所述距离大于预设阈值的关键点对应的局部特征;
    所述采用预先训练的深度自编码网络模型计算所述局部特征的特征值包括:采用所述深度自编码网络模型计算执行上述剔除操作后的局部特征的特征值。
  11. 根据权利要求1所述的图像检索方法,其特征在于,所述将所述特征值与图像检索数据库中注册图像的特征值进行匹配包括:
    将所述特征值与图像检索数据库中注册图像的特征值,采用如下方式逐一进行两两匹配:计算表征待匹配特征值对的差异程度的指标值,并当所述指标值小于预先设定的阈值时判定所述特征值对匹配成功。
  12. 根据权利要求1所述的图像检索方法,其特征在于,所述满足预设条件的注册图像包括:
    按照特征值匹配成功的个数从大到小排序靠前的注册图像;或者,
    特征值匹配成功的个数大于预设阈值的注册图像;或者,
    按照特征值匹配成功所得累计分值从大到小排序靠前的注册图像;或者,
    特征值匹配成功所得累计分值大于预设阈值的注册图像。
  13. 根据权利要求1所述的图像检索方法,其特征在于,在所述根据匹配结果选择满足预设条件的注册图像后,执行下述重排操作:
    针对每个所选注册图像,将所述待检索图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
    根据满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像;
    所述根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果包括:将执行上述重排操作后所选注册图像作为所述待检索图像的检索结果。
  14. 根据权利要求13所述的图像检索方法,其特征在于,在所述记录满足预设重排匹配条件的特征值对的个数后,执行下述操作:
    通过利用变换模型进行空间关系一致性校验,从所述满足预设重排匹配条件的特征值对中剔除误匹配的特征值对;
    所述根据满足预设重排序匹配条件的特征值对的个数对所选注册图像进行排序包括:根据执行上述剔除操作后的、满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序。
  15. 根据权利要求1-14任一项所述的图像检索方法,其特征在于,通过如下方式提取图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
  16. 根据权利要求1所述的图像检索方法,其特征在于,所述提取待检索图像的局部特征、以及所述采用预先训练的深度自编码网络模型计算所述局部特征的特征值的步骤在客户端设备上执行;
    所述将所述特征值与图像检索数据库中图像的特征值进行匹配、以及所述根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果的步骤在服务端设备上执行。
  17. 一种图像检索装置,其特征在于,包括:
    局部特征提取单元,用于提取待检索图像的局部特征;
    特征值计算单元,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值;
    特征值匹配单元,用于将所述特征值计算单元输出的特征值与图像检索数据库中注册图像的特征值进行匹配;
    检索结果生成单元,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果。
  18. 根据权利要求17所述的图像检索装置,其特征在于,当所述特征值为二值化特征值时,所述特征值匹配单元具体用于,采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
  19. 根据权利要求17所述的图像检索装置,其特征在于,包括:模型训练单元,用于预先训练所述深度自编码网络模型;
    所述模型训练单元包括:
    样本选择子单元,用于选择样本图像集;
    样本特征提取子单元,用于提取所述样本图像集中的样本图像的局部特征;
    迭代训练子单元,用于以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
  20. 根据权利要求17所述的图像检索装置,其特征在于,包括:数据库建立单元,用于预先建立所述图像检索数据库;
    所述数据库建立单元包括:
    注册图像选择子单元,用于选取用于构建所述图像检索数据库的注册图像;
    注册图像特征提取子单元,用于提取所述注册图像的局部特征;
    注册图像特征值计算子单元,用于利用所述深度自编码网络模型计算所述局部特征的特征值;
    特征值注册子单元,用于将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
  21. 根据权利要求20所述的图像检索装置,其特征在于,所述数据库建立单元还包括:
    特征值筛选子单元,用于根据特征值的分布,对所述注册图像特征值计算子单元计算得到的特征值进行筛选;
    所述特征值注册子单元具体用于,将所述特征值筛选子单元筛选后的特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
  22. 根据权利要求17所述的图像检索装置,其特征在于,所述装置包括:
    距离计算单元,用于计算所述局部特征对应的关键点到所述待检索图像中心的距离;
    局部特征剔除单元,用于剔除所述距离计算单元计算得到的距离大于预设阈值的关键点对应的局部特征;
    所述特征值计算单元具体用于,采用所述深度自编码网络模型计算由所述局部特征剔除单元执行剔除操作后的局部特征的特征值。
  23. 根据权利要求17所述的图像检索装置,其特征在于,所述检索结果生成单元包括:
    注册图像初选子单元,用于根据所述特征值匹配单元输出的匹配结果选择满足预设条件的注册图像;
    重排匹配子单元,用于针对所述注册图像初选子单元所选的每个注册图像,将所述待检索图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
    重排筛选子单元,用于根据所述重排匹配子单元记录的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像,作为所述待检索图像的检索结果。
  24. 根据权利要求23所述的图像检索装置,其特征在于,所述检索结果生成单元还包括:
    空间一致性校验子单元,用于通过利用变换模型进行空间关系一致性校验,从所述重排匹配子单元得到的特征值对中剔除误匹配的特征值对;
    所述重排筛选子单元具体用于,根据所述空间一致性校验子单元执行剔除操作后的、满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序。
  25. 根据权利要求17所述的图像检索装置,其特征在于,所述局部特征提取单元以及所述特征值计算单元部署于客户端设备上;
    所述特征值匹配单元以及所述检索结果生成单元部署于服务端设备上。
  26. 一种获取图像信息的方法,其特征在于,包括:
    提取待识别图像的局部特征;
    采用预先训练的深度自编码网络模型计算所述局部特征的特征值;
    将所述特征值发送给提供图像识别服务的服务端;
    接收所述服务端返回的所述待识别图像的相关信息。
  27. 根据权利要求26所述的获取图像信息的方法,其特征在于,所述特征值包括二值化特征值。
  28. 根据权利要求26所述的获取图像信息的方法,其特征在于,在所述提取待识别图像的局部特征之后,执行下述操作:
    计算所述局部特征对应的关键点到所述待识别图像中心的距离;
    剔除所述距离大于预设阈值的关键点对应的局部特征;
    所述采用预先训练的深度自编码网络模型计算所述局部特征的特征值包括:采用所述深度自编码网络模型计算执行上述剔除操作后的局部特征的特征值。
  29. 根据权利要求26-28任一项所述的图像检索方法,其特征在于,通过如下方式提取待识别图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
  30. 根据权利要求26所述的获取图像信息的方法,其特征在于,所述方法在移动终端设备上实施。
  31. 一种获取图像信息的装置,其特征在于,包括:
    局部特征提取单元,用于提取待识别图像的局部特征;
    特征值计算单元,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值;
    特征值发送单元,用于将所述特征值计算单元输出的特征值发送给提供图像识别服务的服务端;
    图像信息接收单元,用于接收所述服务端返回的所述待识别图像的相关信息。
  32. 一种图像识别方法,其特征在于,包括:
    接收客户端上传的待识别图像的特征值,所述特征值是以所述待识别图像的局部特征为输入,利用预先训练的深度自编码网络模型计算得到的;
    将所述特征值与图像检索数据库中注册图像的特征值进行匹配;
    根据匹配结果选择满足预设条件的注册图像;
    获取与所选注册图像对应的注册信息,并返回给所述客户端。
  33. 根据权利要求32所述的图像识别方法,其特征在于,所述特征值包括:二值化特征值。
  34. 根据权利要求33所述的图像识别方法,其特征在于,所述将所述特征值与图像检索数据库中注册图像的特征值进行匹配包括:采用基于汉明距离的方式进行匹配,并将汉明距离小于预设阈值的特征值对作为匹配成功的特征值对。
  35. 根据权利要求32所述的图像识别方法,其特征在于,所述图像检索数据库是采用如下步骤预先建立的:
    选取用于构建所述图像检索数据库的注册图像;
    提取所述注册图像的局部特征;
    利用所述深度自编码网络模型计算所述局部特征的特征值;
    将所述特征值存储在图像检索数据库中,并建立所述特征值与注册图像之间的对应关系。
  36. 根据权利要求35所述的图像识别方法,其特征在于,在利用所述深度自编码网络模型计算所述局部特征的特征值之后,执行下述步骤:
    按照计算得到的特征值的分布,对特征值进行筛选;
    所述将所述特征值存储在图像检索数据库中包括:将筛选后的特征值存储在图像检索数据库中。
  37. 根据权利要求36所述的图像识别方法,其特征在于,所述按照计算得到的特征值的分布,对特征值进行筛选包括:
    选择在注册图像中出现频率低于预设阈值的特征值;和/或,
    按照特征值在注册图像中的位置分布选择特征值。
  38. 根据权利要求32所述的图像识别方法,其特征在于,在所述根据匹配结果选择满足预设条件的注册图像后,执行下述重排操作:
    针对每个所选注册图像,将所述待识别图像的特征值与从图像检索数据库中提取的注册图像特征值进行两两匹配,并记录满足预设重排匹配条件的特征值对的个数;
    根据满足预设重排匹配条件的特征值对的个数对所选注册图像进行排序,并从中选择排序靠前的注册图像;
    所述根据匹配结果选择满足预设条件的注册图像作为所述待检索图像的检索结果包括:将执行上述重排操作后所选注册图像作为所述待检索图像的检索结果。
  39. 一种图像识别装置,其特征在于,包括:
    特征值接收单元,用于接收客户端上传的待识别图像的特征值,所述特征值是以所述待识别图像的局部特征为输入,利用预先训练的深度自编码网络模型计算得到的;
    特征值匹配单元,用于将所述特征值接收单元接收到的特征值与图像检索数据库中注册图像的特征值进行匹配;
    注册图像选择单元,用于根据匹配结果选择满足预设条件的注册图像;
    图像信息发送单元,用于获取与所选注册图像对应的注册信息,并返回给所述客户端。
  40. 一种图像识别系统,其特征在于,包括:根据权利要求27所述的获取图像信息的装置,以及根据权利要求30所述的图像识别装置。
  41. 一种用于计算图像特征值的方法,其特征在于,包括:
    提取待计算特征值图像的局部特征;
    采用预先训练的深度自编码网络模型计算所述局部特征的特征值。
  42. 根据权利要求41所述的用于计算图像特征值的方法,其特征在于,所述特征值包括:二值化特征值。
  43. 根据权利要求41所述的用于计算图像特征值的方法,其特征在于,预先训练所述深度自编码网络模型,包括:
    选择样本图像集;
    提取所述样本图像集中的样本图像的局部特征;
    以所述局部特征作为输入,以深度自编码网络模型对输入数据进行编解码 后的重建误差最小为目标,进行迭代训练直至所述深度自编码网络模型收敛。
  44. 根据权利要求41-43任一项所述的用于计算图像特征值的方法,其特征在于,通过如下方式提取待计算特征值图像的局部特征:采用SIFT算法、采用LBP算法或者利用卷积神经网络。
  45. 一种用于计算图像特征值的装置,其特征在于,包括:
    局部特征提取单元,用于提取待计算特征值图像的局部特征;
    特征值计算单元,用于采用预先训练的深度自编码网络模型计算所述局部特征提取单元输出的局部特征的特征值。
  46. 一种电子设备,其特征在于,包括:
    显示器;
    处理器;
    存储器,用于存储获取图像信息的程序,所述程序在被所述处理器读取执行时,执行如下操作:提取待识别图像的局部特征;采用预先训练的深度自编码网络模型计算所述局部特征的二值化特征值;将所述二值化特征值发送给提供图像识别服务的服务端;接收所述服务端返回的关于所述待识别图像的相关信息。
PCT/CN2016/091519 2015-08-06 2016-07-25 图像检索、获取图像信息及图像识别方法、装置及系统 WO2017020741A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510475003.5A CN106445939B (zh) 2015-08-06 2015-08-06 图像检索、获取图像信息及图像识别方法、装置及系统
CN201510475003.5 2015-08-06

Publications (1)

Publication Number Publication Date
WO2017020741A1 true WO2017020741A1 (zh) 2017-02-09

Family

ID=57942389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/091519 WO2017020741A1 (zh) 2015-08-06 2016-07-25 图像检索、获取图像信息及图像识别方法、装置及系统

Country Status (2)

Country Link
CN (1) CN106445939B (zh)
WO (1) WO2017020741A1 (zh)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273458A (zh) * 2017-06-01 2017-10-20 百度在线网络技术(北京)有限公司 深度模型训练方法及装置、图像检索方法及装置
CN110069644A (zh) * 2019-04-24 2019-07-30 南京邮电大学 一种基于深度学习的压缩域大规模图像检索方法
CN110275970A (zh) * 2019-06-21 2019-09-24 北京达佳互联信息技术有限公司 图像检索的方法、装置、服务器及存储介质
CN110348289A (zh) * 2019-05-27 2019-10-18 广州中国科学院先进技术研究所 一种基于二值图的手指静脉识别方法
WO2019210677A1 (en) 2018-05-04 2019-11-07 Beijing Ling Technology Co., Ltd. Method for Book Recognition and Book Reading Device
CN110880005A (zh) * 2018-09-05 2020-03-13 阿里巴巴集团控股有限公司 向量索引建立方法及装置和向量检索方法及装置
CN111008210A (zh) * 2019-11-18 2020-04-14 浙江大华技术股份有限公司 商品识别方法、装置、编解码器及存储装置
CN111026896A (zh) * 2019-11-15 2020-04-17 浙江大华技术股份有限公司 特征值存储、处理方法、设备及存储装置
CN111062478A (zh) * 2019-12-18 2020-04-24 天地伟业技术有限公司 基于神经网络的特征压缩算法
CN111159456A (zh) * 2019-12-30 2020-05-15 云南大学 基于深度学习与传统特征的多尺度服装检索方法及系统
CN111159443A (zh) * 2019-12-31 2020-05-15 深圳云天励飞技术有限公司 一种图像特征值的搜索方法、装置及电子设备
CN111753690A (zh) * 2020-06-15 2020-10-09 神思电子技术股份有限公司 一种菜品托盘识别方法及基于该方法的菜品识别方法
CN111985616A (zh) * 2020-08-13 2020-11-24 沈阳东软智能医疗科技研究院有限公司 一种图像特征提取方法、图像检索方法、装置及设备
CN112257662A (zh) * 2020-11-12 2021-01-22 安徽大学 一种基于深度学习的压力足迹图像检索系统
CN112347885A (zh) * 2020-10-27 2021-02-09 西安科技大学 一种基于自编码网络的铁谱图像智能识别方法
CN113204665A (zh) * 2021-04-28 2021-08-03 北京百度网讯科技有限公司 图像检索方法、装置、电子设备及计算机可读存储介质
CN113254702A (zh) * 2021-05-28 2021-08-13 浙江大华技术股份有限公司 一种视频录像检索的方法及装置
CN113642710A (zh) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 一种网络模型的量化方法、装置、设备和存储介质
CN114610940A (zh) * 2022-03-15 2022-06-10 华南理工大学 基于局部随机敏感自编码器的哈希图像检索方法
CN114978707A (zh) * 2022-05-24 2022-08-30 深圳市前海研祥亚太电子装备技术有限公司 设备的注册方法及其系统

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169425A (zh) * 2017-04-26 2017-09-15 深圳美云智数科技有限公司 一种商品属性的识别方法及装置
US11354822B2 (en) * 2017-05-16 2022-06-07 Google Llc Stop code tolerant image compression neural networks
CN107239793B (zh) * 2017-05-17 2020-01-17 清华大学 多量化深度二值特征学习方法及装置
CN108805244A (zh) * 2017-07-23 2018-11-13 宁波亿诺维信息技术有限公司 二维码生成系统
CN110147459B (zh) * 2017-07-28 2021-08-20 杭州海康威视数字技术股份有限公司 一种图像检索方法、装置及电子设备
CN110019896B (zh) * 2017-07-28 2021-08-13 杭州海康威视数字技术股份有限公司 一种图像检索方法、装置及电子设备
CN107918636B (zh) * 2017-09-07 2021-05-18 苏州飞搜科技有限公司 一种人脸快速检索方法、系统
CN108090117B (zh) 2017-11-06 2019-03-19 北京三快在线科技有限公司 一种图像检索方法及装置,电子设备
CN110309336B (zh) * 2018-03-12 2023-08-08 腾讯科技(深圳)有限公司 图像检索方法、装置、系统、服务器以及存储介质
CN108921065A (zh) * 2018-06-21 2018-11-30 北京陌上花科技有限公司 建立特征数据库的方法和装置
CN109472279B (zh) * 2018-08-31 2020-02-07 杭州千讯智能科技有限公司 基于图像处理的物品识别方法和装置
CN110956190A (zh) * 2018-09-27 2020-04-03 深圳云天励飞技术有限公司 图像识别方法及装置、计算机装置和计算机可读存储介质
CN111078924B (zh) * 2018-10-18 2024-03-01 深圳云天励飞技术有限公司 图像检索方法、装置、终端及存储介质
CN109885709B (zh) * 2019-01-08 2022-12-23 五邑大学 一种基于自编码预降维的图像检索方法、装置和存储介质
CN109784295B (zh) * 2019-01-25 2020-12-25 佳都新太科技股份有限公司 视频流特征识别方法、装置、设备及存储介质
CN110069664B (zh) * 2019-04-24 2021-04-06 北京博视未来科技有限公司 动漫作品封面图提取方法及其系统
CN110298163B (zh) * 2019-06-06 2021-04-02 重庆大学 一种图像验证方法、装置及计算机可读存储介质
CN110287883A (zh) * 2019-06-26 2019-09-27 山东浪潮人工智能研究院有限公司 一种基于改进最近邻距离比值法进行人脸识别的方法
CN110738236B (zh) * 2019-09-16 2022-07-22 深圳市国信合成科技有限公司 图像匹配方法、装置、计算机设备及存储介质
CN110633384B (zh) * 2019-09-19 2022-05-17 哈尔滨工业大学(深圳) 基于汗孔和多图匹配的高分辨率指纹检索方法、装置、系统及存储介质
CN111125412A (zh) * 2019-12-25 2020-05-08 珠海迈科智能科技股份有限公司 一种基于特征的图像匹配方法及系统
CN111339343A (zh) * 2020-02-12 2020-06-26 腾讯科技(深圳)有限公司 图像检索方法、装置、存储介质及设备
CN111553372B (zh) * 2020-04-24 2023-08-08 北京搜狗科技发展有限公司 一种训练图像识别网络、图像识别搜索的方法及相关装置
CN111832494B (zh) * 2020-07-17 2024-03-05 中国联合网络通信集团有限公司 信息存储方法及设备
CN112529018A (zh) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 图像局部特征的训练方法、装置及存储介质
CN113254687B (zh) * 2021-06-28 2021-09-17 腾讯科技(深圳)有限公司 图像检索、图像量化模型训练方法、装置和存储介质
WO2023272659A1 (zh) * 2021-06-30 2023-01-05 东莞市小精灵教育软件有限公司 封面图像的识别方法、装置、存储介质及识别设备
CN113591937B (zh) * 2021-07-09 2023-09-26 国家电网有限公司 一种基于局部距离编码的电力系统关键节点识别方法
CN113806577A (zh) * 2021-08-24 2021-12-17 浙江大华技术股份有限公司 一种图像搜索方法、装置、存储介质和电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030222A (zh) * 2007-03-22 2007-09-05 华为技术有限公司 模型检索装置及方法
CN103488664A (zh) * 2013-05-03 2014-01-01 中国传媒大学 一种图像检索方法
CN104679863A (zh) * 2015-02-28 2015-06-03 武汉烽火众智数字技术有限责任公司 一种基于深度学习的以图搜图方法和系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156464B (zh) * 2014-08-20 2018-04-27 中国科学院重庆绿色智能技术研究院 基于微视频特征数据库的微视频检索方法及装置
CN104239897B (zh) * 2014-09-04 2017-05-17 天津大学 一种基于自编码器词袋的视觉特征表示方法
CN104778671B (zh) * 2015-04-21 2017-09-22 重庆大学 一种基于sae和稀疏表示的图像超分辨率方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030222A (zh) * 2007-03-22 2007-09-05 华为技术有限公司 模型检索装置及方法
CN103488664A (zh) * 2013-05-03 2014-01-01 中国传媒大学 一种图像检索方法
CN104679863A (zh) * 2015-02-28 2015-06-03 武汉烽火众智数字技术有限责任公司 一种基于深度学习的以图搜图方法和系统

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273458A (zh) * 2017-06-01 2017-10-20 百度在线网络技术(北京)有限公司 深度模型训练方法及装置、图像检索方法及装置
EP3821402A4 (en) * 2018-05-04 2022-01-19 Beijing Ling Technology Co., Ltd. METHODS OF BOOK RECOGNITION AND BOOK READING DEVICE
WO2019210677A1 (en) 2018-05-04 2019-11-07 Beijing Ling Technology Co., Ltd. Method for Book Recognition and Book Reading Device
CN110880005A (zh) * 2018-09-05 2020-03-13 阿里巴巴集团控股有限公司 向量索引建立方法及装置和向量检索方法及装置
CN110880005B (zh) * 2018-09-05 2023-06-23 阿里巴巴集团控股有限公司 向量索引建立方法及装置和向量检索方法及装置
CN110069644A (zh) * 2019-04-24 2019-07-30 南京邮电大学 一种基于深度学习的压缩域大规模图像检索方法
CN110069644B (zh) * 2019-04-24 2023-06-06 南京邮电大学 一种基于深度学习的压缩域大规模图像检索方法
CN110348289A (zh) * 2019-05-27 2019-10-18 广州中国科学院先进技术研究所 一种基于二值图的手指静脉识别方法
CN110275970A (zh) * 2019-06-21 2019-09-24 北京达佳互联信息技术有限公司 图像检索的方法、装置、服务器及存储介质
CN111026896B (zh) * 2019-11-15 2023-09-01 浙江大华技术股份有限公司 特征值存储、处理方法、设备及存储装置
CN111026896A (zh) * 2019-11-15 2020-04-17 浙江大华技术股份有限公司 特征值存储、处理方法、设备及存储装置
CN111008210B (zh) * 2019-11-18 2023-08-11 浙江大华技术股份有限公司 商品识别方法、装置、编解码器及存储装置
CN111008210A (zh) * 2019-11-18 2020-04-14 浙江大华技术股份有限公司 商品识别方法、装置、编解码器及存储装置
CN111062478A (zh) * 2019-12-18 2020-04-24 天地伟业技术有限公司 基于神经网络的特征压缩算法
CN111159456A (zh) * 2019-12-30 2020-05-15 云南大学 基于深度学习与传统特征的多尺度服装检索方法及系统
CN111159456B (zh) * 2019-12-30 2022-09-06 云南大学 基于深度学习与传统特征的多尺度服装检索方法及系统
CN111159443B (zh) * 2019-12-31 2022-03-25 深圳云天励飞技术股份有限公司 一种图像特征值的搜索方法、装置及电子设备
CN111159443A (zh) * 2019-12-31 2020-05-15 深圳云天励飞技术有限公司 一种图像特征值的搜索方法、装置及电子设备
CN111753690A (zh) * 2020-06-15 2020-10-09 神思电子技术股份有限公司 一种菜品托盘识别方法及基于该方法的菜品识别方法
CN111753690B (zh) * 2020-06-15 2023-11-07 神思电子技术股份有限公司 一种菜品托盘识别方法及基于该方法的菜品识别方法
CN111985616B (zh) * 2020-08-13 2023-08-08 沈阳东软智能医疗科技研究院有限公司 一种图像特征提取方法、图像检索方法、装置及设备
CN111985616A (zh) * 2020-08-13 2020-11-24 沈阳东软智能医疗科技研究院有限公司 一种图像特征提取方法、图像检索方法、装置及设备
CN112347885B (zh) * 2020-10-27 2023-06-23 西安科技大学 一种基于自编码网络的铁谱图像智能识别方法
CN112347885A (zh) * 2020-10-27 2021-02-09 西安科技大学 一种基于自编码网络的铁谱图像智能识别方法
CN112257662A (zh) * 2020-11-12 2021-01-22 安徽大学 一种基于深度学习的压力足迹图像检索系统
CN113204665B (zh) * 2021-04-28 2023-09-22 北京百度网讯科技有限公司 图像检索方法、装置、电子设备及计算机可读存储介质
CN113204665A (zh) * 2021-04-28 2021-08-03 北京百度网讯科技有限公司 图像检索方法、装置、电子设备及计算机可读存储介质
CN113254702A (zh) * 2021-05-28 2021-08-13 浙江大华技术股份有限公司 一种视频录像检索的方法及装置
CN113642710A (zh) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 一种网络模型的量化方法、装置、设备和存储介质
CN113642710B (zh) * 2021-08-16 2023-10-31 北京百度网讯科技有限公司 一种网络模型的量化方法、装置、设备和存储介质
CN114610940B (zh) * 2022-03-15 2023-02-14 华南理工大学 基于局部随机敏感自编码器的哈希图像检索方法
CN114610940A (zh) * 2022-03-15 2022-06-10 华南理工大学 基于局部随机敏感自编码器的哈希图像检索方法
CN114978707A (zh) * 2022-05-24 2022-08-30 深圳市前海研祥亚太电子装备技术有限公司 设备的注册方法及其系统

Also Published As

Publication number Publication date
CN106445939B (zh) 2019-12-13
CN106445939A (zh) 2017-02-22

Similar Documents

Publication Publication Date Title
WO2017020741A1 (zh) 图像检索、获取图像信息及图像识别方法、装置及系统
JP5926291B2 (ja) 類似画像を識別する方法および装置
Zhou et al. Towards codebook-free: Scalable cascaded hashing for mobile image search
JP5749394B2 (ja) 視覚探索のための堅牢な特徴マッチング
CN105912611B (zh) 一种基于cnn的快速图像检索方法
TW201828109A (zh) 圖像檢索、獲取圖像資訊及圖像識別方法、裝置及系統
CN109783671B (zh) 一种以图搜图的方法、计算机可读介质及服务器
JP5950864B2 (ja) スケール不変の画像特徴の量子化された埋込みを用いて画像を表現する方法
Zheng et al. $\mathcal {L} _p $-Norm IDF for Scalable Image Retrieval
CN105095435A (zh) 一种图像高维特征的相似比较方法及装置
EP2710518B1 (en) Scalable query for visual search
CN112307239B (zh) 一种图像检索方法、装置、介质和设备
CN113254687B (zh) 图像检索、图像量化模型训练方法、装置和存储介质
CN110442749B (zh) 视频帧处理方法及装置
JP2017010468A (ja) 撮影画像に写る物体を検索するシステム及び方法
CN109446408B (zh) 检索相似数据的方法、装置、设备及计算机可读存储介质
CN111767421A (zh) 用于检索图像方法、装置、电子设备和计算机可读介质
Chen et al. Context-aware discriminative vocabulary learning for mobile landmark recognition
Springer et al. Forest hashing: Expediting large scale image retrieval
CN111143619B (zh) 视频指纹生成方法、检索方法、电子设备及介质
CN115129915A (zh) 重复图像检索方法、装置、设备及存储介质
Wadhwa et al. Distributed locality sensitivity hashing
CN116738009B (zh) 一种对数据进行归档回溯的方法
CN113806589B (zh) 视频片段定位方法、装置以及计算机可读存储介质
John et al. Large Scale Image Search Using Data Compression Technique and Relevance Feedback in Raspberry Pi3

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16832229

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16832229

Country of ref document: EP

Kind code of ref document: A1