WO2021068524A1 - 图像匹配方法、装置、计算机设备及存储介质 - Google Patents

图像匹配方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021068524A1
WO2021068524A1 PCT/CN2020/093343 CN2020093343W WO2021068524A1 WO 2021068524 A1 WO2021068524 A1 WO 2021068524A1 CN 2020093343 W CN2020093343 W CN 2020093343W WO 2021068524 A1 WO2021068524 A1 WO 2021068524A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
visual
matched
sample
visual word
Prior art date
Application number
PCT/CN2020/093343
Other languages
English (en)
French (fr)
Inventor
张密
韩丙卫
唐文
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021068524A1 publication Critical patent/WO2021068524A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of image recognition in the field of artificial intelligence, and in particular to an image matching method, device, computer equipment, and storage medium.
  • the embodiments of the present application provide an image matching method, device, computer equipment, and storage medium to solve the problem of low accuracy of image matching.
  • An image matching method including:
  • the sample images of the degree threshold constitute a set of similar images
  • the distance between the visual feature to be matched and each visual word in the preset inverted index table is calculated, which will be the same as the visual feature to be matched.
  • the visual word with the smallest distance is determined as the visual word to be matched with the visual feature to be matched;
  • the sample visual word set refers to a visual word set composed of visual words with the smallest distance from the sample visual features in similar images
  • the similar images whose image co-occurrence ratio value is greater than the preset co-occurrence ratio threshold are formed into a matched image group.
  • An image matching device includes:
  • the first feature extraction module is configured to obtain an image to be matched, perform feature extraction on the image to be matched, and obtain the depth feature to be matched and a plurality of visual features to be matched of the image to be matched;
  • the feature similarity calculation module is used to calculate the feature similarity between the depth feature to be matched of the image to be matched and the sample depth feature of each sample image in the preset image depth feature library, and to extract the Sample images with feature similarity greater than the preset similarity threshold constitute a similar image set;
  • the to-be-matched visual word determination module is used to calculate the distance between the to-be-matched visual feature and each visual word in the preset inverted index table for each of the to-be-matched visual features of the to-be-matched image , Determining the visual word with the smallest distance from the visual feature to be matched as the visual word to be matched of the visual feature to be matched;
  • the first component module is used to compose the visual words to be matched into a visual word set to be matched;
  • the image co-occurrence ratio calculation module is used to calculate the image co-occurrence ratio between the visual word set to be matched and the sample visual word set of each similar image in the similar image set to obtain each similar image and all the similar images.
  • the image co-occurrence ratio value of the image to be matched wherein the sample visual word set refers to a visual word set composed of visual words with the smallest distance from the sample visual features in similar images;
  • the second composition module is used to group similar images whose co-occurrence ratio value is greater than a preset co-occurrence ratio threshold into a matched image group.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the sample images of the degree threshold constitute a set of similar images
  • the distance between the visual feature to be matched and each visual word in the preset inverted index table is calculated, which will be the same as the visual feature to be matched.
  • the visual word with the smallest distance is determined as the visual word to be matched of the visual feature to be matched, and the visual word to be matched is formed into a visual word set to be matched;
  • the sample visual word set refers to a visual word set composed of visual words with the smallest distance from the sample visual features in similar images
  • the similar images whose image co-occurrence ratio value is greater than the preset co-occurrence ratio threshold are formed into a matched image group.
  • One or more readable storage media storing computer readable instructions, the readable storage media including non-volatile readable storage media and volatile readable storage media, and the computer readable instructions are stored by one or more When executed by two processors, when the one or more processors are executed, the following steps are implemented:
  • the sample images of the degree threshold constitute a set of similar images
  • the distance between the visual feature to be matched and each visual word in the preset inverted index table is calculated, which will be the same as the visual feature to be matched.
  • the visual word with the smallest distance is determined as the visual word to be matched of the visual feature to be matched, and the visual word to be matched is formed into a visual word set to be matched;
  • the sample visual word set refers to a visual word set composed of visual words with the smallest distance from the sample visual features in similar images
  • the similar images whose image co-occurrence ratio value is greater than the preset co-occurrence ratio threshold are formed into a matched image group.
  • the above-mentioned image matching method, device, computer equipment and storage medium first match a set of similar images similar to the image to be matched from a large number of sample images through the image depth feature database, and then use the inverted index table to match the set of similar images from the similar image set. A similar image group with more similar images to be matched, thereby further improving the accuracy of the image matching result.
  • FIG. 1 is a schematic diagram of an application environment of an image matching method in an embodiment of the present application
  • Fig. 2 is an example diagram of an image matching method in an embodiment of the present application
  • Fig. 3 is another example diagram of an image matching method in an embodiment of the present application.
  • Fig. 4 is another example diagram of an image matching method in an embodiment of the present application.
  • Fig. 5 is another example diagram of an image matching method in an embodiment of the present application.
  • Fig. 6 is another example diagram of an image matching method in an embodiment of the present application.
  • Fig. 7 is a functional block diagram of an image matching device in an embodiment of the present application.
  • FIG. 8 is another functional block diagram of the image matching device in an embodiment of the present application.
  • FIG. 9 is another functional block diagram of the image matching device in an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the image matching method can be applied to the application environment as shown in FIG. 1.
  • the image matching method is applied to an image matching system.
  • the image matching system includes a client and a server as shown in FIG. 1.
  • the client and the server communicate through a network to solve the problem of low accuracy of image matching. problem.
  • the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented with an independent server or a server cluster composed of multiple servers.
  • an image matching method is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S10 Obtain a to-be-matched image, perform feature extraction on the to-be-matched image, and obtain a to-be-matched depth feature and a plurality of to-be-matched visual features of the to-be-matched image.
  • the image to be matched refers to the image to be matched.
  • the image to be matched may be a car insurance report image.
  • feature extraction is performed on the acquired image to be matched to obtain the depth feature to be matched and multiple visual features to be matched of the image to be matched.
  • the depth feature to be matched refers to the deep feature of the image to be matched, and the depth feature to be matched is suitable for the matching of similar images.
  • the visual feature to be matched refers to the SIFT feature extracted from the image to be matched.
  • SIFT feature is a local feature of image extracted in scale space.
  • the SIFT feature is suitable for the matching of the same image elements.
  • feature extraction is performed on the image to be matched, and 80 visual features to be matched are extracted from the image to be matched, and each visual feature to be matched is a 128-dimensional vector.
  • performing feature extraction on the image to be matched includes performing visual feature extraction and depth feature extraction on the image to be matched.
  • ResNet50 can be selected as the feature extraction network, and the output of the final fully connected layer (2048 dimension) is selected as the depth feature of the image to be matched, that is, the 2048 dimension vector of the image to be matched is used to represent the depth of the image to be matched. feature.
  • the SIFT algorithm or opencv-contrib can be used to extract visual features of the image to be matched to obtain the visual features of the image to be matched.
  • the process of performing depth feature extraction and visual feature extraction on the image to be matched is not prioritized. That is, the process of extracting the visual features of the image to be matched can be performed on the image to be matched and then extracting the depth feature of the image to be matched, or First, perform depth feature extraction on the matched image and then perform visual feature extraction on the matched image.
  • S20 Calculate the feature similarity between the depth feature to be matched of the image to be matched and the sample depth feature of each sample image in the preset image depth feature library, and extract samples with feature similarity greater than the preset similarity threshold Images, which form a collection of similar images.
  • the image depth feature library refers to a database that stores a large number of sample images and corresponding sample depth features. Understandably, each sample image in the image depth feature library corresponds to a unique sample depth feature. Specifically, after the depth feature to be matched of the image to be matched is determined, the depth feature to be matched of the image to be matched is compared with the sample depth feature of each sample image in the preset image depth feature library, and the The feature similarity between the depth feature to be matched of the image to be matched and the sample depth feature of each sample image in the preset image depth feature library.
  • methods such as cosine similarity algorithm, Euclidean distance or Manhattan distance can be used to calculate the similarity between the depth feature to be matched of the image to be matched and the sample depth feature of each sample image in the preset image depth feature library.
  • the degree of feature similarity between the to-be-matched depth feature of the image to be matched and the sample depth feature of each sample image in the preset image depth feature library is obtained.
  • the similar image set refers to several sample images whose feature similarity is greater than the similarity threshold selected from the image depth feature library.
  • the similarity threshold refers to the threshold used to evaluate whether the image to be matched and the sample image are similar images.
  • the similarity threshold can be 0.80, 0.85, or 0.90.
  • the similarity threshold is set to 0.80, that is, the sample images with the feature similarity of the depth feature to be matched of the image to be matched greater than 0.80 form a similar image set.
  • the inverted index table refers to an index table established based on a large number of sample images that contains several visual words and sample images corresponding to each visual word.
  • visual words are a carrier that can be used to express image information.
  • the sample visual features of each sample image are obtained, and then the sample visual features of each sample image are clustered to form a visual word .
  • a corresponding word sequence number can be set for each visual word in advance, and each word sequence number corresponds to a unique visual word. word.
  • Arabic numerals can be used to indicate the word sequence number corresponding to each visual word.
  • the Euclidean distance can be used to calculate the distance between each visual feature of the image to be matched and each visual word in the preset inverted index table, and then it will be matched with each visual feature of the image to be matched.
  • the visual word with the smallest visual feature distance is the visual word to be matched with the visual feature to be matched. Understandably, each visual feature to be matched corresponds to a visual word with the smallest distance. Therefore, the number of visual words to be matched is the same as the number of visual features to be matched.
  • the image to be matched includes 80 visual features to be matched, so the obtained number of visual words to be matched is also 80. It should be noted that the visual words to be matched with the smallest distances corresponding to multiple visual features to be matched may be the same visual words.
  • each to-be-matched visual words of the to-be-matched visual features are combined to form a to-be-matched visual word set of the image to be matched. For example: if there are 80 visual words to be matched with visual features to be matched, the generated visual word set to be matched is a set including 80 visual words to be matched.
  • S50 Calculate the image co-occurrence ratio between the visual word set to be matched and the sample visual word set of each similar image in the similar image set, and obtain the image co-occurrence ratio value of each similar image and the image to be matched, where the sample visual
  • the word set refers to the visual word set composed of the visual words with the smallest distance from the sample visual features in the similar images.
  • the sample images in the image depth feature database and the inverted index table are the same, and each sample image in the inverted index table has determined the corresponding sample visual word set.
  • the similar image set is a number of sample images that meet the set conditions selected from the image depth feature library, that is, each similar image in the similar image set is included in the sample image of the inverted index table. Therefore, after the visual word set to be matched of the image to be matched is determined, the image co-occurrence between the visual word set to be matched and the sample visual word set of each similar image in the inverted index table can be directly calculated. Ratio, obtain the image co-occurrence ratio value of each similar image and the image to be matched.
  • the number of sample visual words contained in the sample visual word set corresponding to each similar image is the same as the number of the image to be matched.
  • the visual word set contains the same number of visual words to be matched.
  • the visual words to be matched contained in the visual word set to be matched of the image to be matched are matched with the sample visual words contained in the sample visual word set of each similar image in the similar image set, and the sample visual words that are successfully matched are matched.
  • Determine the similar visual words of the corresponding similar images and then calculate the proportion value of the similar visual words in the sample visual word set of the corresponding similar images to obtain the image co-occurrence ratio value of the image to be matched and each sample image.
  • the visual word set to be matched of the image to be matched includes 80 visual words to be matched
  • a sample visual word set of a similar image includes 80 sample visual words
  • each visual word set to be matched in the visual word set to be matched includes 80 visual words to be matched.
  • S60 Combine similar images with an image co-occurrence ratio value greater than a preset co-occurrence ratio threshold to form a matching image group.
  • the co-occurrence ratio threshold refers to a threshold used to evaluate whether the image is similar to the image to be matched.
  • the co-occurrence ratio threshold may be 0.80, 0.85, or 0.90.
  • the co-occurrence ratio threshold is set to 0.80, that is, similar images whose co-occurrence ratio with the image co-occurrence ratio value of the image to be matched is greater than 0.80 form a similar image group.
  • the similar image group refers to a group of images with higher similarity to the image to be matched, which is selected from the similar image set using the inverted index table. There may be one or more images included in the similar image group.
  • the image co-occurrence ratio value of each similar image and the image to be matched is compared with a preset co-occurrence ratio threshold.
  • the similar images whose image co-occurrence ratio value is greater than the co-occurrence ratio threshold are extracted to form a matched image group.
  • the depth feature of the image to be matched and multiple visual features to be matched are obtained; the depth feature to be matched of the image to be matched and the preset image are calculated The feature similarity between the sample depth features of each sample image in the depth feature library, and extract the sample images whose feature similarity is greater than the preset similarity threshold to form a similar image set; each to-be-matched vision of the image to be matched Feature, calculate the distance between the visual feature to be matched and each visual word in the preset inverted index table, and determine the visual word with the smallest distance from the visual feature to be matched as the target visual word of the visual feature to be matched; The target visual words form the visual word set to be matched; the image co-occurrence ratio between the visual word set to be matched and the sample visual word set of each similar image in the similar image set is calculated to obtain the image co-occurrence of each similar image and the image to be matched Current ratio value; similar
  • the image matching method also specifically includes the following steps:
  • S21 Obtain a sample image set, where the sample image set includes a plurality of sample images.
  • the sample image set refers to the image data used to build the inverted index table.
  • the sample image set includes a plurality of sample images.
  • the sample image set may be images collected in real time by the client using its image collection tool, or images collected and saved by the client in advance, or images directly uploaded or sent locally to the client.
  • the client sends the sample image set to the server, and the server obtains the sample image set.
  • S22 Perform feature extraction on each sample image to obtain sample depth features and multiple sample visual features of each sample image.
  • the feature extraction is performed on each sample image, and the sample depth feature and multiple sample visual features of each sample image are obtained.
  • the sample depth feature refers to the deep features of the sample image, and the sample depth feature is suitable for matching similar images.
  • Sample visual features SIFT features extracted from the sample image Preferably, in order to improve the matching accuracy and matching efficiency of subsequent images.
  • feature extraction is performed on each sample image, and 80 sample visual features are extracted from each sample image, and each sample visual feature is a 128-dimensional vector.
  • step S10 performs feature extraction on the image to be retrieved to obtain the depth to be retrieved of the image to be retrieved.
  • the feature is the same as the method and process of multiple visual features to be retrieved, and will not be redundantly described here.
  • the visual word dictionary includes multiple visual words.
  • the visual word dictionary refers to a dictionary library containing several visual words formed by clustering the sample visual features of each sample image.
  • the visual word dictionary includes multiple visual words.
  • the K-Means clustering algorithm can be used to cluster the sample visual features of each sample image, aggregate to generate multiple cluster centers, and number the generated cluster centers from 0 to n-1, each The center of the class corresponds to a visual word, thereby generating a visual word dictionary including multiple visual words.
  • the sample visual features of each sample image are clustered, and 50,000 cluster centers (each of which is a 128-dimensional vector) are generated by aggregation.
  • the visual word dictionary includes 50,000 visual words.
  • the specific method and process of determining the target visual word of the sample visual feature of the sample image in this step are similar to the specific method and process of determining the visual word to be matched of the visual feature to be matched in step S30, and no redundancy is made here. Go into details.
  • S25 Form each target visual word of the sample visual feature into a target visual word set of the corresponding sample image.
  • the target visual word set refers to a word set composed of visual words with the smallest distance from each sample visual feature of the sample image. Specifically, after obtaining the target visual word of the sample visual feature according to step S24, each target visual word of the sample visual feature is combined to form the target visual word set of the corresponding sample image. Understandably, since each sample image includes 80 sample visual features, the target visual word set of the obtained sample image includes 80 target visual words
  • each visual word contained in the visual word dictionary is used as the primary key, and then according to the target visual word contained in the target visual word set of each sample image, the sample image corresponding to each visual word is determined, and each The sample image corresponding to the visual word is used as the primary key value of the corresponding visual word, thereby establishing the mapping relationship between each visual word and the corresponding sample image, and generating an inverted index table.
  • the sample image set includes multiple sample images; feature extraction is performed on each sample image to obtain the sample depth feature and multiple sample visual features of each sample image;
  • the visual features of each sample of the image are clustered to generate a visual word dictionary, which includes multiple visual words; for each sample visual feature of each sample image, calculate the sample visual feature and visual word of each sample image The distance of each visual word in the dictionary, the visual word with the smallest distance from the sample visual feature of the sample image is determined as the target visual word of the sample visual feature of the corresponding sample image; each target visual word of the sample visual feature is formed The target visual word set of the corresponding sample image; based on the target visual word set of each sample image, the mapping relationship between each visual word and the corresponding sample image is established, and the inverted index table is generated;
  • the features are transformed into the target visual word set, and the mapping relationship between the sample image and the visual word is established to form an inverted index table, which facilitates subsequent image matching directly based on the inverted index table.
  • the image to be matched includes information about the image to be matched.
  • the image matching method also Specifically, it includes the following steps:
  • the matched image information of the matched image refers to the image-related information carried by the matched image.
  • the matching image information may include the image ID, the acquisition time of the image, the source of the image, or the number of the image, and so on.
  • matching image information corresponding to different types of matching images may be different.
  • the matched image is an image related to a car insurance claim
  • the matched image information of the matched image may be the case number, the time when the image was acquired, the mobile phone number of the report, the insured, etc.
  • the matched image is an image related to user information verification
  • the matched image information of the matched image can be the user ID, the time when the image was acquired, the age of the user, and the address of the user, etc.
  • S80 Calculate the similarity between the to-be-matched image information of the to-be-matched image and the matched image information of each matched image to obtain an information similarity value.
  • the image information to be matched refers to the image-related information carried by the image to be matched.
  • the image information to be matched may include the image ID, the acquisition time of the image, the source of the image, or the number of the image, and so on.
  • different types of images to be matched correspond to different image information.
  • the character string matching method can be used to calculate the information similarity between the image information to be matched and the matching image information of each matching image to obtain the information similarity value of each matching image and the image to be matched.
  • S90 Perform statistical analysis on the similarity value of each information, and use the matching image with the largest information similarity value as the target image.
  • the target image refers to the image with the highest similarity to the image to be matched. Specifically, after determining the information similarity value of each matching image and the image to be matched, statistical analysis is performed on the information similarity value of each matching image and the image to be matched, and the matching image with the largest information similarity value is taken as the target image.
  • the matching image information of each matching image in the matching image group is obtained; the similarity between the to-be-matched image information of the image to be matched and the matching image information of each matching image is calculated to obtain the information similarity value; Perform statistical analysis on the similarity value of each information, and use the matching image with the largest information similarity value as the target image; thereby ensuring the similarity between the generated target image and the image to be matched, so as to further improve the accuracy of image matching.
  • the image co-occurrence ratio between the visual word set to be matched and the sample visual word set of each similar image in the similar image set in the inverted index table is calculated to obtain each similarity
  • the image co-occurrence ratio value of the image and the image to be matched includes the following steps:
  • S502 Perform one-to-one matching of each sample visual word in the sample visual word set of each similar image with each visual word to be matched in the visual word set to be matched to obtain a matching visual word of each similar image.
  • the sample visual word set to be matched includes several visual words to be matched
  • the sample visual word set of each similar image also includes several sample visual words. Therefore, each sample visual word in the sample visual word set of each similar image needs to be matched with each visual word to be matched in the visual word set to be matched, and the visual word that matches the visual word to be matched is determined.
  • the regular matching method or the string matching method can be used to match each sample visual word in the sample visual word set of each similar image with each visual word to be matched in the visual word set to be matched, to obtain each Matching visual words of similar images.
  • S503 Calculate the proportion value of the matching visual words of each similar image in the corresponding sample visual word set, and obtain the image co-occurrence ratio value of each sample image and the image to be matched.
  • the number of matching visual words of the similar image is determined, and then the number of matching visual words of the similar image and the number of sample visual words in the corresponding sample visual word set are calculated.
  • the ratio value is calculated by calculating the proportion value of the matching visual words of each similar image in the corresponding sample visual word set, and then the image co-occurrence ratio value of each sample image and the image to be matched can be obtained.
  • the visual word set to be matched of the image A to be matched includes ⁇ a 1 , a 2 , a 3 , a 4 ... a 80 ⁇ a total of 80 visual words to be matched
  • the sample visual word set of the similar image B Including ⁇ b 1 ,b 2 ,b 3 ,b 4 ...b 80 ⁇
  • a total of 80 sample visual words each sample visual word in the sample visual word set of similar image B is matched with the vision to be matched of image A
  • 60 matching visual words of the similar image B are obtained, and the proportion of the matching visual words of the similar image B in the corresponding sample visual word set is calculated to obtain the
  • each sample visual word in the sample visual word set of each similar image is matched with each visual word to be matched in the visual word set to be matched to obtain each A matching visual word of a similar image; calculating the proportion of the matching visual word of each similar image in the corresponding sample visual word set, and obtaining the image co-occurrence ratio value of each sample image and the image to be matched; thereby further improving the obtained The accuracy of the image co-occurrence ratio between each sample image and the image to be retrieved.
  • establishing the mapping relationship between each visual word and the corresponding sample image to generate an inverted index table specifically includes the following steps:
  • the preset index table refers to a table preset for storing visual words and sample images.
  • the preset index table may be an Excel table or the like.
  • each row in the preset index table is preset with a primary key cell and a primary key value cell corresponding to each primary key. Specifically, after determining each visual word, each visual word is first recorded in the primary key grid of each row in the preset index table, that is, each visual word is used as the primary key in the preset index table.
  • S262 Determine a sample image corresponding to each visual word based on the target visual word set of each sample image.
  • the sample image corresponding to each visual word is determined.
  • the target visual word set of the sample image C includes a total of 4 target visual words ⁇ a, b, c, d ⁇
  • the target visual word set of the sample image D includes a total of 4 ⁇ a, c, f, h ⁇
  • Target visual words the target visual word set of sample image F includes 4 target visual words ⁇ a, b, d, f ⁇ ;
  • the sample images corresponding to visual word a are sample image C, sample image D, and sample image F
  • the sample images corresponding to visual word b are sample image C and sample image F
  • the sample images corresponding to visual word c are sample image C and sample image D
  • the sample images corresponding to visual word d are sample image C and sample image F;
  • vision The sample image corresponding to the word f is the sample image D and the sample image F
  • the sample image corresponding to the visual word h is the sample image D.
  • S263 Use the sample image corresponding to each visual word as the primary key value of the corresponding visual word, and generate an inverted index table.
  • each sample image is recorded in the primary key value grid of the corresponding visual word, that is, the sample image corresponding to each visual word is used as the primary key of the corresponding visual word.
  • Key value to generate an inverted index table Understandably, after the inverted index table is generated, the corresponding visual word can be found directly according to the sample image.
  • each visual word is used as the primary key in the preset index table; the sample image corresponding to each visual word is determined based on the target visual word set of each sample image; the sample corresponding to each visual word is determined The image is used as the primary key value of the corresponding visual word to generate an inverted index table; thus, the accuracy of the corresponding relationship between the sample image and the visual word is ensured, and the accuracy of subsequent image matching is improved.
  • an image matching device is provided, and the image matching device corresponds to the image matching method in the above-mentioned embodiment in a one-to-one correspondence.
  • the image matching device includes a first feature extraction module 10, a feature similarity calculation module 20, a visual word to be matched determining module 30, a first composition module 40, an image co-occurrence ratio calculation module 50, and a composition module 60 .
  • the detailed description of each functional module is as follows:
  • the first feature extraction module 10 is configured to obtain an image to be matched, perform feature extraction on the image to be matched, and obtain the depth feature to be matched and a plurality of visual features to be matched of the image to be matched;
  • the feature similarity calculation module 20 is used to calculate the feature similarity between the to-be-matched depth feature of the image to be matched and the sample depth feature of each sample image in the preset image depth feature library, and extract the feature similarity greater than the predetermined Set the sample images of similarity threshold to form a set of similar images;
  • the visual word to be matched determining module 30 is used to calculate the distance between the visual feature to be matched and each visual word in the preset inverted index table for each visual feature to be matched in the image to be matched. The visual word with the smallest feature distance is determined as the visual word to be matched with the visual feature to be matched;
  • the first composition module 40 is used to compose the visual words to be matched into a visual word set to be matched;
  • the image co-occurrence ratio calculation module 50 is used to calculate the image co-occurrence ratio between the visual word set to be matched and the sample visual word set of each similar image in the similar image set to obtain the image co-occurrence ratio of each similar image and the image to be matched
  • the present scale value where the sample visual word set refers to the visual word set composed of the visual words with the smallest distance from the sample visual features in the similar images;
  • the second composing module 60 is used for composing similar images whose co-occurrence ratio value is greater than a preset co-occurrence ratio threshold to form a matching image group.
  • the image matching device further includes:
  • the sample image set obtaining module 21 is used to obtain a sample image set, and the sample image set includes a plurality of sample images;
  • the second feature extraction module 22 is configured to perform feature extraction on each sample image to obtain sample depth features and multiple sample visual features of each sample image;
  • the clustering processing module 23 is used to perform clustering processing on each sample visual feature of each sample image to generate a visual word dictionary, the visual word dictionary including a plurality of visual words;
  • the target visual word determination module 24 is used to calculate the distance between the sample visual characteristic of each sample image and each visual word in the visual word dictionary for each sample visual characteristic of each sample image, which will be compared with the sample visual characteristic of the sample image.
  • the visual word with the smallest feature distance is determined as the target visual word of the sample visual feature of the corresponding sample image;
  • the third composition module 25 is used to compose each target visual word of the sample visual feature into a corresponding target visual word set of the sample image;
  • the inverted index table generating module 26 is used to establish a mapping relationship between each visual word and the corresponding sample image based on the target visual word set of each sample image, and generate an inverted index table.
  • the image matching device further includes:
  • the matching image information obtaining module 70 is used to obtain the matching image information of each matching image in the matching image group;
  • the similarity calculation module 80 is used to calculate the similarity between the to-be-matched image information of the image to be matched and the matching image information of each matched image to obtain the information similarity value;
  • the statistical analysis module 90 is used to perform statistical analysis on the similarity value of each information, and use the matching image with the largest information similarity value as the target image.
  • the image co-occurrence ratio calculation module 50 includes:
  • the visual word set acquisition unit to be matched is used to acquire the visual word set to be matched
  • the matching unit is used to match each sample visual word in the sample visual word set of each similar image with each visual word to be matched in the visual word set to be matched to obtain the matching visual word of each similar image;
  • the proportion value calculation unit is used to calculate the proportion value of the matching visual words of each similar image in the corresponding sample visual word set, and obtain the image co-occurrence ratio value of each sample image and the image to be matched.
  • the inverted index table generating module 26 includes:
  • the primary key determining unit is used to use each visual word as the primary key in the preset index table
  • the sample image unit is used to determine the sample image corresponding to each visual word based on the target visual word set of each sample image
  • the primary key value unit is used to use the sample image corresponding to each visual word as the primary key value of the corresponding visual word to generate an inverted index table.
  • Each module in the above-mentioned image matching device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
  • the database of the computer device is used to store the data used in the image matching method in the foregoing embodiment.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize an image matching method.
  • the readable storage medium provided in this embodiment may be a non-volatile readable storage medium or a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the sample images of the degree threshold constitute a set of similar images
  • the distance between the visual feature to be matched and each visual word in the preset inverted index table is calculated, which will be the same as the visual feature to be matched.
  • the visual word with the smallest distance is determined as the visual word to be matched of the visual feature to be matched, and the visual word to be matched is formed into a visual word set to be matched;
  • the sample visual word set refers to a visual word set composed of visual words with the smallest distance from the sample visual features in similar images
  • the similar images whose image co-occurrence ratio value is greater than the preset co-occurrence ratio threshold are formed into a matched image group.
  • one or more readable storage media storing computer readable instructions
  • the readable storage medium includes a non-volatile readable storage medium and a volatile readable storage medium
  • the computer may When the read instruction is executed by one or more processors, the one or more processors execute the following steps:
  • the sample images of the degree threshold constitute a set of similar images
  • the distance between the visual feature to be matched and each visual word in the preset inverted index table is calculated, which will be the same as the visual feature to be matched.
  • the visual word with the smallest distance is determined as the visual word to be matched of the visual feature to be matched, and the visual word to be matched is formed into a visual word set to be matched;
  • the sample visual word set refers to a visual word set composed of visual words with the smallest distance from the sample visual features in similar images
  • the similar images whose image co-occurrence ratio value is greater than the preset co-occurrence ratio threshold are formed into a matched image group.
  • the computer-readable instructions can be stored in a non-volatile computer.
  • the computer readable instructions may be stored in a non-volatile computer readable storage medium.
  • the computer readable instructions When executed, they may include: The flow of the embodiments of the above methods.
  • any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种图像匹配方法、装置、计算机设备及存储介质;所述方法包括:获取待匹配图像,对待匹配图像进行特征提取,得到待匹配图像的待匹配深度特征和多个待匹配视觉特征(S10);计算待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出特征相似度大于预设的相似度阈值的样本图像,组成相似图像集(S20);对待匹配图像的每一待匹配视觉特征,计算待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与待匹配视觉特征距离最小的视觉单词,确定为待匹配视觉特征的待匹配视觉单词(S30);将待匹配视觉特征的每一待匹配视觉单词组成待匹配图像的待匹配视觉单词集(S40);计算待匹配视觉单词集与相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一相似图像与待匹配图像的图像共现比例值,其中,样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集(S50);将图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组(S60);该方法提高了图像匹配结果的准确性。

Description

图像匹配方法、装置、计算机设备及存储介质
本申请要求于2019年10月11日提交中国专利局、申请号为201910964148.X,发明名称为“图像匹配方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域的图像识别领域,尤其涉及一种图像匹配方法、装置、计算机设备及存储介质。
背景技术
随着互联网技术的飞速发展,声音、图像、视频和动画等数字信息急剧膨胀,图像作为一种内容丰富、表现直观的媒体信息,因此被应用于越来越多的技术领域中。发明人意识到随着图像数量的急剧增长,如何从大量的图像中匹配出用户所需的目标图像,成为了目前图像领域中亟待解决的重要问题。传统的图像匹配技术大部分都是基于图像的文本描述而实现的匹配技术,但基于不同的人对不同图像内容的理解往往存在较大的差异和主观性,从而导致通过文本描述而实现的图像匹配技术的准确性较低,不能满足诸多实际应用的需求。
申请内容
本申请实施例提供一种图像匹配方法、装置、计算机设备及存储介质,以解决图像匹配的准确性不高的问题。
一种图像匹配方法,包括:
获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词;
将所述待匹配视觉单词组成待匹配视觉单词集;
计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
一种图像匹配装置,包括:
第一特征提取模块,用于获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
特征相似度计算模块,用于计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
待匹配视觉单词确定模块,用于对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词;
第一组成模块,用于将所述待匹配视觉单词组成待匹配视觉单词集;
图像共现比例计算模块,用于计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
第二组成模块,用于将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
一个或多个存储有计算机可读指令的可读存储介质,所述可读存储介质包括非易失性可读存储介质和易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时,实现如下步骤:
获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
上述图像匹配方法、装置、计算机设备及存储介质,先通过图像深度特征库从大量的样本图像中匹配出与待匹配图像相似的相似图像集,再采用倒排索引表从相似图像集中匹配出与待匹配图像更相似的相似图像组,从而进一步提高了图像匹配结果的准确性。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中图像匹配方法的一应用环境示意图;
图2是本申请一实施例中图像匹配方法的一示例图;
图3是本申请一实施例中图像匹配方法的另一示例图;
图4是本申请一实施例中图像匹配方法的另一示例图;
图5是本申请一实施例中图像匹配方法的另一示例图;
图6是本申请一实施例中图像匹配方法的另一示例图;
图7是本申请一实施例中图像匹配装置的一原理框图;
图8是本申请一实施例中图像匹配装置的另一原理框图;
图9是本申请一实施例中图像匹配装置的另一原理框图;
图10是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的图像匹配方法,该图像匹配方法可应用如图1所示的应用环境中。具体地,该图像匹配方法应用在图像匹配系统中,该图像匹配系统包括如图1所示的客户端和服务端,客户端与服务端通过网络进行通信,用于解决图像匹配的准确性低问题。其中,客户端又称为用户端,是指与服务端相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种图像匹配方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:
S10:获取待匹配图像,对待匹配图像进行特征提取,得到待匹配图像的待匹配深度特征和多个待匹配视觉特征。
其中,待匹配图像指待进行匹配的图像。例如:待匹配图像可以为车险报案图像,在获取到待匹配图像后,需要从海量的图像中匹配出与待匹配图像为同一标的或者同一场景下的相似图像。在获取到待匹配图像后,对获取的待匹配图像进行特征提取,得到待匹配图像的待匹配深度特征和多个待匹配视觉特征。其中,待匹配深度特征是指待匹配图像的 深层特征,待匹配深度特征适用于相似图像的匹配。待匹配视觉特征指从待匹配图像中所提取出的SIFT特征。SIFT特征是一种在尺度空间中提取的图像局部特征。SIFT特征适用于相同图像元素的匹配。优选地,为了提高后续图像的匹配精度和匹配效率。在本实施例中,对待匹配图像进行特征提取,从待匹配图像中提取出80个待匹配视觉特征,,每一待匹配视觉特征为128维的向量。
具体地,对待匹配图像进行特征提取包括对待匹配图像进行视觉特征提取和深度特征提取。可选地,可选择ResNet50作为特征提取网络,选择最后的全连接层(2048维)的输出作为待匹配图像的深度特征,即用待匹配图像的2048维向量表示该待匹配图像的待匹配深度特征。另外地,可采用SIFT算法或者opencv-contrib对待匹配图像进行视觉特征提取,得到待匹配图像的待匹配视觉特征。
需要说明的是,在本实施例中,对待匹配图像进行深度特征提取和视觉特征提取的过程不分前后顺序,即可以先对待匹配图像进行视觉特征提取再对待匹配图像进行深度特征提取,也可以先对待匹配图像进行深度特征提取再对待匹配图像进行视觉特征提取。
S20:计算待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出特征相似度大于预设的相似度阈值的样本图像,组成相似图像集。
其中,图像深度特征库是指存储有大量的样本图像以及对应的样本深度特征的数据库。可以理解地,图像深度特征库中的每一样本图像对应唯一的样本深度特征。具体地,在确定了待匹配图像的待匹配深度特征之后,将待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征进行一一比较,计算该待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征的特征相似度。可选地,可采用余弦相似度算法、欧氏距离或者曼哈顿距离等方法计算待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的相似度,得到待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度。
进一步地,在确定了待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征的余弦相似度之后,提取出特征相似度大于预设的相似度阈值的样本图像,组成相似图像集。其中,相似图像集指从图像深度特征库中筛选出的特征相似度大于相似度阈值的若干样本图像。相似度阈值指用于评估待匹配图像与样本图像是否为相似图像的阈值。相似度阈值可以为0.80、0.85或0.90。在本实施例中,相似度阈值设定为0.80,即将与待匹配图像的待匹配深度特征的特征相似度大于0.80的样本图像,组成相似图像集。
S30:对待匹配图像的每一待匹配视觉特征,计算待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与待匹配视觉特征距离最小的视觉单词,确定为待匹配视觉特征的待匹配视觉单词。
其中,倒排索引表指基于大量的样本图像,所建立的包含若干视觉单词以及每一视觉单词对应的样本图像的索引表。其中,视觉单词是一种可用于表达图像信息的载体。具体地,在本实施例中,通过对获取的大量样本图像进行特征提取,得到每一样本图像的样本视觉特征,然后对每一样本图像的样本视觉特征进行聚类后,即可形成视觉单词。优选地,当视觉单词的数量较多时,为了便于识别或区分倒排索引表中的不同的视觉单词,可预先为每一视觉单词设定对应的单词序号,每一单词序号对应唯一的一视觉单词。优选地,可采用阿拉伯数字表示每一视觉单词对应的单词序号。
具体地,可采用欧氏距离计算待匹配图像的每一待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,然后将与该待匹配图像的每一待匹配视觉特征距离最小的视觉单词,为待匹配视觉特征的待匹配视觉单词。可以理解地,每一待匹配视觉特 征对应一个距离最小的视觉单词,因此,待匹配视觉单词的数量与待匹配视觉特征的数量相同。在本实施例中,待匹配图像包括80待匹配视觉特征,因此得到的待匹配视觉单词的数量也为80。需要说明的是,多个待匹配视觉特征对应的距离最小的待匹配视觉单词可能为相同的视觉单词。
S40:将待匹配视觉单词组成待匹配视觉单词集。
在根据步骤S30得到待匹配视觉特征的待匹配视觉单词之后,将待匹配视觉特征的每一待匹配视觉单词进行组合,即可组成待匹配图像的待匹配视觉单词集。例如:若得到的待匹配视觉特征的待匹配视觉单词为80个,则生成的待匹配视觉单词集为包括80个待匹配视觉单词的集合。
S50:计算待匹配视觉单词集与相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一相似图像与待匹配图像的图像共现比例值,其中,样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集。
在本实施例中,图像深度特征库和倒排索引表中的样本图像相同,且倒排索引表中的每一样本图像都已确定了对应的样本视觉单词集。由步骤S20可知相似图像集是从图像深度特征库中筛选出的满足设定条件的若干样本图像,即相似图像集中的每一相似图像都包含在倒排索引表的样本图像中。因此,在确定了待匹配图像的待匹配视觉单词集之后,可直接计算待匹配图像的待匹配视觉单词集与倒排索引表中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一相似图像与待匹配图像的图像共现比例值。优选地,为了提高得到的待匹配图像与每一样本图像的图像共现比例值的准确性,每一相似图像对应的样本视觉单词集所包含的样本视觉单词的数量与待匹配图像的待匹配视觉单词集所包含的待匹配视觉单词的数量相同。
具体地,将待匹配图像的待匹配视觉单词集所包含的待匹配视觉单词与相似图像集中每一相似图像的样本视觉单词集所包含样本视觉单词进行一一匹配,将匹配成功的样本视觉单词确定为对应的相似图像的相似视觉单词,然后计算得到的相似视觉单词在对应的相似图像的样本视觉单词集中的占比值,即可得到待匹配图像与每一样本图像的图像共现比例值。示例性地,若待匹配图像的待匹配视觉单词集包括80个待匹配视觉,一相似图像的样本视觉单词集包含括80个样本视觉单词,在将该待匹配视觉单词集所包含每一待匹配视觉单词该相似图像的样本视觉单词集所包含的样本视觉单词进行一一匹配后,得到64个与待匹配视觉单词匹配成功的相似视觉单词,则该述待匹配图像与该样本图像的图像共现比例值为64/80=0.8。
S60:将图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
其中,共现比例阈值指用于评估是否与待匹配图像相似的阈值。可选地,共现比例阈值可以为0.80、0.85或0.90。在本实施例中,共现比例阈值设定为0.80,即将与待匹配图像的图像共现比例值大于0.80的相似图像,组成相似图像组。其中,相似图像组指采用倒排索引表从相似图像集中筛选出的与待匹配图像相似度更高的一组图像。相似图像组所包含的图像可以为一个或者多个。
具体地,在确定了每一相似图像与待匹配图像的图像共现比例值之后,将每一相似图像与待匹配图像的图像共现比例值与预设的共现比例阈值进行一一比较,将图像共现比例值大于共现比例阈值的相似图像提取出来,即可组成匹配图像组。
在本实施例中,通过获取待匹配图像,对待匹配图像进行特征提取,得到待匹配图像的待匹配深度特征和多个待匹配视觉特征;计算待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;对待匹配图像的每一待匹配视觉特征,计算待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与待匹配视觉特征距离最小的视觉单词,确定为待匹配视觉特征的目标视觉单词;将目标视觉单词组成 待匹配视觉单词集;计算待匹配视觉单词集与相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一相似图像与待匹配图像的图像共现比例值;将图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组;先通过图像深度特征库从大量的样本图像中匹配出与待匹配图像相似的相似图像集,再采用倒排索引表从相似图像集中匹配出与待匹配图像更相似的相似图像组,从而进一步提高了图像匹配结果的准确性。
在一实施例中,如图3所示,在对待匹配图像的每一所述待匹配视觉特征,计算待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离之前,图像匹配方法还具体包括如下步骤:
S21:获取样本图像集,样本图像集包括多个样本图像。
其中,样本图像集指用于建立倒排索引表的图像数据。样本图像集包括多个样本图像。可选地,样本图像集可以是客户端采用其图像采集工具实时采集的图像,也可以是客户端预先采集并保存的图像,或者是本地直接上传或者发送到客户端的图像。客户端将样本图像集发送到服务端,服务端即获取到样本图像集。
S22:对每一样本图像进行特征提取,得到每一样本图像的样本深度特征和多个样本视觉特征。
对每一样本图像进行特征提取,得到每一样本图像的样本深度特征和多个样本视觉特征。其中,样本深度特征指样本图像的深层特征,样本深度特征适用于相似图像的匹配。样本视觉特征从样本图像中所提取出的SIFT特征。优选地,为了提高后续图像的匹配精度和匹配效率。在本实施例中,对每一样本图像进行特征提取,从每一样本图像中提取出80个样本视觉特征,每一样本视觉特征为128维的向量。
具体地,对每一样本图像进行特征提取,得到每一样本图像的样本深度特征和多个样本视觉特征的具体方法和过程与步骤S10对待检索图像进行特征提取,得到待检索图像的待检索深度特征和多个待检索视觉特征的方法和过程相同,此处不作冗余赘述。
S23:对每一样本图像的每一样本视觉特征进行聚类处理,生成视觉单词词典,视觉单词词典包括多个视觉单词。
其中,视觉单词词典指对每一样本图像的样本视觉特征进行聚类后所形成的包含若干视觉单词的词典库。视觉单词词典包括多个视觉单词。具体地,可采用K-Means聚类算法,对每一样本图像的样本视觉特征进行聚类处理,聚合生成多个类中心,并对生成的类中心从0到n-1进行编号,每个类中心对应一个视觉单词,从而生成包括多个视觉单词的视觉单词词典。优选地,在本实施例中,为了提高后续图像的匹配精度,对每一样本图像的样本视觉特征进行聚类处理,聚合生成50000个类中心(每个为128维的向量),即生成的视觉单词词典包括50000个视觉单词。
S24:对每一样本图像的每一样本视觉特征,计算每一样本图像的样本视觉特征与视觉单词词典中的每一视觉单词的距离,将与样本图像的样本视觉特征距离最小的视觉单词,确定为对应的样本图像的样本视觉特征的目标视觉单词。
具体地,该步骤中确定样本图像的样本视觉特征的目标视觉单词的具体方法和过程,与步骤S30中确定待匹配视觉特征的待匹配视觉单词的具体方法和过程相似,此处不作做冗余赘述。
S25:将样本视觉特征的每一目标视觉单词组成对应的样本图像的目标视觉单词集。
其中,目标视觉单词集指与样本图像的每一样本视觉特征距离最小的视觉单词所组成的单词集合。具体地,在根据步骤S24得到样本视觉特征的目标视觉单词之后,将样本视觉特征的每一目标视觉单词进行组合,即可组成对应的样本图像的目标视觉单词集。可以理解地,由于每一样本图像都包括80个样本视觉特征,因此,得到的样本图像的目标视觉单词集包括80个目标视觉单词
S26:基于每一样本图像的目标视觉单词集,建立每一视觉单词与对应的样本图像之间的映射关系,生成倒排索引表。
具体地,将视觉单词词典中所包含的每一视觉单词作为主键,再根据每一样本图像的目标视觉单词集所包含的目标视觉单词,确定每一视觉单词对应的样本图像,并将每一视觉单词对应的样本图像作为对应的视觉单词的主键值,从而建立每一视觉单词与对应的样本图像之间的映射关系,生成倒排索引表。
在本实施例中,通过获取样本图像集,样本图像集包括多个样本图像;对每一样本图像进行特征提取,得到每一样本图像的样本深度特征和多个样本视觉特征;对每一样本图像的每一样本视觉特征进行聚类处理,生成视觉单词词典,视觉单词词典包括多个视觉单词;对每一样本图像的每一样本视觉特征,计算每一样本图像的样本视觉特征与视觉单词词典中的每一视觉单词的距离,将与样本图像的样本视觉特征距离最小的视觉单词,确定为对应的样本图像的样本视觉特征的目标视觉单词;将样本视觉特征的每一目标视觉单词组成对应的样本图像的目标视觉单词集;基于每一样本图像的目标视觉单词集,建立每一视觉单词与对应的样本图像之间的映射关系,生成倒排索引表;通过将样本图片的样本视觉特征转化成目标视觉单词集,并建立样本图像与视觉单词之间的映射关系,形成倒排索引表,方便后续可直接根据倒排索引表进行图像匹配。
在一实施例中,如图4所示,待匹配图像包括待匹配图像信息,在将图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组之后,图像匹配方法还具体包括如下步骤:
S70:获取匹配图像组中每一匹配图像的匹配图像信息。
其中,匹配图像的匹配图像信息指匹配图像所携带的与图像有关的信息。例如,匹配图像信息可以包括图像ID、图像的获取时间、图像的来源或图像的编号等。在一具体实施例中,不同类型的匹配图像所对应的匹配图像信息可能不同。例如:若匹配图像为与车险理赔有关的图像,则匹配图像的匹配图像信息可以为案号、图像的获取时间、报案手机号和被保险人等。若匹配图像为与用户信息验证有关的图像,则匹配图像的匹配图像信息可以为用户ID、图像的获取时间、用户年龄和用户住址等
S80:计算待匹配图像的待匹配图像信息与每一匹配图像的匹配图像信息之间的相似度,得到信息相似值。
其中,待匹配图像信息指待匹配图像所携带的与图像有关的信息。同样地,待匹配图像信息可以包括图像ID、图像的获取时间、图像的来源或图像的编号等。在一具体实施例中,不同类型的待匹配图像所对应的图像信息不同。具体地,可采用字符串匹配法,计算待匹配图像的待匹配图像信息与每一匹配图像的匹配图像信息之间的信息相似度,得到每一匹配图像与待匹配图像的信息相似值。
S90:对每一信息相似值进行统计分析,将信息相似值最大的匹配图像,作为目标图像。
其中,目标图像指与待匹配图像相似度最高的图像。具体地,在确定了每一匹配图像与待匹配图像的信息相似值后,对每一匹配图像与待匹配图像的信息相似值进行统计分析,将信息相似值最大的匹配图像,作为目标图像。
在本实施例中,通过获取匹配图像组中每一匹配图像的匹配图像信息;计算待匹配图像的待匹配图像信息与每一匹配图像的匹配图像信息之间的相似度,得到信息相似值;对每一信息相似值进行统计分析,将信息相似值最大的匹配图像,作为目标图像;从而保证了生成的目标图像与待匹配图像的相似度,以进一步提高了图像匹配的准确性。
在一实施例中,如图5所示,计算待匹配视觉单词集与倒排索引表中的相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一相似图像与待匹配图像 的图像共现比例值,具体包括如下步骤:
S501:获取待匹配视觉单词集。
S502:将每一相似图像的样本视觉单词集中的每一样本视觉单词与待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一相似图像的匹配视觉单词。
可以理解地,由于待匹配视觉单词集中包括若干待匹配视觉单词,每一相似图像的样本视觉单词集中也包括若干样本视觉单词。因此,需将每一相似图像的样本视觉单词集中的每一样本视觉单词与待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,并将与待匹配视觉单词相匹配的视觉单词确定为匹配视觉单词。具体地,可采用正则匹配法或者字符串匹配法将每一相似图像的样本视觉单词集中的每一样本视觉单词与待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一相似图像的匹配视觉单词。
S503:计算每一相似图像的匹配视觉单词在对应的样本视觉单词集中的占比值,得到每一样本图像与待匹配图像的图像共现比例值。
具体地,在得到相似图像的匹配视觉单词之后,确定相似图像的匹配视觉单词的个数,然后将相似图像的匹配视觉单词的个数与对应的样本视觉单词集中样本视觉单词的个数进行求比例值,计算每一相似图像的匹配视觉单词在对应的样本视觉单词集中的占比值,即可得到每一样本图像与待匹配图像的图像共现比例值。
示例性地,若待匹配图像A的待匹配视觉单词集中包括{a 1,a 2,a 3,a 4...a 80}共80个待匹配视觉单词,相似图像B的样本视觉单词集中包括{b 1,b 2,b 3,b 4...b 80}共80个样本视觉单词,将该相似图像B的样本视觉单词集中的每一样本视觉单词与匹配图像A的待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配之后,得到相似图像B的匹配视觉单词为60个,则计算该相似图像B的匹配视觉单词在对应的样本视觉单词集中的占比值,得到该相似图像B与待匹配图像的图像共现比例值为60/80=0.75
在本实施例中,通过获取待匹配视觉单词集;将每一相似图像的样本视觉单词集中的每一样本视觉单词与待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一相似图像的匹配视觉单词;计算每一相似图像的匹配视觉单词在对应的样本视觉单词集中的占比值,得到每一样本图像与待匹配图像的图像共现比例值;从而进一步提高了得到的每一样本图像与待检索图像的图像共现比例值的准确率。
在一实施例中,如图6所示,基于每一样本图像的目标视觉单词集,建立每一视觉单词与对应的样本图像之间的映射关系,生成倒排索引表,具体包括如下步骤:
S261:将每一视觉单词作为预设索引表中的主键。
其中,预设索引表是指预先设置的用于存储视觉单词和样本图像的表格。可选地,预设索引表可以为Excel表等。在一具体实施例中,预设索引表中的每一行都预先设置有主键格和每一主键对应的主键值格。具体地,在确定了每一视觉单词后,先将每一视觉单词记录到该预设索引表中每一行的主键格中,即将每一视觉单词作为预设索引表中的主键。
S262:基于每一样本图像的目标视觉单词集,确定每一视觉单词对应的样本图像。
具体地,基于每一样本图像的目标视觉单词集,确定每一视觉单词对应的样本图像。示例性地:若样本图像C的目标视觉单词集中包括{a,b,c,d}共4个目标视觉单词,样本图像D的目标视觉单词集中包括{a,c,f,h}共4个目标视觉单词,样本图像F的目标视觉单词集中包括{a,b,d,f}共4个目标视觉单词;则视觉单词a对应的样本图像为样本图像C、样本图像D和样本图像F;视觉单词b对应的样本图像为样本图像C和样本图像F;视觉单词c对应的样本图像为样本图像C和样本图像D;视觉单词d对应的样本图像为样本图像C和样本图像F;视觉单词f对应的样本图像为样本图像D和样本图像F;视觉单词h对应的样本图像为样本图像D。
S263:将每一视觉单词对应的样本图像作为对应的视觉单词的主键值,生成倒排索引 表。
具体地,在确定了每一视觉单词对应的样本图像之后,将每一样本图像记录到对应的视觉单词的主键值格中,即将每一视觉单词对应的样本图像作为对应的视觉单词的主键值,从而生成倒排索引表。可以理解地,在生成倒排索引表之后,可直接根据样本图像查找到对应的视觉单词。
在本实施例中,通过将每一视觉单词作为预设索引表中的主键;基于每一样本图像的目标视觉单词集,确定每一视觉单词对应的样本图像;将每一视觉单词对应的样本图像作为对应的视觉单词的主键值,生成倒排索引表;从而保证了样本图像与视觉单词对应关系的准确性,提高了后续进行图像匹配的准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种图像匹配装置,该图像匹配装置与上述实施例中图像匹配方法一一对应。如图7所示,该图像匹配装置包括第一特征提取模块10、特征相似度计算模块20、待匹配视觉单词确定模块30、第一组成模块40、图像共现比例计算模块50和组成模块60。各功能模块详细说明如下:
第一特征提取模块10,用于获取待匹配图像,对待匹配图像进行特征提取,得到待匹配图像的待匹配深度特征和多个待匹配视觉特征,;
特征相似度计算模块20,用于计算待匹配图像的待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
待匹配视觉单词确定模块30,用于对待匹配图像的每一待匹配视觉特征,计算待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与待匹配视觉特征距离最小的视觉单词,确定为待匹配视觉特征的待匹配视觉单词;
第一组成模块40,用于将待匹配视觉单词组成待匹配视觉单词集;
图像共现比例计算模块50,用于计算待匹配视觉单词集与相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一相似图像与待匹配图像的图像共现比例值,其中,样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
第二组成模块60,用于将图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
优选地,如图8所示,所述图像匹配装置还包括:
样本图像集获取模块21,用于获取样本图像集,样本图像集包括多个样本图像;
第二特征提取模块22,用于对每一样本图像进行特征提取,得到每一样本图像的样本深度特征和多个样本视觉特征;
聚类处理模块23,用于对每一样本图像的每一样本视觉特征进行聚类处理,生成视觉单词词典,视觉单词词典包括多个视觉单词;
目标视觉单词确定模块24,用于对每一样本图像的每一样本视觉特征,计算每一样本图像的样本视觉特征与视觉单词词典中的每一视觉单词的距离,将与样本图像的样本视觉特征距离最小的视觉单词,确定为对应的样本图像的样本视觉特征的目标视觉单词;
第三组成模块25,用于将样本视觉特征的每一目标视觉单词组成对应的样本图像的目标视觉单词集;
倒排索引表生成模块26,用于基于每一样本图像的目标视觉单词集,建立每一视觉单词与对应的样本图像之间的映射关系,生成倒排索引表。
优选地,如图9所示,所述图像匹配装置还包括:
匹配图像信息获取模块70,用于获取匹配图像组中每一匹配图像的匹配图像信息;
相似度计算模块80,用于计算待匹配图像的待匹配图像信息与每一匹配图像的匹配图像信息之间的相似度,得到信息相似值;
统计分析模块90,用于对每一信息相似值进行统计分析,将信息相似值最大的匹配图像,作为目标图像。
优选地,所述图像共现比例计算模块50,包括:
待匹配视觉单词集获取单元,用于获取待匹配视觉单词集;
匹配单元,用于将每一相似图像的样本视觉单词集中的每一样本视觉单词与待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一相似图像的匹配视觉单词;
占比值计算单元,用于计算每一相似图像的匹配视觉单词在对应的样本视觉单词集中的占比值,得到每一样本图像与待匹配图像的图像共现比例值。
优选地,所述倒排索引表生成模块26,包括:
主键确定单元,用于将每一视觉单词作为预设索引表中的主键;
样本图像单元,用于基于每一样本图像的目标视觉单词集,确定每一视觉单词对应的样本图像;
主键值单元,用于将每一视觉单词对应的样本图像作为对应的视觉单词的主键值,生成倒排索引表。
关于图像匹配装置的具体限定可以参见上文中对于图像匹配方法的限定,在此不再赘述。上述图像匹配装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述实施例中图像匹配方法所使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种图像匹配方法。本实施例所提供的可读存储介质可以是非易失性可读存储介质,也可以是易失性可读存储介质。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:
获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
在一个实施例中,一个或多个存储有计算机可读指令的可读存储介质,所述可读存储介质包括非易失性可读存储介质和易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或易失性可读存储介质中,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种图像匹配方法,其中,包括:
    获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
    计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
    对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
    计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
    将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
  2. 如权利要求1所述的图像匹配方法,其中,在所述对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离之前,所述图像匹配方法还包括:
    获取样本图像集,所述样本图像集包括多个样本图像;
    对每一所述样本图像进行特征提取,得到每一所述样本图像的样本深度特征和多个样本视觉特征;
    对每一所述样本图像的每一所述样本视觉特征进行聚类处理,生成视觉单词词典,所述视觉单词词典包括多个视觉单词;
    对每一所述样本图像的每一所述样本视觉特征,计算每一所述样本图像的所述样本视觉特征与所述视觉单词词典中的每一所述视觉单词的距离,将与所述样本图像的所述样本视觉特征距离最小的视觉单词,确定为对应的所述样本图像的所述样本视觉特征的目标视觉单词;
    将所述样本视觉特征的每一所述目标视觉单词组成对应的所述样本图像的目标视觉单词集;
    基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表。
  3. 如权利要求1所述的图像匹配方法,所述待匹配图像包括待匹配图像信息,其中,所述在将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组之后,所述图像匹配方法,还包括:
    获取所述匹配图像组中每一匹配图像的匹配图像信息;
    计算所述待匹配图像的所述待匹配图像信息与每一所述匹配图像的所述匹配图像信息之间的相似度,得到信息相似值;
    对每一所述信息相似值进行统计分析,将所述信息相似值最大的所述匹配图像,作为目标图像。
  4. 如权利要求1所述的图像匹配方法,其中,所述计算所述待匹配视觉单词集与所述倒排索引表中的所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,包括:
    获取待匹配视觉单词集;
    将每一所述相似图像的样本视觉单词集中的每一样本视觉单词与所述待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一所述相似图像的匹配视觉单词;
    计算每一所述相似图像的所述匹配视觉单词在对应的所述样本视觉单词集中的占比值,得到每一所述样本图像与所述待匹配图像的图像共现比例值。
  5. 如权利要求2所述的图像匹配方法,其中,所述基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表,包括:
    将每一所述视觉单词作为预设索引表中的主键;
    基于每一所述样本图像的目标视觉单词集,确定每一所述视觉单词对应的样本图像;
    将每一所述视觉单词对应的所述样本图像作为对应的所述视觉单词的主键值,生成倒排索引表。
  6. 一种图像匹配装置,其中,包括:
    第一特征提取模块,用于获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
    特征相似度计算模块,用于计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
    待匹配视觉单词确定模块,用于对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词;
    第一组成模块,用于将所述待匹配视觉单词组成待匹配视觉单词集;
    图像共现比例计算模块,用于计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
    第二组成模块,用于将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
  7. 如权利要求6所述的图像匹配装置,其中,所述图像匹配装置还包括:
    样本图像集获取模块,用于获取样本图像集,所述样本图像集包括多个样本图像;
    第二特征提取模块,用于对每一所述样本图像进行特征提取,得到每一所述样本图像的样本深度特征和多个样本视觉特征;
    聚类处理模块,用于对每一所述样本图像的每一所述样本视觉特征进行聚类处理,生成视觉单词词典,所述视觉单词词典包括多个视觉单词;
    目标视觉单词确定模块,用于对每一所述样本图像的每一所述样本视觉特征,计算每一所述样本图像的所述样本视觉特征与所述视觉单词词典中的每一所述视觉单词的距离,将与所述样本图像的所述样本视觉特征距离最小的视觉单词,确定为对应的所述样本图像的所述样本视觉特征的目标视觉单词;
    第三组成模块,用于将所述样本视觉特征的每一所述目标视觉单词组成对应的所述样本图像的目标视觉单词集;
    倒排索引表生成模块,用于基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表。
  8. 如权利要求6所述的图像匹配装置,其中,所述图像匹配装置还包括:
    匹配图像信息获取模块,用于获取所述匹配图像组中每一匹配图像的匹配图像信息;
    相似度计算模块,用于计算所述待匹配图像的所述待匹配图像信息与每一所述匹配图像的所述匹配图像信息之间的相似度,得到信息相似值;
    统计分析模块,用于对每一所述信息相似值进行统计分析,将所述信息相似值最大的所述匹配图像,作为目标图像。
  9. 如权利要求6所述的图像匹配装置,其中,所述图像共现比例计算模块包括:
    待匹配视觉单词集获取单元,用于获取待匹配视觉单词集;
    匹配单元,用于将每一所述相似图像的样本视觉单词集中的每一样本视觉单词与所述待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一所述相似图像的匹配视觉单词;
    占比值计算单元,用于计算每一所述相似图像的所述匹配视觉单词在对应的所述样本视觉单词集中的占比值,得到每一所述样本图像与所述待匹配图像的图像共现比例值。
  10. 如权利要求7所述的图像匹配装置,其中,所述倒排索引表生成模块包括:
    作为单元,用于将每一所述视觉单词作为预设索引表中的主键;
    确定单元,用于基于每一所述样本图像的目标视觉单词集,确定每一所述视觉单词对应的样本图像;
    倒排索引表生成单元,用于将每一所述视觉单词对应的所述样本图像作为对应的所述视觉单词的主键值,生成倒排索引表。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
    计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
    对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
    计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
    将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
  12. 如权利要求11所述的计算机设备,其中,在所述对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取样本图像集,所述样本图像集包括多个样本图像;
    对每一所述样本图像进行特征提取,得到每一所述样本图像的样本深度特征和多个样本视觉特征;
    对每一所述样本图像的每一所述样本视觉特征进行聚类处理,生成视觉单词词典,所述视觉单词词典包括多个视觉单词;
    对每一所述样本图像的每一所述样本视觉特征,计算每一所述样本图像的所述样本视觉特征与所述视觉单词词典中的每一所述视觉单词的距离,将与所述样本图像的所述样本视觉特征距离最小的视觉单词,确定为对应的所述样本图像的所述样本视觉特征的目标视觉单词;
    将所述样本视觉特征的每一所述目标视觉单词组成对应的所述样本图像的目标视觉单词集;
    基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表。
  13. 如权利要求11所述的计算机设备,其中,所述在将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取所述匹配图像组中每一匹配图像的匹配图像信息;
    计算所述待匹配图像的所述待匹配图像信息与每一所述匹配图像的所述匹配图像信息之间的相似度,得到信息相似值;
    对每一所述信息相似值进行统计分析,将所述信息相似值最大的所述匹配图像,作为目标图像。
  14. 如权利要求11所述的计算机设备,其中,所述计算所述待匹配视觉单词集与所述倒排索引表中的所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,包括:
    获取待匹配视觉单词集;
    将每一所述相似图像的样本视觉单词集中的每一样本视觉单词与所述待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一所述相似图像的匹配视觉单词;
    计算每一所述相似图像的所述匹配视觉单词在对应的所述样本视觉单词集中的占比值,得到每一所述样本图像与所述待匹配图像的图像共现比例值。
  15. 如权利要求12所述的计算机设备,其中,所述基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表,包括:
    将每一所述视觉单词作为预设索引表中的主键;
    基于每一所述样本图像的目标视觉单词集,确定每一所述视觉单词对应的样本图像;
    将每一所述视觉单词对应的所述样本图像作为对应的所述视觉单词的主键值,生成倒排索引表。
  16. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待匹配图像,对所述待匹配图像进行特征提取,得到所述待匹配图像的待匹配深度特征和多个待匹配视觉特征;
    计算所述待匹配图像的所述待匹配深度特征与预设的图像深度特征库中的每一样本图像的样本深度特征之间的特征相似度,提取出所述特征相似度大于预设的相似度阈值的样本图像,组成相似图像集;
    对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离,将与所述待匹配视觉特征距离最小的视觉单词,确定为所述待匹配视觉特征的待匹配视觉单词,将所述待匹配视觉单词组成待匹配视觉单词集;
    计算所述待匹配视觉单词集与所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,其中,所述样本视觉单词集指与相似图像中的样本视觉特征距离最小的视觉单词所组成的视觉单词集;
    将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组。
  17. 如权利要求16所述的可读存储介质,其中,在所述对所述待匹配图像的每一所述待匹配视觉特征,计算所述待匹配视觉特征与预设的倒排索引表中的每一视觉单词之间的距离之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    获取样本图像集,所述样本图像集包括多个样本图像;
    对每一所述样本图像进行特征提取,得到每一所述样本图像的样本深度特征和多个样本视觉特征;
    对每一所述样本图像的每一所述样本视觉特征进行聚类处理,生成视觉单词词典,所述视觉单词词典包括多个视觉单词;
    对每一所述样本图像的每一所述样本视觉特征,计算每一所述样本图像的所述样本视觉特征与所述视觉单词词典中的每一所述视觉单词的距离,将与所述样本图像的所述样本视觉特征距离最小的视觉单词,确定为对应的所述样本图像的所述样本视觉特征的目标视觉单词;
    将所述样本视觉特征的每一所述目标视觉单词组成对应的所述样本图像的目标视觉单词集;
    基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表。
  18. 如权利要求16所述的可读存储介质,其中,所述待匹配图像包括待匹配图像信息,其中,所述在将所述图像共现比例值大于预设的共现比例阈值的相似图像,组成匹配图像组之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    获取所述匹配图像组中每一匹配图像的匹配图像信息;
    计算所述待匹配图像的所述待匹配图像信息与每一所述匹配图像的所述匹配图像信息之间的相似度,得到信息相似值;
    对每一所述信息相似值进行统计分析,将所述信息相似值最大的所述匹配图像,作为目标图像。
  19. 如权利要求16所述的可读存储介质,其中,所述计算所述待匹配视觉单词集与所述倒排索引表中的所述相似图像集中的每一相似图像的样本视觉单词集之间的图像共现比例,得到每一所述相似图像与所述待匹配图像的图像共现比例值,包括:
    获取待匹配视觉单词集;
    将每一所述相似图像的样本视觉单词集中的每一样本视觉单词与所述待匹配视觉单词集中的每一待匹配视觉单词进行一一匹配,得到每一所述相似图像的匹配视觉单词;
    计算每一所述相似图像的所述匹配视觉单词在对应的所述样本视觉单词集中的占比值,得到每一所述样本图像与所述待匹配图像的图像共现比例值。
  20. 如权利要求17所述的可读存储介质,其中,所述基于每一所述样本图像的所述目标视觉单词集,建立每一所述视觉单词与对应的所述样本图像之间的映射关系,生成倒排索引表,包括:
    将每一所述视觉单词作为预设索引表中的主键;
    基于每一所述样本图像的目标视觉单词集,确定每一所述视觉单词对应的样本图像;
    将每一所述视觉单词对应的所述样本图像作为对应的所述视觉单词的主键值,生成倒排索引表。
PCT/CN2020/093343 2019-10-11 2020-05-29 图像匹配方法、装置、计算机设备及存储介质 WO2021068524A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910964148.X 2019-10-11
CN201910964148.XA CN110956195B (zh) 2019-10-11 2019-10-11 图像匹配方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021068524A1 true WO2021068524A1 (zh) 2021-04-15

Family

ID=69976365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093343 WO2021068524A1 (zh) 2019-10-11 2020-05-29 图像匹配方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110956195B (zh)
WO (1) WO2021068524A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676774A (zh) * 2022-03-25 2022-06-28 北京百度网讯科技有限公司 数据处理方法、装置、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956195B (zh) * 2019-10-11 2023-06-02 平安科技(深圳)有限公司 图像匹配方法、装置、计算机设备及存储介质
CN111859004A (zh) * 2020-07-29 2020-10-30 书行科技(北京)有限公司 检索图像的获取方法、装置、设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360435A (zh) * 2011-10-26 2012-02-22 西安电子科技大学 基于隐含主题分析的不良图像检测方法
CN103970769A (zh) * 2013-01-29 2014-08-06 华为技术有限公司 图像检索方法及装置
CN106886783A (zh) * 2017-01-20 2017-06-23 清华大学 一种基于区域特征的图像检索方法及系统
CN108334644A (zh) * 2018-03-30 2018-07-27 百度在线网络技术(北京)有限公司 图像识别方法和装置
US20190206077A1 (en) * 2018-01-02 2019-07-04 Chung Ang University Industry Academic Cooperation Foundation Apparatus and method for re-identifying object in image processing
CN110956195A (zh) * 2019-10-11 2020-04-03 平安科技(深圳)有限公司 图像匹配方法、装置、计算机设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970548B (zh) * 2012-11-27 2015-01-21 西安交通大学 一种图像深度感知装置
CN103714549B (zh) * 2013-12-30 2016-06-08 南京大学 基于快速局部匹配的立体图像对象分割方法
CN105005755B (zh) * 2014-04-25 2019-03-29 北京邮电大学 三维人脸识别方法和系统
CN106649490B (zh) * 2016-10-08 2020-06-16 中国人民解放军理工大学 一种基于深度特征的图像检索方法及装置
US10592743B2 (en) * 2017-08-24 2020-03-17 International Business Machines Corporation Machine learning to predict cognitive image composition
CN108537837B (zh) * 2018-04-04 2023-05-05 腾讯科技(深圳)有限公司 一种深度信息确定的方法及相关装置
CN108647307A (zh) * 2018-05-09 2018-10-12 京东方科技集团股份有限公司 图像处理方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360435A (zh) * 2011-10-26 2012-02-22 西安电子科技大学 基于隐含主题分析的不良图像检测方法
CN103970769A (zh) * 2013-01-29 2014-08-06 华为技术有限公司 图像检索方法及装置
CN106886783A (zh) * 2017-01-20 2017-06-23 清华大学 一种基于区域特征的图像检索方法及系统
US20190206077A1 (en) * 2018-01-02 2019-07-04 Chung Ang University Industry Academic Cooperation Foundation Apparatus and method for re-identifying object in image processing
CN108334644A (zh) * 2018-03-30 2018-07-27 百度在线网络技术(北京)有限公司 图像识别方法和装置
CN110956195A (zh) * 2019-10-11 2020-04-03 平安科技(深圳)有限公司 图像匹配方法、装置、计算机设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676774A (zh) * 2022-03-25 2022-06-28 北京百度网讯科技有限公司 数据处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110956195B (zh) 2023-06-02
CN110956195A (zh) 2020-04-03

Similar Documents

Publication Publication Date Title
WO2021068524A1 (zh) 图像匹配方法、装置、计算机设备及存储介质
CN108595695B (zh) 数据处理方法、装置、计算机设备和存储介质
EP3855324A1 (en) Associative recommendation method and apparatus, computer device, and storage medium
CN110866491B (zh) 目标检索方法、装置、计算机可读存储介质和计算机设备
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
US9063954B2 (en) Near duplicate images
WO2021114810A1 (zh) 基于图结构的公文推荐方法、装置、计算机设备及介质
WO2021012382A1 (zh) 配置聊天机器人的方法、装置、计算机设备和存储介质
EP3890333A1 (en) Video cutting method and apparatus, computer device and storage medium
US11714921B2 (en) Image processing method with ash code on local feature vectors, image processing device and storage medium
WO2021082426A1 (zh) 人脸聚类方法、装置、计算机设备及存储介质
WO2021258848A1 (zh) 数据字典生成方法、数据查询方法、装置、设备及介质
CN109325118B (zh) 不平衡样本数据预处理方法、装置和计算机设备
WO2020114100A1 (zh) 一种信息处理方法、装置和计算机存储介质
CN109271917B (zh) 人脸识别方法、装置、计算机设备和可读存储介质
US11734341B2 (en) Information processing method, related device, and computer storage medium
CN110689323A (zh) 图片审核方法、装置、计算机设备和存储介质
CN112926654A (zh) 预标注模型训练、证件预标注方法、装置、设备及介质
WO2022105119A1 (zh) 意图识别模型的训练语料生成方法及其相关设备
CN111832581A (zh) 肺部特征识别方法、装置、计算机设备及存储介质
CN111209061B (zh) 用户信息的填写方法、装置、计算机设备和存储介质
WO2021135063A1 (zh) 病理数据分析方法、装置、设备及存储介质
CN114547257B (zh) 类案匹配方法、装置、计算机设备及存储介质
CN110688516A (zh) 图像检索方法、装置、计算机设备和存储介质
WO2022142032A1 (zh) 手写签名校验方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874181

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20874181

Country of ref document: EP

Kind code of ref document: A1