CN112148909B - Method and system for searching similar pictures - Google Patents

Method and system for searching similar pictures Download PDF

Info

Publication number
CN112148909B
CN112148909B CN202010984382.1A CN202010984382A CN112148909B CN 112148909 B CN112148909 B CN 112148909B CN 202010984382 A CN202010984382 A CN 202010984382A CN 112148909 B CN112148909 B CN 112148909B
Authority
CN
China
Prior art keywords
picture
alternative
pictures
target
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010984382.1A
Other languages
Chinese (zh)
Other versions
CN112148909A (en
Inventor
张景鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimeng Chuangke Network Technology China Co Ltd
Original Assignee
Weimeng Chuangke Network Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimeng Chuangke Network Technology China Co Ltd filed Critical Weimeng Chuangke Network Technology China Co Ltd
Priority to CN202010984382.1A priority Critical patent/CN112148909B/en
Publication of CN112148909A publication Critical patent/CN112148909A/en
Application granted granted Critical
Publication of CN112148909B publication Critical patent/CN112148909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of big data, in particular to a method and a system for searching similar pictures, wherein the method comprises the following steps: inputting each picture into a classifier to obtain the feature vector of each picture and the probability of classifying different categories; matching and selecting a category for each picture from each category according to the probability that each picture belongs to different categories; searching a picture meeting a preset condition between the candidate category and the target picture candidate category as a candidate picture of the target picture; and selecting an alternative picture with the distance between the feature vector and the feature vector of the target picture smaller than a preset distance threshold as a similar picture of the target picture. The method and the system can realize quick image searching service aiming at the images with large data volume.

Description

Method and system for searching similar pictures
Technical Field
The invention relates to the technical field of big data, in particular to a method and a system for searching similar pictures.
Background
The graph searching service is widely applied to various fields such as big data image retrieval, internet image material searching, shopping searching and the like. The picture searching is a technology for searching similar pictures by inputting a picture, and provides a searching technology for searching related graphic image materials for users. Various disciplines such as databases, data caching, computer vision, image processing, information retrieval, and the like are involved. The key technology is the feature representation and similarity measure.
When the graphic search service is performed for a large amount of data, the search speed of the teacher is often slow because the amount of search data is too large.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects of the prior art, and provides a method and a system for searching similar pictures, which can realize quick picture searching service aiming at pictures with large data volume.
In order to achieve the above technical object, in one aspect, the method for searching similar pictures provided by the present invention includes:
inputting each picture into a classifier to obtain the feature vector of each picture and the probability of classifying different categories;
matching and selecting a category for each picture from each category according to the probability that each picture belongs to different categories;
searching a picture meeting a preset condition between the candidate category and the target picture candidate category as a candidate picture of the target picture;
and selecting an alternative picture with the distance between the feature vector and the feature vector of the target picture smaller than a preset distance threshold as a similar picture of the target picture.
On the other hand, the system for searching similar pictures provided by the invention comprises the following steps:
the classifier is used for obtaining the feature vector of each picture and the probability of classifying different categories after inputting each picture;
the matching unit is used for matching each picture with a selected category from various categories according to the probability that each picture belongs to different categories;
the database is used for searching pictures meeting preset conditions between the candidate category and the target picture candidate category and taking the pictures as candidate pictures of the target picture;
the selecting unit is used for selecting the candidate picture with the distance between the feature vector and the feature vector of the target picture smaller than the preset distance threshold value as the similar picture of the target picture.
According to the invention, similar pictures are found through the probabilities that the pictures belong to different categories, so that the efficiency of searching the pictures can be improved. In addition, the similarity degree of the searched pictures can be improved by returning the similar pictures through the feature vectors of the pictures. Therefore, the invention can realize quick image searching service aiming at images with large data volume.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of an embodiment of the present invention;
FIG. 3 is a schematic flow chart of constructing and storing pictures by row keys according to an embodiment of the invention;
fig. 4 is a schematic flow chart of searching pictures in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the method for searching similar pictures according to the present invention includes:
101. inputting each picture into a classifier to obtain the feature vector of each picture and the probability of classifying different categories;
102. matching and selecting a category for each picture from each category according to the probability that each picture belongs to different categories;
103. searching a picture meeting a preset condition between the candidate category and the target picture candidate category as a candidate picture of the target picture;
104. and selecting an alternative picture with the distance between the feature vector and the feature vector of the target picture smaller than a preset distance threshold as a similar picture of the target picture.
The matching and selecting category for each picture from each category according to the probability that each picture belongs to different categories, specifically comprising:
for the current picture:
1021. the probability of the pictures belonging to different categories is arranged in a descending order;
1022. and selecting the classes corresponding to the first N probabilities as alternative classes of the picture, wherein N is more than or equal to 1.
Before each picture is input into the classifier, the method further comprises the following steps:
105. training the classifier;
after the matching of each picture from each category is selected, the method further comprises:
106. and storing each picture into a database according to the alternative category of each picture.
According to the alternative category of each picture, each picture is stored in a database, and the method specifically comprises the following steps:
for the current picture:
1061. splicing names of all the alternative categories of the picture into alternative identifications according to the sequence of the corresponding probability descending order;
1062. and storing the picture into a non-relational distributed database HBase, taking the alternative identification as a row key of the picture, and storing the characteristic vector and the picture ID of the picture in a column value corresponding to the row key.
The searching for the picture meeting the preset condition between the candidate category and the candidate category of the target picture, as the candidate picture of the target picture, specifically comprises:
when n=1, or N > 1 and the difference between probabilities corresponding to sequentially adjacent ones of the target picture candidate categories is greater than a preset probability threshold:
1031a, searching row keys of each picture, and determining a picture ID which is the same as the row key of the target picture;
1033a, obtaining a corresponding picture as an alternative picture of the target picture according to the determined picture ID;
when N is more than 1 and the difference between probabilities corresponding to sequentially adjacent categories in the target picture candidate categories is less than or equal to a preset probability threshold value:
1031b, splicing the name modification sequence of the target picture alternative category into a sequence identification;
1032b, updating the corresponding row key by using the sequence changing identification of the target picture;
1033b, searching row keys of each picture, and respectively determining the picture ID which is the same as the row key before updating the target picture and the picture ID which is the same as the row key after updating the target picture;
1034b, according to the determined picture ID, obtaining the corresponding picture as an alternative picture of the target picture.
The selecting the candidate picture with the distance between the feature vector and the feature vector of the target picture smaller than the preset distance threshold as the similar picture of the target picture specifically comprises:
1041. acquiring corresponding feature vectors according to the picture IDs of the candidate pictures of the target picture;
1042. respectively calculating the distance between the feature vector of each candidate picture and the feature vector of the target picture;
1043. and taking the candidate picture with the distance smaller than a preset distance threshold as a similar picture of the target picture.
As shown in fig. 2, the system for searching similar pictures according to the present invention includes:
the classifier 21 is used for obtaining the feature vector of each picture and the probability of classifying different categories after inputting each picture;
a matching unit 22, configured to match each picture with a selected category from each category according to the probability that each picture belongs to a different category;
a search unit 23 for searching for a picture satisfying a predetermined condition between the candidate category and the target picture candidate category as a candidate picture of the target picture;
the selecting unit 24 is configured to select, as a similar picture of the target picture, an alternative picture in which a distance between the feature vector and the feature vector of the target picture is smaller than a preset distance threshold.
The matching unit 22 is specifically configured to:
for the current picture:
the probability of the pictures belonging to different categories is arranged in a descending order;
and selecting the classes corresponding to the first N probabilities as alternative classes of the picture, wherein N is more than or equal to 1.
The system further comprises a training unit 25, a storage unit 26 and a database 27, wherein:
the training unit 25 is configured to train the classifier;
the storage unit 26 is configured to store each picture in the database according to the candidate category of each picture.
The database 27 is HBase;
the storage unit 26 is specifically configured to:
for the current picture:
splicing names of all the alternative categories of the picture into alternative identifications according to the sequence of the corresponding probability descending order;
and storing the picture into a non-relational distributed database HBase, taking the alternative identification as a row key of the picture, and storing the characteristic vector and the picture ID of the picture in a column value corresponding to the row key.
The searching unit 23 is specifically configured to:
when n=1, or N > 1 and the difference between probabilities corresponding to sequentially adjacent ones of the target picture candidate categories is greater than a preset probability threshold:
searching row keys of each picture, and determining a picture ID which is the same as the row key of the target picture;
acquiring a corresponding picture as an alternative picture of the target picture according to the determined picture ID;
when N is more than 1 and the difference between probabilities corresponding to sequentially adjacent categories in the target picture candidate categories is less than or equal to a preset probability threshold value:
splicing the name modification sequence of the target picture alternative category into a sequence modification mark;
updating the corresponding row key by using the sequence changing identification of the target picture;
searching row keys of each picture, and respectively determining a picture ID which is the same as the row key before updating the target picture and a picture ID which is the same as the row key after updating the target picture;
and acquiring the corresponding picture as an alternative picture of the target picture according to the determined picture ID.
The selecting unit 24 is specifically configured to:
acquiring corresponding feature vectors according to the picture IDs of the candidate pictures of the target picture;
respectively calculating the distance between the feature vector of each candidate picture and the feature vector of the target picture;
and taking the candidate picture with the distance smaller than a preset distance threshold as a similar picture of the target picture.
According to the invention, similar pictures are found through the probabilities that the pictures belong to different categories, so that the efficiency of searching the pictures can be improved. In addition, the similarity degree of the searched pictures can be improved by returning the similar pictures through the feature vectors of the pictures. Therefore, the invention can realize quick image searching service aiming at images with large data volume.
The following describes the above technical solution of the embodiment of the present invention in detail with reference to an application example:
in this embodiment, the classifier uses a VGG16 network in the convolutional neural network; the database is HBase. HBase is a distributed, column-oriented open source database, which is a distributed storage system for structured data.
As shown in fig. 4, the method for searching similar pictures according to the embodiment includes:
index construction and picture storage:
step1, training the VGG16 network;
in this embodiment, the VGG16 network can classify 1000 classes.
Step2, inputting all pictures, including the picture to be searched and the target picture, into a VGG16 network, and outputting the feature vector of each picture and the probability of belonging to different categories;
each feature vector generated by the VGG16 network is 512-dimensional, and the generated category has 1000 categories. Thus, after each picture is input to the VGG16 network, each feature vector of 512 dimensions and 1000 probabilities are correspondingly generated.
For example, the feature vector generated by the picture a is a, and the class probability value of the 1000 class is P (representing 1000 probability values).
Step3, matching alternative categories of the current picture from 1000 categories according to the 1000 corresponding probabilities aiming at the current picture;
step3.1, arranging 1000 probabilities correspondingly output by the current picture in a descending order according to the probability value;
sorting 1000 probability values in P generated by the picture a from big to small;
step3.2, selecting the category corresponding to the first two probabilities as the candidate category of the current picture;
in this embodiment, the selected candidate categories are the two categories with the largest category probability, that is, the two categories to which the picture is most likely to belong are indicated;
taking the category corresponding to the first two probabilities in P as the alternative category of the picture a, and recording the name: CLASS1 and CLASS2.
Step4, storing the current picture into the HBase according to the alternative category of the current picture;
step4.1, splicing names of all the alternative categories of the picture into alternative identifications according to the sequence of the corresponding descending probability sequences;
splicing the CLASS1 and the CLASS2 of the picture a according to the size sequence to form alternative marks: CLASS1-CLASS2;
step4.2, using the alternative identification of the picture as a corresponding rowkey, and storing the feature vector and ID of the picture by a value (column value);
CLASS1-CLASS2 is taken as rowkey of picture a, and A and a (representing the ID of picture a) are stored in the value of picture a.
Similarly, each picture is stored in HBase, whereby the maximum number of rowkeys is 1000×1000=100 tens of thousands. If the picture categories in the gallery are uniformly distributed at this time, the value corresponding to 1 hundred million data amounts is about 100 pieces on average.
If the number of pictures in a certain category is particularly more or particularly less, the number of categories serving as rowkeys is flexibly adjusted, and the former category is changed into the former category or the former three categories in the step 3.2.
In this embodiment, the picture ID saved by the value may be pID (process identifier) of the picture.
To this end, each picture has completed storing, and the index is also constructed. In this embodiment, the rowkey of HBase is used as an index.
The method for searching similar pictures according to the embodiment comprises the following steps: searching:
step5, searching in the HBase by utilizing a rowkey of the target picture, wherein the picture corresponding to the alternative category meeting the preset condition is an alternative picture of the target picture;
after the indexing and storing are completed, when a user inputs a picture b (target picture) to be searched, a feature vector A2 of the picture and first two classifications of CLASS1 and CLASS2 are obtained, namely, rowkey of the target picture b is CLASS1-CLASS2;
when the probability difference value of the two alternative categories of the target picture is larger than a preset probability threshold value:
step5.1a, retrieving rowkeys of each picture, and returning a picture ID which is the same rowkey as the target picture;
according to the value corresponding to the get in the CLASS1-CLASS2 dehbase, returning all rowkeys to be picture IDs of the CLASS1-CLASS2;
step5.2a, acquiring a corresponding picture as an alternative picture of the target picture according to the determined picture ID;
obtaining an alternative picture of the picture b according to the ID returned by the step 5.1a;
when the probability difference value of the two alternative categories of the target picture is smaller than or equal to a preset probability threshold value:
step5.1b, splicing the name modification sequences of the two alternative categories of the target picture into a sequence-changing identifier;
step5.2b, updating the corresponding rowkey by using the permuting mark of the target picture;
modifying the rowkey of the picture b to be CLASS2-CLASS1;
step5.3b, searching row keys of each picture, and respectively determining a picture ID which is the same as the row key before updating the target picture and a picture ID which is the same as the row key after updating the target picture;
firstly, removing corresponding value values of the get in the hbase according to the CLASS1-CLASS2, and returning all rowkeys as picture IDs of the CLASS1-CLASS2; then, according to the value corresponding to the get in the CLASS2-CLASS1 dehbase, returning all rowkeys as the picture IDs of the CLASS2-CLASS1;
step5.4b, obtaining a corresponding picture as an alternative picture of the target picture according to the determined picture ID;
and obtaining an alternative picture of the picture b according to the ID returned twice in the step 5.3b.
If the number of the candidate pictures returned in Step5 does not meet the preset number threshold, returning to Step3.2 to search by taking the CLASS1-CLASS3 as the rowkey of the picture b until the required number of the candidate pictures returned is met.
In this embodiment, when the probability difference between the two alternative categories of the target picture is smaller, it indicates that the classification of the picture swings between the two alternative categories, so in order to increase the number of recalled pictures, the rowkey of the target picture can be modified after the order of the two alternative categories is changed to search for the returned picture again.
Step6, selecting a similar picture of the target picture, wherein the distance between the feature vector and the feature vector of the target picture is smaller than a preset distance threshold value, from each candidate picture corresponding to the current target picture;
step6.1, obtaining feature vectors of each candidate picture according to the ID of each candidate picture corresponding to the target picture;
step6.2, respectively calculating the distance between the feature vector of each candidate picture and the feature vector of the target picture;
step6.3, taking the candidate picture with the distance smaller than a preset distance threshold value as a similar picture of the target picture.
In this embodiment, similar pictures are selected according to the distance between the basis feature vectors, or vectorization dot product operation can be directly performed, and after dot product operation is taken out, larger k result picture IDs are obtained, and picture display is performed.
According to the embodiment, a classification convolutional neural network model is trained, then each picture in a gallery is extracted through the model to conduct feature vectorization, and the first two vectorized main categories are used as rowkeys of hbase; and then, completely matching according to the first two categories of the target picture, and comparing the obtained results in distance. Therefore, the method can realize quick image searching service aiming at the images with large data volume.
The embodiment of the present invention provides a system for searching similar pictures, which can implement the method embodiment provided above, and specific functional implementation is referred to the description in the method embodiment and is not repeated herein.
It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. As will be apparent to those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or".
Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block), units, and steps described in connection with the embodiments of the invention may be implemented by electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components (illustrative components), elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation is not to be understood as beyond the scope of the embodiments of the present invention.
The various illustrative logical blocks or units described in the embodiments of the invention may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a user terminal. In the alternative, the processor and the storage medium may reside as distinct components in a user terminal.
In one or more exemplary designs, the above-described functions of embodiments of the present invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer readable media includes both computer storage media and communication media that facilitate transfer of computer programs from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media may include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store program code in the form of instructions or data structures and other data structures that may be read by a general or special purpose computer, or a general or special purpose processor. Further, any connection is properly termed a computer-readable medium, e.g., if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless such as infrared, radio, and microwave, and is also included in the definition of computer-readable medium. The disks (disks) and disks (disks) include compact disks, laser disks, optical disks, DVDs, floppy disks, and blu-ray discs where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included within the computer-readable media.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method of searching for similar pictures, the method comprising:
inputting each picture into a classifier to obtain the feature vector of each picture and the probability of classifying different categories;
matching and selecting a category for each picture from each category according to the probability that each picture belongs to different categories;
searching a picture meeting a preset condition between the candidate category and the target picture candidate category as a candidate picture of the target picture;
selecting an alternative picture with the distance between the feature vector and the feature vector of the target picture smaller than a preset distance threshold as a similar picture of the target picture;
the searching for the picture meeting the preset condition between the candidate category and the candidate category of the target picture, as the candidate picture of the target picture, specifically comprises:
when n=1, or N > 1 and the difference between probabilities corresponding to sequentially adjacent ones of the target picture candidate categories is greater than a preset probability threshold:
searching row keys of each picture, and determining a picture ID which is the same as the row key of the target picture; the line keys of the pictures are alternative identifications of the pictures, and the alternative identifications are formed by splicing names of various alternative categories of the pictures according to the sequence of the corresponding descending probability sequences; n represents the number of alternative categories of pictures;
acquiring a corresponding picture as an alternative picture of the target picture according to the determined picture ID;
when N is more than 1 and the difference between probabilities corresponding to sequentially adjacent categories in the target picture candidate categories is less than or equal to a preset probability threshold value:
splicing the name modification sequence of the target picture alternative category into a sequence modification mark;
updating the corresponding row key by using the sequence changing identification of the target picture;
searching row keys of each picture, and respectively determining a picture ID which is the same as the row key before updating the target picture and a picture ID which is the same as the row key after updating the target picture;
and acquiring the corresponding picture as an alternative picture of the target picture according to the determined picture ID.
2. The method for searching similar pictures according to claim 1, wherein the matching of each picture from each class according to the probability that each picture belongs to a different class comprises:
for the current picture:
the probability of the pictures belonging to different categories is arranged in a descending order;
and selecting the classes corresponding to the first N probabilities as alternative classes of the picture, wherein N is more than or equal to 1.
3. The method of searching for similar pictures according to claim 2, further comprising, before inputting each picture to the classifier:
training the classifier;
after the matching of each picture from each category is selected, the method further comprises:
and storing each picture into a database according to the alternative category of each picture.
4. A method for searching for similar pictures according to claim 3, wherein said storing each picture in a database according to the alternative category of each picture specifically comprises:
for the current picture:
splicing names of all the alternative categories of the picture into alternative identifications according to the sequence of the corresponding probability descending order;
and storing the picture into a non-relational distributed database HBase, taking the alternative identification as a row key of the picture, and storing the characteristic vector and the picture ID of the picture in a column value corresponding to the row key.
5. The method for searching for similar pictures according to claim 4, wherein the selecting the candidate picture having the distance between the feature vector and the feature vector of the target picture smaller than the preset distance threshold as the similar picture of the target picture specifically comprises:
acquiring corresponding feature vectors according to the picture IDs of the candidate pictures of the target picture;
respectively calculating the distance between the feature vector of each candidate picture and the feature vector of the target picture;
and taking the candidate picture with the distance smaller than a preset distance threshold as a similar picture of the target picture.
6. A system for searching for similar pictures, the system comprising:
the classifier is used for obtaining the feature vector of each picture and the probability of classifying different categories after inputting each picture;
the matching unit is used for matching each picture with a selected category from various categories according to the probability that each picture belongs to different categories;
the searching unit is used for searching pictures meeting preset conditions between the candidate category and the target picture candidate category and taking the pictures as candidate pictures of the target picture;
the selecting unit is used for selecting the candidate pictures with the distance between the feature vector and the feature vector of the target picture smaller than a preset distance threshold value as similar pictures of the target picture;
the searching unit is specifically configured to:
when n=1, or N > 1 and the difference between probabilities corresponding to sequentially adjacent ones of the target picture candidate categories is greater than a preset probability threshold:
searching row keys of each picture, and determining a picture ID which is the same as the row key of the target picture; the line keys of the pictures are alternative identifications of the pictures, and the alternative identifications are formed by splicing names of various alternative categories of the pictures according to the sequence of the corresponding descending probability sequences; n represents the number of alternative categories of pictures;
acquiring a corresponding picture as an alternative picture of the target picture according to the determined picture ID;
when N is more than 1 and the difference between probabilities corresponding to sequentially adjacent categories in the target picture candidate categories is less than or equal to a preset probability threshold value:
splicing the name modification sequence of the target picture alternative category into a sequence modification mark;
updating the corresponding row key by using the sequence changing identification of the target picture;
searching row keys of each picture, and respectively determining a picture ID which is the same as the row key before updating the target picture and a picture ID which is the same as the row key after updating the target picture;
and acquiring the corresponding picture as an alternative picture of the target picture according to the determined picture ID.
7. The system for searching for similar pictures according to claim 6, wherein the matching unit is specifically configured to:
for the current picture:
the probability of the pictures belonging to different categories is arranged in a descending order;
and selecting the classes corresponding to the first N probabilities as alternative classes of the picture, wherein N is more than or equal to 1.
8. The system for searching for similar pictures according to claim 7, further comprising a training unit, a storage unit, and a database, wherein:
the training unit is used for training the classifier;
the storage unit is used for storing each picture into the database according to the alternative category of each picture.
9. The system for searching for similar pictures according to claim 8, wherein said database is a non-relational distributed database HBase;
the storage unit is specifically configured to:
for the current picture:
splicing names of all the alternative categories of the picture into alternative identifications according to the sequence of the corresponding probability descending order;
and storing the picture into the HBase, taking the alternative identification as a row key of the picture, and storing the characteristic vector and the picture ID of the picture in a column value corresponding to the row key.
10. The system for searching for similar pictures according to claim 9, wherein said selection unit is specifically configured to:
acquiring corresponding feature vectors according to the picture IDs of the candidate pictures of the target picture;
respectively calculating the distance between the feature vector of each candidate picture and the feature vector of the target picture;
and taking the candidate picture with the distance smaller than a preset distance threshold as a similar picture of the target picture.
CN202010984382.1A 2020-09-18 2020-09-18 Method and system for searching similar pictures Active CN112148909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010984382.1A CN112148909B (en) 2020-09-18 2020-09-18 Method and system for searching similar pictures

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010984382.1A CN112148909B (en) 2020-09-18 2020-09-18 Method and system for searching similar pictures

Publications (2)

Publication Number Publication Date
CN112148909A CN112148909A (en) 2020-12-29
CN112148909B true CN112148909B (en) 2024-03-29

Family

ID=73893175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010984382.1A Active CN112148909B (en) 2020-09-18 2020-09-18 Method and system for searching similar pictures

Country Status (1)

Country Link
CN (1) CN112148909B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927131B (en) * 2021-01-16 2022-11-11 中建三局第一建设工程有限责任公司 Picture splicing method and device, computer equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692224A (en) * 2009-07-08 2010-04-07 南京师范大学 High-resolution remote sensing image search method fused with spatial relation semantics
CN103679674A (en) * 2013-11-29 2014-03-26 航天恒星科技有限公司 Method and system for splicing images of unmanned aircrafts in real time
CN105427272A (en) * 2014-09-17 2016-03-23 富士通株式会社 Image processing device, image processing method and electronic device
CN105957063A (en) * 2016-04-22 2016-09-21 北京理工大学 CT image liver segmentation method and system based on multi-scale weighting similarity measure
CN106021362A (en) * 2016-05-10 2016-10-12 百度在线网络技术(北京)有限公司 Query picture characteristic representation generation method and device, and picture search method and device
CN107067020A (en) * 2016-12-30 2017-08-18 腾讯科技(上海)有限公司 Image identification method and device
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
EP3478728A1 (en) * 2016-06-30 2019-05-08 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell annotation with adaptive incremental learning
CN109783671A (en) * 2019-01-30 2019-05-21 京东方科技集团股份有限公司 A kind of method, computer-readable medium and server to scheme to search figure
CN109902198A (en) * 2019-03-11 2019-06-18 京东方科技集团股份有限公司 A kind of method, apparatus and application system to scheme to search figure
CN110019913A (en) * 2018-06-01 2019-07-16 平安好房(上海)电子商务有限公司 Picture match method, user equipment, storage medium and device
CN110660023A (en) * 2019-09-12 2020-01-07 中国测绘科学研究院 Video stitching method based on image semantic segmentation
CN110851639A (en) * 2018-07-24 2020-02-28 浙江大华技术股份有限公司 Method and equipment for searching picture by picture
CN110942003A (en) * 2019-11-20 2020-03-31 中国建设银行股份有限公司 Personnel track searching method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7016884B2 (en) * 2002-06-27 2006-03-21 Microsoft Corporation Probability estimate for K-nearest neighbor

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692224A (en) * 2009-07-08 2010-04-07 南京师范大学 High-resolution remote sensing image search method fused with spatial relation semantics
CN103679674A (en) * 2013-11-29 2014-03-26 航天恒星科技有限公司 Method and system for splicing images of unmanned aircrafts in real time
CN105427272A (en) * 2014-09-17 2016-03-23 富士通株式会社 Image processing device, image processing method and electronic device
CN105957063A (en) * 2016-04-22 2016-09-21 北京理工大学 CT image liver segmentation method and system based on multi-scale weighting similarity measure
CN106021362A (en) * 2016-05-10 2016-10-12 百度在线网络技术(北京)有限公司 Query picture characteristic representation generation method and device, and picture search method and device
EP3478728A1 (en) * 2016-06-30 2019-05-08 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell annotation with adaptive incremental learning
CN107067020A (en) * 2016-12-30 2017-08-18 腾讯科技(上海)有限公司 Image identification method and device
CN110019913A (en) * 2018-06-01 2019-07-16 平安好房(上海)电子商务有限公司 Picture match method, user equipment, storage medium and device
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
CN110851639A (en) * 2018-07-24 2020-02-28 浙江大华技术股份有限公司 Method and equipment for searching picture by picture
CN109783671A (en) * 2019-01-30 2019-05-21 京东方科技集团股份有限公司 A kind of method, computer-readable medium and server to scheme to search figure
CN109902198A (en) * 2019-03-11 2019-06-18 京东方科技集团股份有限公司 A kind of method, apparatus and application system to scheme to search figure
CN110660023A (en) * 2019-09-12 2020-01-07 中国测绘科学研究院 Video stitching method based on image semantic segmentation
CN110942003A (en) * 2019-11-20 2020-03-31 中国建设银行股份有限公司 Personnel track searching method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
静动态结合的恶意Android应用自动检测技术;黄浩华等;《 信息安全学报》;20171015;第27-40页 *

Also Published As

Publication number Publication date
CN112148909A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
Biten et al. Icdar 2019 competition on scene text visual question answering
CN110188223B (en) Image processing method and device and computer equipment
Zhou et al. BSIFT: Toward data-independent codebook for large scale image search
KR102092263B1 (en) How to find K poles within a certain processing time
Kang et al. Maximum-margin hamming hashing
WO2008026414A1 (en) Image recognition method, image recognition device, and image recognition program
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
CN109582847B (en) Information processing method and device and storage medium
CN108256044A (en) Direct broadcasting room recommends method, apparatus and electronic equipment
He et al. Bidirectional discrete matrix factorization hashing for image search
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
CN112988784B (en) Data query method, query statement generation method and device
CN114461890A (en) Hierarchical multi-modal intellectual property search engine method and system
CN112148909B (en) Method and system for searching similar pictures
CN114238746A (en) Cross-modal retrieval method, device, equipment and storage medium
CN110956271A (en) Multi-stage classification method and device for mass data
CN113076740A (en) Synonym mining method and device in government affair service field
CN116883740A (en) Similar picture identification method, device, electronic equipment and storage medium
CN116340551A (en) Similar content determining method and device
CN115577147A (en) Visual information map retrieval method and device, electronic equipment and storage medium
CN115952800A (en) Named entity recognition method and device, computer equipment and readable storage medium
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
CN113626574B (en) Information query method, system and device and medium
CN112905820B (en) Multi-graph retrieval method based on logic learning
CN112528021B (en) Model training method, model training device and intelligent equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant