CN109697451A - Similar image clustering method and device, storage medium, electronic equipment - Google Patents

Similar image clustering method and device, storage medium, electronic equipment Download PDF

Info

Publication number
CN109697451A
CN109697451A CN201710994492.4A CN201710994492A CN109697451A CN 109697451 A CN109697451 A CN 109697451A CN 201710994492 A CN201710994492 A CN 201710994492A CN 109697451 A CN109697451 A CN 109697451A
Authority
CN
China
Prior art keywords
image
value code
clustering method
described image
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710994492.4A
Other languages
Chinese (zh)
Other versions
CN109697451B (en
Inventor
黄志标
安山
陈宇
貟雯婷
翁志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710994492.4A priority Critical patent/CN109697451B/en
Publication of CN109697451A publication Critical patent/CN109697451A/en
Application granted granted Critical
Publication of CN109697451B publication Critical patent/CN109697451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure is directed to a kind of similar image clustering method, similar image clustering apparatus, computer readable storage medium and electronic equipments, it is related to technical field of image processing, this method comprises: passing through the characteristics of image of convolutional neural networks model extraction multiple images and carrying out Hash mapping to described image feature;Described image feature is clustered by strong continune component clustering method, clusters classification to determine;A classification logotype is provided for each described image feature and the similar image is obtained from database according to the classification logotype.Similar image cluster efficiency can be improved in the disclosure.

Description

Similar image clustering method and device, storage medium, electronic equipment
Technical field
This disclosure relates to technical field of image processing, in particular to a kind of similar image clustering method, similar image Clustering apparatus, computer readable storage medium and electronic equipment.
Background technique
In the image storing process in the fields such as image retrieval, the copyright protection of image, video intelligent analysis, often go out The phenomenon that storing same or similar image is now repeated, in order to avoid the appearance of the phenomenon, similar image can be clustered To be handled according to cluster result similar image.
In the related technology, K mean cluster method, DBSCAN (Density-Based Spatial can be used mostly Clustering of Applications with Noise) density clustering method or hierarchy clustering method be to phase It is clustered like image.Wherein, K mean cluster method from sample by selecting initial classes center at random, by by current sample Originally sample class is then calculated as the classification where sample the smallest class center number of Euclidean distance between multiple class centers The mean vector of feature vector and class center corresponding to current class is updated in not corresponding set;DBSCAN clustering method can Surrounding core point place is determined with the number using sample point in the neighborhood of each very little to calculate the density at the point Classification;Hierarchical clustering can be used as the bottom, and the method that classification is merged two-by-two by the way that all samples are respectively set as a kind of, Or all samples are gathered for multiple classifications as top, and the method for being iterated fractionation to each classification sample determines class Shuo not.
There may be following problems in above-mentioned clustering method: needing when one, clustering to image artificial specified in advance poly- The classification number of class, accuracy is poor and efficiency is lower;Two, number of samples is larger, complexity is higher in the dimension of feature vector The case where a large amount of memories can be occupied, therefore lead to low memory.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of similar image clustering method, similar image clustering apparatus, computer-readable Storage medium and electronic equipment, and then caused by overcoming the limitation and defect due to the relevant technologies at least to a certain extent One or more problem.
Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by the disclosure Practice and acquistion.
According to one aspect of the disclosure, a kind of similar image clustering method is provided, comprising:
Pass through the characteristics of image of convolutional neural networks model extraction multiple images and described image feature progress Hash is reflected It penetrates;
Described image feature is clustered by strong continune component clustering method, clusters classification to determine;
A classification logotype is provided for each described image feature and the phase is obtained from database according to the classification logotype Like image.
In a kind of exemplary embodiment of the disclosure, carrying out Hash mapping to described image feature includes:
By Hash quantization coding method by the described image Feature Conversion of each described image be two-value code.
In a kind of exemplary embodiment of the disclosure, described image feature is carried out by strong continune component clustering method Cluster includes:
The similarity between multiple images is calculated by the two-value code and the two-value code is ranked up;
Digraph is constructed by vertex of the two-value code of each described image respectively;
All strong continune components in the digraph are searched to determine the categorical measure of cluster.
In a kind of exemplary embodiment of the disclosure, the similarity between multiple images is calculated simultaneously by the two-value code The two-value code is ranked up and includes:
The Hamming distance calculated between the corresponding two-value code of each described image and an inquiry two-value code is described to obtain Similarity;
The two-value code is ranked up according to the Hamming distance.
In a kind of exemplary embodiment of the disclosure, digraph is constructed by vertex of the two-value code of each described image Include:
When the Hamming distance between the two-value code and the inquiry two-value code meets preset condition, establish by institute State the directed edge that inquiry two-value code is directed toward the two-value code.
In a kind of exemplary embodiment of the disclosure, all strong continune components searched in the digraph include:
When belonging to same connected component by each vertex of preset data structure decision, searched using Tarjan algorithm The strong continune component in the digraph.
In a kind of exemplary embodiment of the disclosure, the Hamming distance and the similarity are negatively correlated.
According to one aspect of the disclosure, a kind of similar image clustering apparatus is provided, comprising:
Characteristic extracting module, for the characteristics of image by convolutional neural networks model extraction multiple images and to the figure As feature carries out Hash mapping;
Feature clustering module, for being clustered by strong continune component clustering method to described image feature, with determination Cluster classification;
Image collection module, for providing a classification logotype and according to the classification logotype from number for each described image feature According to obtaining the similar image in library.
According to one aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The computer program realizes similar image clustering method described in above-mentioned any one when being executed by processor.
According to one aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to execute phase described in above-mentioned any one via the executable instruction is executed Like image clustering method.
A kind of similar image clustering method, the similar image clustering apparatus, calculating provided in disclosure exemplary embodiment In machine readable storage medium storing program for executing and electronic equipment, by the characteristics of image of convolutional neural networks model extraction multiple images and to institute It states characteristics of image and carries out Hash mapping;Described image feature is clustered by strong continune component clustering method, and is each Described image feature provides a classification logotype and obtains the similar image from database according to the classification logotype.One side Face clusters described image feature by strong continune component clustering method, can automatically determine cluster class number, avoid In the related technology by manually determining the operation of clusters number, the accuracy rate and efficiency of cluster are improved;On the other hand, pass through The characteristics of image of convolutional neural networks model extraction multiple images simultaneously carries out Hash mapping to described image feature, can reduce figure As the size of feature, to reduce the memory consumption of characteristics of image occupancy.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 schematically shows a kind of similar image clustering method schematic diagram in disclosure exemplary embodiment;
Fig. 2 schematically shows a kind of specific flow chart of similar image clustering method in disclosure exemplary embodiment;
Fig. 3 schematically shows a kind of block diagram of similar image clustering apparatus in disclosure exemplary embodiment;
Fig. 4 schematically shows the block diagram of a kind of electronic equipment in disclosure exemplary embodiment;
Fig. 5 schematically shows a kind of program product in disclosure exemplary embodiment.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
A kind of similar image clustering method is provided firstly in this example embodiment, can be applied to image retrieval, electricity In the fields such as sub- business platform image copyright protection, video intelligent analysis.Refering to what is shown in Fig. 1, the similar image clustering method can With the following steps are included:
Step S110. by the characteristics of image of convolutional neural networks model extraction multiple images and to described image feature into Row Hash mapping;
Step S120. clusters described image feature by strong continune component clustering method, clusters classification to determine;
Step S130. provides a classification logotype for each described image feature and is obtained from database according to the classification logotype Take the similar image.
In the similar image clustering method provided in the present example embodiment, on the one hand, clustered by strong continune component Method clusters described image feature, can automatically determine cluster class number, avoids in the related technology by artificial The operation for determining clusters number improves the accuracy rate and efficiency of cluster;On the other hand, pass through convolutional neural networks model extraction The characteristics of image of multiple images simultaneously carries out Hash mapping to described image feature, can reduce the size of characteristics of image, to subtract The memory consumption of characteristics of image occupancy is lacked.
In the following, will be carried out to each step in similar image clustering method described in this example embodiment detailed Explanation and explanation.
Firstly, in step S100 as shown in Figure 2, can with input data, the data for example can for image sequence or Person's sequence of frames of video.
Next in step s 110, by the characteristics of image of convolutional neural networks model extraction multiple images and to described Characteristics of image carries out Hash mapping.
When carrying out cluster or other operations to image, since the memory space that image needs may result in greatly very much nothing All image datas are disposably stored in memory by method, in order to solve this problem, can be extracted by step S111 semantic special It seeks peace visual signature etc..Specifically, figure can be passed through by the characteristics of image of convolutional neural networks model extraction multiple images As the mode of feature expresses each image, to reduce the memory of image.
The basic structure of convolutional neural networks model (Convolutional Neural Network, CNN) includes feature Extract layer and Feature Mapping layer.Since the feature detection layer of CNN is learnt by training data, so being kept away when using CNN The feature extraction of display is exempted from, and has implicitly been learnt from training data, can have been reduced by convolutional neural networks model The complexity of data reconstruction in feature extraction and assorting process.
Characteristics of image may include local feature and global characteristics herein, in order to enable to characteristics of image become apparent from accurately The global characteristics and local feature of image are expressed in ground, need to train a convolutional neural networks model based on sequence in advance.Example A sample (X, Yp) can be such as taken from sample set, and X is inputted into the corresponding reality output of network query function, it is then practical by calculating Export the difference that Yp is exported with corresponding ideal, and the training volume in such a way that the method backpropagation of minimization error adjusts weight matrix Product neural network model.
Then the characteristics of image for passing through all images of convolutional neural networks model extraction, extracts the specific of characteristics of image Process may include: input one include (R), green (G), blue (B) three colors Three Channel Color image, by convolutional Neural The convolutional layer of network model, bias layer, down-sampling layer, active coating etc. finally export the characteristics of image of 1024 dimensions, figure herein As feature includes floating type feature.
After obtaining the floating type feature of image, Hash mapping operation further can be executed to described image feature, with By the floating type Feature Mapping of image a to point in Hamming space.Specifically, to described image feature in step S110 Carrying out Hash mapping may include step S112:
By Hash quantization coding method by the described image Feature Conversion of each described image be two-value code.
Wherein, all features can be indicated the floating of a d dimension by Hash quantization encoding using the binary string of 0,1 bit Point-type feature, so that the distance between feature can be by the distance between binary string Lai approximate.
When carrying out Hash mapping firstly the need of one Hash mapping model of study, by the floating type Feature Mapping of 1024 dimensions To a point in Hamming space, the dimension of the point is 1024 dimensions, i.e. 1024 bits by 0,1 character string, which is known as Two-value code.Assuming that there is N number of image, after convolutional neural networks model extraction feature and Hash mapping operation, this N number of image is all It is indicated by N number of two-value code in memory.
For example in conjunction with step S110, for the Three Channel Color image of a 800*600, convolutional Neural can be passed through Network model obtains its floating type feature, which includes 1024 dimensions, 4096 bytes of committed memory, using Hash After being mapped to the two-value code of 1024 dimensions, committed memory is 1024 bits, i.e. 128 bytes.Relative to described in conventional method 1 Hundred million image datas are put into for memory at least needs to occupy 380GB memory, the memory that two-value code feature occupies be it is original needed in / 8th deposited, to reduce the memory consumption of characteristics of image occupancy.
In the step s 120, described image feature is clustered by strong continune component clustering method, to determine cluster Classification.
If having a reachable path in digraph between any two vertex, which is referred to as strongly connected graph, when It is still strongly connected graph after removing the side of multi-quantity as far as possible, then the strongly connected graph is strong continune component.For example, in digraph G In, if (vi > vj) has a directed walk from vi to vj between two vertex vs i, vj, while there are also one from vj to vi Directed walk then claims two vertex strong continunes.If every two vertex all strong continunes of digraph G, claiming G is a strong continune Figure.The very big strong continune subgraph of digraph, referred to as strong continune component.
It herein can be by strong continune component clustering method to the two-value code feature obtained after Hash mapping operation It is clustered, the principle of the two-value code feature clustering method based on strong continune component is that the Semantic Similarity between image can be with It is indicated using the connectivity between graph theory midpoint, belongs between the other image of same class them and belong to strong continune point Amount.
Specifically, carrying out cluster to described image feature by strong continune component clustering method can specifically include step S121 to S123, in which:
Step S121: the similarity between multiple images is calculated by the two-value code and the two-value code is arranged Sequence;
Step S122: digraph is constructed by vertex of the two-value code of each described image respectively;
Step S123: all strong continune components in the digraph are searched to determine the categorical measure of cluster.
It is possible, firstly, to calculate the similarity between multiple images to carry out similarity retrieval and to institute by the two-value code Two-value code is stated to be ranked up;Next digraph is constructed by vertex of the two-value code of each described image respectively;Finally in institute State the categorical measure that all strong continune components are searched in digraph to determine cluster.
Next, each step in clustering method is specifically described.Wherein, it is calculated by the two-value code multiple Similarity between image is simultaneously ranked up the two-value code and may include:
The Hamming distance calculated between the corresponding two-value code of each described image and an inquiry two-value code is described to obtain Similarity;
The two-value code is ranked up according to the Hamming distance.
In this example, Hamming distance can be used to indicate that two equal length words correspond to the different quantity in position, can be to two A character string carries out XOR operation, and the number that statistical result is 1, then 1 number is Hamming distance.Such as: 1011101 with Hamming distance between 1001001 is that the Hamming distance between 2,2143896 and 2233796 is 3, " toned " and " roses " it Between Hamming distance be 3.It should be noted that the Hamming distance minimum value of 1024 two-value codes is 0, maximum value 1024.
Inquiry two-value code can be configured according to user demand, such as can be any one in multiple two-value codes. It can be by calculating the Hamming distance between the corresponding two-value code of each image and the inquiry two-value code of setting, and with the Hamming distance From the similarity indicated between each two-value code and the inquiry two-value code.Wherein, similarity and Hamming distance are negatively correlated.With It is illustrated for 1024 two-value codes, when the Hamming distance between two image two-value codes is 0, indicates between them Similarity is maximum;When the Hamming distance between two image two-value codes is 1024, indicate that the similarity between them is minimum.
After similarity has been calculated, all two-value codes can be ranked up according to the size of the Hamming distance, with Improving the efficiency of cluster operation, wherein the Hamming distance can be ranked up according to Heap algorithm or other algorithms, Specifically process can be realized by writing program code.
For example, it for N images, calculates different in Hamming distance i.e. 1024 between their corresponding two-value codes The number of position, such as Hamming distance between 0101 and 1011 are 3.Since the calculating time complexity of distance between N images is Ο(N2), but be to calculate the two-value code using 0,1 coding, therefore efficiency increases in this example.
Specifically, Hamming distance can be calculated by the hardware instruction in CPU and GPU _ _ popcnt.Assuming that a, b distinguish Indicate that length is 1024 two-value code, then the Hamming distance d between themh(a, b) can pass through formula dh(a, b)=_ popcnt (a ∧ b) is calculated.
After calculating the Hamming distance between N images, it can adjust the distance and be ranked up according to sequence from small to large. Specifically, when using a as inquiry two-value code, by calculating in inquiry two-value code a and memory between other N number of two-value codes Hamming distance can obtain being less than all two-value codes of H with the maximum preceding K two-value code of a similitude, and distance.Parameter K, H is controllable parameter when constructing digraph, can be configured according to actual needs.It, can by the adjusting to this 2 parameters To control the cluster classification sum finally obtained.Generally, when increasing K, H, obtained classification sum is reduced;When reducing K, H, obtain The classification sum arrived increases.
In addition to this, constructing digraph as vertex using the two-value code of each described image may include:
When the Hamming distance between the two-value code and the inquiry two-value code meets preset condition, establish by institute State the directed edge that inquiry two-value code is directed toward the two-value code.
When constructing digraph, the Hamming between each two-value code and the inquiry two-value code may determine that first Whether distance meets preset condition, and wherein preset condition can be the sequence of the two-value code in preceding K.
If the Hamming distance between each two-value code and the inquiry two-value code can be built in preceding K A directed edge is found, and the direction of the directed edge is to be directed toward the two-value code by the inquiry two-value code.It can be sentenced using circulation Disconnected method traverses all two-value codes in memory through the above steps, to establish digraph.
For example, for inquiring two-value code a, if establishing one by pushing up in preceding K at a distance from two-value code b and a Point a is directed toward the directed edge of vertex b, traverses two-value code all in memory, if the two-value code, in preceding K, increasing by one has Xiang Bian;It repeats to set up digraph with this.
In addition to this, all strong continune components searched in the digraph may include:
When belonging to same connected component by each vertex of preset data structure decision, searched using Tarjan algorithm The strong continune component in the digraph.
In this example, preset data structure can be Union-find Sets data structure.When searching the strong continune component of digraph, It can further be used using Union-find Sets data structure with judging whether two vertex belong to the same connected component Tarjan algorithm finds out all strong continune components.Wherein, the number of strong continune component is to cluster classification sum.Tarjan Algorithm is based on the algorithm searched for graph deep optimization, and each strong continune component is the stalk tree in search tree.Handle when search Untreated node is added a storehouse in current search tree, and when backtracking may determine that whether node of the stack top into stack is one Strong continune component.The detailed process that the strong continune component in the digraph is searched by Tarjan algorithm can pass through puppet The program that code, pascal code, C++ code or other language are write is realized.
Since the number of strong continune component is automatically determined by the parameter K in above-mentioned steps, and the number of strong continune component is It is total for cluster classification, therefore the artificial step for determining cluster classification sum by hand in the related technology is avoided, to improve Efficiency is clustered, while also avoiding caused mistake when manual operation, improves accuracy rate.
Next, in step s 130, providing a classification logotype for each described image feature and according to the classification logotype The similar image is obtained from database.
It, can based on step S120 after carrying out clustering determining cluster classification to multiple images feature by strong continune component A classification logotype is provided with the corresponding cluster classification of characteristics of image to indicate image.The category is identified as globally unique mark Number, such as can be number or other identifier, this is illustrated for sentencing number.Specifically, in finding out digraph After all strong continune components, then the image number having the same of all vertex correspondences in each connected component, the position of number Numerical example such as can be 11.
In this example, all strong continune components can be found out from digraph according to classification logotype, and then can be according to this Classification logotype quickly finds out all images of traversal from database and finds out similar image, is retouched with executing step S140 by similar image The goal task stated, the goal task can be copyright protection or key frame extraction.Similar diagram is searched by classification logotype The efficiency of similar image acquisition can be improved in the method for picture.
The disclosure additionally provides a kind of similar image clustering apparatus.Refering to what is shown in Fig. 3, the similar image clustering apparatus 300 It may include characteristic extracting module 301, feature clustering module 302, image collection module 303.Wherein:
Characteristic extracting module 301 can be used for through the characteristics of image of convolutional neural networks model extraction multiple images simultaneously Hash mapping is carried out to described image feature;
Feature clustering module 302 can be used for clustering described image feature by strong continune component clustering method, Classification is clustered to determine;
Image collection module 303 can be used for providing a classification logotype for each described image feature and according to the classification Mark obtains the similar image from database.
The detail of each module is in corresponding similar image clustering method in above-mentioned similar image clustering apparatus It is described in detail, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/ Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is executed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of electronic equipment that can be realized the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".
The electronic equipment 600 of this embodiment according to the present invention is described referring to Fig. 4.The electronics that Fig. 4 is shown Equipment 600 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 4, electronic equipment 600 is showed in the form of universal computing device.The component of electronic equipment 600 can wrap It includes but is not limited to: at least one above-mentioned processing unit 610, at least one above-mentioned storage unit 620, the different system components of connection The bus 630 of (including storage unit 620 and processing unit 610).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 610 Row, so that various according to the present invention described in the execution of the processing unit 610 above-mentioned " illustrative methods " part of this specification The step of illustrative embodiments.For example, the processing unit 610, which can execute step S110. as shown in fig. 1, passes through volume Product neural network model extracts the characteristics of image of multiple images and carries out Hash mapping to described image feature;Step S120. is logical Too strong connected component clustering method clusters described image feature, clusters classification to determine;Step S130. is each figure As feature provides a classification logotype and obtains the similar image from database according to the classification logotype.
Storage unit 620 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit (RAM) 6201 and/or cache memory unit 6202, it can further include read-only memory unit (ROM) 6203.
Storage unit 620 can also include program/utility with one group of (at least one) program module 6205 6204, such program module 6205 includes but is not limited to: operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 630 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 600 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 600 communicate, and/or with make Any equipment (such as the router, modulation /demodulation that the electronic equipment 600 can be communicated with one or more of the other calculating equipment Device etc.) communication.This communication can be carried out by input/output (I/O) interface 650.Also, electronic equipment 600 can be with By network adapter 660 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, Such as internet) communication.As shown, network adapter 660 is communicated by bus 630 with other modules of electronic equipment 600. It should be understood that although not shown in the drawings, other hardware and/or software module can not used in conjunction with electronic equipment 600, including but not Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this hair The step of bright various illustrative embodiments.
Refering to what is shown in Fig. 5, describing the program product for realizing the above method of embodiment according to the present invention 800, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as it is run on PC.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing can be with To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its The program of combined use.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have Line, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP To be connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure His embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim It points out.

Claims (10)

1. a kind of similar image clustering method characterized by comprising
Pass through the characteristics of image of convolutional neural networks model extraction multiple images and Hash mapping is carried out to described image feature;
Described image feature is clustered by strong continune component clustering method, clusters classification to determine;
A classification logotype is provided for each described image feature and the similar diagram is obtained from database according to the classification logotype Picture.
2. similar image clustering method according to claim 1, which is characterized in that carry out Hash to described image feature and reflect It penetrates and includes:
By Hash quantization coding method by the described image Feature Conversion of each described image be two-value code.
3. similar image clustering method according to claim 2, which is characterized in that pass through strong continune component clustering method pair Described image feature carries out cluster
The similarity between multiple images is calculated by the two-value code and the two-value code is ranked up;
Digraph is constructed by vertex of the two-value code of each described image respectively;
All strong continune components in the digraph are searched to determine the categorical measure of cluster.
4. similar image clustering method according to claim 3, which is characterized in that calculate multiple figures by the two-value code Similarity as between is simultaneously ranked up the two-value code and includes:
The Hamming distance calculated between the corresponding two-value code of each described image and an inquiry two-value code is described similar to obtain Degree;
The two-value code is ranked up according to the Hamming distance.
5. similar image clustering method according to claim 4, which is characterized in that with the two-value code of each described image Constructing digraph for vertex includes:
When the Hamming distance between the two-value code and the inquiry two-value code meets preset condition, foundation is looked by described Ask the directed edge that two-value code is directed toward the two-value code.
6. similar image clustering method according to claim 3, which is characterized in that search all strong in the digraph Connected component includes:
When belonging to same connected component by each vertex of preset data structure decision, using described in the lookup of Tarjan algorithm The strong continune component in digraph.
7. similar image clustering method according to claim 4, which is characterized in that the Hamming distance and the similarity It is negatively correlated.
8. a kind of similar image clustering apparatus characterized by comprising
Characteristic extracting module, for the characteristics of image by convolutional neural networks model extraction multiple images and to described image spy Sign carries out Hash mapping;
Feature clustering module, for being clustered by strong continune component clustering method to described image feature, to determine cluster Classification;
Image collection module, for providing a classification logotype and according to the classification logotype from database for each described image feature It is middle to obtain the similar image.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Claim 1-7 described in any item similar image clustering methods are realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to require 1-7 described in any item via executing the executable instruction and carry out perform claim Similar image clustering method.
CN201710994492.4A 2017-10-23 2017-10-23 Similar image clustering method and device, storage medium and electronic equipment Active CN109697451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710994492.4A CN109697451B (en) 2017-10-23 2017-10-23 Similar image clustering method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710994492.4A CN109697451B (en) 2017-10-23 2017-10-23 Similar image clustering method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109697451A true CN109697451A (en) 2019-04-30
CN109697451B CN109697451B (en) 2022-01-07

Family

ID=66226804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710994492.4A Active CN109697451B (en) 2017-10-23 2017-10-23 Similar image clustering method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109697451B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110531335A (en) * 2019-09-18 2019-12-03 哈尔滨工程大学 A kind of low complex degree similitude clustering signal sorting method based on Union-find Sets
CN110705341A (en) * 2019-08-13 2020-01-17 平安科技(深圳)有限公司 Verification method, device and storage medium based on finger vein image
CN110991514A (en) * 2019-11-27 2020-04-10 深圳市商汤科技有限公司 Image clustering method and device, electronic equipment and storage medium
CN111046929A (en) * 2019-11-28 2020-04-21 北京金山云网络技术有限公司 Method and device for analyzing model error cases and electronic equipment
CN111062431A (en) * 2019-12-12 2020-04-24 Oppo广东移动通信有限公司 Image clustering method, image clustering device, electronic device, and storage medium
CN111460234A (en) * 2020-03-26 2020-07-28 平安科技(深圳)有限公司 Graph query method and device, electronic equipment and computer readable storage medium
CN111597373A (en) * 2020-05-19 2020-08-28 清华大学 Image classification method based on convolutional neural network and connected graph and related equipment
CN111860575A (en) * 2020-06-05 2020-10-30 百度在线网络技术(北京)有限公司 Method and device for processing article attribute information, electronic equipment and storage medium
CN112559974A (en) * 2020-11-13 2021-03-26 山东浪潮质量链科技有限公司 Picture copyright protection method, equipment and medium based on block chain
CN112580676A (en) * 2019-09-29 2021-03-30 北京京东振世信息技术有限公司 Clustering method, clustering device, computer readable medium and electronic device
CN116018615A (en) * 2020-09-08 2023-04-25 科磊股份有限公司 Unsupervised pattern equivalence detection using image hashing

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853930B2 (en) * 2005-01-04 2010-12-14 International Business Machines Corporation Annotating graphs to allow quick loading and analysis of very large graphs
CN101976348A (en) * 2010-10-21 2011-02-16 中国科学院深圳先进技术研究院 Image clustering method and system
CN104063428A (en) * 2014-06-09 2014-09-24 国家计算机网络与信息安全管理中心 Method for detecting unexpected hot topics in Chinese microblogs
AU2013263846A1 (en) * 2013-11-29 2015-06-18 Canon Kabushiki Kaisha Hierarchical determination of metrics for component-based parameterized SoCos
US20150213239A1 (en) * 2007-02-23 2015-07-30 Irdeto Canada Corporation System and method of interlocking to protect software-mediated program and device behaviours
CN105631210A (en) * 2015-12-28 2016-06-01 南京邮电大学 Directed digraph strongly-connected component analysis method based on MapReduce
CN106202167A (en) * 2016-06-21 2016-12-07 南开大学 A kind of oriented label figure adaptive index construction method based on structural outline model
CN106815362A (en) * 2017-01-22 2017-06-09 福州大学 One kind is based on KPCA multilist thumbnail Hash search methods
CN106886599A (en) * 2017-02-28 2017-06-23 北京京东尚科信息技术有限公司 Image search method and device
CN107169106A (en) * 2017-05-18 2017-09-15 珠海习悦信息技术有限公司 Video retrieval method, device, storage medium and processor
CN107193942A (en) * 2017-05-19 2017-09-22 西安邮电大学 The rapid generation of all connected subgraphs in a kind of digraph

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853930B2 (en) * 2005-01-04 2010-12-14 International Business Machines Corporation Annotating graphs to allow quick loading and analysis of very large graphs
US20150213239A1 (en) * 2007-02-23 2015-07-30 Irdeto Canada Corporation System and method of interlocking to protect software-mediated program and device behaviours
CN101976348A (en) * 2010-10-21 2011-02-16 中国科学院深圳先进技术研究院 Image clustering method and system
AU2013263846A1 (en) * 2013-11-29 2015-06-18 Canon Kabushiki Kaisha Hierarchical determination of metrics for component-based parameterized SoCos
CN104063428A (en) * 2014-06-09 2014-09-24 国家计算机网络与信息安全管理中心 Method for detecting unexpected hot topics in Chinese microblogs
CN105631210A (en) * 2015-12-28 2016-06-01 南京邮电大学 Directed digraph strongly-connected component analysis method based on MapReduce
CN106202167A (en) * 2016-06-21 2016-12-07 南开大学 A kind of oriented label figure adaptive index construction method based on structural outline model
CN106815362A (en) * 2017-01-22 2017-06-09 福州大学 One kind is based on KPCA multilist thumbnail Hash search methods
CN106886599A (en) * 2017-02-28 2017-06-23 北京京东尚科信息技术有限公司 Image search method and device
CN107169106A (en) * 2017-05-18 2017-09-15 珠海习悦信息技术有限公司 Video retrieval method, device, storage medium and processor
CN107193942A (en) * 2017-05-19 2017-09-22 西安邮电大学 The rapid generation of all connected subgraphs in a kind of digraph

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JAVAD B. EBRAHIMI等: "Linear index coding via graph homomorphism", 《2014 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT)》 *
SCHWENK K等: "Connected Component Labeling algorithm for very complex and high-resolution images on an FPGA platform", 《SPIE REMOTE SENSING 2015》 *
吴国榕: "基于神经影像的多尺度动态有向连接理论与算法研究", 《中国博士学位论文全文数据库 (医药卫生科技辑)》 *
王振: "提升近邻检索性能的二值编码算法", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705341A (en) * 2019-08-13 2020-01-17 平安科技(深圳)有限公司 Verification method, device and storage medium based on finger vein image
CN110531335A (en) * 2019-09-18 2019-12-03 哈尔滨工程大学 A kind of low complex degree similitude clustering signal sorting method based on Union-find Sets
CN112580676B (en) * 2019-09-29 2024-08-20 北京京东振世信息技术有限公司 Clustering method, clustering device, computer readable medium and electronic equipment
CN112580676A (en) * 2019-09-29 2021-03-30 北京京东振世信息技术有限公司 Clustering method, clustering device, computer readable medium and electronic device
CN110991514A (en) * 2019-11-27 2020-04-10 深圳市商汤科技有限公司 Image clustering method and device, electronic equipment and storage medium
CN110991514B (en) * 2019-11-27 2024-05-17 深圳市商汤科技有限公司 Image clustering method and device, electronic equipment and storage medium
CN111046929A (en) * 2019-11-28 2020-04-21 北京金山云网络技术有限公司 Method and device for analyzing model error cases and electronic equipment
CN111046929B (en) * 2019-11-28 2023-09-26 北京金山云网络技术有限公司 Analysis method and device for model error cases and electronic equipment
CN111062431A (en) * 2019-12-12 2020-04-24 Oppo广东移动通信有限公司 Image clustering method, image clustering device, electronic device, and storage medium
CN111460234B (en) * 2020-03-26 2023-06-09 平安科技(深圳)有限公司 Graph query method, device, electronic equipment and computer readable storage medium
CN111460234A (en) * 2020-03-26 2020-07-28 平安科技(深圳)有限公司 Graph query method and device, electronic equipment and computer readable storage medium
CN111597373B (en) * 2020-05-19 2023-06-20 清华大学 Picture classifying method and related equipment based on convolutional neural network and connected graph
CN111597373A (en) * 2020-05-19 2020-08-28 清华大学 Image classification method based on convolutional neural network and connected graph and related equipment
CN111860575A (en) * 2020-06-05 2020-10-30 百度在线网络技术(北京)有限公司 Method and device for processing article attribute information, electronic equipment and storage medium
CN116018615A (en) * 2020-09-08 2023-04-25 科磊股份有限公司 Unsupervised pattern equivalence detection using image hashing
CN112559974A (en) * 2020-11-13 2021-03-26 山东浪潮质量链科技有限公司 Picture copyright protection method, equipment and medium based on block chain

Also Published As

Publication number Publication date
CN109697451B (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN109697451A (en) Similar image clustering method and device, storage medium, electronic equipment
CN111291190B (en) Training method of encoder, information detection method and related device
US7903883B2 (en) Local bi-gram model for object recognition
Rouhani et al. Semantic segmentation of 3D textured meshes for urban scene analysis
US20220301173A1 (en) Method and system for graph-based panoptic segmentation
CN111931067A (en) Interest point recommendation method, device, equipment and medium
CN116664719B (en) Image redrawing model training method, image redrawing method and device
CN111339443A (en) User label determination method and device, computer equipment and storage medium
Pei et al. Unsupervised multimodal feature learning for semantic image segmentation
Song et al. Boundary‐enhanced supervoxel segmentation for sparse outdoor LiDAR data
CN112000763A (en) Method, device, equipment and medium for determining competition relationship of interest points
CN118451423A (en) Optimal knowledge distillation scheme
CN115115914A (en) Information identification method, device and computer readable storage medium
CN115905838A (en) Audio-visual auxiliary fine-grained tactile signal reconstruction method
Wang et al. Hierarchical space tiling for scene modeling
CN117592595A (en) Method and device for building and predicting load prediction model of power distribution network
Borna et al. An intelligent geospatial processing unit for image classification based on geographic vector agents (GVAs)
CN117132804A (en) Hyperspectral image classification method based on causal cross-domain small sample learning
CN115168609A (en) Text matching method and device, computer equipment and storage medium
Norelyaqine et al. Architecture of Deep Convolutional Encoder‐Decoder Networks for Building Footprint Semantic Segmentation
Su et al. Deep supervised hashing with hard example pairs optimization for image retrieval
CN113822291A (en) Image processing method, device, equipment and storage medium
CN115600053A (en) Navigation method and related equipment
Hu et al. [Retracted] Footprint Extraction and Sports Dance Action Recognition Method Based on Artificial Intelligence Distributed Edge Computing
CN112685603A (en) Efficient retrieval of top-level similarity representations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant