CN104298713B - A kind of picture retrieval method based on fuzzy clustering - Google Patents

A kind of picture retrieval method based on fuzzy clustering Download PDF

Info

Publication number
CN104298713B
CN104298713B CN201410472785.2A CN201410472785A CN104298713B CN 104298713 B CN104298713 B CN 104298713B CN 201410472785 A CN201410472785 A CN 201410472785A CN 104298713 B CN104298713 B CN 104298713B
Authority
CN
China
Prior art keywords
picture
pictures
point
similarity
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410472785.2A
Other languages
Chinese (zh)
Other versions
CN104298713A (en
Inventor
刘瑞
左源
张辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201410472785.2A priority Critical patent/CN104298713B/en
Publication of CN104298713A publication Critical patent/CN104298713A/en
Application granted granted Critical
Publication of CN104298713B publication Critical patent/CN104298713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of picture retrieval method based on fuzzy clustering, comprise the following steps:S11, it is picture construction feature value storehouse in picture library, and to be numbered per pictures;S12, the mutual distance from picture library between selection picture are all higher than the N pictures apart from threshold A1, carry out first time classification to remaining picture, form N class pictures;S13, the class that quantity threshold setting is more than to contained picture number in N class pictures performs step S12, untill all classes are respectively less than quantity threshold setting, obtain M and represents a little;S14, to all pictures in picture library according to the similarity degree that point is represented with M, it is divided into similarity degree highest and represents in the representative pictures of point;S15, for input picture to be retrieved, to its feature value, itself and all similarities represented between point are calculated respectively, choose several closest representative points of similarity and retrieved.The present invention reduces range of search on the basis of recall precision is ensured, reduces the workload of retrieval.

Description

A kind of picture retrieval method based on fuzzy clustering
Technical field
The present invention relates to a kind of picture retrieval method, more particularly to a kind of picture retrieval method based on fuzzy clustering, category In technical field of information retrieval.
Background technology
One of the important appearance form of picture as multimedia messages, the vision that it is enriched by color, texture, shape etc. Feature, intuitively, vivo make abstract data visualization, be presented to masses actualization.With internet information spreading more The convenient and constantly improve of mobile terminal function, image information is by as the main information carrier quilt of another after word It is widely used in the computer major fields such as information retrieval, data mining, man-machine interaction.But contain letter because picture exists in itself Breath is complicated, environmental dependence is strong, abstract difficult, computationally intensive and towards mass picture the organizational structure of search modes of high-level semantic It is the problems such as imperfection, related to pictorial information processing, retrieval, analysis, organization and administration especially with mass picture in internet Related research turns into a Research Challenges of computer realm.
The basic model of existing picture retrieval is to carry out similarity with the storehouse picture that is retrieved according to retrieving image one by one to compare Compared with selecting immediate some pictures after sequence as return value, this model is needed in each retrieval to whole Picture library carries out a traversing operation, can bring the wait of long period to later visitor when retrieval access is excessive, and And this stand-by period can further increase with being on the increase for visitor.The result of picture retrieval is from the figure collected Valut, if the needs for the person that to meet different access, or inputted suitable for the retrieval of different type picture, the scale of picture library is just Need sufficiently large, so just can guarantee that the degree of accuracy of retrieval, but excessive picture library make retrieval load and the response time in times Increase, the requirement of real-time retrieval can not be reached.
In order to solve the above problems, in Application No.:201010195710.6 Chinese invention patent in, disclose one kind Image search method, including train and retrieve two parts;Training department point comprises the following steps:The extraction of characteristic point;Characteristic point Supplement and matching relationship determination;The generation of similar point set;Feature point set clusters;Each image Characteristic Vectors in image data base The generation of amount;Retrieving portion comprises the following steps:The characteristic point of picture to be retrieved is extracted, generates feature point set;Calculate each spy Sign point description subvector determines cluster belonging to current signature point to the distance of each cluster centre with minimum range;Calculate to be checked The frequency of each cluster belonging to the characteristic point of rope picture;The frequency that is clustered belonging to characteristic point based on picture to be retrieved and described The probability logarithm that respectively clusters generates a characteristic vector and unitization;The characteristic vector for calculating picture to be retrieved is respectively schemed to picture library As Euler's distance of characteristic vector, the minimum image output of selected distance is retrieval result.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of picture retrieval method based on fuzzy clustering.
To achieve the above object, the present invention uses following technical schemes:
A kind of picture retrieval method based on fuzzy clustering, comprises the following steps:
S11, it is picture construction feature value storehouse in picture library, and to be numbered per pictures;
S12, it is operation object with numbering, the mutual distance chosen from picture library between picture is all higher than apart from threshold A1's N pictures, first time classification is carried out to remaining picture, forms N class pictures;Wherein, it is described that N pictures are chosen from picture library Process comprise the following steps:S121, a pictures P is arbitrarily chosen in picture library, be to input in picture library using this pictures In retrieved, find the maximum picture Q of similarity distance1;S 122, with picture Q1Inputted for retrieval and be divided into pictures And Q1Similarity distance be more than apart from threshold A1 part SH1, and obtain the maximum picture Q of similarity distance2;S123, circulation Perform step S122, the least similar pictures Q that each retrieving image obtains for last circulationN, the pictures being retrieved are The SH that last time circulation obtainsN, until SHNUntill for sky, resulting Q1……QNN pictures as need the N number of generation selected Table point;
S13, the class that quantity threshold setting is more than to contained picture number in N class pictures perform step S12, the picture of selection Between mutual distance be all higher than apart from threshold A2, the subclass of varying number is formed per class, is continued to meet contained picture number big Step S12 is performed in the subclass of quantity threshold setting, untill all classes are respectively less than quantity threshold setting, M are obtained and represents a little;
S14, to all pictures in picture library, according to the similarity degree that point is represented with M, it is divided into similarity degree highest Representative point representated by pictures go, complete the partition process of whole picture library classification;
S15, for input picture to be retrieved, to its feature value, calculate respectively between the picture and all representative points Similarity and arranged according to size order, choose closest several of similarity and represent a little, in the representative point institute of selection Retrieved in the pictures of representative, user is returned to after retrieval result is merged.
A kind of picture retrieval method based on fuzzy clustering, comprises the following steps:
S21, the picture in picture library is numbered, and picture is mapped as characteristic value code, using byte Hash by its It is assigned on node, is then stored into distributed file system;
S22, one characteristic value code of random read take is each node distribution one as initial point from distributed file system Individual map functions, the point maximum with its similarity distance is found in each map functions, re-send at reduce functions and carry out Merge, pick out whole picture library with its similarity apart from farthest point Q1
S23, with point Q1For new initial point, calculate in each node with point Q1The maximum point of similarity distance, is merged into Maximum is taken at reduce functions, is obtained and Q1Similarity distance be more than apart from threshold A1 pictures SH1It is and least similar Picture Q2, in SH1In characteristic value code corresponding to picture is assigned on node again, and be each one map letter of node distribution Number, continue to find similarity apart from farthest point Q according to above-mentioned steps3, each initial point circulates for the last time to be obtained most Dissimilar picture QN, the pictures being retrieved are the SH that last circulation obtainsN, repeatedly circulation is until SHNUntill for sky, obtain It is N number of to represent a little.
S24, point one map function of distribution is represented to be each, each map functions according to remaining picture in picture library with it is known The similarity distance division classification of point is represented, same category is mapped at a reduce function, according to picture number in classification Size judge whether to perform with single node;
S25, for be unable to single node execution classification in be continuing with step S23 find represent a little, choose and QNPhase It is more than the pictures SH apart from threshold A2 like degree distanceNAs the pictures being retrieved, until all categories can be held with single node Behavior stops, and obtains M and represents a little;
S26, collection is all to be represented a little, distributes a map function for each point that represents, each map functions calculate figure respectively Similarity distance of remaining picture with representing point, is finally classified in valut, is saved as after the similar merging using reduce functions File;
S27, for input picture to be retrieved, to its feature value, calculate respectively between the picture and all representative points Similarity and arranged according to size order, choose closest several of similarity and represent a little, in representative point institute's generation of selection End product is searched in the file of table and is returned.
Wherein more preferably, retrieving is carried out in the pictures representated by the representative point in selection to comprise the following steps:
S151, it is to distribute a map function, characteristic value corresponding to the picture that will be included in every class pictures per class pictures Code, is assigned it on node using byte Hash.
S152, map function calculate on same node the similarity distance of picture and retrieving image in pictures, and according to away from It is ranked up from size, the result after sequence is sent to reduce functions.
S153, reduce function receive the result after the sequence that each map functions transmission comes, and it is merged, sorted, Obtain final picture retrieval result.
Wherein more preferably, when handling picture, only the numbering corresponding to it is operated, without to figure Piece is extracted, and only after retrieval result merging, is extracted further in accordance with the corresponding relation of picture and numbering from picture library Picture, return to user.
Wherein more preferably, the similarity between calculating picture apart from when, the combination using two kinds of characteristic values is entered to picture Row represents, using combinatorial formula of the geometric mean as two kinds of characteristic values, calculates the similarity distance between picture.
Wherein more preferably, it is described apart from threshold A2 be less than the Arbitrary Digit apart from threshold A1.
Picture retrieval method provided by the invention based on fuzzy clustering, by Selecting Representative Points from A, by the figure in picture library Piece carries out classification processing according to point is represented, and during retrieval, need to only calculate similarity distance of the picture of input with representing point, choose phase Picture is carried out like several classifications for representing point place for spending in small distance further to retrieve, and is ensureing the basis of recall precision On, the scope of retrieval is reduced, reduces the workload of retrieval, effectively meets the demand of user's real-time retrieval.
Brief description of the drawings
Fig. 1 is the flow chart of the picture retrieval method provided by the present invention based on fuzzy clustering;
Fig. 2 is that the flow chart that N pictures are chosen from picture library is realized in embodiment provided by the invention.
Embodiment
The technology contents of the present invention are described in further detail with specific embodiment below in conjunction with the accompanying drawings.
A kind of picture retrieval method based on fuzzy clustering, comprises the following steps:The phase relied on first according to picture place Like degree computation model and high-dimensional feature space in picture be distributed density degree come choose it is an appropriate number of represent a little, these representative Point itself can also be picture, ensure that the quantity of the higher Regional Representative's point of picture aggregation extent is more, conversely, picture assembles journey The quantity of the lower Regional Representative's point of degree is fewer, and the relative distance represented a little separates as far as possible according to the height of density, ensures other Picture can embody enough taxises when sorting out;By remaining picture according to remote with these representative points after selected representative point Closely it is divided into different regions, forms high n-dimensional subspace n one by one, i.e., all kinds of pictures;Finally input is schemed in retrieval Piece is divided into several high n-dimensional subspace ns, is retrieved in high n-dimensional subspace n, and retrieval result is merged and returns to user. As shown in figure 1, detailed specific description is done to this process below.
S11, it is picture construction feature value storehouse in picture library, and to be numbered per pictures.
At picture construction feature value storehouse in for picture library, picture is indicated using the combination of two kinds of characteristic values, with Ensure that covered information content is enough substantially to represent image content, in embodiment provided by the present invention, use CEDD and side Two kinds of characteristic values of edge histogram are built, and characteristic value combinations CEDD and edge histogram not only cover the color of picture, line Reason and the attribute of profile three, to distinguishing that the agent object of picture has preferable effect, and memory headroom shared by unit character value It is small, it is easy to store.It is the picture construction feature value storehouse in picture library on the basis of characteristic value combinations CEDD and edge histogram, and Every pictures are numbered.In embodiment provided by the present invention, when handling picture, only to being compiled corresponding to it Number operated, picture is not extracted, after only last retrieval result merges, further in accordance with picture and the corresponding relation of numbering Picture is extracted from picture library, returns to user.Such as:When carrying out the similarity distance calculating between picture, only extract a picture and compile Characteristic value corresponding to number, the calculating of similarity distance is carried out, picture is not extracted, reduces and operate complexity, carried High effectiveness of retrieval.
S12, it is operation object with numbering, the mutual distance chosen from picture library between picture is all higher than apart from threshold A1's N pictures, first time classification is carried out to remaining picture, forms N class pictures.
It is public as the combination of two kinds of characteristic values using geometric mean according to the characteristic value of the picture stored in value indicative storehouse Formula, the mutual distance between picture is calculated, the advantage of geometric mean in avoiding the normalization to characteristic value, and and multiply merely Method, which calculates to compare, ensure that the codomain of combination and single features value approaches, and be more beneficial for the comparison of distance value size.From picture library Mutual distance between middle selection picture is all higher than the N pictures apart from threshold A1, is represented a little as N number of, with the N pictures of selection On the basis of, according to the similarity of remaining picture and N pictures apart from size, first time classification is carried out to remaining picture, forms N classes Pictures.During the selection of N class pictures, picture uses corresponding numbering to replace, and does not go to extract picture in picture library, drops It is low to operate complexity, improve treatment effeciency.
As shown in Fig. 2 the process that N pictures are chosen from picture library comprises the following steps:
S121, a pictures P is arbitrarily chosen in picture library, is retrieved by input of this pictures in picture library, Find the picture Q of least similar (similarity distance is maximum)1
Finding the picture Q least similar to picture P1When, it is several according to the characteristic value of the picture stored in value indicative storehouse, use What combinatorial formula of the average as two kinds of characteristic values, the mutual distance between picture is calculated, find out the figure maximum with picture P distances Piece, as picture Q1
S122, with picture Q1For retrieval input and pictures are divided into and Q1Similarity distance be more than apart from threshold A1 Part SH1, and obtain least similar picture Q2
S123, circulation perform step S1 22, the least similar pictures that each retrieving image obtains for last circulation QN, the pictures being retrieved are the SH that last circulation obtainsN, until SHNUntill for sky, resulting Q1……QNN pictures As need to select N number of represents a little.
S13, step S12, the phase this time chosen are performed to class of the contained picture number in N class pictures more than quantity threshold setting H Mutual edge distance is all higher than being formed the subclass of varying number per class apart from threshold A2, continues meeting contained picture number more than number Measure and step S12 is performed in threshold H class, untill all classes are respectively less than quantity threshold setting H, form M class pictures, that is, deposit Represented a little at M.Wherein, it is less than the Arbitrary Digit apart from threshold A1 apart from threshold A2, and A1 and A2 is according to point of picture library Cloth situation and the system degree of accuracy needs different from the response time in retrieval are set.By setting A1 and A2 appropriate Regulation picture category size and relative density, improve the flexibility of retrieval.
S14, to all pictures in picture library, according to the similarity degree that point is represented with M, it is divided into similarity degree most Pictures representated by high representative point are gone, and complete the partition process of whole classification.
M of selection are represented a little, remaining picture in picture library is calculated into it respectively represents the similar of point to this M Distance is spent, is divided into according to the size of similarity distance in different pictures, completes the final of whole picture library classification Division.
S15, for input picture to be retrieved, to its feature value, calculate respectively between the picture and all representative points Similarity and arranged according to size order, choose closest several of similarity and represent a little, representated by these representative points Pictures in search and end product and return.
After user inputs picture to be retrieved, picture is indicated using the combination of two kinds of characteristic values, then used Combinatorial formula of the geometric mean as two kinds of characteristic values, calculate picture and represent the similarity distance between point, and according to it The size of value is ranked up to it.Nearest several of selected distance represent a little according to demand, and it is several that picture is respectively divided into this Individual represent is retrieved in the representative pictures of point to picture.In embodiment provided by the present invention, by figure to be retrieved Piece is respectively divided these and represented when being retrieved in the representative pictures of point to picture, does not extract the figure in picture library Piece, the characteristic value corresponding to picture number is only extracted, the calculating of similarity distance is carried out, is arranged according to size order, and by result Merge, picture is extracted from picture library further in accordance with the corresponding relation of picture and numbering, returns to user.
In embodiment provided by the present invention, the process retrieved in different classes of pictures is using distributed Cluster processing, certain independence between class and class be present, the memory node of reasonable distribution class can ensure to examine in the cluster Rope request is distributed on several nodes of minority, strengthens the scalability of system.Moreover, the classification of division represents point in position On there is also the difference of distance, the small possibility that is calculated simultaneously in retrieval of difference is big, can be placed on same node Handled.
MapReduce is one of distributed computing platform of current main-stream, and calculating is decomposed into mapping (Map) and abbreviation (Reduce) two kinds of processing stages, it can greatly facilitate user when not knowing about Distributed Calculation principle and implementation method by journey Sequence is deployed in distributed type assemblies and calculated.The basic procedure of MapReduce model is first to the individual element of data Being operated, this step is referred to as to map (Map), i.e., pending initial data is converted into the data of preliminary treatment, by Dependence is not present between data in the operation of this step, it is possible to assign data to different nodal parallels and calculate, Map output data is the form tissue according to key-value pair in Hadoop, then carries out Hash behaviour to the key values in key-value pair Corresponding node is assigned it to after work up, will enter abbreviation (Reduce) stage by integrating sorting data.The abbreviation stage pair The data of same key assignments merge or other processing obtain single data result, and then complete whole operation.This processing stream Journey can ensure processing each stage be not present must through processing node and cause Calculation bottleneck.
For MapReduce model by ensureing the reliability calculated to the feedback of each task, each node can be according to Certain time interval sends the state of operation, and system will will distribute to appointing for the node when out of touch with a certain node Other nodes are distributed in business.Principle is localized according to data, processing routine is typically passed to storage corresponding data by system as far as possible Node on avoid the overload of network, raising efficiency.
In embodiment provided by the present invention, by the figure retrieving method based on fuzzy clustering in inhomogeneity pictures In the process retrieved to picture be converted into the processing method of MapReduce model, MapReduce model is one kind by mapping With the computation model based on thought of dividing and ruling of abbreviation composition, independence when being retrieved in inhomogeneity pictures to picture is fitted For the model, several MapReduce tasks can be translated into according to the classification of the pictures of selection, after conversion, The process retrieved in per class pictures to picture comprises the following steps:
S151, it is to distribute a map function, characteristic value corresponding to the picture that will be included in every class pictures per class pictures Code, is assigned it on node using byte Hash.
Can be to distribute a map function per class pictures, when division when distributing map functions for every class pictures Classification represents point and far and near difference hour in position be present, or multiclass pictures distribute a map function.In this hair It is to distribute a map function per class pictures in bright provided embodiment.
S152, map function calculate on same node the similarity distance of picture and retrieving image in pictures, and according to away from It is ranked up from size, the result after sequence is sent to reduce functions.
S153, reduce function receive the result after the sequence that each map functions transmission comes, and it is merged, sorted, Obtain final picture retrieval result.
In the picture retrieval method provided by the present invention based on fuzzy clustering, the process of prototype selection is in every class What inside was completed, the computing of other classes is totally independent of, is suitable for Distributed Calculation.Whole retrieving walks except last The amount of calculation of picture number and classification number product is needed when being divided into every width picture and Similarity Measure is carried out in specific category suddenly In addition, remainder amount of calculation is smaller, and time complexity will not be caused to become situation big and that exponentially type increases with picture library, can Suitable for picture library it is larger when retrieved, the needs for the person that effectively can meet different access, suitable for inhomogeneity The retrieval input of type picture.
In addition, the figure retrieving method provided by the present invention based on fuzzy clustering and not based on class center point conduct Cluster standard, but several reference base pictures by being differed greatly in space differentiate the taxis of remaining picture, and choose benchmark The iterations of picture is relevant with the distance threshold and the relative extent of the degree of rarefication of picture library chosen, and big with picture library It is small unrelated, and category division and iterative process is not present every time.The final category division of picture be in all reference base pictures all Choose what is just determined after terminating, and the size and space degree of rarefication of reference base picture and the class representated by it have close relation, picture Relatively intensive region, reference base picture is also relatively more, can so ensure that the size of classification is relatively uniform and according to degree of rarefication Division.Remaining cluster process after clustering for the first time is carried out in class, meets substantially for algorithm of being divided and ruled in Distributed Calculation Ask.
In another embodiment provided by the present invention, by the figure retrieving method based on fuzzy clustering in inhomogeneity figure Piece concentrates the process of Selecting Representative Points from A to be converted into the processing method of MapReduce model, and MapReduce model is one kind by mapping With the computation model based on thought of dividing and ruling of abbreviation composition, the independence in prototype selection is applied to the model, Ke Yizhuan Several MapReduce tasks are turned to, are specifically comprised the following steps:
S21, the picture in picture library is numbered, and is mapped as characteristic value code, using byte Hash by its point It is fitted on node, is then stored into distributed file system.
S22, one characteristic value code of random read take is each node distribution one as initial point from distributed file system Individual map functions, the point maximum with its similarity distance is found in each map functions, is re-send at reduce functions to it Merge, pick out whole picture library with its similarity apart from farthest point Q1
S23, with point Q1For new initial point, calculate in each node with point Q1The maximum point of similarity distance, is merged into Maximum is taken at reduce functions, is obtained and Q1Similarity distance be more than apart from threshold A1 pictures SH1It is and least similar Picture Q2, in SH1In characteristic value code corresponding to picture is assigned on node again, and be each one map letter of node distribution Number, continue to find similarity apart from farthest point Q according to above-mentioned steps3, each initial point circulates for the last time to be obtained most Dissimilar picture QN, the pictures being retrieved are the SH that last circulation obtainsN, repeatedly circulation is until SHNUntill for sky, obtain It is N number of to represent a little.
S24, point one map function of distribution is represented to be each, each map functions according to remaining picture in picture library with it is known The similarity distance division classification of point is represented, same category is mapped at a reduce function, according to picture number in classification Size judge whether to perform with single node.
In embodiment provided by the present invention, judge whether to transport with single node according to the size of picture number in classification Row is to judge whether picture number is more than the quantity threshold setting set in classification, when picture number is more than the quantity fault of setting in classification During value, the category cannot single node perform, turn to step S25, when in classification picture number no more than setting quantity threshold setting When, the category can be performed with single node, without the division of next step.
S25, for be unable to single node execution classification in be continuing with step S23 find represent a little, choose and QNPhase It is more than the pictures SH apart from threshold A2 like degree distanceNAs the pictures being retrieved, until all categories can be held with single node Behavior stops, and obtains M and represents a little.
S26, collection is all to be represented a little, distributes a map function for each point that represents, each map functions calculate figure respectively Similarity distance of remaining picture with representing point, is finally classified in valut, is saved as after the similar merging using reduce functions File.
S27, for input picture to be retrieved, to its feature value, calculate respectively between the picture and all representative points Similarity and arranged according to size order, choose closest several of similarity and represent a little, representated by these representative points File in search and end product and return.
In embodiment provided by the present invention, in the pictures representated by representative point in selection carry out retrieving with Above-mentioned steps S151-S153 is identical, just repeats no more herein.
In summary, the picture retrieval method provided by the present invention based on fuzzy clustering, relied on according to picture place In similarity calculation and high-dimensional feature space picture be distributed density degree come choose it is an appropriate number of represent a little, not only contain Color, texture and the attribute of profile three of picture are covered, the agent object to distinguishing picture has preferable effect, and unit is special Memory headroom is small shared by value indicative, is easy to store.Remaining picture is drawn according to the distance with these representative points after selected representative point Assign in different regions, form high n-dimensional subspace n one by one, i.e., different classes of pictures;Finally will input in retrieval Picture is divided into several high n-dimensional subspace ns, is retrieved in high n-dimensional subspace n, and retrieval result is merged and returns to use Family.Wherein, the process retrieved in high n-dimensional subspace n is handled using distributed type colony, can effectively improve the effect of retrieval Rate, meet the requirement of user's real-time retrieval.
A kind of picture retrieval method based on fuzzy clustering provided by the present invention is described in detail above.It is right For those skilled in the art, any obviously change to what it was done on the premise of without departing substantially from true spirit It is dynamic, it will all form to infringement of patent right of the present invention, corresponding legal liabilities will be undertaken.

Claims (6)

1. a kind of picture retrieval method based on fuzzy clustering, it is characterised in that comprise the following steps:
S11, it is picture construction feature value storehouse in picture library, and to be numbered per pictures;
S12, it is operation object with numbering, the mutual distance between picture is chosen from picture library and is all higher than N apart from threshold A1 Picture, first time classification is carried out to remaining picture, forms N class pictures;Wherein, it is described that N pictures are chosen from picture library Process comprises the following steps:S121, a pictures P is arbitrarily chosen in picture library, be to input in picture library using this pictures Retrieved, find the maximum picture Q of similarity distance1;S 122, with picture Q1Be divided into for retrieval input and by pictures and Q1Similarity distance be more than apart from threshold A1 part SH1, and obtain the maximum picture Q of similarity distance2;S123, circulation are held Row step S122, the least similar pictures Q that each retrieving image obtains for last circulationN, the pictures being retrieved are upper The SH that one cycle obtainsN, until SHNUntill for sky, resulting Q1……QNN pictures as need the N number of representative selected Point;
S13, it is mutual between the picture of selection to class execution step S12 of the contained picture number in N class pictures more than quantity threshold setting Distance is all higher than apart from threshold A2, and the subclass of varying number is formed per class, continues to be more than quantity meeting contained picture number Step S12 is performed in the subclass of threshold, untill all classes are respectively less than quantity threshold setting, M are obtained and represents a little;
S14, to all pictures in picture library, according to the similarity degree that point is represented with M, it is divided into similarity degree highest generation Pictures representated by table point are gone, and complete the partition process of whole picture library classification;
S15, for input picture to be retrieved, to its feature value, the picture and all phases represented between point are calculated respectively Like spending and being arranged according to size order, choose closest several of similarity and represent a little, representated by the representative point in selection Pictures in retrieved, return to user after retrieval result is merged.
2. a kind of picture retrieval method based on fuzzy clustering, it is characterised in that comprise the following steps:
S21, the picture in picture library is numbered, and picture is mapped as characteristic value code, distributed using byte Hash Onto node, it is then stored into distributed file system;
S22, one characteristic value code of random read take is each node distribution one as initial point from distributed file system Map functions, the point maximum with its similarity distance is found in each map functions, re-sends at reduce functions and is closed And whole picture library is picked out with its similarity apart from farthest point Q1
S23, with point Q1For new initial point, calculate in each node with point Q1The maximum point of similarity distance, is merged into reduce Maximum is taken at function, is obtained and Q1Similarity distance be more than apart from threshold A1 pictures SH1And least similar picture Q2, in SH1In characteristic value code corresponding to picture is assigned on node again, and be each one map function of node distribution, after It is continuous to find similarity apart from farthest point Q according to above-mentioned steps3, each initial point obtains least similar for last circulation Picture QN, the pictures being retrieved are the SH that last circulation obtainsN, repeatedly circulation is until SHNUntill for sky, N number of representative is obtained Point;
S24, point one map function of distribution is represented to be each, each map functions are according to remaining picture in picture library and known representative The similarity distance division classification of point, same category is mapped at a reduce function, according in classification picture number it is big It is small to judge whether to perform with single node;
S25, for be unable to single node execution classification in be continuing with step S23 find represent a little, choose and QNSimilarity away from From the pictures SH of the threshold A2 more than with a distance fromNAs the pictures being retrieved, until all categories can untill single node performs, M are obtained to represent a little;
S26, collection is all to be represented a little, distributes a map function for each point that represents, each map functions calculate picture library respectively In similarity distance of remaining picture with representing point, finally classified, file saved as after the similar merging using reduce functions;
S27, for input picture to be retrieved, to its feature value, the picture and all phases represented between point are calculated respectively Like spending and being arranged according to size order, choose closest several of similarity and represent a little, representated by the representative point in selection End product is searched in file and is returned.
3. the picture retrieval method based on fuzzy clustering as claimed in claim 1 or 2, it is characterised in that in the representative point of selection Retrieving is carried out in representative pictures to comprise the following steps:
S151, it is that a map function is distributed per class pictures, characteristic value code corresponding to the picture that will be included in every class pictures, Assigned it to using byte Hash on node;
The similarity distance of picture and retrieving image in pictures on the same node of S152, map function calculating, and it is big according to distance It is small that it is ranked up, the result after sequence is sent to reduce functions;
S153, reduce function receive the result after the sequence that each map functions transmission comes, and it is merged, sorted, is obtained Final picture retrieval result.
4. the picture retrieval method based on fuzzy clustering as claimed in claim 1 or 2, it is characterised in that:
When handling picture, only the numbering corresponding to it is operated, without being extracted to picture, only After retrieval result merging, picture is extracted from picture library further in accordance with the corresponding relation of picture and numbering, returns to user.
5. the picture retrieval method based on fuzzy clustering as claimed in claim 1 or 2, it is characterised in that:
The similarity between calculating picture apart from when, picture is indicated using the combination of two kinds of characteristic values, using geometry Combinatorial formula of the average as two kinds of characteristic values, calculate the similarity distance between picture.
6. the picture retrieval method based on fuzzy clustering as claimed in claim 1 or 2, it is characterised in that:
It is described apart from threshold A2 be less than the Arbitrary Digit apart from threshold A1.
CN201410472785.2A 2014-09-16 2014-09-16 A kind of picture retrieval method based on fuzzy clustering Active CN104298713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410472785.2A CN104298713B (en) 2014-09-16 2014-09-16 A kind of picture retrieval method based on fuzzy clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410472785.2A CN104298713B (en) 2014-09-16 2014-09-16 A kind of picture retrieval method based on fuzzy clustering

Publications (2)

Publication Number Publication Date
CN104298713A CN104298713A (en) 2015-01-21
CN104298713B true CN104298713B (en) 2017-12-08

Family

ID=52318438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410472785.2A Active CN104298713B (en) 2014-09-16 2014-09-16 A kind of picture retrieval method based on fuzzy clustering

Country Status (1)

Country Link
CN (1) CN104298713B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557523B (en) * 2015-09-30 2020-05-12 佳能株式会社 Representative image selection method and apparatus, and object image retrieval method and apparatus
CN107122785B (en) * 2016-02-25 2022-09-27 中兴通讯股份有限公司 Text recognition model establishing method and device
CN107423309A (en) * 2016-06-01 2017-12-01 国家计算机网络与信息安全管理中心 Magnanimity internet similar pictures detecting system and method based on fuzzy hash algorithm
CN106528629B (en) * 2016-10-09 2018-04-03 深圳云天励飞技术有限公司 A kind of vector based on geometric space division searches for method and system generally
CN110502953A (en) * 2018-05-16 2019-11-26 杭州海康威视数字技术股份有限公司 A kind of iconic model comparison method and device
CN108830217B (en) * 2018-06-15 2021-10-26 辽宁工程技术大学 Automatic signature distinguishing method based on fuzzy mean hash learning
CN109783678B (en) * 2018-12-29 2021-07-20 深圳云天励飞技术有限公司 Image searching method and device
CN109766470A (en) * 2019-01-15 2019-05-17 北京旷视科技有限公司 Image search method, device and processing equipment
CN110083732B (en) * 2019-03-12 2021-08-31 浙江大华技术股份有限公司 Picture retrieval method and device and computer storage medium
CN109948734B (en) * 2019-04-02 2022-03-29 北京旷视科技有限公司 Image clustering method and device and electronic equipment
CN110069645A (en) * 2019-04-22 2019-07-30 北京迈格威科技有限公司 Image recommendation method, apparatus, electronic equipment and computer readable storage medium
CN110377781A (en) * 2019-06-06 2019-10-25 福建讯网网络科技股份有限公司 A kind of matched innovatory algorithm of application sole search
CN110942046B (en) * 2019-12-05 2023-04-07 腾讯云计算(北京)有限责任公司 Image retrieval method, device, equipment and storage medium
CN112328819B (en) * 2020-11-07 2023-08-18 嘉兴智设信息科技有限公司 Method for recommending similar pictures based on picture set
CN113360698A (en) * 2021-06-30 2021-09-07 北京海纳数聚科技有限公司 Picture retrieval method based on image-text semantic transfer technology
CN115129921B (en) * 2022-06-30 2023-05-26 重庆紫光华山智安科技有限公司 Picture retrieval method, apparatus, electronic device, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004111931A2 (en) * 2003-06-10 2004-12-23 California Institute Of Technology A system and method for attentional selection
CN101211355A (en) * 2006-12-30 2008-07-02 中国科学院计算技术研究所 Image inquiry method based on clustering
CN101859326A (en) * 2010-06-09 2010-10-13 南京大学 Image searching method
CN103617217A (en) * 2013-11-20 2014-03-05 中国科学院信息工程研究所 Hierarchical index based image retrieval method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004111931A2 (en) * 2003-06-10 2004-12-23 California Institute Of Technology A system and method for attentional selection
CN101211355A (en) * 2006-12-30 2008-07-02 中国科学院计算技术研究所 Image inquiry method based on clustering
CN101859326A (en) * 2010-06-09 2010-10-13 南京大学 Image searching method
CN103617217A (en) * 2013-11-20 2014-03-05 中国科学院信息工程研究所 Hierarchical index based image retrieval method and system

Also Published As

Publication number Publication date
CN104298713A (en) 2015-01-21

Similar Documents

Publication Publication Date Title
CN104298713B (en) A kind of picture retrieval method based on fuzzy clustering
Kumar et al. An efficient k-means clustering filtering algorithm using density based initial cluster centers
Kapoor et al. Active learning with gaussian processes for object categorization
US9454580B2 (en) Recommendation system with metric transformation
Hore et al. A scalable framework for cluster ensembles
Chen et al. Parallel spectral clustering in distributed systems
Bozas et al. Large scale sketch based image retrieval using patch hashing
US20220058222A1 (en) Method and apparatus of processing information, method and apparatus of recommending information, electronic device, and storage medium
CN109242002A (en) High dimensional data classification method, device and terminal device
Celebi et al. Linear, deterministic, and order-invariant initialization methods for the k-means clustering algorithm
WO2019120023A1 (en) Gender prediction method and apparatus, storage medium and electronic device
CN110147455A (en) A kind of face matching retrieval device and method
CN110119477A (en) A kind of information-pushing method, device and storage medium
Yu et al. A content-based goods image recommendation system
WO2019120007A1 (en) Method and apparatus for predicting user gender, and electronic device
WO2015001416A1 (en) Multi-dimensional data clustering
Huang et al. Melody-join: Efficient earth mover's distance similarity joins using MapReduce
Alam et al. A hybrid approach for web document clustering using K-means and artificial bee colony algorithm
Yang et al. An effective detection of satellite image via K-means clustering on Hadoop system
Gabryel A bag-of-features algorithm for applications using a NoSQL database
Ła̧giewka et al. Distributed image retrieval with colour and keypoint features
An et al. A K-means-based multi-prototype high-speed learning system with FPGA-implemented coprocessor for 1-NN searching
CN110209895B (en) Vector retrieval method, device and equipment
CN111709473A (en) Object feature clustering method and device
CN108268478A (en) A kind of unbalanced dataset feature selection approach and device based on ur-CAIM algorithms

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant