CN108875834B - Image clustering method, device, computer equipment and storage medium - Google Patents

Image clustering method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN108875834B
CN108875834B CN201810653849.7A CN201810653849A CN108875834B CN 108875834 B CN108875834 B CN 108875834B CN 201810653849 A CN201810653849 A CN 201810653849A CN 108875834 B CN108875834 B CN 108875834B
Authority
CN
China
Prior art keywords
image
threshold value
list
classified
duty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810653849.7A
Other languages
Chinese (zh)
Other versions
CN108875834A (en
Inventor
吴丽军
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201810653849.7A priority Critical patent/CN108875834B/en
Publication of CN108875834A publication Critical patent/CN108875834A/en
Application granted granted Critical
Publication of CN108875834B publication Critical patent/CN108875834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques

Abstract

The embodiment of the invention discloses a kind of image clustering method, device, computer equipment and storage medium, include the following steps: to obtain the similarity between image to be classified and sample image;The similarity is compared with the current threshold value on duty by turns in preset threshold list;When the similarity is greater than the current threshold value on duty by turns, the image to be classified is collected in the cluster list characterized to the sample image.What it is due to similarity calculation is the degree of association between image to be classified and sample image, therefore, classification results are also directly linked with sample image, simultaneously, carrying out threshold value restriction to degree of association classification keeps classification results more accurate, it can be realized as the diversity that classification is improved to different classes of division by the sample image for replacing different, simultaneously because inartificial intervention substantially increases the efficiency of classification.

Description

Image clustering method, device, computer equipment and storage medium
Technical field
The present embodiments relate to model algorithm field, especially a kind of image clustering method, device, computer equipment and Storage medium.
Background technique
Along with the acceleration of the process of social informatization, the data being uploaded on internet daily are innumerable, and wherein Again in the majority with audio-video and picture information, management also facilitates other users to be browsed simultaneously for convenience, needs to upload Audio-video and picture information classify, to facilitate storage.
In the prior art, two methods are often used to the classification of audio-video and picture: first, server end is using artificial Safety inspection and classification are carried out to above-mentioned data, i.e. setting classification standard according to classification standard by manually classifying;Second, with Family is when terminal carries out material upload, and in the classification set, the data uploaded to oneself is classified, server end User's classification information of the data is read after obtaining uplink data, and classification storage is carried out to data by the classification information.
The inventor of the invention has found under study for action, and that there are classification effectivenesses is low for mode classification in the prior art, And since assorting process is influenced by subjective factor, the problem that causes the accuracy rate of classification lower.
Summary of the invention
The embodiment of the present invention provide be capable of providing a kind of classification is accurate and classification effectiveness is high image clustering method, device, Computer equipment and storage medium.
In order to solve the above technical problems, the technical solution that the embodiment of the invention uses is: providing a kind of figure As clustering method, include the following steps:
Obtain the similarity between image to be classified and sample image;
The similarity is compared with the current threshold value on duty by turns in preset threshold list;
When the similarity is greater than the current threshold value on duty by turns, the image to be classified is collected to the sample image In the cluster list characterized.
Optionally, described when the similarity is greater than the current threshold value on duty by turns, by the image to be classified collect to Further include following step after the step in cluster list that the sample image is characterized:
Obtain the number of image in the cluster list;
The number of described image is compared with preset limiting threshold value;
When the number of described image is greater than the limiting threshold value, all images in the cluster list are discharged to wait divide In class list.
Optionally, described when the number of described image is greater than the limiting threshold value, discharge the institute in the cluster list Further include following step after having step of the image into list to be sorted:
The threshold value on duty by turns that confirmation is arranged in the current threshold value next bit on duty by turns is classification thresholds, wherein the next bit Threshold value on duty by turns be greater than the current threshold value on duty by turns;
The image in the list to be sorted is clustered using the classification thresholds as qualifications.
When the amount of images that the threshold value in threshold list is replaced to penultimate, and screened is greater than the limit of setting When threshold value, the method also includes following step:
Obtain the maximum threshold value on duty by turns of numerical value in the threshold list;
The similarity is greater than to the images to be recognized of the maximum threshold value on duty by turns of the numerical value, cluster to the sample image In the cluster list characterized.
It optionally, further include following before the step of obtaining the similarity between image to be classified and sample image Step:
Obtain the image to be classified;
The image to be classified is input in preset PCA dimensionality reduction model, by the image to be classified dimensionality reduction to pre- If dimension.
Optionally, described that the image to be classified is input in preset PCA dimensionality reduction model, by the figure to be sorted Further include following step after as the step of dimensionality reduction to default dimension:
Image to be classified after obtaining dimensionality reduction;
Image to be classified after the dimensionality reduction is input in preset similarity judgment models, after calculating the dimensionality reduction The degree of association between image to be classified and the sample image.
It optionally, further include following before the step of similarity obtained between image to be classified and sample image Step:
Obtain user list, wherein include the flow information of user in the user list;
Descending arrangement is carried out to the user in the user list using the numerical value of flow as qualifications;
The facial image that user in the user list is successively confirmed by the first is the sample image.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of image clustering device, comprising:
Module is obtained, for obtaining the similarity between image to be classified and sample image;
Processing module, for the similarity to be compared with the current threshold value on duty by turns in preset threshold list;
Execution module, for when the similarity is greater than the current threshold value on duty by turns, the image to be classified to be collected In the cluster list characterized to the sample image.
Optionally, described image clustering apparatus further include:
First acquisition submodule, for obtaining the number of image in the cluster list;
First processing submodule, for the number of described image to be compared with preset limiting threshold value;
First implementation sub-module, for when the number of described image is greater than the limiting threshold value, discharging the cluster column All images in table are into list to be sorted.
Optionally, described image clustering apparatus further include:
First confirmation submodule, for confirming that the threshold value on duty by turns for being arranged in the current threshold value next bit on duty by turns is classification threshold Value, wherein the threshold value on duty by turns of the next bit is greater than the current threshold value on duty by turns;
First cluster submodule, for using the classification thresholds as qualifications to the image in the list to be sorted into Row cluster.
Optionally, described image clustering apparatus further include:
Second acquisition submodule, for obtaining the maximum threshold value on duty by turns of numerical value in the threshold list;
Second implementation sub-module, for the similarity to be greater than to the figure to be identified of the maximum threshold value on duty by turns of the numerical value Picture clusters in the cluster list characterized to the sample image.
Optionally, described image clustering apparatus further include:
Third acquisition submodule, for obtaining the image to be classified;
Third implementation sub-module will be described for the image to be classified to be input in preset PCA dimensionality reduction model Image to be classified dimensionality reduction is to default dimension.
Optionally, described image clustering apparatus further include:
4th acquisition submodule, for obtaining the image to be classified after dimensionality reduction;
4th implementation sub-module, for the image to be classified after the dimensionality reduction to be input to preset similarity judgment models In, the degree of association between image to be classified and the sample image after calculating the dimensionality reduction.
Optionally, described image clustering apparatus further include:
5th acquisition submodule, for obtaining user list, wherein the flow in the user list including user is believed Breath;
First order submodule, for being dropped using the numerical value of flow as qualifications to the user in the user list Sequence arrangement;
5th implementation sub-module, for successively confirming the facial image of user in the user list for institute by the first State sample image.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer equipment, including memory and processing Device is stored with computer-readable instruction in the memory, when the computer-readable instruction is executed by the processor, so that The processor executes the step of image clustering method described above.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of storage Jie for being stored with computer-readable instruction Matter, when the computer-readable instruction is executed by one or more processors, so that one or more processors are executed as above-mentioned The step of described image clustering method.
The beneficial effect of the embodiment of the present invention is: getting image to be classified and sample image by similarity comparison model Between similarity, after obtaining the similarity, similarity is compared with preset threshold value, according to comparison result to be sorted Image is classified, and when similarity is greater than preset threshold value, image to be classified collects the cluster column characterized to present threshold value In table.What it is due to similarity calculation is the degree of association between image to be classified and sample image, classification results also with sample Image is directly linked, meanwhile, carrying out threshold value restriction to degree of association classification keeps classification results more accurate, different by replacing Sample image can be realized as the diversity that classification is improved to different classes of division, simultaneously because inartificial intervention Substantially increase the efficiency of classification.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the basic procedure schematic diagram of image clustering method of the embodiment of the present invention;
Fig. 2 is the flow diagram that the embodiment of the present invention controls figure quantity in same class categories;
Fig. 3 is the flow diagram that the embodiment of the present invention replaces current threshold value on duty by turns;
Fig. 4 is that embodiment of the embodiment of the present invention is screened using the max-thresholds in threshold list as current threshold value memory on duty by turns Flow diagram;
Fig. 5 is the flow diagram that the embodiment of the present invention carries out dimensionality reduction to image to be classified;
Fig. 6 is the method flow diagram that the embodiment of the present invention passes through model calculating correlation;
Fig. 7 is the flow diagram of Screening Samples of embodiment of the present invention image;
Fig. 8 is image clustering of embodiment of the present invention device basic structure schematic diagram;
Fig. 9 is computer equipment of embodiment of the present invention basic structure block diagram.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
In some processes of the description in description and claims of this specification and above-mentioned attached drawing, contain according to Multiple operations that particular order occurs, but it should be clearly understood that these operations can not be what appears in this article suitable according to its Sequence is executed or is executed parallel, and serial number of operation such as 101,102 etc. is only used for distinguishing each different operation, serial number It itself does not represent and any executes sequence.In addition, these processes may include more or fewer operations, and these operations can To execute or execute parallel in order.It should be noted that the description such as " first " herein, " second ", is for distinguishing not Same message, equipment, module etc., does not represent sequencing, does not also limit " first " and " second " and be different type.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.
Those skilled in the art of the present technique are appreciated that " terminal " used herein above, " terminal device " both include wireless communication The equipment of number receiver, only has the equipment of the wireless signal receiver of non-emissive ability, and including receiving and emitting hardware Equipment, have on bidirectional communication link, can execute two-way communication reception and emit hardware equipment.This equipment It may include: honeycomb or other communication equipments, shown with single line display or multi-line display or without multi-line The honeycomb of device or other communication equipments;PCS (Personal Communications Service, PCS Personal Communications System), can With combine voice, data processing, fax and/or communication ability;PDA (Personal Digital Assistant, it is personal Digital assistants), it may include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day It goes through and/or GPS (Global Positioning System, global positioning system) receiver;Conventional laptop and/or palm Type computer or other equipment, have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its His equipment." terminal " used herein above, " terminal device " can be it is portable, can transport, be mounted on the vehicles (aviation, Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on Network termination, music/video playback terminal, such as can be PDA, MID (Mobile Internet Device, mobile Internet Equipment) and/or mobile phone with music/video playing function, it is also possible to the equipment such as smart television, set-top box.
Specifically referring to Fig. 1, Fig. 1 is the basic procedure schematic diagram of the present embodiment image clustering method.
As shown in Figure 1, a kind of image clustering method, includes the following steps:
Similarity between S1100, acquisition image to be classified and sample image;
Image to be classified is the image for storing classification to be combed in the database, also can be the figure to be sorted that user uploads Picture.In some embodiments, image to be classified will be regarded from video information by carrying out frame processing to video information Temporally axis is split as several frame picture images to frequency information, randomly select or timing extraction by way of obtain figure to be sorted Picture.
Sample image is pre-stored classification referring to photo, the classification that the title or sample image of sample image have had Information is a class categories.For example, when sample image is facial image, title, age, gender, the face value of personage Or race can become class categories, and according to the difference of application scenarios, the class categories that sample image is characterized, Neng Goushi One or more combination of the above-mentioned classification referring in.But the classification of sample image is not limited solely to facial image, root According to the difference of specific application environment, sample image can also be plant, scenery, animal or industrial product etc..
In some real-time modes, the quantity of picture to be sorted is more (for example, 10,000,1,000,000 or 90,000,000 Etc.), therefore, the number of sample image does not limit to one yet, and according to the requirement of specific application environment, sample image can be (being not limited to): 1,10,100,1000 or more.Therefore, step S1100 between step S1300 the step of be Reciprocation cycle iteration, every circulation primary replaces a sample image.
Similarity calculation between image to be classified and sample image using similarity calculation in the prior art into Row calculate, similarity calculation can be CNN convolutional neural networks model, VGG convolutional neural networks model, Insightface human face recognition model or k-means computation model.Above-mentioned model is trained in advance to restraining or building, will Sample image exports then similar between sample image and images to be recognized as reference using images to be recognized as input Degree.
S1200, the similarity is compared with the current threshold value on duty by turns in preset threshold list;
Threshold list is set in this city embodiment, multiple threshold values are set in threshold list, and multiple threshold values are according to numerical value Size carries out ascending power sequence, the threshold value to limit the quantity in each class categories in image to be classified, in given threshold list Successively become qualifications (comparing the size between similarity and current threshold value on duty by turns), and is opened by being arranged in the first threshold value Begin, i.e. the threshold value that (the same sample image) is arranged in next bit in an iteration circulation is greater than current threshold value, and definition is current It is current threshold value on duty by turns as the threshold value of qualifications, since cluster is that iteration carries out, each threshold value can Qualifications can be become, therefore, the current threshold value as qualifications in threshold list is referred to as current threshold value on duty by turns.
For example, the threshold value in threshold list is [0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95], When being screened using 0.60 as qualifications, discovery collects the image in the class categories characterized to sample image too It is more, it will lead to the not high problem of accuracy rate, need to improve screening conditions, i.e., using 0.60 next bit threshold value 0.65 as current Threshold value on duty by turns re-starts screening, and so on, until the class categories characterized to sample image can be collected by screening In amount of images meet condition or when accurate rate is met certain condition, terminate using current sample image as class categories Image clustering.
S1300, when the similarity is greater than the current threshold value on duty by turns, the image to be classified is collected to the sample In the cluster list that this image is characterized.
When similarity is greater than current threshold value on duty by turns, image to be classified is collected into the cluster characterized to sample image and is arranged In table.For example, title, age, gender, face value or the race of personage can when sample image is facial image As class categories, according to the difference of application scenarios, the class categories that sample image is characterized can be above-mentioned classification referring in One or more combination.But the classification of sample image is not limited solely to facial image, according to specific application environment Difference, sample image can also be plant, scenery, animal or industrial product etc..
For example, sample image characterization classification be sample image hero name, according to similarity compare confirmation to The similarity between facial image and sample image in classification image is greater than current threshold value on duty by turns, then confirms the figure to be sorted As being in the sample image with the classification of the naming of hero.
Above embodiment gets the similarity between image to be classified and sample image by similarity comparison model, After obtaining the similarity, similarity is compared with preset threshold value, is classified according to comparison result to image to be classified, When similarity is greater than preset threshold value, image to be classified is collected in the cluster list characterized to present threshold value.Due to similar What degree calculated is the degree of association between image to be classified and sample image, and therefore, classification results are also directly linked with sample image, Meanwhile carrying out threshold value restriction to degree of association classification keeps classification results more accurate, passes through and replaces different sample image energy It is enough to realize to different classes of division, the diversity of classification is improved, simultaneously because inartificial intervention substantially increases point The efficiency of class.
In some embodiments, to avoid threshold value setting improper, a large amount of figure to be sorted is concentrated in same category Picture or video information, need to control the quantity for collecting the image to be classified in same class categories.Please specifically it join Fig. 2 is read, Fig. 2 is the flow diagram that the present embodiment controls figure quantity in same class categories.
As shown in Fig. 2, further including following step after step S1300:
S1411, the number for obtaining image in the cluster list;
Image to be classified is stored in cluster list corresponding to current sample image, and is obtained full in the cluster list The quantity of the image of the current threshold value on duty by turns of foot, that is, store the quantity of image to be classified in the list.In present embodiment, wait divide The number of class image is more (for example, 10,000,1,000,000 or 90,000,000 etc.).It therefore meets current threshold value on duty by turns Image to be classified is stored in the cluster list.
S1412, the number of described image is compared with preset limiting threshold value;
In acquisition cluster list after the quantity of image, the quantity of the image is compared with preset limiting threshold value.
Limiting threshold value is the threshold value for controlling no and qualifications amount of images in cluster list.In some embodiment party In formula, limiting threshold value is set to 500.But not limited to this, according to the difference of practical application scene, limiting threshold value can be by Setting are as follows: 50,100,300,900,1500,5000 or larger or smaller.The size of limiting threshold value is determined to cluster column Table is the amount of recalling of search condition.For example, the quantity 500 in cluster list, which is the name of personage in sample image Claim, when user is with the entitled search condition, the amount of recalling (picture number that search result is shown) is 500 clustered in list Image.
S1413, all images when the number of described image is greater than the limiting threshold value, in the release cluster list To in list to be sorted.
After comparing and confirming limiting threshold value of the number greater than setting of image in cluster list, cluster column will be stored in Image in table is discharged again into list to be sorted.
Image in list to be sorted is image to be sorted, will be qualified by constantly replacing sample image Image takes out and classifies from list to be sorted, when sample image of every replacement, all images in list to be sorted Belong to image to be classified.
For example, current threshold value on duty by turns is 0.60, and compared with sample image, similarity is greater than when limiting threshold value is 500 0.60 picture all clusters in the class categories characterized to current sample image, when similarity is greater than 0.60 in cluster list When picture number is greater than 500, shows that the amount of images in the category has exceeded preset value, need to improve screening conditions will cluster Quantity control in list is being set within quantity, that is, is less than or equal to 500.
It in some embodiments, will be described wait divide when the number of described image is less than or equal to the limiting threshold value Class image collects in the cluster list characterized to the sample image.
When by calculating the image in entire list to be sorted, the amount of images that confirmation meets present threshold value is less than or equal to pole When limiting threshold value, show that the cluster list i.e. class categories of sample image characterization had not only met the restriction of current threshold value on duty by turns, but also full The restriction of the current limiting threshold value of foot indicates to cluster and successfully continues to switch the iteration that sample image carries out a new round.
It carries out while limiting by quantity in the cluster list that is characterized to the same sample image and similarity, it can either Guarantee the amount of recalling when retrieval, and is capable of the accuracy rate of control tactics.
In some embodiments, after being screened by current threshold value on duty by turns, the amount of images of screening oversteps the extreme limit After the quantity that threshold value limits, needs to replace current threshold value on duty by turns and continue to screen images to be recognized, until will screen Terminate when amount of images is limited within limiting threshold value.Referring specifically to Fig. 3, Fig. 3 is that the present embodiment replaces current threshold value on duty by turns Flow diagram.
As shown in figure 3, further including following step after step S1413:
The threshold value on duty by turns that S1511, confirmation are arranged in the current threshold value next bit on duty by turns is classification thresholds, wherein described The threshold value on duty by turns of next bit is greater than the current threshold value on duty by turns;
After being screened by current threshold value on duty by turns, the amount of images of screening oversteps the extreme limit after the quantity that threshold value limits, It needs to replace current threshold value on duty by turns to continue to screen images to be recognized, the replacement of threshold value on duty by turns successively carries out.
For example, threshold value in threshold list is [0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95], when with When 0.60 conduct qualifications are screened, discovery collects the figure in the class categories (cluster list) characterized to sample image As too much, will lead to the not high problem of accuracy rate, needing to improve screening conditions, that is, using 0.60 0.65 conduct of next bit threshold value Current threshold value on duty by turns re-starts screening, and so on, when maximum threshold value 0.95 in threshold list until, or expiring While foot current threshold value (for example, 0.75) on duty by turns, when also meeting the restriction of limiting threshold value until.Therefore, when threshold value is replaced, The threshold value on duty by turns of next bit is greater than current threshold value on duty by turns.
S1512, the image in the list to be sorted is clustered using the classification thresholds as qualifications.
Classification thresholds are the threshold value on duty by turns redefined, and above one threshold value on duty by turns is that qualifications carry out a new round Screening, i.e. circulation execute step S1411-S1413.
In some embodiments, when the threshold value in threshold list is replaced to penultimate, the amount of images of screening When being also greater than the limiting threshold value of setting, last threshold value is called to be screened as qualifications, no matter the selection result sieves Whether the image selected is less than or equal to limit threshold value, the cluster list that the image filtered out is characterized as current sample image In.Referring specifically to Fig. 4, Fig. 4 is to be shown using the process of the max-thresholds in threshold list as current threshold value memory screening on duty by turns It is intended to.
As shown in figure 4, further including following step after step S1512:
S1611, the maximum threshold value on duty by turns of numerical value in the threshold list is obtained;
By successively replacing current threshold value on duty by turns until filtering out when penultimate threshold value on duty by turns is as screening conditions Amount of images when being also greater than the limiting threshold value of setting, obtain position (the maximum threshold value of numerical value) last in threshold list Threshold value is as current threshold value on duty by turns.
S1612, the images to be recognized that the similarity is greater than to the maximum threshold value on duty by turns of the numerical value, cluster to the sample In the cluster list that this image is characterized.
It, will by the way that the similarity between image to be classified and sample image to be compared with the maximum threshold value on duty by turns of numerical value Similarity is greater than the image to be classified of the maximum threshold value on duty by turns of numerical value, clusters in the cluster list characterized to sample image.I.e. When the amount of images filtered out by the maximum threshold value on duty by turns of numerical value being made still to be greater than limiting threshold value, still meet numerical value for all The image to be classified of maximum threshold value screening conditions on duty by turns clusters in the cluster list characterized to sample image.
By above-mentioned clustering method, the amount of images for meeting cluster condition is controlled within the scope of limiting threshold value, or Similarity numerical value is greater than the max-thresholds of setting, and when only meeting any one in above-mentioned condition, cluster can succeed. The accuracy rate of cluster can be effectively improved by this way, while also can effectively be inhibited since accuracy rate is excessively high caused Qualified amount of images is limited, the problem of the amount of recalling deficiency when retrieval, while improving efficiency, taken into account accuracy rate with Retrieve the double requirements for the amount of recalling.
For example, the clustering method in present embodiment be used to net the tracking of rising star's object data volume and management.Setting is worked as First, second in the preceding period, third, the network temperatures of fourth four net rising star's objects it is higher, need the sound view to aforementioned four net rising star's object Frequency and image are tracked classification, audio-video and image that user in database uploads are classified, wherein audio-video is to mention The mode of frame picture carries out image classification in taking.
First using the facial image of first as sample image when classification, pass through similarity comparison model successively identification database The image of middle storage and the similarity of frame picture and the facial image of first, then with it is one the smallest in threshold list (for example, 0.60) start to be screened, the qualified image of first round the selection result and frame picture are 5000, greater than the limit of setting Threshold value (for example, 500) needs to carry out the second wheel screening, and threshold value when the second wheel screens is 0.65, passes through the sieve of the second wheel Choosing, qualified image and frame picture are 3000, and so on, the number of threshold value is continuously improved, until qualified When image and frame picture are less than or equal to 500, it is assumed that threshold value at this time is 0.85, then image of the similarity greater than 0.85 and frame are drawn Below the title of the corresponding audio-video classification value first in face.Identical mode also clusters second, third, fourth, due to the popularity of fourth The most hot, by screening layer by layer, when being screened in threshold list with 0.90 for threshold value, qualified image and frame are drawn Face is still 4000, and using maximum threshold value 0.95 screened when, there are also 1000 for qualified image and frame picture , at this point, this 1000 images and the corresponding audio-video of frame picture are clustered to the title of fourth.
In some embodiments, it in order to increase the efficiency that similarity compares, needs to carry out at dimensionality reduction image to be classified Reason.It is the flow diagram that the present embodiment carries out dimensionality reduction to image to be classified referring specifically to Fig. 5, Fig. 5.
As described in Figure 5, further include following step before step S1100:
S1011, the image to be classified is obtained;
In present embodiment, image to be classified includes: the frame picture for storing and extracting in image in the database and audio-video Image.In some embodiments, when the purpose of cluster is using facial image as sample image, the frame picture image of extraction is needed It is input in preset human face recognition model, whether there is face in detection frame picture image, it, should in the presence of facial image Frame picture can characterize the audio-video.Human face recognition model can be CNN convolutional neural networks model, VGG convolutional Neural net Network model, insightface human face recognition model or k-means computation model.It is successively read the image to be classified of storage.
S1012, the image to be classified is input in preset PCA dimensionality reduction model, the image to be classified is dropped It ties up to default dimension.
The image to be classified that will acquire is input in preset PCA dimensionality reduction model, carries out image dimensionality reduction, PCA dimensionality reduction model By the higher-dimension of single image data, the data acquisition system in higher dimensional space is converted by single image, it is carried out non-linear Dimensionality reduction.The one-dimensional representation vector for seeking its high dimensional data manifold intrinsic junction structure, as the feature representation vector of image data.
Dimensionality reduction model in present embodiment is not limited to this, in some embodiments, can use Nonlinear Dimension Reduction Method carries out dimensionality reduction to images to be recognized.
Dimensionality reduction the result is that by all images to be recognized dimensionality reductions to 128 dimensions, but the result of dimensionality reduction is not limited to this, not In same application environment, dimensionality reduction result can also be (being not limited to): 32,64,256,512 or 1024 dimension.
In some embodiments, sample image is also required to carry out dimension-reduction treatment, and dimensionality reduction is extremely identical with images to be recognized Dimension is in order to comparing.
Above embodiment improves the efficiency of similarity comparison by carrying out unified dimensionality reduction to images to be recognized.
In some embodiments it may be desirable to calculate the degree of association between image to be classified and sample image by model.Tool Body is referring to Fig. 6, Fig. 6 is the method flow diagram that the present embodiment passes through model calculating correlation.
As shown in fig. 6, further including following step after step S1012:
S1021, the image to be classified after dimensionality reduction is obtained;
After image to be classified dimensionality reduction, the image after obtaining the dimensionality reduction carries out the processing of similarity comparison.
S1022, the image to be classified after the dimensionality reduction is input in preset similarity judgment models, calculates the drop The degree of association between image to be classified and the sample image after dimension.
Image to be classified after dimensionality reduction is input in preset similarity judgment models, similarity judgment models are trained For comparing the model of image similarity.Specifically, similarity calculation can be CNN convolutional neural networks model, VGG volumes Product neural network model, insightface human face recognition model or k-means computation model.The training in advance of above-mentioned model is extremely Restrain or build, using sample image as reference, using images to be recognized as input, output then for sample image with it is to be identified Similarity between image.Similar to pass through normalized to output, output valve is between 0-1.
Calculated by using similarity of the model to images to be recognized and sample image, improve the efficiency of comparison with And the accuracy rate compared.
In some embodiments, carrying out clustering processing to image is for the higher use of accounting in flow in tracking platform Family, therefore, it is necessary to pass through the flow information Screening Samples image of user.It is that the present embodiment screens sample referring specifically to Fig. 7, Fig. 7 The flow diagram of this image.
As shown in fig. 7, further including following step before step S1100:
S1031, user list is obtained, wherein include the flow information of user in the user list;
In present embodiment, information classification tracking is carried out to the red user of net in platform by flow information, is mentioned for user For the method for more accurate systematic searching and distribution.
The user of platform generates the flow of downloading, Platform Server end when obtaining the audio-video and image of viewing platform Record the flow information of each user as unit of platform user, i.e., the flow that the picture and audio-video of each user's long pass generate, It is recorded in the account of the user.
User account all in platform is recorded in user list, the account information of user is stored in the user list And the downloading flow information that the account upload information generates in one period.But above- mentioned information are not only recorded in user list, In some embodiments, user list also records the click volume information or pageview information of account upload information.
S1032, descending arrangement is carried out to the user in the user list using the numerical value of flow as qualifications;
The user in user list is ranked up with the numerical values recited of flow, the mode of sequence is descending (descending) Mode be ranked up.In some embodiments, the foundation of sequence can also be according to (being not limited to) click volume information or clear The amount of looking at information is ranked up.
S1033, the facial image that user in the user list is successively confirmed by the first are the sample image.
According to sortord in user list, there is the first beginning successively to determine the facial image of the user in the account that sorts For sample image.System of real name name, login name, face value, gender, age, race, stage name, account and account in user account The combination that other number associated social accounts one or more of are worked as, can become the classification that the sample image is characterized Classification.
In some embodiments, clustering method only clusters the audio-video and image of partially netting red user, because This, is only appointed as sample image for the facial image of preceding ten perhaps first three ten user account of arrangement or believes flow Breath is more than that the facial image of the user account of certain numerical value (for example, 100G) is appointed as sample image.
Obtain sample image by using the mode of traffic statistics, can for the face in popular audio-video or image into Row cluster, it is red to trace into the more net of popular number, is effectively managed to popular video and the distribution of flow.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of image clustering device.
It is the present embodiment image clustering device basic structure schematic diagram referring specifically to Fig. 8, Fig. 8.
As shown in figure 8, a kind of image clustering device, comprising: obtain module 2100, processing module 2200 and execution module 2300.Wherein, module 2100 is obtained to be used to obtain the similarity between image to be classified and sample image;Processing module 2200 is used In similarity is compared with the current threshold value on duty by turns in preset threshold list;Execution module 2300 is used to be greater than when similarity When current threshold value on duty by turns, image to be classified is collected in the cluster list characterized to sample image.
Image clustering device gets the similarity between image to be classified and sample image by similarity comparison model, After obtaining the similarity, similarity is compared with preset threshold value, is classified according to comparison result to image to be classified, When similarity is greater than preset threshold value, image to be classified is collected in the cluster list characterized to present threshold value.Due to similar What degree calculated is the degree of association between image to be classified and sample image, and therefore, classification results are also directly linked with sample image, Meanwhile carrying out threshold value restriction to degree of association classification keeps classification results more accurate, passes through and replaces different sample image energy It is enough to realize to different classes of division, the diversity of classification is improved, simultaneously because inartificial intervention substantially increases point The efficiency of class.
In some embodiments, image clustering device further include: the first acquisition submodule, the first processing submodule and the One implementation sub-module.Wherein, the first acquisition submodule is used to obtain the number of image in cluster list;First processing submodule is used In the number of image is compared with preset limiting threshold value;First implementation sub-module is used to be greater than the limit when the number of image When threshold value, all images in release cluster list are into list to be sorted.
In some embodiments, image clustering device further include: the first confirmation submodule and the first cluster submodule.Its In, the first confirmation submodule is used to confirm that the threshold value on duty by turns for being arranged in current threshold value next bit on duty by turns to be classification thresholds, wherein under One threshold value on duty by turns is greater than current threshold value on duty by turns;First cluster submodule is used to treat classification by qualifications of classification thresholds Image in list is clustered.
In some embodiments, image clustering device further include: the second acquisition submodule and the second implementation sub-module.Its In, the second acquisition submodule is for obtaining the maximum threshold value on duty by turns of numerical value in threshold list;Second implementation sub-module is used for phase It is greater than the images to be recognized of the maximum threshold value on duty by turns of numerical value like degree, clusters in the cluster list characterized to sample image.
In some embodiments, image clustering device further include: third acquisition submodule and third implementation sub-module.Its In, third acquisition submodule is for obtaining image to be classified;Third implementation sub-module is default for image to be classified to be input to PCA dimensionality reduction model in, by image to be classified dimensionality reduction to default dimension.
In some embodiments, image clustering device further include: the 4th acquisition submodule and the 4th implementation sub-module.Its In, the 4th acquisition submodule is for obtaining the image to be classified after dimensionality reduction;4th implementation sub-module be used for by after dimensionality reduction to point Class image is input in preset similarity judgment models, being associated between the image to be classified and sample image after calculating dimensionality reduction Degree.
In some embodiments, image clustering device further include: the 5th acquisition submodule, first order submodule and One arrangement submodule.Wherein, the 5th acquisition submodule is for obtaining user list, wherein includes the stream of user in user list Measure information;First order submodule is used to carry out descending row to the user in user list using the numerical value of flow as qualifications Column;5th implementation sub-module is used to successively confirm by the first that the facial image of user in user list to be sample image.
In order to solve the above technical problems, the embodiment of the present invention also provides computer equipment.It is this referring specifically to Fig. 9, Fig. 9 Embodiment computer equipment basic structure block diagram.
As shown in figure 9, the schematic diagram of internal structure of computer equipment.As shown in figure 9, the computer equipment includes passing through to be Processor, non-volatile memory medium, memory and the network interface of bus of uniting connection.Wherein, the computer equipment is non-easy The property lost storage medium is stored with operating system, database and computer-readable instruction, can be stored with control information sequence in database Column, when which is executed by processor, may make processor to realize a kind of image clustering method.The computer is set Standby processor supports the operation of entire computer equipment for providing calculating and control ability.The storage of the computer equipment It can be stored with computer-readable instruction in device, when which is executed by processor, processor may make to execute one Kind image clustering method.The network interface of the computer equipment is used for and terminal connection communication.Those skilled in the art can manage It solves, structure shown in Fig. 9, only the block diagram of part-structure relevant to application scheme, is not constituted to the application side The restriction for the computer equipment that case is applied thereon, specific computer equipment may include more more or less than as shown in the figure Component, perhaps combine certain components or with different component layouts.
Processor obtains module 2100, processing module 2200 and execution module for executing in present embodiment in Fig. 8 2300 concrete function, program code and Various types of data needed for memory is stored with the above-mentioned module of execution.Network interface is used for To the data transmission between user terminal or server.Memory in present embodiment is stored with facial image critical point detection Program code needed for executing all submodules in device and data, server are capable of the program code and data of invoking server Execute the function of all submodules.
The similarity between image to be classified and sample image is got by similarity comparison model in computer equipment, After obtaining the similarity, similarity is compared with preset threshold value, is classified according to comparison result to image to be classified, When similarity is greater than preset threshold value, image to be classified is collected in the cluster list characterized to present threshold value.Due to similar What degree calculated is the degree of association between image to be classified and sample image, and therefore, classification results are also directly linked with sample image, Meanwhile carrying out threshold value restriction to degree of association classification keeps classification results more accurate, passes through and replaces different sample image energy It is enough to realize to different classes of division, the diversity of classification is improved, simultaneously because inartificial intervention substantially increases point The efficiency of class.
The present invention also provides a kind of storage mediums for being stored with computer-readable instruction, and the computer-readable instruction is by one When a or multiple processors execute, so that one or more processors execute any of the above-described embodiment described image clustering method Step.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

Claims (12)

1. a kind of image clustering method, which is characterized in that include the following steps:
Obtain the similarity between image to be classified and sample image;
The similarity is compared with the current threshold value on duty by turns in preset threshold list;
When the similarity is greater than the current threshold value on duty by turns, the image to be classified is collected to sample image institute table In the cluster list of sign;
Obtain the number of image in the cluster list;
The number of described image is compared with preset limiting threshold value;
When the number of described image is greater than the limiting threshold value, all images clustered in list are discharged to column to be sorted In table;
The threshold value on duty by turns that confirmation is arranged in the current threshold value next bit on duty by turns is classification thresholds, wherein the wheel of the next bit It is worth threshold value and is greater than the current threshold value on duty by turns;
The image in the list to be sorted is clustered using the classification thresholds as qualifications.
2. image clustering method according to claim 1, which is characterized in that when the threshold value in threshold list is replaced to inverse Second, and screen amount of images be greater than setting the limiting threshold value when, the method also includes following step:
Obtain the maximum threshold value on duty by turns of numerical value in the threshold list;
The similarity is greater than to the images to be recognized of the maximum threshold value on duty by turns of the numerical value, cluster to sample image institute table In the cluster list of sign.
3. image clustering method according to claim 1, which is characterized in that obtain between image to be classified and sample image Similarity the step of before, further include following step:
Obtain the image to be classified;
The image to be classified is input in preset PCA dimensionality reduction model, by the image to be classified dimensionality reduction to default dimension Degree.
4. image clustering method according to claim 3, which is characterized in that it is described the image to be classified is input to it is pre- If PCA dimensionality reduction model in, further include following step after the step of the image to be classified dimensionality reduction to default dimension:
Image to be classified after obtaining dimensionality reduction;
Image to be classified after the dimensionality reduction is input in preset similarity judgment models, after calculating the dimensionality reduction to point The degree of association between class image and the sample image.
5. image clustering method according to claim 1, which is characterized in that the acquisition image to be classified and sample image Between similarity the step of before, further include following step:
Obtain user list, wherein include the flow information of user in the user list;
Descending arrangement is carried out to the user in the user list using the numerical value of flow as qualifications;
The facial image that user in the user list is successively confirmed by the first is the sample image.
6. a kind of image clustering device characterized by comprising
Module is obtained, for obtaining the similarity between image to be classified and sample image;
Processing module, for the similarity to be compared with the current threshold value on duty by turns in preset threshold list;
Execution module, for when the similarity is greater than the current threshold value on duty by turns, the image to be classified to be collected to institute It states in the cluster list that sample image is characterized;
First acquisition submodule, for obtaining the number of image in the cluster list;
First processing submodule, for the number of described image to be compared with preset limiting threshold value;
First implementation sub-module, for discharging in the cluster list when the number of described image is greater than the limiting threshold value All images into list to be sorted;
First confirmation submodule, for confirming that being arranged in the threshold value on duty by turns of the current threshold value next bit on duty by turns is classification thresholds, Wherein, the threshold value on duty by turns of the next bit is greater than the current threshold value on duty by turns;
First cluster submodule, for gathering using the classification thresholds as qualifications to the image in the list to be sorted Class.
7. image clustering device according to claim 6, which is characterized in that described image clustering apparatus further include:
Second acquisition submodule, for obtaining the maximum threshold value on duty by turns of numerical value in the threshold list;
Second implementation sub-module gathers for the similarity to be greater than to the images to be recognized of the maximum threshold value on duty by turns of the numerical value In the cluster list that class to the sample image is characterized.
8. image clustering device according to claim 6, which is characterized in that described image clustering apparatus further include:
Third acquisition submodule, for obtaining the image to be classified;
Third implementation sub-module will be described wait divide for the image to be classified to be input in preset PCA dimensionality reduction model Class image dimensionality reduction is to default dimension.
9. image clustering device according to claim 8, which is characterized in that described image clustering apparatus further include:
4th acquisition submodule, for obtaining the image to be classified after dimensionality reduction;
4th implementation sub-module, for the image to be classified after the dimensionality reduction to be input in preset similarity judgment models, The degree of association between image to be classified and the sample image after calculating the dimensionality reduction.
10. image clustering device according to claim 6, which is characterized in that described image clustering apparatus further include:
5th acquisition submodule, for obtaining user list, wherein include the flow information of user in the user list;
First order submodule, for carrying out descending row to the user in the user list using the numerical value of flow as qualifications Column;
5th implementation sub-module, for successively confirming that the facial image of user in the user list is the sample by the first This image.
11. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, institute When stating computer-readable instruction and being executed by the processor, so that the processor executes any one of claims 1 to 5 such as and weighs Benefit requires the step of described image clustering method.
12. a kind of storage medium for being stored with computer-readable instruction, the computer-readable instruction is handled by one or more When device executes, so that one or more processors execute such as any one of claims 1 to 5 claim described image cluster side The step of method.
CN201810653849.7A 2018-06-22 2018-06-22 Image clustering method, device, computer equipment and storage medium Active CN108875834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810653849.7A CN108875834B (en) 2018-06-22 2018-06-22 Image clustering method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810653849.7A CN108875834B (en) 2018-06-22 2018-06-22 Image clustering method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108875834A CN108875834A (en) 2018-11-23
CN108875834B true CN108875834B (en) 2019-08-20

Family

ID=64294298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810653849.7A Active CN108875834B (en) 2018-06-22 2018-06-22 Image clustering method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108875834B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597979B (en) * 2018-12-17 2023-05-12 北京嘀嘀无限科技发展有限公司 Target object clustering method and device
CN109948734B (en) * 2019-04-02 2022-03-29 北京旷视科技有限公司 Image clustering method and device and electronic equipment
CN110490057B (en) * 2019-07-08 2020-10-27 光控特斯联(上海)信息科技有限公司 Self-adaptive identification method and system based on human face big data artificial intelligence clustering
CN110443297B (en) * 2019-07-30 2022-06-07 浙江大华技术股份有限公司 Image clustering method and device and computer storage medium
CN110626674A (en) * 2019-08-26 2019-12-31 江汉大学 Garbage classification device and garbage classification method
CN110750661B (en) * 2019-09-04 2022-09-16 成都华为技术有限公司 Method, device, computer equipment and storage medium for searching image
US20230245421A1 (en) * 2021-03-18 2023-08-03 Boe Technology Group Co., Ltd. Face clustering method and apparatus, classification storage method, medium and electronic device
CN117056547B (en) * 2023-10-13 2024-01-26 深圳博十强志科技有限公司 Big data classification method and system based on image recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method
CN101271528A (en) * 2008-04-11 2008-09-24 北京中星微电子有限公司 Method and device for outputting image
CN101655909A (en) * 2008-08-21 2010-02-24 索尼(中国)有限公司 Device and method for calculating matching degree
US7936929B2 (en) * 2005-06-09 2011-05-03 Canon Kabushiki Kaisha Image processing method and apparatus for removing noise from a document image
JP5699595B2 (en) * 2010-12-24 2015-04-15 株式会社ニコン Image processing apparatus, image classification program, and image display program
CN104899579A (en) * 2015-06-29 2015-09-09 小米科技有限责任公司 Face recognition method and face recognition device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9984199B2 (en) * 2015-05-21 2018-05-29 Ge Healthcare Bio-Sciences Corp. Method and system for classification and quantitative analysis of cell types in microscopy images
CN107909104B (en) * 2017-11-13 2023-07-18 腾讯数码(天津)有限公司 Face clustering method and device for pictures and storage medium
CN108121816B (en) * 2017-12-28 2020-09-08 Oppo广东移动通信有限公司 Picture classification method and device, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7936929B2 (en) * 2005-06-09 2011-05-03 Canon Kabushiki Kaisha Image processing method and apparatus for removing noise from a document image
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method
CN101271528A (en) * 2008-04-11 2008-09-24 北京中星微电子有限公司 Method and device for outputting image
CN101655909A (en) * 2008-08-21 2010-02-24 索尼(中国)有限公司 Device and method for calculating matching degree
JP5699595B2 (en) * 2010-12-24 2015-04-15 株式会社ニコン Image processing apparatus, image classification program, and image display program
CN104899579A (en) * 2015-06-29 2015-09-09 小米科技有限责任公司 Face recognition method and face recognition device

Also Published As

Publication number Publication date
CN108875834A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
CN108875834B (en) Image clustering method, device, computer equipment and storage medium
CN110263613A (en) Monitor video processing method and processing device
Ahmad et al. How deep features have improved event recognition in multimedia: A survey
CN108595461B (en) Interest exploration method, storage medium, electronic device and system
CN111914937A (en) Lightweight improved target detection method and detection system
CN102207954A (en) Electronic apparatus, content recommendation method and program therefor
CN111209440A (en) Video playing method, device and storage medium
CN1599904A (en) Adaptive environment system and method of providing an adaptive environment
CN110717058B (en) Information recommendation method and device and storage medium
CN106878767A (en) Video broadcasting method and device
CN106339507A (en) Method and device for pushing streaming media message
CN111460195B (en) Picture processing method and device, storage medium and electronic equipment
CN109195011B (en) Video processing method, device, equipment and storage medium
CN110162643A (en) Electron album report-generating method, device and storage medium
Voulodimos et al. Improving multi-camera activity recognition by employing neural network based readjustment
CN112765373A (en) Resource recommendation method and device, electronic equipment and storage medium
CN112487207A (en) Image multi-label classification method and device, computer equipment and storage medium
CN105791674A (en) Electronic device and focusing method
Papadopoulos et al. Automatic summarization and annotation of videos with lack of metadata information
CN109151488A (en) According to the method and system of user behavior real-time recommendation direct broadcasting room
CN106407434A (en) Video pushing method and system
CN112328888A (en) Information recommendation method and device, server and storage medium
Cheng et al. Semantically-driven automatic creation of training sets for object recognition
DE112016004160T5 (en) UI for video summaries
CN114363565A (en) Video polling method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant