CN113672751B

CN113672751B - Background similar picture clustering method and device, electronic equipment and storage medium

Info

Publication number: CN113672751B
Application number: CN202110729370.9A
Authority: CN
Inventors: 田春霖; 蒋泽锟; 严宋扬; 阮书宁
Original assignee: Xi'an Xinxin Information Technology Co ltd
Current assignee: Xi'an Xinxin Information Technology Co ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2022-07-01
Anticipated expiration: 2041-06-29
Also published as: CN113672751A

Abstract

The invention discloses a clustering method and a device of background similar pictures, an electronic device and a storage medium, wherein the clustering method comprises the following steps: constructing an undirected graph G, wherein the undirected graph G is represented by a contiguous matrix, and pictures are nodes of the undirected graph G; removing all nodes with the core degree smaller than k0 in the undirected graph G to obtain a plurality of subgraphs G1, wherein the subgraph G1 is a strong relationship cluster, and k0 is a turning point of an affinity and frequency relationship graph; and dividing a first non-strong relation node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the high confidence threshold is the highest point of the affinity and frequency relation graph. The invention combines a strong correlation sub-graph algorithm to mine the strong correlation relationship between different pictures, uses the uncorrelated sub-graph algorithm to find the strong relationship of corresponding entities on the basis of the obtained adjacent matrix, clusters the pictures and solves the problem of 'normal pictures'.

Description

Background similar picture clustering method and device, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a method and a device for clustering background similar pictures, an electronic device and a storage medium.

Background

At present, on the background clustering of pictures, common clustering algorithms are generally used, for example, k-means (k-means), k-centers (k-means), spectral clustering, Affinity diffusion clustering (Affinity prediction), and the like, to process feature information.

These clustering algorithms cannot cope well with the problem of "normal pictures" in picture background clustering, and "normal pictures" are a large number of pictures that do not belong to any cluster in picture background clustering, and a general clustering algorithm cannot process such pictures well.

Common clustering algorithms cannot well deal with the problem of 'normal pictures' in image background clustering, and the 'normal pictures' are pictures for handling business normally, which occupy most of the pictures and do not belong to any fraudulent clustering cluster, and can influence the final result to a great extent.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a method and an apparatus for clustering background-similar pictures, an electronic device, and a storage medium. The technical problem to be solved by the invention is realized by the following technical scheme:

a clustering method of background similar pictures comprises the following steps:

constructing an undirected graph G, wherein the undirected graph G is represented by a contiguous matrix, and pictures are nodes of the undirected graph G;

removing all nodes with the core degree smaller than k0 in the undirected graph G to obtain a plurality of sub-graphs G1, wherein the sub-graph G1 is a strong relation cluster, and k0 is a turning point of an affinity and frequency relation graph;

and classifying a first non-strong relation node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the high confidence threshold is the highest point of the affinity and frequency relation graph.

In an embodiment of the present invention, all the nodes with a core degree smaller than k0 in the undirected graph G are removed, resulting in a subgraph G1, which includes:

obtaining a relation graph of affinity and frequency;

calculating turning points of the affinity and frequency relation graph;

and all nodes with the core degrees smaller than the turning points of the affinity-frequency relation graph in the undirected graph G are removed to obtain the sub-graphs G1.

In one embodiment of the present invention, obtaining an affinity vs. frequency graph comprises:

counting the affinity of each node to the other nodes;

and obtaining a relation graph of the affinity and the frequency according to all the affinities and all the frequencies.

In one embodiment of the present invention, calculating the inflection point of the affinity vs. frequency graph comprises:

and calculating the turning point of the affinity and frequency relation graph by a Petit algorithm.

In an embodiment of the present invention, partitioning a first non-strong relationship node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph includes:

finding the high confidence threshold in the affinity vs. frequency graph;

counting the nodes with the affinity greater than the high confidence threshold value in the first non-strong relationship nodes to obtain second non-strong relationship nodes;

calculating the affinities between the second non-strongly related node and the nodes in all the sub-graph G1 to obtain the minimum affinity m of the second non-strongly related node and the node in each sub-graph G1;

merging the second non-strong relationship node into the sub-graph G1 corresponding to the maximum affinity of all the minimum affinities m.

In an embodiment of the present invention, after dividing the first non-strong relationship node into the corresponding sub-graph G1 according to the high confidence threshold of the affinity-frequency relationship graph, the method further includes:

and obtaining the confidence of each node according to a confidence calculation formula.

In one embodiment of the present invention, the confidence calculation formula is:

wherein p represents confidence, x represents the affinity of the node belonging to the corresponding strong relation cluster, and t₁、t₂And when t represents the highest point of the affinity and frequency relation graph or the turning point of the affinity and frequency relation graph, the t uses the affinity corresponding to the highest point of the affinity and frequency relation graph or the turning point of the affinity and frequency relation graph.

An embodiment of the present invention further provides a device for clustering background similar pictures, including:

the device comprises a construction module, a searching module and a judging module, wherein the construction module is used for constructing an undirected graph G, the undirected graph G is represented by an adjacent matrix, and pictures are nodes of the undirected graph G;

a removing module, configured to remove all nodes with a core degree smaller than k0 in the undirected graph G to obtain a plurality of subgraphs G1, where the subgraph G1 is a strong relationship cluster, and k0 is a turning point of an affinity-frequency relationship graph;

and the clustering module is used for dividing a first non-strong relation node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the high confidence threshold is the highest point of the affinity and frequency relation graph.

An embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor, configured to implement the steps of the clustering method for background similar pictures according to any of the above embodiments when the computer program is executed.

An embodiment of the present invention further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the clustering method for the background similar pictures described in any of the above embodiments are implemented.

The invention has the beneficial effects that:

the invention combines a strong correlation sub-graph algorithm to mine the strong correlation relationship between different pictures, uses the uncorrelated sub-graph algorithm to find the strong relationship of corresponding entities on the basis of the obtained adjacent matrix, clusters the pictures and solves the problem of 'normal pictures'.

The invention provides a background similar picture background clustering method based on a strong relation subgraph, which can detect similar background pictures generated in various different scenes on the basis of an obtained adjacency matrix and provide information which can be referred to by a user.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

Fig. 1 is a schematic flow chart of a method for clustering background similar pictures according to an embodiment of the present invention;

FIG. 2 is a graph of affinity versus frequency according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a background similar picture clustering apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for clustering background similar pictures according to an embodiment of the present invention. The embodiment of the invention provides a clustering method of background similar pictures, which specifically comprises the following steps 1-4, wherein:

step 1, constructing an undirected graph G, wherein the undirected graph G is represented by an adjacent matrix, and the picture is a node of the undirected graph G.

Specifically, it is first necessary to obtain an adjacency matrix, and the adjacency matrix is used to represent a graph (graph) including nodes and edges, and representing the relationship between the nodes, where, assuming that a picture (picture) is a node of the graph, an edge is a degree of association between pictures, and the greater the degree of association, the greater the similarity between pictures, so the embodiment uses the adjacency matrix to represent the undirected graph G.

The adjacency matrix may be a picture distance matrix or an affinity matrix.

And 2, removing all nodes with the core degree smaller than k0 in the undirected graph G to obtain a plurality of subgraphs G1, wherein the subgraph G1 is a strong relationship cluster, and k0 is a turning point of the affinity and frequency relationship graph.

In a particular embodiment, step 2 may particularly comprise steps 2.1 to 2.3, wherein:

step 2.1, please refer to fig. 2, to obtain a graph of the relationship between affinity and frequency.

In this embodiment, step 2.1 may specifically include steps 2.11 to 2.12, where:

and 2.11, counting the affinity from each node to other nodes, wherein the affinity is an element of the adjacency matrix and is inversely proportional to the distance between the nodes.

Specifically, this embodiment needs to count the affinity between every two nodes in the adjacency matrix, i.e., the affinity from one node to the rest of the nodes.

And 2.12, obtaining a relation graph of the affinity and the frequency according to all the affinities and all the frequencies.

Specifically, the frequency is the number of occurrences of each element in the adjacency matrix, and referring to fig. 2, a graph of the relationship between affinity and frequency can be constructed on the basis of obtaining all the affinities and the frequency.

And 2.2, calculating the turning point of the affinity and frequency relation graph.

Specifically, the turning point of the affinity-frequency relationship graph, i.e., the threshold point 2 in fig. 2, is calculated by the pettit algorithm (pettit).

And 2.3, all nodes with the core degrees smaller than the turning points of the affinity-frequency relation graph in the undirected graph G are removed to obtain a plurality of sub-graphs G1.

Specifically, the core degrees of the nodes in the undirected graph G and the turning points of the affinity-frequency relationship graph are compared, the nodes with the core degrees smaller than the turning points of the affinity-frequency relationship graph are removed, the removed nodes are first non-strong relationship nodes, the remaining nodes are strong relationship nodes, a plurality of nodes which are interconnected through edges in the remaining strong relationship nodes form a sub-graph G1, and each sub-graph G1 is a strong relationship entity, that is, the sub-graph G1 is a strong relationship cluster.

And 3, dividing the first non-strong relation node into the corresponding subgraph G1 according to the high confidence threshold of the affinity and frequency relation graph, wherein the high confidence threshold is the highest point of the affinity and frequency relation graph.

The strong relational entities obtained in the strong relationship mining are actually one-by-one high-confidence spoofed clusters, and there are a large number of non-strong relational entities outside the strong relational entities, which appear as nodes on the graph, so that the nodes need to be selectively merged into the clusters.

In a particular embodiment, step 3 may particularly comprise steps 3.1 to 3.4, wherein:

and 3.1, finding a high confidence threshold value in the affinity and frequency relation graph.

Specifically, the highest point, i.e. the threshold point 1 in fig. 2, is found in the affinity-frequency relationship diagram, and the affinity corresponding to this highest point serves as the high confidence threshold.

And 3.2, counting the nodes with the affinity greater than the high confidence threshold value in the first non-strong relationship nodes to obtain second non-strong relationship nodes.

In this embodiment, the first non-strong relationship nodes are the nodes removed in step 2.3, and these nodes need to be moved into the corresponding strong relationship clusters, so that the nodes whose affinities are greater than the high confidence threshold value in all the first non-strong relationship nodes need to be counted, and the nodes whose affinities are greater than the high confidence threshold value in the first non-strong relationship nodes are taken as the second non-strong relationship nodes.

And 3.3, calculating the affinity between the second non-strong relationship node and the nodes in all the subgraph G1 to obtain the minimum affinity m between the second non-strong relationship node and the node in each subgraph G1.

Specifically, for each second non-strongly related node, the affinity between the second non-strongly related node and the node in each sub-graph G1 needs to be calculated, so that the node with the smallest affinity between one second non-strongly related node and one sub-graph G1 can be obtained, and therefore the minimum affinity m is the minimum value of the affinities between one second non-strongly related node and all nodes of one sub-graph G1.

And 3.4, merging the second non-strong relation nodes into a subgraph G1 corresponding to the maximum affinity in all the minimum affinities m.

Specifically, for each second non-strong relationship node, there is a minimum affinity m with each sub-graph G1, so the largest m can be selected from all the minimum affinities m corresponding to the second non-strong relationship nodes, and the second non-strong relationship nodes are divided into the sub-graph G1 corresponding to the largest m, so as to complete the clustering of the second non-strong relationship nodes.

And 4, obtaining the confidence coefficient of each node according to a confidence coefficient calculation formula.

The confidence coefficient represents the probability that the corresponding node belongs to a certain cluster, namely the probability that a certain picture belongs to a certain cluster, and on the basis of node merging, the following calculation is carried out on each node according to a confidence coefficient calculation formula, wherein the confidence coefficient calculation formula is as follows:

According to the above formula, the present embodiment can calculate the confidence value of each node, thereby giving more information to the user.

According to the method, the threshold point 1 and the threshold point 2 are determined through the affinity-frequency relation graph, the pictures are sequentially merged into different clustering clusters, the affinity-frequency relation graph can adaptively reduce the interference of noise data on the result, particularly the interference of a 'normal picture', and the identification precision is improved.

The picture background clustering method provided by the invention is mainly used for clustering fraudulent background pictures, and the picture background clustering method provided by the invention comprises three main steps, namely: mining strong relation, merging nodes and calculating confidence coefficient.

The invention processes the obtained adjacent matrix through an irrelevant subgraph algorithm to obtain a strong relation entity, and finally finds out the most possible fraudulent picture cluster.

Example two

Referring to fig. 3, fig. 3 is a schematic diagram of a clustering device for background similar pictures according to an embodiment of the present invention. The clustering device for the background similar pictures comprises:

In an embodiment of the present invention, the removing module may be specifically configured to obtain an affinity-frequency relationship graph; calculating turning points of the affinity and frequency relation graph; and all nodes with the core degrees smaller than the turning points of the affinity and frequency relation graph in the undirected graph G are removed to obtain a plurality of sub-graphs G1.

In an embodiment of the present invention, the clustering module may be specifically configured to find the high confidence threshold in the affinity-frequency relationship graph; counting the nodes with the affinity greater than the high confidence threshold value in the first non-strong relationship nodes to obtain second non-strong relationship nodes; calculating affinities between the second non-strongly related node and nodes in all of the subgraph G1 to obtain a minimum affinity m of the second non-strongly related node and each of the subgraph G1; merging the second non-strongly related node into the sub-graph G1 corresponding to the maximum affinity of all the minimum affinities m.

The clustering device for the background similar pictures provided by the embodiment of the invention can execute the method embodiment, and the implementation principle and the technical effect are similar, so that the implementation principle and the technical effect are not described again.

EXAMPLE III

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 1100 comprises: the system comprises a processor 1101, a communication interface 1102, a memory 1103 and a communication bus 1104, wherein the processor 1101, the communication interface 1102 and the memory 1103 are communicated with each other through the communication bus 1104;

a memory 1103 for storing a computer program;

the processor 1101, when executing the computer program, implements the above method steps.

The processor 1101, when executing the computer program, implements the following steps:

step 1, constructing an undirected graph G, wherein the undirected graph G is represented by an adjacent matrix, and pictures are nodes of the undirected graph G;

step 2, all nodes with the core degree smaller than k0 in the undirected graph G are removed to obtain a plurality of sub graphs G1, wherein the sub graphs G1 are strong relation clusters, and k0 is turning points of an affinity and frequency relation graph;

and 3, dividing a first non-strong relation node into the corresponding sub graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the high confidence threshold is the highest point of the affinity and frequency relation graph.

The electronic device provided by the embodiment of the present invention can execute the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

Example four

Yet another embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

and 3, dividing a first non-strong relation node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the high confidence threshold is the highest point of the affinity and frequency relation graph.

The computer-readable storage medium provided by the embodiment of the present invention may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device), or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "module" or "system. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The computer program is stored/distributed on a suitable medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples described in this specification can be combined and combined by those skilled in the art.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A clustering method of background similar pictures is characterized by comprising the following steps:

constructing an undirected graph G, wherein the undirected graph G is represented by an adjacent matrix and comprises nodes and edges, pictures are the nodes of the undirected graph G, the edges are the correlation degrees among the pictures, and the larger the correlation degree is, the larger the similarity among the pictures is;

removing all nodes with the core degree smaller than k0 in the undirected graph G to obtain a plurality of sub-graphs G1, wherein the removed nodes are first non-strong relationship nodes, the sub-graph G1 is a strong relationship cluster, and k0 is a turning point of an affinity and frequency relationship graph;

dividing a first non-strong relation node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the affinity is an element of an adjacent matrix, the affinity is inversely proportional to the distance between the nodes, the frequency is the occurrence frequency of each element in the adjacent matrix, and the high confidence threshold is the highest point of the affinity and frequency relation graph;

obtaining the confidence of each node according to a confidence calculation formula, wherein the confidence calculation formula is as follows:

wherein p represents confidence, x represents the affinity of the node belonging to the corresponding strong relation cluster, and t₁、t₂When t represents the highest point of the affinity and frequency relation graph or the turning point of the affinity and frequency relation graph, the t uses the affinity corresponding to the highest point of the affinity and frequency relation graph or the turning point of the affinity and frequency relation graph;

dividing a first non-strong relation node into the corresponding sub-graph G1 according to the high confidence threshold of the affinity and frequency relation graph, including:

finding the high confidence threshold in the affinity vs. frequency graph;

merging the second non-strong relationship node into the sub-graph G1 corresponding to the maximum affinity in all the minimum affinities m.

2. The method for clustering background similar pictures according to claim 1, wherein all nodes with a core degree smaller than k0 in the undirected graph G are removed to obtain a sub-graph G1, comprising:

obtaining a relation graph of affinity and frequency;

calculating the turning point of the affinity and frequency relation graph;

and all the nodes with the core degrees smaller than the turning points of the affinity and frequency relation graph in the undirected graph G are removed to obtain the sub-graphs G1.

3. The method of claim 2, wherein obtaining the affinity-frequency relationship graph comprises:

counting the affinity of each node to the other nodes;

4. The method of claim 2, wherein calculating the turning point of the affinity-frequency relationship graph comprises:

5. A background similar picture clustering device is characterized by comprising:

the device comprises a construction module and a processing module, wherein the construction module is used for constructing an undirected graph G, the undirected graph G is represented by an adjacent matrix and comprises nodes and edges, pictures are the nodes of the undirected graph G, the edges are the correlation degrees among the pictures, and the larger the correlation degree is, the larger the similarity degree is;

a removing module, configured to remove all nodes with a core degree smaller than k0 in the undirected graph G to obtain a plurality of sub-graphs G1, where the removed nodes are first non-strong-relationship nodes, the sub-graph G1 is a strong-relationship cluster, and k0 is a turning point of an affinity-frequency relationship graph;

the clustering module is used for dividing a first non-strong relation node into the corresponding sub-graph G1 according to a high confidence threshold of the affinity and frequency relation graph, wherein the affinity is an element of an adjacent matrix, the affinity is inversely proportional to the distance between the nodes, the frequency is the occurrence frequency of each element in the adjacent matrix, and the high confidence threshold is the highest point of the affinity and frequency relation graph;

finding the high confidence threshold in the affinity vs. frequency graph;

calculating affinities between the second non-strongly related node and nodes in all of the subgraph G1 to obtain a minimum affinity m of the second non-strongly related node and each node in the subgraph G1;

6. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-4 when executing the computer program.

7. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-4.