CN112308162A - Image big data similarity comparison method and system - Google Patents
Image big data similarity comparison method and system Download PDFInfo
- Publication number
- CN112308162A CN112308162A CN202011232317.XA CN202011232317A CN112308162A CN 112308162 A CN112308162 A CN 112308162A CN 202011232317 A CN202011232317 A CN 202011232317A CN 112308162 A CN112308162 A CN 112308162A
- Authority
- CN
- China
- Prior art keywords
- image
- comparison
- big data
- images
- comparing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 9
- 239000003086 colorant Substances 0.000 claims description 7
- 230000009193 crawling Effects 0.000 claims description 2
- 241000196324 Embryophyta Species 0.000 description 3
- 244000025254 Cannabis sativa Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
According to the image big data similarity comparison method, image data needing comparison is captured through a crawler capture technology and big data search, the captured image data are analyzed, managed and structurally stored, classification comparison is conducted according to the attributes of the images, the comparison process includes calculating the features contained in the images and generating a group of fingerprints, the fingerprints of the images are compared to judge the similarity of the images, the image identification efficiency can be improved, and the image big data similarity comparison method can be combined with the big data to achieve comparison of the image similarity in a big data range.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method and a system for comparing image big data similarity.
Background
The existing image anti-infringement consciousness and function are not perfect enough, the intellectual property of an image creator cannot be protected, a user who possibly uses an infringement image on a website is difficult to collect, and effective evidence cannot be collected. The reason is that the existing image similarity comparison algorithms are more, but detailed optimization of specific types of images cannot be achieved, so that the comparison threshold of the similarity reaches 95% or above. Without an image comparison tool combined with big data, comparison of image similarity within a big data range cannot be achieved. Resulting in less efficient and accurate image contrast.
Through retrieval, the Chinese invention patent is as follows: an image similarity comparison method based on a perceptual hash algorithm (application No. 202010177648.1, application date 20200313) discloses an image similarity comparison method based on a perceptual hash algorithm. The method comprises the steps of compressing an image through a Discrete Cosine Transform (DCT) algorithm, reducing the size of the image through pHash to obtain a color channel R, G, B of the image, calculating an average value of RGB, graying the image, extracting fingerprints in each image by adopting a color distribution method and a content characteristic method respectively, compressing an original image into a grayscale image with a small fixed size to determine a threshold, converting the image into a black-and-white image to compare the outline of the image, and comparing the basic fingerprint, the color characteristic fingerprint and the content characteristic fingerprint in multiple dimensions to obtain a similarity result. However, the method has the disadvantages that the image basic fingerprint, the color characteristic fingerprint and the content characteristic fingerprint need to be compared in multiple dimensions, and is complex and low in efficiency.
Disclosure of Invention
1. Technical problem to be solved by the invention
The invention aims to solve the problem that the existing image contrast is low in efficiency and accuracy.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the image big data similarity comparison method comprises the steps of searching and capturing image data needing comparison through a crawler capture technology and big data, conducting structured storage after analyzing and managing the captured image data, conducting classification comparison according to the attributes of images, calculating the features contained in the images in the comparison process, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
Preferably, the method comprises the steps of:
s100, image capture, wherein the required image data is captured in a specific website through a crawler capture technology;
s200, analyzing and managing, namely performing multi-dimensional structured division on captured image data;
s300, comparing the images, calculating the characteristics contained in the images, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
Preferably, in step S100, the image data required for crawler capture is selected from a high-definition image and a thumbnail image within 500 pixels.
Preferably, in step S200, the multidimensional structured division of the image data specifically includes the following dimensions:
preferably, the image comparison in step S300 includes the following steps:
s310, unifying the sizes, and reducing the image to the same size to obtain N pixels;
s320, simplifying colors, and converting the reduced image into N-level gray;
s330, calculating an average value, and calculating the gray level average value of all N pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain an N number which is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
Preferably, N is 64, and the method specifically comprises the steps of:
s310, unifying the sizes, and reducing the size of the image to the same size of 8x8 to obtain 64 pixels;
s320, simplifying colors, and converting the reduced image into 64-level gray scale;
s330, calculating an average value, and calculating the gray level average value of all 64 pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain a 64-bit number, wherein the number is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
Preferably, in step S360, when the overlapping rate is greater than 90%, it is determined that the heights are approximate; when the coincidence rate is between 70 and 90 percent, judging the coincidence rate to be approximate; when the coincidence rate is between 50 and 70%, the judgment is generally similar, and when the coincidence rate is less than 50%, the judgment is not similar.
The image big data similarity comparison system comprises an image collection module, an image processing module and an image comparison module which are sequentially in communication connection, wherein the image collection module is used for capturing required image data from a specific website, the image processing module is used for analyzing and managing the captured image data, and the image comparison module is used for comparing images.
Preferably, still including image database, big data search module and the big data comparison module of communication connection in proper order, image database and image processing module communication connection, image database is used for the image data after the processing of structuralization storage image processing module, big data search module is arranged in following image database and searches for fast and fix a position picture information, big data comparison module is used for comparing at the image data of big data rank and looks for the work.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:
according to the image big data similarity comparison method, image data needing comparison is captured through a crawler capture technology and big data search, the captured image data are analyzed, managed and structurally stored, classification comparison is conducted according to the attributes of the images, the comparison process includes calculating the features contained in the images and generating a group of fingerprints, the fingerprints of the images are compared to judge the similarity of the images, the image identification efficiency can be improved, and the image big data similarity comparison method can be combined with the big data to achieve comparison of the image similarity in a big data range.
Drawings
FIG. 1 is a flowchart of a method for comparing similarity of big data of an image according to the present invention;
FIG. 2 is a detailed flowchart of step S300 according to the present invention;
fig. 3 is a schematic structural diagram of the system of the present invention.
The reference numerals in the schematic drawings illustrate:
100. an image collection module; 200. an image processing module; 300. an image comparison module; 400. an image database; 500. a big data searching module; 600. and a big data comparison module.
Detailed Description
In order to facilitate an understanding of the invention, the invention will now be described more fully hereinafter with reference to the accompanying drawings, in which several embodiments of the invention are shown, but which may be embodied in many different forms and are not limited to the embodiments described herein, but rather are provided for the purpose of providing a more thorough disclosure of the invention.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present; when an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present; the terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention; as used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Example 1
Referring to fig. 1 to fig. 3, in the image big data similarity comparison method according to this embodiment, image data to be compared is captured by using a crawler capture technology and big data search, the captured image data is analyzed and processed, then is stored in a structured manner, and is compared in a classified manner according to attributes of the images, and the comparison process is to calculate features included in the images and generate a set of fingerprints, and compare the fingerprints of the images to determine the similarity of the images.
The method comprises the following steps:
s100, image capture, wherein the required image data is captured in a specific website through a crawler capture technology;
s200, analyzing and managing, namely performing multi-dimensional structured division on captured image data;
s300, comparing the images, calculating the characteristics contained in the images, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
In step S100, the image data required for crawling by the crawler is preferably selected from a high definition image and a thumbnail within 500 pixels. Important picture format data are captured in a specified internet website through a crawler capturing technology, the format and the size of pictures are specified, the pictures can not be stored without limitation, and preliminary screening is performed: the method is characterized in that the main stream picture formats png, jpg and the like are saved, namely the picture size, the method is mainly used for 1080p resolution pictures of the high-definition pictures, and thumbnails within 500 pixels can be matched with an algorithm to realize the comparison of the pictures. And then saved to a database for data accumulation. Specifically, by analyzing the structure of the target website, the complete organization structure of the target data can be obtained, and the picture content mainly expressed by the website can be found and collected.
In the step S200, carrying out multi-dimensional structural division on image data, and carrying out structural division on the picture from multiple dimensions, namely whether the picture is a living body or not, then dividing animals and plants, wherein the picture is a dimension, and further, the picture is divided from the aspect of color, and the integral color tone is red or green; and whether it is a scene, a natural scene or a building scene. We build a structured classification branch, the tail end of the branch is a refined label, such as one of green, plant and grass, and each picture can have a plurality of labels, so as to be classified and stored. The classified storage provides convenience for later retrieval and comparison and search, similar pictures can be compared and inquired more quickly, retrieval work is greatly reduced, and a direction is provided for retrieval.
The image comparison in step S300 includes the following steps:
s310, unifying the sizes, and reducing the image to the same size to obtain N pixels;
s320, simplifying colors, and converting the reduced image into N-level gray;
s330, calculating an average value, and calculating the gray level average value of all N pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain an N number which is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
In this embodiment, preferably, N is 64, and the method specifically includes the steps of:
s310, unifying the sizes, and reducing the size of the image to the same size of 8x8 to obtain 64 pixels;
s320, simplifying colors, and converting the reduced image into 64-level gray scale;
s330, calculating an average value, and calculating the gray level average value of all 64 pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain a 64-bit number, wherein the number is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
In step S200, the multidimensional structured division of the image data specifically includes the following dimensions: examples of the material include objects (articles for daily use, vehicles, plants), emotional tone (sadness, joy), colors (warm tone, cool tone, red-biased body, green), landscape (civil construction, natural landscape), characters (weiren, star, man, woman, crowd), cartoon animation (cartoon characters, cartoon scenery), characters (with characters), artistic vision (artworks, figures), and the like.
In step S360, when the coincidence rate is greater than 90%, judging that the heights are approximate; when the coincidence rate is between 70 and 90 percent, judging the coincidence rate to be approximate; when the coincidence rate is between 50 and 70%, the judgment is generally similar, and when the coincidence rate is less than 50%, the judgment is not similar.
The embodiment further comprises an image big data similarity comparison system, which is used for executing the method, and comprises an image collection module 100, an image processing module 200 and an image comparison module 300, which are sequentially in communication connection, wherein the image collection module 100 is used for capturing required image data from a specific website, the image processing module 200 is used for analyzing and managing the captured image data, and the image comparison module 300 is used for comparing images.
Still including image database 400, big data search module 500 and big data comparison module 600 of communication connection in proper order, image database 400 and image processing module 200 communication connection, image database 400 is used for the image data after the processing of structured storage image processing module 200, big data search module 500 is arranged in following image database 400 quick search and location picture information, big data comparison module 600 is used for comparing at the image data of big data level and looks for work.
The above-mentioned embodiments only express a certain implementation mode of the present invention, and the description thereof is specific and detailed, but not construed as limiting the scope of the present invention; it should be noted that, for those skilled in the art, without departing from the concept of the present invention, several variations and modifications can be made, which are within the protection scope of the present invention; therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (8)
1. A method for comparing image big data similarity is characterized in that: the image data which need to be compared are captured through a crawler capture technology and big data search, the captured image data are analyzed, treated and structurally stored, classification and comparison are conducted according to the attributes of the images, the comparison process is to calculate the features contained in the images and generate a group of fingerprints, and the fingerprints of the images are compared to judge the similarity of the images.
2. The image big data similarity comparison method according to claim 1, comprising the steps of:
s100, image capture, wherein the required image data is captured in a specific website through a crawler capture technology;
s200, analyzing and managing, namely performing multi-dimensional structured division on captured image data;
s300, comparing the images, calculating the characteristics contained in the images, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
3. The image big data similarity comparison method according to claim 2, characterized in that: in step S100, the image data required for crawling by the crawler is preferably selected from a high definition image and a thumbnail image within 500 pixels.
4. The method according to claim 2, wherein the image comparison in step S300 includes the following steps:
s310, unifying the sizes, and reducing the image to the same size to obtain N pixels;
s320, simplifying colors, and converting the reduced image into N-level gray;
s330, calculating an average value, and calculating the gray level average value of all N pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain an N number which is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
5. The method according to claim 4, wherein N is 64, and the method specifically comprises the following steps:
s310, unifying the sizes, and reducing the size of the image to the same size of 8x8 to obtain 64 pixels;
s320, simplifying colors, and converting the reduced image into 64-level gray scale;
s330, calculating an average value, and calculating the gray level average value of all 64 pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain a 64-bit number, wherein the number is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
6. The image big data similarity comparison method according to claim 4, wherein: in the step S360, when the coincidence rate is greater than 90%, it is determined that the heights are approximate; when the coincidence rate is between 70 and 90 percent, judging the coincidence rate to be approximate; when the coincidence rate is between 50 and 70%, the judgment is generally similar, and when the coincidence rate is less than 50%, the judgment is not similar.
7. An image big data similarity comparison system for performing the method of any one of claims 1 to 6, wherein: the image comparison system comprises an image collection module (100), an image processing module (200) and an image comparison module (300) which are sequentially in communication connection, wherein the image collection module (100) is used for capturing required image data from a specific website, the image processing module (200) is used for analyzing and governing the captured image data, and the image comparison module (300) is used for comparing images.
8. The image big data similarity comparison system according to claim 7, wherein: still including image database (400), big data search module (500) and big data comparison module (600) of communication connection in proper order, image database (400) and image processing module (200) communication connection, image database (400) are used for the image data after the processing of structuralization storage image processing module (200), big data search module (500) are arranged in following image database (400) quick search and location picture information, big data comparison module (600) are used for comparing at the image data of big data level and look for the work.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011232317.XA CN112308162A (en) | 2020-11-06 | 2020-11-06 | Image big data similarity comparison method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011232317.XA CN112308162A (en) | 2020-11-06 | 2020-11-06 | Image big data similarity comparison method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112308162A true CN112308162A (en) | 2021-02-02 |
Family
ID=74325180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011232317.XA Pending CN112308162A (en) | 2020-11-06 | 2020-11-06 | Image big data similarity comparison method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112308162A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139589A (en) * | 2021-04-12 | 2021-07-20 | 网易(杭州)网络有限公司 | Picture similarity detection method and device, processor and electronic device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213891A (en) * | 2018-08-20 | 2019-01-15 | 深圳市乐唯科技开发有限公司 | A method of using average hash algorithm search pictures |
CN110348277A (en) * | 2018-11-30 | 2019-10-18 | 浙江农林大学 | A kind of tree species image-recognizing method based under natural background |
CN111353552A (en) * | 2020-03-13 | 2020-06-30 | 杭州趣维科技有限公司 | Image similarity contrast method based on perceptual hash algorithm |
-
2020
- 2020-11-06 CN CN202011232317.XA patent/CN112308162A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213891A (en) * | 2018-08-20 | 2019-01-15 | 深圳市乐唯科技开发有限公司 | A method of using average hash algorithm search pictures |
CN110348277A (en) * | 2018-11-30 | 2019-10-18 | 浙江农林大学 | A kind of tree species image-recognizing method based under natural background |
CN111353552A (en) * | 2020-03-13 | 2020-06-30 | 杭州趣维科技有限公司 | Image similarity contrast method based on perceptual hash algorithm |
Non-Patent Citations (1)
Title |
---|
李玉香;王孟玉;涂宇晰;: "基于python的网络爬虫技术研究", 信息技术与信息化, no. 12, 25 December 2019 (2019-12-25), pages 149 - 151 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139589A (en) * | 2021-04-12 | 2021-07-20 | 网易(杭州)网络有限公司 | Picture similarity detection method and device, processor and electronic device |
CN113139589B (en) * | 2021-04-12 | 2023-02-28 | 网易(杭州)网络有限公司 | Picture similarity detection method and device, processor and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107256246B (en) | printed fabric image retrieval method based on convolutional neural network | |
US8913853B2 (en) | Image retrieval system and method | |
US8494259B2 (en) | Biologically-inspired metadata extraction (BIME) of visual data using a multi-level universal scene descriptor (USD) | |
EP3499414B1 (en) | Lightweight 3d vision camera with intelligent segmentation engine for machine vision and auto identification | |
CN110866896B (en) | Image saliency target detection method based on k-means and level set super-pixel segmentation | |
TW530498B (en) | Object segmentation method using MPEG-7 | |
CN110727819B (en) | Method for retrieving scale-adaptive pathological full-section image database | |
US9373056B2 (en) | Image analysis | |
CN101930461A (en) | Digital image visualized management and retrieval for communication network | |
CN105335469A (en) | Method and device for image matching and retrieving | |
CN113627402B (en) | Image identification method and related device | |
CN107169425A (en) | A kind of recognition methods of item property and device | |
CN114842240A (en) | Method for classifying images of leaves of MobileNet V2 crops by fusing ghost module and attention mechanism | |
CN112308162A (en) | Image big data similarity comparison method and system | |
WO2020235862A1 (en) | Image manipulation | |
CN112488072A (en) | Method, system and equipment for acquiring face sample set | |
Wang et al. | Automatically detecting the wild giant panda using deep learning with context and species distribution model | |
JP6789175B2 (en) | Image recognizers, methods, and programs | |
CN114299307A (en) | Power transmission line image annotation method and related device | |
CN108268533A (en) | A kind of Image Feature Matching method for image retrieval | |
CN112990076A (en) | Data arrangement method and device based on artificial intelligence | |
JPH06251147A (en) | Video feature processing method | |
CN111178409A (en) | Image matching and recognition system based on big data matrix stability analysis | |
CN113114982B (en) | Internet of things data transmission method and system | |
CN113723410B (en) | Digital identification method and device for nixie tube |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |