CN112308162A - Image big data similarity comparison method and system - Google Patents

Image big data similarity comparison method and system Download PDF

Info

Publication number
CN112308162A
CN112308162A CN202011232317.XA CN202011232317A CN112308162A CN 112308162 A CN112308162 A CN 112308162A CN 202011232317 A CN202011232317 A CN 202011232317A CN 112308162 A CN112308162 A CN 112308162A
Authority
CN
China
Prior art keywords
image
comparison
big data
images
comparing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011232317.XA
Other languages
Chinese (zh)
Inventor
罗敏刚
李中棠
朱永佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Softline Information Technology Co ltd
Original Assignee
Shanghai Softline Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Softline Information Technology Co ltd filed Critical Shanghai Softline Information Technology Co ltd
Priority to CN202011232317.XA priority Critical patent/CN112308162A/en
Publication of CN112308162A publication Critical patent/CN112308162A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

According to the image big data similarity comparison method, image data needing comparison is captured through a crawler capture technology and big data search, the captured image data are analyzed, managed and structurally stored, classification comparison is conducted according to the attributes of the images, the comparison process includes calculating the features contained in the images and generating a group of fingerprints, the fingerprints of the images are compared to judge the similarity of the images, the image identification efficiency can be improved, and the image big data similarity comparison method can be combined with the big data to achieve comparison of the image similarity in a big data range.

Description

Image big data similarity comparison method and system
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method and a system for comparing image big data similarity.
Background
The existing image anti-infringement consciousness and function are not perfect enough, the intellectual property of an image creator cannot be protected, a user who possibly uses an infringement image on a website is difficult to collect, and effective evidence cannot be collected. The reason is that the existing image similarity comparison algorithms are more, but detailed optimization of specific types of images cannot be achieved, so that the comparison threshold of the similarity reaches 95% or above. Without an image comparison tool combined with big data, comparison of image similarity within a big data range cannot be achieved. Resulting in less efficient and accurate image contrast.
Through retrieval, the Chinese invention patent is as follows: an image similarity comparison method based on a perceptual hash algorithm (application No. 202010177648.1, application date 20200313) discloses an image similarity comparison method based on a perceptual hash algorithm. The method comprises the steps of compressing an image through a Discrete Cosine Transform (DCT) algorithm, reducing the size of the image through pHash to obtain a color channel R, G, B of the image, calculating an average value of RGB, graying the image, extracting fingerprints in each image by adopting a color distribution method and a content characteristic method respectively, compressing an original image into a grayscale image with a small fixed size to determine a threshold, converting the image into a black-and-white image to compare the outline of the image, and comparing the basic fingerprint, the color characteristic fingerprint and the content characteristic fingerprint in multiple dimensions to obtain a similarity result. However, the method has the disadvantages that the image basic fingerprint, the color characteristic fingerprint and the content characteristic fingerprint need to be compared in multiple dimensions, and is complex and low in efficiency.
Disclosure of Invention
1. Technical problem to be solved by the invention
The invention aims to solve the problem that the existing image contrast is low in efficiency and accuracy.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the image big data similarity comparison method comprises the steps of searching and capturing image data needing comparison through a crawler capture technology and big data, conducting structured storage after analyzing and managing the captured image data, conducting classification comparison according to the attributes of images, calculating the features contained in the images in the comparison process, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
Preferably, the method comprises the steps of:
s100, image capture, wherein the required image data is captured in a specific website through a crawler capture technology;
s200, analyzing and managing, namely performing multi-dimensional structured division on captured image data;
s300, comparing the images, calculating the characteristics contained in the images, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
Preferably, in step S100, the image data required for crawler capture is selected from a high-definition image and a thumbnail image within 500 pixels.
Preferably, in step S200, the multidimensional structured division of the image data specifically includes the following dimensions:
preferably, the image comparison in step S300 includes the following steps:
s310, unifying the sizes, and reducing the image to the same size to obtain N pixels;
s320, simplifying colors, and converting the reduced image into N-level gray;
s330, calculating an average value, and calculating the gray level average value of all N pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain an N number which is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
Preferably, N is 64, and the method specifically comprises the steps of:
s310, unifying the sizes, and reducing the size of the image to the same size of 8x8 to obtain 64 pixels;
s320, simplifying colors, and converting the reduced image into 64-level gray scale;
s330, calculating an average value, and calculating the gray level average value of all 64 pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain a 64-bit number, wherein the number is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
Preferably, in step S360, when the overlapping rate is greater than 90%, it is determined that the heights are approximate; when the coincidence rate is between 70 and 90 percent, judging the coincidence rate to be approximate; when the coincidence rate is between 50 and 70%, the judgment is generally similar, and when the coincidence rate is less than 50%, the judgment is not similar.
The image big data similarity comparison system comprises an image collection module, an image processing module and an image comparison module which are sequentially in communication connection, wherein the image collection module is used for capturing required image data from a specific website, the image processing module is used for analyzing and managing the captured image data, and the image comparison module is used for comparing images.
Preferably, still including image database, big data search module and the big data comparison module of communication connection in proper order, image database and image processing module communication connection, image database is used for the image data after the processing of structuralization storage image processing module, big data search module is arranged in following image database and searches for fast and fix a position picture information, big data comparison module is used for comparing at the image data of big data rank and looks for the work.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:
according to the image big data similarity comparison method, image data needing comparison is captured through a crawler capture technology and big data search, the captured image data are analyzed, managed and structurally stored, classification comparison is conducted according to the attributes of the images, the comparison process includes calculating the features contained in the images and generating a group of fingerprints, the fingerprints of the images are compared to judge the similarity of the images, the image identification efficiency can be improved, and the image big data similarity comparison method can be combined with the big data to achieve comparison of the image similarity in a big data range.
Drawings
FIG. 1 is a flowchart of a method for comparing similarity of big data of an image according to the present invention;
FIG. 2 is a detailed flowchart of step S300 according to the present invention;
fig. 3 is a schematic structural diagram of the system of the present invention.
The reference numerals in the schematic drawings illustrate:
100. an image collection module; 200. an image processing module; 300. an image comparison module; 400. an image database; 500. a big data searching module; 600. and a big data comparison module.
Detailed Description
In order to facilitate an understanding of the invention, the invention will now be described more fully hereinafter with reference to the accompanying drawings, in which several embodiments of the invention are shown, but which may be embodied in many different forms and are not limited to the embodiments described herein, but rather are provided for the purpose of providing a more thorough disclosure of the invention.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present; when an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present; the terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention; as used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Example 1
Referring to fig. 1 to fig. 3, in the image big data similarity comparison method according to this embodiment, image data to be compared is captured by using a crawler capture technology and big data search, the captured image data is analyzed and processed, then is stored in a structured manner, and is compared in a classified manner according to attributes of the images, and the comparison process is to calculate features included in the images and generate a set of fingerprints, and compare the fingerprints of the images to determine the similarity of the images.
The method comprises the following steps:
s100, image capture, wherein the required image data is captured in a specific website through a crawler capture technology;
s200, analyzing and managing, namely performing multi-dimensional structured division on captured image data;
s300, comparing the images, calculating the characteristics contained in the images, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
In step S100, the image data required for crawling by the crawler is preferably selected from a high definition image and a thumbnail within 500 pixels. Important picture format data are captured in a specified internet website through a crawler capturing technology, the format and the size of pictures are specified, the pictures can not be stored without limitation, and preliminary screening is performed: the method is characterized in that the main stream picture formats png, jpg and the like are saved, namely the picture size, the method is mainly used for 1080p resolution pictures of the high-definition pictures, and thumbnails within 500 pixels can be matched with an algorithm to realize the comparison of the pictures. And then saved to a database for data accumulation. Specifically, by analyzing the structure of the target website, the complete organization structure of the target data can be obtained, and the picture content mainly expressed by the website can be found and collected.
In the step S200, carrying out multi-dimensional structural division on image data, and carrying out structural division on the picture from multiple dimensions, namely whether the picture is a living body or not, then dividing animals and plants, wherein the picture is a dimension, and further, the picture is divided from the aspect of color, and the integral color tone is red or green; and whether it is a scene, a natural scene or a building scene. We build a structured classification branch, the tail end of the branch is a refined label, such as one of green, plant and grass, and each picture can have a plurality of labels, so as to be classified and stored. The classified storage provides convenience for later retrieval and comparison and search, similar pictures can be compared and inquired more quickly, retrieval work is greatly reduced, and a direction is provided for retrieval.
The image comparison in step S300 includes the following steps:
s310, unifying the sizes, and reducing the image to the same size to obtain N pixels;
s320, simplifying colors, and converting the reduced image into N-level gray;
s330, calculating an average value, and calculating the gray level average value of all N pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain an N number which is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
In this embodiment, preferably, N is 64, and the method specifically includes the steps of:
s310, unifying the sizes, and reducing the size of the image to the same size of 8x8 to obtain 64 pixels;
s320, simplifying colors, and converting the reduced image into 64-level gray scale;
s330, calculating an average value, and calculating the gray level average value of all 64 pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain a 64-bit number, wherein the number is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
In step S200, the multidimensional structured division of the image data specifically includes the following dimensions: examples of the material include objects (articles for daily use, vehicles, plants), emotional tone (sadness, joy), colors (warm tone, cool tone, red-biased body, green), landscape (civil construction, natural landscape), characters (weiren, star, man, woman, crowd), cartoon animation (cartoon characters, cartoon scenery), characters (with characters), artistic vision (artworks, figures), and the like.
In step S360, when the coincidence rate is greater than 90%, judging that the heights are approximate; when the coincidence rate is between 70 and 90 percent, judging the coincidence rate to be approximate; when the coincidence rate is between 50 and 70%, the judgment is generally similar, and when the coincidence rate is less than 50%, the judgment is not similar.
The embodiment further comprises an image big data similarity comparison system, which is used for executing the method, and comprises an image collection module 100, an image processing module 200 and an image comparison module 300, which are sequentially in communication connection, wherein the image collection module 100 is used for capturing required image data from a specific website, the image processing module 200 is used for analyzing and managing the captured image data, and the image comparison module 300 is used for comparing images.
Still including image database 400, big data search module 500 and big data comparison module 600 of communication connection in proper order, image database 400 and image processing module 200 communication connection, image database 400 is used for the image data after the processing of structured storage image processing module 200, big data search module 500 is arranged in following image database 400 quick search and location picture information, big data comparison module 600 is used for comparing at the image data of big data level and looks for work.
The above-mentioned embodiments only express a certain implementation mode of the present invention, and the description thereof is specific and detailed, but not construed as limiting the scope of the present invention; it should be noted that, for those skilled in the art, without departing from the concept of the present invention, several variations and modifications can be made, which are within the protection scope of the present invention; therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A method for comparing image big data similarity is characterized in that: the image data which need to be compared are captured through a crawler capture technology and big data search, the captured image data are analyzed, treated and structurally stored, classification and comparison are conducted according to the attributes of the images, the comparison process is to calculate the features contained in the images and generate a group of fingerprints, and the fingerprints of the images are compared to judge the similarity of the images.
2. The image big data similarity comparison method according to claim 1, comprising the steps of:
s100, image capture, wherein the required image data is captured in a specific website through a crawler capture technology;
s200, analyzing and managing, namely performing multi-dimensional structured division on captured image data;
s300, comparing the images, calculating the characteristics contained in the images, generating a group of fingerprints, and comparing the fingerprints of the images to judge the similarity of the images.
3. The image big data similarity comparison method according to claim 2, characterized in that: in step S100, the image data required for crawling by the crawler is preferably selected from a high definition image and a thumbnail image within 500 pixels.
4. The method according to claim 2, wherein the image comparison in step S300 includes the following steps:
s310, unifying the sizes, and reducing the image to the same size to obtain N pixels;
s320, simplifying colors, and converting the reduced image into N-level gray;
s330, calculating an average value, and calculating the gray level average value of all N pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain an N number which is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
5. The method according to claim 4, wherein N is 64, and the method specifically comprises the following steps:
s310, unifying the sizes, and reducing the size of the image to the same size of 8x8 to obtain 64 pixels;
s320, simplifying colors, and converting the reduced image into 64-level gray scale;
s330, calculating an average value, and calculating the gray level average value of all 64 pixels;
s340, gray level comparison, namely comparing the gray level of each pixel with the average value in sequence, and recording as 1 when the gray level is greater than or equal to the average value and recording as 0 when the gray level is less than the average value;
s350, calculating a hash value, and combining the results of the gray comparison in sequence to obtain a 64-bit number, wherein the number is a fingerprint of the image;
and S360, comparing the similarity, comparing the fingerprints of different images, and judging the coincidence rate.
6. The image big data similarity comparison method according to claim 4, wherein: in the step S360, when the coincidence rate is greater than 90%, it is determined that the heights are approximate; when the coincidence rate is between 70 and 90 percent, judging the coincidence rate to be approximate; when the coincidence rate is between 50 and 70%, the judgment is generally similar, and when the coincidence rate is less than 50%, the judgment is not similar.
7. An image big data similarity comparison system for performing the method of any one of claims 1 to 6, wherein: the image comparison system comprises an image collection module (100), an image processing module (200) and an image comparison module (300) which are sequentially in communication connection, wherein the image collection module (100) is used for capturing required image data from a specific website, the image processing module (200) is used for analyzing and governing the captured image data, and the image comparison module (300) is used for comparing images.
8. The image big data similarity comparison system according to claim 7, wherein: still including image database (400), big data search module (500) and big data comparison module (600) of communication connection in proper order, image database (400) and image processing module (200) communication connection, image database (400) are used for the image data after the processing of structuralization storage image processing module (200), big data search module (500) are arranged in following image database (400) quick search and location picture information, big data comparison module (600) are used for comparing at the image data of big data level and look for the work.
CN202011232317.XA 2020-11-06 2020-11-06 Image big data similarity comparison method and system Pending CN112308162A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011232317.XA CN112308162A (en) 2020-11-06 2020-11-06 Image big data similarity comparison method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011232317.XA CN112308162A (en) 2020-11-06 2020-11-06 Image big data similarity comparison method and system

Publications (1)

Publication Number Publication Date
CN112308162A true CN112308162A (en) 2021-02-02

Family

ID=74325180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011232317.XA Pending CN112308162A (en) 2020-11-06 2020-11-06 Image big data similarity comparison method and system

Country Status (1)

Country Link
CN (1) CN112308162A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139589A (en) * 2021-04-12 2021-07-20 网易(杭州)网络有限公司 Picture similarity detection method and device, processor and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213891A (en) * 2018-08-20 2019-01-15 深圳市乐唯科技开发有限公司 A method of using average hash algorithm search pictures
CN110348277A (en) * 2018-11-30 2019-10-18 浙江农林大学 A kind of tree species image-recognizing method based under natural background
CN111353552A (en) * 2020-03-13 2020-06-30 杭州趣维科技有限公司 Image similarity contrast method based on perceptual hash algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213891A (en) * 2018-08-20 2019-01-15 深圳市乐唯科技开发有限公司 A method of using average hash algorithm search pictures
CN110348277A (en) * 2018-11-30 2019-10-18 浙江农林大学 A kind of tree species image-recognizing method based under natural background
CN111353552A (en) * 2020-03-13 2020-06-30 杭州趣维科技有限公司 Image similarity contrast method based on perceptual hash algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李玉香;王孟玉;涂宇晰;: "基于python的网络爬虫技术研究", 信息技术与信息化, no. 12, 25 December 2019 (2019-12-25), pages 149 - 151 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139589A (en) * 2021-04-12 2021-07-20 网易(杭州)网络有限公司 Picture similarity detection method and device, processor and electronic device
CN113139589B (en) * 2021-04-12 2023-02-28 网易(杭州)网络有限公司 Picture similarity detection method and device, processor and electronic device

Similar Documents

Publication Publication Date Title
CN107256246B (en) printed fabric image retrieval method based on convolutional neural network
US8913853B2 (en) Image retrieval system and method
US8494259B2 (en) Biologically-inspired metadata extraction (BIME) of visual data using a multi-level universal scene descriptor (USD)
EP3499414B1 (en) Lightweight 3d vision camera with intelligent segmentation engine for machine vision and auto identification
CN110866896B (en) Image saliency target detection method based on k-means and level set super-pixel segmentation
TW530498B (en) Object segmentation method using MPEG-7
CN110727819B (en) Method for retrieving scale-adaptive pathological full-section image database
US9373056B2 (en) Image analysis
CN101930461A (en) Digital image visualized management and retrieval for communication network
CN105335469A (en) Method and device for image matching and retrieving
CN113627402B (en) Image identification method and related device
CN107169425A (en) A kind of recognition methods of item property and device
CN114842240A (en) Method for classifying images of leaves of MobileNet V2 crops by fusing ghost module and attention mechanism
CN112308162A (en) Image big data similarity comparison method and system
WO2020235862A1 (en) Image manipulation
CN112488072A (en) Method, system and equipment for acquiring face sample set
Wang et al. Automatically detecting the wild giant panda using deep learning with context and species distribution model
JP6789175B2 (en) Image recognizers, methods, and programs
CN114299307A (en) Power transmission line image annotation method and related device
CN108268533A (en) A kind of Image Feature Matching method for image retrieval
CN112990076A (en) Data arrangement method and device based on artificial intelligence
JPH06251147A (en) Video feature processing method
CN111178409A (en) Image matching and recognition system based on big data matrix stability analysis
CN113114982B (en) Internet of things data transmission method and system
CN113723410B (en) Digital identification method and device for nixie tube

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination