CN112733952A - Image data processing method and device, terminal equipment and readable storage medium - Google Patents

Image data processing method and device, terminal equipment and readable storage medium Download PDF

Info

Publication number
CN112733952A
CN112733952A CN202110071069.3A CN202110071069A CN112733952A CN 112733952 A CN112733952 A CN 112733952A CN 202110071069 A CN202110071069 A CN 202110071069A CN 112733952 A CN112733952 A CN 112733952A
Authority
CN
China
Prior art keywords
image data
target image
preset
threshold
preset condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110071069.3A
Other languages
Chinese (zh)
Inventor
刘均
陶青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Golo Chelian Data Technology Co ltd
Original Assignee
Shenzhen Golo Chelian Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Golo Chelian Data Technology Co ltd filed Critical Shenzhen Golo Chelian Data Technology Co ltd
Priority to CN202110071069.3A priority Critical patent/CN112733952A/en
Publication of CN112733952A publication Critical patent/CN112733952A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application is applicable to the technical field of image processing, and provides an image data processing method, an image data processing device, a terminal device and a readable storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a plurality of first image data meeting a first preset condition, carrying out duplicate removal processing on the first image data to obtain target image data, obtaining information of the target image data, detecting whether the target image data meets a second preset condition according to the information, and generating an image data processing result according to the target image data meeting the second preset condition. The obtained first image data is subjected to duplicate removal processing to obtain target image data, and the target image data with the quality information meeting the preset condition is detected and stored to obtain the image data meeting the requirement, so that the redundancy degree of the image data is reduced, and the quality of the image data is improved.

Description

Image data processing method and device, terminal equipment and readable storage medium
Technical Field
The present application belongs to the field of image processing technologies, and in particular, to an image data processing method, an image data processing apparatus, a terminal device, and a readable storage medium.
Background
In recent years, big data is vigorously developed in the internet and information industry, and is widely applied to a plurality of fields such as society, economy, life and the like.
However, in the big data era, various types of data have the problems of huge data volume, disorder and uneven data quality. In the field of image processing applications, the above-described problems result in users not being able to obtain images that meet the needs of the users.
Disclosure of Invention
The embodiment of the application provides an image data processing method and device, a terminal device and a readable storage medium, and can solve the problems that under a big data era, image data is huge in amount, disordered and unstable in quality, and image data meeting user requirements cannot be obtained.
In a first aspect, an embodiment of the present application provides an image data processing method, including:
acquiring a plurality of first image data meeting a first preset condition;
carrying out duplicate removal processing on the first image data to obtain target image data;
acquiring information of the target image data, and detecting whether the target image data meets a second preset condition or not according to the information;
and generating an image data processing result according to the target image data meeting the second preset condition.
In one embodiment, the performing the deduplication processing on the first image data to obtain target image data includes:
calculating and obtaining the similarity between every two first image data according to a preset algorithm, identifying and obtaining the first image data with the similarity meeting a third preset condition, and storing the first image data as the target image data;
carrying out deduplication processing on the first image data with the identified similarity not meeting a third preset condition, and taking the image data subjected to deduplication processing as the target image data and storing the target image data; wherein the preset algorithm comprises at least one of an MD5 algorithm and a perceptual hash algorithm.
In one embodiment, the calculating and obtaining the similarity between every two first image data according to a preset algorithm, and identifying and obtaining the first image data with the similarity meeting a third preset condition as the target image data and storing the target image data includes:
initializing each first image data into a picture object, and converting the picture object into an image character string;
calculating the image character string to obtain an information abstract value of the first image data;
and identifying and obtaining first image data with different information abstract values from any one first image data, and storing the first image data as the target image data.
In one embodiment, the method includes calculating and obtaining a similarity between every two first image data according to a preset algorithm, identifying and obtaining first image data with the similarity satisfying a third preset condition as the target image data, and storing the target image data, and further includes:
converting each first image data into gray image data, and constructing a hash value of the gray image data;
calculating and obtaining the Hamming distance between every two first image data according to the Hash value;
and identifying and obtaining first image data of which the Hamming distance with any one of the first image data is greater than or equal to a Hamming distance threshold value, and storing the first image data as the target image data.
In one embodiment, acquiring information of the target image data, and detecting whether the target image data satisfies a second preset condition according to the information includes:
detecting whether the target image data meets a preset quality threshold value according to quality information;
detecting whether the target image data meeting a preset quality threshold is readable image data;
and when the target image data is detected to meet a preset quality threshold and is readable image data, judging that the target image data meets a second preset condition.
In one embodiment, the quality information includes a preset width, a preset height, a preset format and a preset memory; the preset quality threshold comprises a width threshold, a height threshold, a format threshold and a memory threshold;
the detecting whether the target image data meets a preset quality threshold according to the quality information includes:
detecting whether the preset width of the target image data is smaller than the width threshold value;
when detecting that the preset width of the target image data is greater than or equal to the width threshold, detecting whether the preset height of the target image data is less than the height threshold;
when detecting that the preset height of the target image data is greater than or equal to the height threshold, detecting whether a preset memory of the target image data is smaller than the memory threshold;
when detecting that the preset memory of the target image data is larger than or equal to the memory threshold, detecting whether the preset format of the target image data is the same as the format threshold;
and when detecting that the preset format of the target image data is the same as the format threshold, judging that the target image data meets a preset quality threshold.
In one embodiment, the generating an image data processing result according to target image data satisfying a second preset condition includes:
renaming the target image data meeting the second preset condition;
and counting the number of the target image data meeting the second preset condition to generate an image data processing result.
In a second aspect, an embodiment of the present application provides an image data processing apparatus, including:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of first image data meeting a first preset condition;
the duplication removing processing module is used for carrying out duplication removing processing on the first image data to obtain target image data;
the detection module is used for acquiring the information of the target image data and detecting whether the target image data meets a second preset condition or not according to the information;
and the generating module is used for generating an image data processing result according to the target image data meeting the second preset condition.
In one embodiment, a deduplication processing module comprises:
the identification unit is used for calculating and obtaining the similarity between every two first image data according to a preset algorithm, identifying and obtaining the first image data with the similarity meeting a third preset condition, and storing the first image data as the target image data;
the duplication removing processing unit is used for carrying out duplication removing processing on the first image data with the identified similarity not meeting a third preset condition, and storing the image data after the duplication removing processing as the target image data; wherein the preset algorithm comprises at least one of an MD5 algorithm and a perceptual hash algorithm.
In one embodiment, an identification unit, comprising:
the initialization subunit is used for initializing each first image data into a picture object and converting the picture object into an image character string;
the first calculation subunit is configured to calculate the image character string to obtain an information digest value of the first image data;
and the first identification subunit is used for identifying and obtaining first image data with different information abstract values from any one of the first image data, and storing the first image data as the target image data.
In one embodiment, the identification unit further comprises:
a conversion subunit, configured to convert each of the first image data into grayscale image data, and construct a hash value of the grayscale image data;
the second calculating subunit is configured to calculate and obtain a hamming distance between every two pieces of the first image data according to the hash value;
and the second identification subunit is used for identifying and obtaining the first image data of which the Hamming distance from any one of the first image data is greater than or equal to the Hamming distance threshold value, and storing the first image data as the target image data.
In one embodiment, a detection module comprises:
the first detection unit is used for detecting whether the target image data meets a preset quality threshold value according to the quality information;
a second detection unit configured to detect whether target image data satisfying a preset quality threshold is readable image data;
the judging unit is used for judging that the target image data meets a second preset condition when the target image data is detected to meet a preset quality threshold and is readable image data.
In one embodiment, the quality information includes a preset width, a preset height, a preset format and a preset memory; the preset quality threshold comprises a width threshold, a height threshold, a format threshold and a memory threshold;
the first detection unit includes:
the first detection subunit is used for detecting whether the preset width of the target image data is smaller than the width threshold value;
a second detecting subunit, configured to detect whether a preset height of the target image data is smaller than the height threshold when it is detected that the preset width of the target image data is greater than or equal to the width threshold;
a third detecting subunit, configured to detect, when it is detected that the preset height of the target image data is greater than or equal to the height threshold, whether a preset memory of the target image data is smaller than the memory threshold;
a fourth detecting subunit, configured to detect whether a preset format of the target image data is the same as the format threshold when it is detected that a preset memory of the target image data is greater than or equal to the memory threshold;
and the judging subunit is used for judging that the target image data meets a preset quality threshold value when detecting that the preset format of the target image data is the same as the format threshold value.
In one embodiment, the generating module includes:
the renaming unit is used for renaming the target image data meeting the second preset condition;
and the generating unit is used for counting the number of the target image data meeting the second preset condition and generating an image data processing result.
In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the image data processing method according to any one of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the image data processing method according to any one of the first aspect.
In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the image data processing method according to any one of the first aspect.
The obtained first image data is subjected to duplicate removal processing to obtain target image data, and the target image data with the quality information meeting the preset condition is detected and stored to obtain the image data meeting the requirement, so that the redundancy degree of the image data is reduced, and the quality of the image data is improved.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of an image data processing method provided in an embodiment of the present application;
fig. 2a, 2b, 2c, and 2d are schematic diagrams of application scenarios for acquiring first image data according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating step S102 of an image data processing method according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a step S1021 of an image data processing method according to an embodiment of the present application;
FIG. 5 is another flowchart illustrating step S1021 of the image data processing method according to the embodiment of the present application;
FIG. 6 is a schematic diagram of an application scenario for generating a processing result of image data according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," and the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The image data processing method provided by the embodiment of the application can be applied to terminal equipment such as a mobile phone, a tablet computer, a notebook computer and the like, and the embodiment of the application does not limit the specific type of the terminal equipment.
Fig. 1 shows a schematic flow chart of an image data processing method provided by the present application, which can be applied to the above-mentioned notebook computer by way of example and not limitation.
S101, acquiring a plurality of first image data meeting a first preset condition;
in a specific application, the first preset condition refers to an acquisition condition of the first image data set by a user. In this embodiment, the first preset condition is to acquire a certain amount of image data that satisfies the requirement key vocabulary input by the user from the third-party platform.
In specific application, firstly, the number of required key words input by a user and the number specified by the user are obtained, the third-party platform is analyzed, the request mode of the third-party platform is determined, data analysis is carried out according to the request mode of the third-party platform through the required key words of the user, and first image data meeting a first preset condition is obtained and stored. It is understood that when the number of image data detected by the third-party platform to satisfy the first preset condition is greater than or equal to the user-specified number, it may be determined that the first image data of the specified user-specified number is obtained; and when the number of the image data which are detected by the third-party platform and meet the first preset condition does not meet the user-specified number, determining that the number of the obtained first image data is smaller than the user-specified number. The third party platform includes, but is not limited to, an image database or a web page providing an image. The data analysis mode comprises various modes and can be specifically set according to the requirements of users. In this embodiment, data analysis is performed by a python crawler to obtain first image data, which includes but is not limited to regular expressions, xpath, bs4, and other different ways.
It can be understood that, when the first image data is acquired, the first image data carries its original name and quality information, where the quality information includes, but is not limited to, a preset width, a preset height, a preset format, and a preset memory.
As shown in fig. 2, a schematic diagram of an application scenario for acquiring first image data is provided.
In fig. 2a, the required key words input by the user include "cat", "computer", and "mouse", the designated number is 20, and the first image data may be obtained in the third-party platform through the regular expression, where the first image data includes all detected image data corresponding to the keyword "cat", all detected image data corresponding to the keyword "computer", and all detected image data corresponding to the keyword "mouse". Meanwhile, as shown in fig. 2b, a storage space corresponding to the keyword "cat" is established to store the image data corresponding to the keyword "cat", as shown in fig. 2c, a storage space corresponding to the keyword "computer" is established to store the image data corresponding to the keyword "computer", as shown in fig. 2d, a storage space corresponding to the keyword "mouse" is established to store the image data corresponding to the keyword "mouse".
S102, carrying out duplicate removal processing on the first image data to obtain target image data.
In a specific application, usually, the same third-party platform may contain some image data with the same content or with the same height. Therefore, it is necessary to perform deduplication processing on the obtained first image data to obtain target image data after deduplication processing, so as to reduce the degree of redundancy of the image data. The duplicate removal processing mode can be correspondingly set according to the requirement of a user. In this embodiment, it is set that the first image data is subjected to the deduplication processing by a preset algorithm. The predetermined Algorithm includes, but is not limited to, at least one of a Message Digest Algorithm (MD 5) and a Perceptual Hash Algorithm (PHA).
It is understood that, when it is detected that a plurality of different key words are input by the user, the deduplication processing is performed separately for the first image data corresponding to each key word.
In one embodiment, after step S102, the method further includes:
and deleting the target image data when detecting that any one of the target image data does not meet the fourth preset condition.
In a specific application, the fourth preset condition is a recognition condition for detecting whether the target image data is image data corresponding to a key word required by the user, and may be specifically set according to the requirement of the user.
As an example and not by way of limitation, if a user sets that the similarity between every two target image data is calculated through a perceptual hash algorithm, a second hamming distance threshold is correspondingly set, and the fourth preset condition is that the hamming distance between any two target image data is smaller than or equal to the second hamming distance threshold; therefore, when it is detected that the hamming distance between a certain target image data and all the remaining target image data except the target image data is greater than the second hamming distance threshold value, the target image data is determined to be image data which does not satisfy the key words required by the user, and the target image data needs to be deleted. Wherein, the second hamming distance threshold should be greater than the hamming distance threshold in step S10216.
For example, the requirement keyword input by the user is "cat", and the target image data obtained after the deduplication processing is performed on the first image data includes a, b, c, d, e, f, and g. When the hamming distances between the target image data c and the target image data a, b, d, e, f and g are all larger than the second hamming distance threshold value, it is determined that the target image data c may not be the image data corresponding to the requirement key word "cat", and the target image data c is deleted.
By way of example and not limitation, if the requirement key word set by the user includes a target category and it is set that the target category in each target image data is recognized by an image-based target detection algorithm, the fourth preset condition is correspondingly set that the target category in the target image data is the same as the target category included in the requirement key word. Therefore, when it is detected that the target category of a certain target image data is different from the target category of all target image data except the target image data, the target image data is judged to be image data which does not meet the key vocabulary required by the user, and the target image data needs to be deleted. For example, the target image data obtained after the deduplication processing is performed on the first image data includes a, b, c, d, e, f, and g. When the target type in the target image data c is identified and determined to be 'dog' through the image-based target detection algorithm, and the target types of the target image data a, b, d, e, f and g are 'cat', it is determined that the target image data c may not be the image data corresponding to the requirement key word 'cat', and the target image data c is deleted.
Or, directly comparing whether the target category in each target image data is the same as the target category contained in the requirement key vocabulary, and deleting the target image data when detecting that the category of certain target image data is different from the target category contained in the requirement key vocabulary.
S103, acquiring information of the target image data, and detecting whether the target image data meets a second preset condition or not according to the information.
In a specific application, under a normal condition, image data in a third-party platform has a characteristic of uneven quality (for example, it includes thumbnails, image data with a very small memory, image data that cannot be normally opened, and the like), and therefore, it is necessary to detect through information of target image data, determine whether the target image data satisfies a second preset condition, identify and determine the target image data satisfying the second preset condition, and delete the target image data that does not satisfy the second preset condition, so as to improve the quality of the image data. The second preset condition is a condition set by a user and used for detecting whether the information of the image data meets the requirements of the user. In this embodiment, the second preset condition is to detect whether the information of the target image data satisfies the corresponding information threshold, and to determine whether the target image data is readable image data.
And S104, generating an image data processing result according to the target image data meeting the second preset condition.
In specific application, all target image data which do not meet the second preset condition are deleted, the target image data which are detected to meet the second preset condition are sorted, and a corresponding image data processing result is generated.
As shown in fig. 3, in one embodiment, step S102 includes:
and S1021, calculating and obtaining the similarity between every two first image data according to a preset algorithm, identifying and obtaining the first image data with the similarity meeting a third preset condition, and storing the first image data as the target image data.
In specific application, traversing all the first image data through a preset algorithm, respectively calculating to obtain the similarity between every two first image data, and identifying all the similarities; determining all first image data with the similarity meeting a third preset condition, and storing the first image data as target image data; and simultaneously determining any two first image data with the similarity not meeting the third preset condition and judging the first image data as a repeated image, performing deduplication processing on any two first image data with the similarity not meeting the third preset condition, and taking the deduplicated image data as target image data and storing the target image data. The third preset condition is a detection condition of whether the first image data is a repeated image or not, and the set third preset conditions are different according to different preset algorithms.
It can be understood that: when the similarity between a certain first image data and any one first image data is detected not to meet a third preset condition, the two first image data are both repeated images; and only when detecting that the similarity between a certain first image data and all other first image data except the first image data meets a third preset condition, judging that the first image data is a non-repetitive image, namely target image data.
For example, there are 8 pieces of first image data of a, b, c, d, e, f, g, and h. And when it is detected that the similarity between the first image data a and the first image data b does not meet a third preset condition and the similarity between the first image data a and the first image data c does not meet the third preset condition, determining that the first image data a, the first image data b and the first image data c are all repeated images. When it is detected that the similarity between the first image data h and the first image data a, b, c, d, e, f and g respectively satisfies a third preset condition, it is determined that the first image data h is a non-repetitive image, that is, target image data.
S1022, performing deduplication processing on the first image data with the identified similarity not meeting a third preset condition, and storing the image data subjected to deduplication processing as the target image data; wherein the preset algorithm comprises at least one of an MD5 algorithm and a perceptual hash algorithm.
In a specific application, the step of judging that the first image data with the similarity not meeting the third preset condition is a repeated image and carrying out deduplication processing on the repeated image comprises the following steps: the method includes the steps of taking any one of the first image data of the two or more duplicate images as target image data, storing the target image data, and deleting the first image data except the target image data in the duplicate images.
As shown in fig. 4, in one embodiment, step S1021 includes:
s10211, initializing each of the first image data to a picture object, and converting the picture object into an image character string;
s10212, calculating the image character string to obtain an information abstract value of the first image data;
s10213, identifying and obtaining first image data with different information summary values from any first image data, and storing the first image data as the target image data.
In a specific application, the determining, according to the MD5 algorithm, the first image data whose similarity satisfies the third preset condition includes: acquiring a path name of first image data, initializing the first image data to obtain a picture object, converting the picture object into an image array according to a numpy & array () method by taking the picture object as an input parameter, and performing character string splicing according to a hashib & md5 & update () method by taking the image array as an input parameter to obtain an image character string; calculating the image character string by a hashlib, md5, hexdigest () method to obtain an information abstract value of the first image data, identifying and obtaining the first image data which is different from the information abstract value of any first image data, and storing the first image data as target image data. Correspondingly, any two or more first image data with the same information abstract value can be identified and obtained, the first image data is judged to be a repeated image, and the repeated image needs to be subjected to deduplication processing.
As shown in fig. 5, in an embodiment, step S1021 further includes:
s10214, converting each of the first image data into a gray image data, and constructing a hash value of the gray image data;
s10215, calculating and obtaining a Hamming distance between every two first image data according to the hash value;
s10216, identifying and obtaining first image data with the Hamming distance between the first image data and any one of the first image data larger than or equal to a Hamming distance threshold value, and saving the first image data as the target image data.
In a specific application, determining, according to a perceptual hash algorithm, first image data whose similarity satisfies a third preset condition includes: the method comprises the steps of zooming first image data to a target size, converting the zoomed first image data into gray image data, calculating difference values between adjacent pixels in the gray image data, constructing a hash value of the gray image data according to the difference values between the adjacent pixels, comparing 'fingerprint' (fingerprint) character strings of every two gray image data, determining that the two gray image data have high similarity, calculating a Hamming distance between two first image data corresponding to the two gray image data according to the hash value of the two gray image data with high similarity, identifying and determining the first image data of which the Hamming distance with any one first image data is larger than or equal to a Hamming distance threshold value, and storing the first image data as target image data. Correspondingly, any two or more first image data with the Hamming distance smaller than the Hamming distance threshold can be identified and determined as the repeated image, and the repeated image needs to be subjected to the de-duplication processing. Wherein, the Hamming distance is the same number of character strings at the same position. The target size and the hamming distance threshold may be specifically set according to the user requirement, for example, the target size is set to 9 × 8px, and the hamming distance threshold is set to 5.
In a specific application, calculating a difference value between adjacent pixels in the gray-scale image data includes: and calculating to obtain the difference value between the adjacent pixels according to the color intensity between the adjacent pixels. In this embodiment, if the color intensity of the I pixel is greater than the color intensity of the I +1 pixel, the difference value between the I pixel and the I +1 pixel is set to be 1; and if the color intensity of the I pixel is less than or equal to the color intensity of the I +1 pixel, setting the difference value between the I pixel and the I +1 pixel to be 0.
For example, there are 8 pieces of first image data of a, b, c, d, e, f, g, and h. The information abstract values of the three first image data a, b and c are the same, and the information abstract values of the two first image data f and g are the same; taking a in the three first image data a, b and c as target image data, and correspondingly deleting b and c; g in the f and g first image data is taken as target image data, and f needs to be deleted correspondingly; the obtained target image data includes five first image data of a, d, e, g, and h.
In one embodiment, step S103 includes:
detecting whether the target image data meets a preset quality threshold value according to quality information;
detecting whether the target image data meeting a preset quality threshold is readable image data;
and when the target image data is detected to meet a preset quality threshold and is readable image data, judging that the target image data meets a second preset condition.
In specific application, acquiring quality information of target image data, detecting and determining whether the target image data is image data meeting a preset quality threshold according to the quality information of a target image, and deleting the target image data of which the quality information does not meet the preset quality threshold; and reading the target image data of which the quality information meets the preset quality threshold value, determining whether the target image data is readable image data, and judging that the target image data meets a second preset condition when the target image data of which the quality information meets the preset quality threshold value is the readable image data.
In one embodiment, the quality information includes a preset width, a preset height, a preset format and a preset memory; the preset quality threshold comprises a width threshold, a height threshold, a format threshold and a memory threshold.
In a specific application, the quality information of the image data includes, but is not limited to, a preset height, a preset width, a preset format, a preset memory, and the like of the image data. Correspondingly, the preset quality threshold includes, but is not limited to, a height threshold, a width threshold, a format threshold, and a memory threshold.
The detecting whether the target image data meets a preset quality threshold according to the quality information includes:
detecting whether the preset width of the target image data is smaller than the width threshold value;
when detecting that the preset width of the target image data is greater than or equal to the width threshold, detecting whether the preset height of the target image data is less than the height threshold;
when detecting that the preset height of the target image data is greater than or equal to the height threshold, detecting whether a preset memory of the target image data is smaller than the memory threshold;
when detecting that the preset memory of the target image data is larger than or equal to the memory threshold, detecting whether the preset format of the target image data is the same as the format threshold;
and when detecting that the preset format of the target image data is the same as the format threshold, judging that the target image data meets a preset quality threshold.
In a specific application, a width threshold, a height threshold, a memory threshold and a format threshold set by a user need to be acquired, and whether a preset width of target image data is smaller than the width threshold, whether a preset height of the target image data is smaller than the height threshold, whether a preset memory of the target image data is smaller than the memory threshold and whether a preset format of the target image data is the same as the format threshold are detected respectively; when detecting that the preset width of the target image data is larger than or equal to the width threshold, the preset height of the target image data is larger than or equal to the height threshold, the preset memory of the target image data is larger than or equal to the memory threshold, and the preset format of the target image data is the same as the format threshold, judging that the target image data meets the preset quality threshold. The format threshold is a condition for detecting whether the preset format of the target image data meets the format required by the user, and may be specifically set according to the user requirement, and includes, but is not limited to, jpeg, jpg, and gif.
It can be understood that when any one of the preset width of the target image data is smaller than the width threshold, the preset height of the target image data is smaller than the height threshold, the preset memory of the target image data is smaller than the memory threshold, and the preset format of the target image data is different from the format threshold is detected, it is determined that the target image data does not satisfy the preset quality threshold.
For example, the width threshold set by the user is 100px, the height threshold is 120px, the memory threshold is 200k, the format threshold includes jpg and gif, and when it is detected that the target image data is 120 × 120px, the preset memory size is 220k, and the preset format is jpeg, it is determined that the target image data does not satisfy the preset quality threshold.
In one embodiment, the step S104 includes:
renaming the target image data meeting the second preset condition;
and counting the number of the target image data meeting the second preset condition to generate an image data processing result.
In specific application, renaming operation is carried out on all target image data meeting the second preset condition, the number of the target image data meeting the second preset condition is counted, and an image data processing result is generated. The renaming rule can be specifically set according to the requirements of the user. Specifically, all target image data meeting the second preset condition may be renamed through the renaming function rename.
In a specific application, when a user inputs a plurality of different key words, the target image data corresponding to each key word and meeting the second preset condition needs to be renamed, the number of the target image data corresponding to each key word and meeting the second preset condition is counted, and an image data processing result is generated according to the number of the image data corresponding to each key word.
In specific application, when the quantity in the image data processing result is detected not to meet the requirement of a user, a notification of insufficient quantity of image data can be generated to prompt the user that the quantity of the image data obtained at the current moment is insufficient; when detecting an instruction that the user sets to continue to acquire image data, re-executing the operations from step S101 to step S104 on another third-party platform to implement image data processing to acquire the image data of the amount specified by the user.
As shown in fig. 6, a schematic diagram of an application scenario for generating image data processing results is provided.
In fig. 6, it is detected that the key words of the requirement input by the user include "cat", "computer", and "mouse", and therefore, the target image data satisfying the second preset condition corresponding to the "cat" needs to be renamed and the number of the target image data is counted, the target image data satisfying the second preset condition corresponding to the "computer" needs to be renamed and the number of the target image data is counted, the target image data satisfying the second preset condition corresponding to the "mouse" needs to be renamed and the number of the target image data is counted, and a corresponding image data processing result is obtained.
The obtained first image data is subjected to duplicate removal processing to obtain target image data, and the target image data with the quality information meeting the preset condition is detected and stored to obtain the image data meeting the requirement, so that the redundancy degree of the image data is reduced, and the quality of the image data is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 7 shows a block diagram of an image data processing apparatus according to an embodiment of the present application, which corresponds to the image data processing method described in the above embodiment, and only shows portions related to the embodiment of the present application for convenience of description.
Referring to fig. 7, the image data processing apparatus 100 includes:
an obtaining module 101, configured to obtain a plurality of first image data that satisfy a first preset condition;
a duplicate removal processing module 102, configured to perform duplicate removal processing on the first image data to obtain target image data;
the detection module 103 is configured to acquire information of the target image data, and detect whether the target image data meets a second preset condition according to the information;
and the generating module 104 is configured to generate an image data processing result according to the target image data meeting the second preset condition.
In one embodiment, the deduplication processing module 102 includes:
the identification unit 1021 is used for calculating and obtaining the similarity between every two first image data according to a preset algorithm, identifying and obtaining the first image data with the similarity meeting a third preset condition, and storing the first image data as the target image data;
a deduplication processing unit 1022, configured to perform deduplication processing on the first image data whose identified similarity does not satisfy the third preset condition, and save the image data after the deduplication processing as the target image data; wherein the preset algorithm comprises at least one of an MD5 algorithm and a perceptual hash algorithm.
In one embodiment, the identification unit 1021 includes:
the initialization subunit is used for initializing each first image data into a picture object and converting the picture object into an image character string;
the first calculation subunit is configured to calculate the image character string to obtain an information digest value of the first image data;
and the first identification subunit is used for identifying and obtaining first image data with different information abstract values from any one of the first image data, and storing the first image data as the target image data.
In one embodiment, the identification unit 1021 further comprises:
a conversion subunit, configured to convert each of the first image data into grayscale image data, and construct a hash value of the grayscale image data;
the second calculating subunit is configured to calculate and obtain a hamming distance between every two pieces of the first image data according to the hash value;
and the second identification subunit is used for identifying and obtaining the first image data of which the Hamming distance from any one of the first image data is greater than or equal to the Hamming distance threshold value, and storing the first image data as the target image data.
In one embodiment, the detection module 103 includes:
the first detection unit is used for detecting whether the target image data meets a preset quality threshold value according to the quality information;
a second detection unit configured to detect whether target image data satisfying a preset quality threshold is readable image data;
the judging unit is used for judging that the target image data meets a second preset condition when the target image data is detected to meet a preset quality threshold and is readable image data.
In one embodiment, the quality information includes a preset width, a preset height, a preset format and a preset memory; the preset quality threshold comprises a width threshold, a height threshold, a format threshold and a memory threshold;
the first detection unit includes:
the first detection subunit is used for detecting whether the preset width of the target image data is smaller than the width threshold value;
a second detecting subunit, configured to detect whether a preset height of the target image data is smaller than the height threshold when it is detected that the preset width of the target image data is greater than or equal to the width threshold;
a third detecting subunit, configured to detect, when it is detected that the preset height of the target image data is greater than or equal to the height threshold, whether a preset memory of the target image data is smaller than the memory threshold;
a fourth detecting subunit, configured to detect whether a preset format of the target image data is the same as the format threshold when it is detected that a preset memory of the target image data is greater than or equal to the memory threshold;
and the judging subunit is used for judging that the target image data meets a preset quality threshold value when detecting that the preset format of the target image data is the same as the format threshold value.
In one embodiment, the generating module 104 includes:
the renaming unit is used for renaming the target image data meeting the second preset condition;
and the generating unit is used for counting the number of the target image data meeting the second preset condition and generating an image data processing result.
The obtained first image data is subjected to duplicate removal processing to obtain target image data, and the target image data with the quality information meeting the preset condition is detected and stored to obtain the image data meeting the requirement, so that the redundancy degree of the image data is reduced, and the quality of the image data is improved.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
Fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, the terminal device 8 of this embodiment includes: at least one processor 80 (only one shown in fig. 8), a memory 81, and a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the processor 80 implementing the steps in any of the various image data processing method embodiments described above when executing the computer program 82.
The terminal device 8 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of the terminal device 8, and does not constitute a limitation of the terminal device 8, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.
The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 81 may in some embodiments be an internal storage unit of the terminal device 8, such as a hard disk or a memory of the terminal device 8. In other embodiments, the memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital Card (SD), a Flash memory Card (Flash Card), and the like provided on the terminal device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device 8. The memory 81 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 81 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. An image data processing method characterized by comprising:
acquiring a plurality of first image data meeting a first preset condition;
carrying out duplicate removal processing on the first image data to obtain target image data;
acquiring information of the target image data, and detecting whether the target image data meets a second preset condition or not according to the information;
and generating an image data processing result according to the target image data meeting the second preset condition.
2. The image data processing method of claim 1, wherein performing deduplication processing on the first image data to obtain target image data comprises:
calculating and obtaining the similarity between every two first image data according to a preset algorithm, identifying and obtaining the first image data with the similarity meeting a third preset condition, and storing the first image data as the target image data;
carrying out deduplication processing on the first image data with the identified similarity not meeting a third preset condition, and taking the image data subjected to deduplication processing as the target image data and storing the target image data; wherein the preset algorithm comprises at least one of an MD5 algorithm and a perceptual hash algorithm.
3. The image data processing method of claim 2, wherein the calculating according to a preset algorithm to obtain the similarity between every two first image data, and identifying and obtaining the first image data with the similarity satisfying a third preset condition as the target image data and storing the target image data comprises:
initializing each first image data into a picture object, and converting the picture object into an image character string;
calculating the image character string to obtain an information abstract value of the first image data;
and identifying and obtaining first image data with different information abstract values from any one first image data, and storing the first image data as the target image data.
4. The image data processing method of claim 2, wherein a similarity between every two pieces of the first image data is obtained by calculation according to a preset algorithm, and the obtained first image data having the similarity satisfying a third preset condition is identified and stored as the target image data, further comprising:
converting each first image data into gray image data, and constructing a hash value of the gray image data;
calculating and obtaining the Hamming distance between every two first image data according to the Hash value;
and identifying and obtaining first image data of which the Hamming distance with any one of the first image data is greater than or equal to a Hamming distance threshold value, and storing the first image data as the target image data.
5. The image data processing method of claim 1, wherein obtaining information of the target image data, and detecting whether the target image data satisfies a second preset condition according to the information comprises:
detecting whether the target image data meets a preset quality threshold value according to quality information;
detecting whether the target image data meeting a preset quality threshold is readable image data;
and when the target image data is detected to meet a preset quality threshold and is readable image data, judging that the target image data meets a second preset condition.
6. The image data processing method of claim 5, wherein the quality information includes a preset width, a preset height, a preset format, and a preset memory; the preset quality threshold comprises a width threshold, a height threshold, a format threshold and a memory threshold;
the detecting whether the target image data meets a preset quality threshold according to the quality information includes:
detecting whether the preset width of the target image data is smaller than the width threshold value;
when detecting that the preset width of the target image data is greater than or equal to the width threshold, detecting whether the preset height of the target image data is less than the height threshold;
when detecting that the preset height of the target image data is greater than or equal to the height threshold, detecting whether a preset memory of the target image data is smaller than the memory threshold;
when detecting that the preset memory of the target image data is larger than or equal to the memory threshold, detecting whether the preset format of the target image data is the same as the format threshold;
and when detecting that the preset format of the target image data is the same as the format threshold, judging that the target image data meets a preset quality threshold.
7. The image data processing method of any one of claims 1 to 6, wherein the generating an image data processing result from the target image data satisfying a second preset condition includes:
renaming the target image data meeting the second preset condition;
and counting the number of the target image data meeting the second preset condition to generate an image data processing result.
8. An image data processing apparatus characterized by comprising:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of first image data meeting a first preset condition;
the duplication removing processing module is used for carrying out duplication removing processing on the first image data to obtain target image data;
the detection module is used for acquiring the information of the target image data and detecting whether the target image data meets a second preset condition or not according to the information;
and the generating module is used for generating an image data processing result according to the target image data meeting the second preset condition.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202110071069.3A 2021-01-19 2021-01-19 Image data processing method and device, terminal equipment and readable storage medium Pending CN112733952A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110071069.3A CN112733952A (en) 2021-01-19 2021-01-19 Image data processing method and device, terminal equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110071069.3A CN112733952A (en) 2021-01-19 2021-01-19 Image data processing method and device, terminal equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112733952A true CN112733952A (en) 2021-04-30

Family

ID=75592479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110071069.3A Pending CN112733952A (en) 2021-01-19 2021-01-19 Image data processing method and device, terminal equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112733952A (en)

Similar Documents

Publication Publication Date Title
US10062083B2 (en) Method and system for clustering and classifying online visual information
US8838657B1 (en) Document fingerprints using block encoding of text
CN111651636B (en) Video similar segment searching method and device
CN111209827B (en) Method and system for OCR (optical character recognition) bill problem based on feature detection
CN110134965B (en) Method, apparatus, device and computer readable storage medium for information processing
US20220215205A1 (en) Robust content fingerprinting for image attribution
CN111651552A (en) Structured information determination method and device and electronic equipment
CN110647832A (en) Method and device for acquiring information in certificate, electronic equipment and storage medium
CN110532449B (en) Method, device, equipment and storage medium for processing service document
CN113408660B (en) Book clustering method, device, equipment and storage medium
CN117078970A (en) Picture identification method and device, electronic equipment and storage medium
Liu et al. Video copy detection by conducting fast searching of inverted files
CN109710626B (en) Data warehousing management method and device, electronic equipment and storage medium
CN112733952A (en) Image data processing method and device, terminal equipment and readable storage medium
Dalins et al. PDQ & TMK+ PDQF--A Test Drive of Facebook's Perceptual Hashing Algorithms
CN108009233B (en) Image restoration method and device, computer equipment and storage medium
CN114461833A (en) Picture evidence obtaining method and device, computer equipment and storage medium
US11574456B2 (en) Processing irregularly arranged characters
CN112990466A (en) Redundancy rule detection method and device and server
CN111611417B (en) Image de-duplication method, device, terminal equipment and storage medium
CN115687673B (en) Picture archiving method and device, electronic equipment and readable storage medium
He et al. a novel robust image forensics algorithm based on L1-norm estimation
CN117112846B (en) Multi-information source license information management method, system and medium
CN111597373B (en) Picture classifying method and related equipment based on convolutional neural network and connected graph
CN108595715B (en) File marking and arrangement analysis method, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination