CN113780042A - Picture set operation method, picture set labeling method and device - Google Patents

Picture set operation method, picture set labeling method and device Download PDF

Info

Publication number
CN113780042A
CN113780042A CN202011241796.1A CN202011241796A CN113780042A CN 113780042 A CN113780042 A CN 113780042A CN 202011241796 A CN202011241796 A CN 202011241796A CN 113780042 A CN113780042 A CN 113780042A
Authority
CN
China
Prior art keywords
candidate
pictures
picture set
picture
similarity value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011241796.1A
Other languages
Chinese (zh)
Inventor
张建虎
赖荣凤
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202011241796.1A priority Critical patent/CN113780042A/en
Publication of CN113780042A publication Critical patent/CN113780042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a picture set operation method, a picture set labeling method and a picture set labeling device, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a candidate picture set; wherein the candidate picture set comprises a plurality of candidate pictures; determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures; according to the first similarity value, carrying out duplicate removal processing on the candidate picture set; determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second characteristic values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing; and according to the second similarity value, performing deduplication again on the candidate pictures subjected to deduplication processing to generate a deduplication picture set. The embodiment can effectively reduce repeated storage of the same picture.

Description

Picture set operation method, picture set labeling method and device
Technical Field
The invention relates to the technical field of computers, in particular to a picture set operation method, a picture set labeling method and a picture set labeling device.
Background
Due to the needs of various applications, multiple pictures are often stored in the system. However, many pictures are stored repeatedly, which not only wastes disk space, but also has adverse effects on some applications. For example, a visual model is trained by using a picture set containing many repeated pictures as sample data, which may result in overfitting of the model to the data to a certain extent and reduce the generalization of the model.
Disclosure of Invention
In view of this, embodiments of the present invention provide a picture set operation method, a picture set labeling method, and a device, which can reduce storage of duplicate pictures.
In a first aspect, an embodiment of the present invention provides a method for operating a picture set, including:
acquiring a candidate picture set; wherein the candidate picture set comprises a plurality of candidate pictures;
determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures;
according to the first similarity value, carrying out duplicate removal processing on the candidate picture set;
determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second characteristic values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing;
and according to the second similarity value, performing deduplication again on the candidate pictures subjected to deduplication processing to generate a deduplication picture set.
Alternatively,
determining a first similarity value between any two candidate pictures in the multiple candidate pictures according to the first feature values of the multiple candidate pictures, including:
for any two candidate pictures in the plurality of candidate pictures: calculating a distance value between first characteristic values of the two candidate pictures; determining the distance value as a first similarity value between the two candidate pictures;
the performing, according to the first similarity value, deduplication processing on the candidate picture set includes:
if the first similarity value is smaller than a first threshold value, determining one of the two candidate pictures as a repeated picture;
deleting the duplicate picture from the set of candidate pictures.
Alternatively,
the determining a second similarity value between any two candidate pictures in the processed candidate picture set according to second feature values of a plurality of candidate pictures in the processed candidate picture set includes:
aiming at any two candidate pictures in the processed candidate picture set: calculating a distance value between second characteristic values of the two candidate pictures; determining the distance value as a second similarity value between the two candidate pictures;
and performing the deduplication again on the candidate picture after the deduplication processing according to the second similarity value, including:
if the second similarity value is smaller than a second threshold value, determining one of the two candidate pictures as a similar picture;
receiving operation information of a user for the similar picture;
and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the candidate picture set after the duplicate removal processing.
Alternatively,
the determining one of the two candidate pictures as a similar picture comprises:
determining quality parameters of the two candidate pictures;
determining a candidate picture with a lower quality parameter as the similar picture.
Alternatively,
the acquiring of the candidate picture set comprises:
acquiring at least one candidate video;
for each of the candidate videos: extracting a plurality of video frames from the candidate video;
and combining the plurality of video frames to generate the candidate picture set.
Alternatively,
the duplication removal picture set comprises a plurality of duplication removal pictures, and further comprises:
acquiring an existing picture set; wherein the existing picture set comprises a plurality of existing pictures;
determining a third similarity value between the existing picture and the duplication-removing picture according to the first characteristic value of the existing picture and the first characteristic value of the duplication-removing picture;
deleting the pictures similar to the pictures in the existing picture set in the duplication removing picture set according to the third similarity value;
determining a fourth similarity value between the existing picture and the duplication-removing picture according to the second characteristic value of the existing picture and the second characteristic value of the duplication-removing picture in the processed duplication-removing picture set;
according to the fourth similarity value, deleting the processed de-duplicated picture set;
and expanding a plurality of pictures in the duplication-removing picture set to the existing picture set.
Alternatively,
and deleting the duplicate removal picture set according to the third similarity value, wherein the deleting comprises:
and if the third similarity value is smaller than a third threshold value, deleting the duplicate removal pictures from the duplicate removal picture set.
Alternatively,
and deleting the processed de-duplicated picture set or the existing picture set according to the fourth similarity value, wherein the deleting process comprises the following steps:
if the fourth similarity value is smaller than a fourth threshold value, determining one of the existing picture or the duplicate removal picture as a similar picture;
receiving operation information of a user for the similar picture;
and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the processed candidate picture set or the existing picture set.
In a second aspect, an embodiment of the present invention provides a method for labeling a picture set, including:
acquiring a first candidate picture set;
determining similar pictures of the first candidate picture set and the existing picture set according to the first characteristic values of the pictures in the first candidate picture set and the existing picture set, and deleting the similar pictures from the first candidate picture set to obtain a second candidate picture set;
according to the second candidate picture set and the second characteristic value of the pictures in the existing picture set, determining similar pictures of the second candidate picture set and the existing picture set, and deleting the similar pictures from the second candidate picture set to obtain a picture set to be labeled;
and labeling the pictures in the picture set to be labeled.
In a third aspect, an embodiment of the present invention provides an image set operation apparatus, including:
the image set acquisition module is used for acquiring a candidate image set; wherein the candidate picture set comprises a plurality of candidate pictures;
the first similarity value determining module is used for determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures;
the first duplicate removal module is used for carrying out duplicate removal processing on the candidate picture set according to the first similarity value;
the second similarity value determining module is used for determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second feature values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing;
and the second duplicate removal module is used for carrying out duplicate removal processing on the candidate pictures subjected to the duplicate removal processing again according to the second similarity value to generate a duplicate removal picture set.
In a fourth aspect, an embodiment of the present invention provides an image set annotation device, including:
the image set acquisition module is used for acquiring a first candidate image set;
the first deleting module is used for determining similar pictures of the first candidate picture set and the existing picture set according to first characteristic values of pictures in the first candidate picture set and the existing picture set, and deleting the similar pictures from the first candidate picture set to obtain a second candidate picture set;
the second deleting module is used for determining similar pictures of the second candidate picture set and the existing picture set according to second characteristic values of pictures in the second candidate picture set and the existing picture set, and deleting the similar pictures from the second candidate picture set to obtain a picture set to be labeled;
and the picture marking module is used for marking the pictures in the picture set to be marked.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
In a sixth aspect, the present invention provides a computer-readable medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method of any one of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: the first characteristic value and the second characteristic value are different representation modes for the picture characteristics. The similarity degree between the two candidate pictures is determined according to the first characteristic value and the second characteristic value of the candidate pictures respectively, and the duplicate removal processing is performed on the candidate picture set twice, so that the duplicate storage of the same picture can be effectively reduced.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 2 is a schematic diagram illustrating a flow of a method for operating a picture set according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a flow of another method for handling a picture set according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a flow of a method for labeling a picture set according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a data flow of another method for labeling a picture set according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an album operating apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a device for labeling a picture set according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which an abnormal behavior recognition method or an abnormal behavior recognition apparatus according to an embodiment of the present invention may be applied.
As shown in fig. 1, the system architecture 100 may include acquisition devices 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the acquisition devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The acquisition devices 101, 102, 103 may be various types of terminal devices or servers. Terminals include, but are not limited to, smart phones, tablets, laptop portable computers, desktop computers, and the like. The acquisition devices 101, 102, 103 interact with a server 105 over a network 104 to receive or send messages and the like. Various crawler tools, video capture tools, and the like may be deployed on the capture devices 101, 102, 103. The capture devices 101, 102, 103 crawl picture data or video data on a search engine or a particular website through crawler tools.
The server 105 receives a plurality of candidate pictures acquired by the acquisition devices 101, 102, 103. The background management server can perform duplicate removal processing on a plurality of candidate pictures twice according to the first characteristic value and the second characteristic value of the candidate pictures so as to delete repeated pictures or similar pictures in the candidate pictures.
It should be noted that the method for operating or annotating a picture set provided by the embodiment of the present invention is generally executed by the server 105, and accordingly, the picture set annotation device after the picture set operation device is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
An embodiment of the present invention provides a method for operating a picture set, as shown in fig. 2, including:
step 201: acquiring a candidate picture set; wherein the candidate picture set comprises a plurality of candidate pictures.
The candidate picture set is a picture set which needs to be subjected to deduplication processing. The candidate picture set can be obtained in various ways, such as an open source picture set, a network collected picture set, a third party purchased picture set, and the like. The method of the embodiment of the present invention does not limit how to obtain the candidate picture set.
In one embodiment of the present invention, the candidate picture set may be obtained by:
acquiring at least one candidate video;
for each candidate video: extracting a plurality of video frames from the candidate video;
and combining the plurality of video frames to generate a candidate picture set.
At least one candidate video may be obtained by a video crawler or from an existing video set. And for each candidate video, respectively extracting a plurality of video frames of the candidate video, and combining the plurality of video frames to generate a candidate picture set. A video extraction program may be written and an extraction time interval may be set in the extraction program to extract video frames from the candidate video. The extraction of the video frame can also be realized by directly applying the tools such as ffmpeg and the like.
Step 202: and determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures.
The first feature value may be used to characterize feature information of the candidate picture. The first characteristic value may be a unique characteristic of the picture, such as MD5(Message-digest Algorithm), SHA-224(SecureHashAlgorithm-224, secure hash Algorithm 224), SHA-256(SecureHashAlgorithm-256, secure hash Algorithm 256), SHA-384 (SecureHashAlgorithm-384, secure hash Algorithm 384), or SHA-512 (SecureHashAlgorithm-512, secure hash Algorithm 512), and the like. If the first feature values of the two candidate pictures are the same, the two candidate pictures are the same picture or very similar pictures. If the distance value between the first feature values of the two candidate pictures is small, the two candidate pictures may be similar pictures.
The first feature value may also be a feature code of a picture, such as CNN code, hash code, bof (basic) code, or HBP code. The distance value between the second characteristic values of the two pictures can measure the similarity degree of the two pictures. The smaller the distance value between the second feature values of the two pictures is, the more similar the two pictures are. And if the distance value between the first characteristic values of the two candidate pictures is smaller, the two candidate pictures are similar pictures.
In an embodiment of the present invention, determining a first similarity value between any two candidate pictures in a plurality of candidate pictures according to first feature values of the candidate pictures includes:
for any two candidate pictures in the multiple candidate pictures: calculating a distance value between first characteristic values of the two candidate pictures; the distance value is determined as a first similarity value between the two candidate pictures.
The first characteristic values of all candidate pictures in the candidate set can be determined, then the distance value between the first characteristic values of any two candidate pictures in the candidate set is calculated, and the distance value is determined as the first similarity value between the two candidate pictures. The first similarity value may be used to characterize a degree of similarity between two candidate pictures determined from the first feature value.
The calculation manner of the distance value between the first characteristic values of the two pictures can be determined according to the specific form of the first characteristic values. For example, if the first feature value is MD5, SHA-224, etc., the difference between the first feature values of the two pictures may be calculated to determine a similarity value for the two pictures. If the first characteristic value is hash coding, the Hamming distance between the first characteristic values of the two pictures can be calculated to determine the similarity value of the two pictures.
Step 203: and according to the first similarity value, performing duplicate removal processing on the candidate picture set.
The first threshold may be preset in the system according to specific requirements. The first threshold may take a value very close to zero and greater than zero. The first similarity value is then compared to a first threshold. And if the first similarity value is smaller than the first threshold value, determining one of the two candidate pictures as a repeated picture. And according to the determined repeated picture, carrying out duplicate removal processing on the candidate picture set. For example, duplicate pictures may be deleted directly from the candidate picture set. The duplicate picture may also be presented to the user, who decides whether to delete the duplicate picture.
Step 204: and determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to the second characteristic values of the candidate pictures in the candidate picture set subjected to the deduplication processing.
The candidate picture set after the deduplication processing is the candidate picture set obtained after the deduplication processing in step 103.
The second feature value may be used to characterize feature information of the candidate picture. The second feature value may be a unique feature of the picture, such as MD5, SHA-224, SHA-256, SHA-384, or SHA-512, etc. And if the second characteristic values of the two candidate pictures are the same, the two candidate pictures are the same picture or very similar pictures. If the distance value between the second feature values of the two candidate pictures is small, the two candidate pictures may be similar pictures.
The second feature value may also be a feature code of the picture, such as CNN code, hash code, BOF code, or HBP code. The distance value between the second characteristic values of the two pictures can measure the similarity degree of the two pictures. The smaller the distance value between the second feature values of the two pictures is, the more similar the two pictures are. And if the distance value between the second characteristic values of the two candidate pictures is smaller, the two candidate pictures are similar pictures.
Note that the second characteristic value is different from the first characteristic value. According to the method provided by the embodiment of the invention, duplicate removal processing is carried out twice according to the first characteristic value and the second characteristic value, so that a better duplicate removal effect can be obtained for a plurality of pictures in the candidate picture set.
In an embodiment of the present invention, determining a second similarity value between any two candidate pictures in the processed candidate picture set according to second feature values of multiple candidate pictures in the processed candidate picture set includes:
aiming at any two candidate pictures in the processed candidate picture set: calculating a distance value between second characteristic values of the two candidate pictures; the distance value is determined as a second similarity value between the two candidate pictures.
The second eigenvalue of all candidate pictures in the candidate set after the deduplication processing in step 103 may be determined first, then the distance value between the second eigenvalues of any two candidate pictures in the candidate set is calculated, and the distance value is determined as the second similarity value between the two candidate pictures. The second similarity value may be used to characterize a degree of similarity between the two candidate pictures determined according to the second feature value.
The calculation manner of the distance value between the second characteristic values of the two pictures can be determined according to the specific form of the second characteristic values. If the second feature value is MD5, SHA-224, etc., the difference between the second feature values of the two pictures can be calculated to determine the similarity value of the two pictures. If the second characteristic value is hash code, the Hamming distance between the second characteristic values of the two pictures can be calculated to determine the similarity value of the two pictures.
Step 205: and according to the second similarity value, carrying out the duplicate removal processing on the candidate pictures after the duplicate removal processing again to generate a duplicate removal picture set.
The second threshold value may be preset in the system according to specific requirements. The second threshold may be very close to zero and greater than zero. The second similarity value is then compared to a second threshold value. And if the first similarity value is smaller than the first threshold value, determining one of the two candidate pictures as a similar picture. And according to the determined similar pictures, carrying out duplicate removal processing on the candidate picture set. For example, duplicate pictures may be deleted directly from the candidate picture set. The repeated pictures can also be displayed to the user, and the operation information of the user for the similar pictures is received; and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the candidate picture set after the duplicate removal processing.
In one embodiment of the present invention, one of the two candidate pictures may be determined as a similar picture by:
determining quality parameters of two candidate pictures;
the candidate picture with the lower quality parameter is determined to be a similar picture.
The quality parameter of the picture may be used to characterize the picture quality or display effect in the candidate picture. The higher the picture quality parameter is, the better the picture quality of the picture or the display effect of the picture is. The quality parameters of the picture may be determined from picture attribute information including, but not limited to, sharpness, brightness, and completeness. When the candidate picture includes face information, the picture attribute information may further include a face size and a face angle.
Specifically, the corresponding relationship between the value range and the quality score can be set for the image attribute information, and the quality score can be determined according to the attribute information. And setting different weights for different picture attribute information. The weight is used for representing the contribution degree of the picture attribute information to the picture quality. And finally, calculating the weighted sum of the scores corresponding to the attributes of the pictures as the quality parameters of the pictures.
In the embodiment of the invention, the first characteristic value and the second characteristic value are different characterization modes for the picture characteristics. The similarity degree between the two candidate pictures can be respectively determined according to the first characteristic value or the second characteristic value of the two candidate pictures. According to the method provided by the embodiment of the invention, duplicate removal processing is performed on the candidate picture set twice according to the first characteristic value and the second characteristic value of the candidate picture respectively, so that the problem of repeated storage of the same picture in the prior art can be solved.
In one embodiment of the invention, the first feature value is a unique feature of the picture, such as MD5, SHA-224, SHA-256, SHA-384, or SHA-512. The second characteristic value is characteristic coding of the picture, such as CNN coding, hash coding, BOF coding, HBP coding, or the like.
The almost identical graphs are deleted through the first characteristic value, and then the similar graphs are deleted through the second characteristic value. The embodiment of the invention carries out two times of deletion operations on the candidate picture set, and can more effectively remove repeated pictures and similar pictures in the candidate picture set.
In addition, the calculation speed of the first characteristic value and the comparison speed between the first characteristic values are faster than the processing speed of the second characteristic value. If the number of the pictures in the picture set is large or the repetition rate of the pictures in the picture set is high, the deduplication operation is firstly performed according to the first characteristic value, and the deduplication operation speed can be higher than that of the deduplication operation directly performed by using the second characteristic value.
Fig. 3 is a flowchart of a method for operating a picture set according to another embodiment of the present invention. As shown in fig. 3, the method includes:
step 301: acquiring an existing picture set; wherein the existing picture set comprises a plurality of existing pictures.
The existing picture set is a picture set which is already stored in the system. It is desirable to extend the pictures in the deduplicated picture set to an existing picture set. The de-duplicated picture set is a picture set generated by the method shown in fig. 2. The duplicate removal picture set comprises a plurality of duplicate removal pictures.
Step 302: and determining a third similarity value between the existing picture and the duplicate removal picture according to the first characteristic value of the existing picture and the first characteristic value of the duplicate removal picture.
The detailed description of the first characteristic value is the same as the above method, and is not repeated herein.
Calculating a distance value between a first characteristic value of the existing picture and a first characteristic value of the duplicate removal picture; the distance value is determined as a third similarity value between the existing picture and the de-duplicated picture. The third similarity value may be used to characterize a degree of similarity between the existing picture and the de-duplicated picture determined from the first feature value.
Step 303: and deleting the pictures similar to the pictures in the existing picture set in the duplication removing picture set according to the third similarity value.
In an embodiment of the present invention, deleting pictures in the duplicate removal picture set similar to pictures in the existing picture set according to the third similarity value includes:
and if the third similarity value is smaller than the third threshold value, deleting the duplicate removal pictures from the duplicate removal picture set.
The third threshold value may be preset in the system according to specific requirements. The third threshold may take a value very close to zero. The third similarity value is then compared with a third threshold value. And if the third similarity value is smaller than a third threshold value, determining the duplicate removal picture corresponding to the third similarity value as a repeated picture, and deleting the duplicate removal picture from the duplicate removal picture set. The deduplicated picture can be deleted directly from the set of deduplicated pictures. The duplicate removal picture may also be presented to the user, who decides whether to delete the duplicate removal picture.
Step 304: and determining a fourth similarity value between the existing picture and the duplication-removing picture according to the second characteristic value of the existing picture and the second characteristic value of the duplication-removing picture in the processed duplication-removing picture set.
The detailed description of the second characteristic value is the same as the above method, and is not repeated here.
Calculating a distance value between a second characteristic value of the existing picture and a second characteristic value of the duplicate removal picture; the distance value is determined as a fourth similarity value between the existing picture and the de-duplicated picture. The fourth similarity value may be used to characterize a degree of similarity between the existing picture and the deduplicated picture determined from the second feature value.
Step 305: and deleting the processed duplicate removal picture set according to the fourth similarity value.
In an embodiment of the present invention, the deleting process, according to the fourth similarity value, on the processed deduplicated picture set or the existing picture set includes:
if the fourth similarity value is smaller than the fourth threshold value, determining one of the existing picture or the duplication-removed picture as a similar picture;
receiving operation information of a user for similar pictures;
and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the processed candidate picture set or the existing picture set.
The fourth threshold value may be preset in the system according to specific requirements. The fourth threshold may take a value very close to zero. The fourth similarity value is then compared to a fourth threshold. And if the fourth similarity value is smaller than a fourth threshold value, determining the duplicate removal picture corresponding to the fourth similarity value as a similar picture, and deleting the duplicate removal picture from the duplicate removal picture set. The deduplicated picture can be deleted directly from the set of deduplicated pictures. The duplicate removal picture may also be presented to the user, who decides whether to delete the duplicate removal picture.
Step 306: and expanding a plurality of pictures in the duplicate picture set to the existing picture set.
In the embodiment of the invention, before a plurality of pictures in the duplication elimination picture set are expanded to the existing picture set, the pictures similar to the existing picture set in the duplication elimination picture set are subjected to duplication elimination twice according to the first characteristic value and the second characteristic value of the pictures. The method of the embodiment of the invention can reduce the occurrence of the situation that the existing picture set contains repeated pictures or similar pictures.
In one embodiment of the invention, each existing picture in the existing picture set corresponds to a label; before expanding a plurality of pictures in a duplicate picture set to an existing picture set, the method comprises the following steps:
labeling each of the de-duplicated pictures in the set of de-duplicated pictures.
In the embodiment of the invention, the duplicate removal processing is carried out on the first candidate picture set twice, so that the pictures added into the existing picture set are ensured not to be repeated or similar to the original pictures in the existing picture set, and further, various applications based on the existing picture set can obtain better application effects. In addition, because the duplicate removal operation is carried out twice, the number of the pictures to be labeled is reduced, the workload of labeling the picture sets is reduced, and the efficiency of labeling the picture sets is improved.
Fig. 4 is a flowchart of a method for labeling a picture set according to another embodiment of the present invention. As shown in fig. 4, the method includes:
step 401: a first set of candidate pictures is obtained.
Step 402: according to the first characteristic values of the pictures in the first candidate picture set and the existing picture set, determining similar pictures of the first candidate picture set and the existing picture set, and deleting the similar pictures from the first candidate picture set to obtain a second candidate picture set.
The existing picture set is a picture set which is already stored in the system. Each existing picture in the existing picture set corresponds to a label.
The detailed description of the first characteristic value is the same as the above method, and is not repeated herein. Calculating a distance value between a first characteristic value of the existing picture and a first characteristic value of pictures in the first candidate picture set; and determining the distance value as a picture similarity value between the existing picture and the pictures in the first candidate set. The picture similarity value may be used to characterize a degree of similarity between the existing picture determined according to the first feature value and the pictures in the first candidate set.
Step 403: and determining similar pictures of the second candidate picture set and the existing picture set according to second characteristic values of pictures in the second candidate picture set and the existing picture set, and deleting the similar pictures from the second candidate picture set to obtain a picture set to be labeled.
The detailed description of the second characteristic value is the same as the above method, and is not repeated here. Calculating a distance value between a second characteristic value of the existing picture and a second characteristic value of pictures in the first candidate picture set; the distance value is determined as a picture similarity value between the existing picture and the pictures in the first candidate picture set. The picture similarity can be used for representing the similarity between the existing picture determined according to the second characteristic value and the pictures in the first candidate picture set.
Step 404: and carrying out labeling processing on the pictures in the picture set to be labeled.
In an embodiment of the present invention, after performing annotation processing on the pictures in the picture set to be annotated, the method further includes: and expanding the marked pictures in the to-be-marked picture set to the existing picture set.
Each existing picture in the existing picture set corresponds to a label, and the existing picture set can be applied to various applications, such as training of a visual model. On one hand, through twice deduplication processing, only the pictures which are not contained in the existing picture set are expanded to the existing picture set, the probability that the expanded existing picture set contains repeated pictures or similar pictures is reduced, the risk of data overfitting of the model is reduced, and the generalization of the visual model is improved. On the other hand, only the duplicate removal pictures subjected to the duplicate removal processing are labeled, and all the duplicate removal pictures do not need to be labeled, so that the efficiency of labeling the picture sets is improved.
Fig. 5 is a schematic diagram of a data flow of another method for labeling a picture set according to an embodiment of the present invention. As shown in fig. 5, the embodiment of the present invention relates to four data sources in total, and a picture in at least one of the four data sources may be selected and used as a candidate picture set to collect the extended data.
The picture data for a search engine or a particular website may be crawled using manual collection or through crawler tools. The method can be used for removing the duplicate of the crawled full-scale picture data, and can also be used for removing the duplicate of the full-scale picture data after the full-scale picture data is partitioned. If the picture repetition rate in the full picture data is high or the data volume is large, the full picture data is recommended to be processed in a blocking mode. The full-scale picture data is processed in a blocking mode firstly, and then the duplicate removal processing is carried out, so that the duplicate removal speed of the picture data can be improved.
The method for processing the full picture data in blocks is as follows: 1) dividing the full-size pictures into N groups, wherein N is an integer greater than 1. 2) And respectively carrying out duplicate removal operation on each group of pictures. 3) And combining the N groups of the deduplicated pictures generated in the step 2. 4) And 3, removing the duplicate of the combined picture generated in the step 3 again.
The picture duplication eliminating method in the embodiment of the invention mainly comprises two parts: and removing the duplicate of the candidate picture set, and removing the duplicate of the duplicate picture set and the existing picture set. The deduplication logic for the candidate picture set is as follows:
s11: integrity of a plurality of candidate pictures in the candidate picture set is detected. The existing tools such as openCV or PIL can be used for detecting the integrity of the pictures, and the pictures which cannot be read correctly and the pictures which fail to be crawled are deleted.
S12: determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures; and according to the first similarity value, carrying out de-duplication processing on the candidate picture set.
This step is to directly delete the duplicate pictures in the candidate picture set. The first feature value is a unique feature of each picture, such as MD 5. And then directly deleting the repeated pictures according to the first characteristic value. This step can be understood as deleting almost identical graphs and does not require manual secondary verification intervention.
S13: determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second characteristic values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing; and according to the second similarity value, performing deduplication again on the candidate pictures subjected to deduplication processing to generate a deduplication picture set.
This step is to indirectly delete the duplicate pictures in the candidate picture set. The second feature value may be such that the similar map corresponds to features closer in some distance than features of the non-similar map. The picture may be encoded as CNN (convolutional neural networks) or hash may be selected to generate a second feature value of the picture. The example selects Phash to encode the picture.
And setting a certain similarity threshold, and displaying the similar graphs to the user in a certain mode, so that the user can select whether to manually confirm deletion. This example will be similar to the figures: one still remains in the original path and is named with 'A _' + original name, and corresponding to the remaining similar figures, is named with 'R _' + original name, and is moved to the specific folder again. The user can define a specific folder by self, and the user can select whether to manually check and correct the result.
For the candidate pictures obtained through the video crawler branch in fig. 5, the videos may be downloaded manually or related videos may be crawled using a crawler tool. And then, performing frame extraction on each video, and taking a plurality of extracted video frames as a candidate picture set. The example chooses to extract the key frames using the ffmpeg tool. And then, applying the de-duplication logic of the candidate picture set to the extracted candidate picture set of the video frame for de-duplication processing.
The candidate pictures obtained through other source picture branches in fig. 5 include: third party purchase data, open source data, internal data, etc. And for the candidate picture sets obtained by other source picture branches, applying the deduplication logic of the candidate picture sets to perform deduplication processing.
There are picture branches. Because the iteration of the model not only optimizes the model structure and the like, but also comprises continuous iteration addition and deletion of data and the like, and the data possibly required along with different service scenes are different, the candidate pictures in the picture crawler branch, the video crawler branch or other source picture branches can be sampled according to a certain rule to form a new data set. Since the existing data is assumed to be data after the past re-cleaning, the part of data is not subjected to the de-duplication operation.
In fig. 5, the candidate picture sets from the data source branches are subjected to duplicate removal processing twice to generate duplicate removal picture sets. The pictures in the existing picture set are pictures already stored in the system. The existing picture set can be formed by collecting the four branch data. For the duplicate removal of the duplicate removal picture set and the existing picture set, the specific steps are as follows:
s21: and detecting the integrity of a plurality of the duplicate pictures in the duplicate picture. Some pictures may be accidentally damaged over time and some mobile copies. The existing tools such as openCV or PIL can be used for detecting the integrity of the pictures, and the pictures which cannot be read correctly are deleted.
S22: determining a third similarity value between the existing picture and the duplication-removing picture according to the first characteristic value of the existing picture and the first characteristic value of the duplication-removing picture; and deleting the pictures similar to the pictures in the existing picture set in the duplication removing picture set according to the third similarity value.
The step is to directly delete the picture which is repeated with the existing picture set from the duplicate removal picture set. A first feature value, such as MD5, may be calculated for each picture. The processed pictures include the pictures in the de-duplicated picture set and the existing picture set. And then directly deleting the duplicate-removed picture which is repeated with the picture in the existing picture set according to the first characteristic value. This step can be understood as deleting almost exactly the same graph in the branch as in the existing set of pictures and does not require manual secondary verification intervention.
S23: determining a fourth similarity value between the existing picture and the duplication-removing picture according to the second characteristic value of the existing picture and the second characteristic value of the duplication-removing picture in the processed duplication-removing picture set; and deleting the processed de-duplicated picture set according to the fourth similarity value to obtain a picture set to be labeled.
The step is to delete the picture which is duplicated in the duplicate picture set indirectly and the existing picture set. Each map may be encoded with a second feature value, which may be such that the features of the similar map are closer in some distance than the features of the non-similar map. If CNN or hash can be selected for coding, Phash is selected for coding in the example. The processed pictures include the pictures in the de-duplicated picture set and the existing picture set. And setting a certain similarity threshold, and displaying the pictures similar to the existing picture set and the duplicate picture set to the user in a certain mode, so that the user can select whether to manually confirm deletion. This example moves a graph in the branch similar to the graph in the alternative database back to the particular folder. Similar pictures in the de-duplicated picture set are named with 'R _' + original picture name, corresponding pictures in the existing picture set are named with 'A _' + original picture name, and a user can select whether to manually check and correct the result.
S24: and labeling the pictures in the picture set to be labeled.
Only the added data are labeled, so that the labeling quantity is reduced, the labeling efficiency is improved, direct manual labeling can be selected, and algorithm-assisted semi-supervised labeling can also be selected.
S25: and expanding a plurality of pictures in the marked picture set to be marked to the existing picture set.
And updating the existing picture set. And expanding the branch data after the duplication removal to the existing picture set. In this way, when the data is expanded later, the number of the following manual labels can be reduced by filtering the data contained in the existing picture set.
In the embodiment of the invention, the candidate picture sets from each branch are subjected to duplicate removal, and the duplicate removal picture set and the existing picture set are star-shaped by using the following three steps: filtering incomplete pictures; deleting almost identical graphs directly; the similar graph was filtered. The similarity can be adjusted, a feature extraction algorithm can be selected, and whether secondary manual verification is performed or not can be selected. The pictures of each branch are different from the existing pictures in calculation, only the pictures which are not in the existing picture library are filtered, and the pictures which are not in the existing picture library are added into the existing picture library, so that the storage of repeated pictures is reduced. In addition, only the filtered pictures which are not in the existing picture library are labeled, so that the labeling cost is further reduced.
Fig. 6 is a schematic structural diagram of an album operating apparatus according to an embodiment of the present invention, including:
a picture set obtaining module 601, configured to obtain a candidate picture set; the candidate picture set comprises a plurality of candidate pictures;
a first similarity value determining module 602, configured to determine, according to first feature values of multiple candidate pictures, a first similarity value between any two candidate pictures in the multiple candidate pictures;
a first duplicate removal module 603, configured to perform duplicate removal processing on the candidate picture set according to the first similarity value;
a second similarity value determining module 604, configured to determine, according to second feature values of multiple candidate pictures in the candidate picture set after the deduplication processing, a second similarity value between any two candidate pictures in the candidate picture set after the deduplication processing;
the second deduplication module 605 is configured to perform deduplication again on the candidate picture after deduplication processing according to the second similarity value, so as to generate a deduplication picture set.
In an embodiment of the present invention, the first similarity value determining module 602 is specifically configured to, for any two candidate pictures in the multiple candidate pictures: calculating a distance value between first characteristic values of the two candidate pictures; determining the distance value as a first similarity value between two candidate pictures;
the first deduplication module 603 is specifically configured to determine one of the two candidate pictures as a duplicate picture if the first similarity value is smaller than the first threshold;
duplicate pictures are deleted from the candidate picture set.
In an embodiment of the present invention, the second similarity value determining module 604 is specifically configured to, for any two candidate pictures in the processed candidate picture set: calculating a distance value between second characteristic values of the two candidate pictures; determining the distance value as a second similarity value between the two candidate pictures;
the second duplicate removal module 605 is specifically configured to determine one of the two candidate pictures as a similar picture if the second similarity value is smaller than the second threshold;
receiving operation information of a user for similar pictures;
and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the candidate picture set after the duplicate removal processing.
In an embodiment of the present invention, the second deduplication module 605 is specifically configured to determine quality parameters of two candidate pictures;
the candidate picture with the lower quality parameter is determined to be a similar picture.
In an embodiment of the present invention, the picture set obtaining module 601 is specifically configured to obtain at least one candidate video;
for each candidate video: extracting a plurality of video frames from the candidate video;
and combining the plurality of video frames to generate a candidate picture set.
In one embodiment of the invention, the duplicate picture set comprises a plurality of duplicate pictures;
the picture set obtaining module 601 is further configured to obtain an existing picture set; wherein the existing picture set comprises a plurality of existing pictures;
the first similarity value determining module 602 is further configured to determine a third similarity value between the existing picture and the duplicate removal picture according to the first feature value of the existing picture and the first feature value of the duplicate removal picture;
the first deduplication module 603 is further configured to delete, according to the third similarity value, pictures in the deduplication picture set that are similar to pictures in the existing picture set;
the second similarity value determining module 604 is further configured to determine a fourth similarity value between the existing picture and the duplicate removal picture according to the second feature value of the existing picture and the second feature value of the duplicate removal picture in the processed duplicate removal picture set;
a second duplicate removal module 605, configured to delete the processed duplicate removal picture set according to the fourth similarity value;
the device also includes: the expansion module 606 is configured to expand a plurality of pictures in the duplicate picture set to an existing picture set.
In an embodiment of the present invention, the first deduplication module 603 is specifically configured to delete the deduplication picture from the deduplication picture set if the third similarity value is smaller than the third threshold.
In an embodiment of the present invention, the second deduplication module 605 is specifically configured to determine one of the existing picture or the deduplication picture as the similar picture if the fourth similarity value is smaller than the fourth threshold;
receiving operation information of a user for similar pictures;
and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the processed candidate picture set or the existing picture set.
Fig. 7 is a schematic structural diagram of a picture set labeling apparatus according to an embodiment of the present invention, including:
a picture set obtaining module 701, configured to obtain a first candidate picture set;
a first deleting module 702, configured to determine, according to first feature values of pictures in the first candidate picture set and the existing picture set, similar pictures of the first candidate picture set and the existing picture set, and delete the similar pictures from the first candidate picture set to obtain a second candidate picture set;
a second deleting module 703, configured to determine, according to second feature values of pictures in the second candidate picture set and the existing picture set, similar pictures of the second candidate picture set and the existing picture set, and delete the similar pictures from the second candidate picture set to obtain a picture set to be labeled;
and the picture labeling module 704 is configured to label the pictures in the picture set to be labeled.
An embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not form a limitation on the modules themselves in some cases, and for example, the sending module may also be described as a "module sending a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring a candidate picture set; wherein the candidate picture set comprises a plurality of candidate pictures;
determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures;
according to the first similarity value, carrying out duplicate removal processing on the candidate picture set;
determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second characteristic values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing;
and according to the second similarity value, performing deduplication again on the candidate pictures subjected to deduplication processing to generate a deduplication picture set.
According to the technical scheme of the embodiment of the invention, the first characteristic value and the second characteristic value of the picture can be used for representing the characteristic information of the picture, and the similarity degree between the two candidate pictures can be determined according to the first characteristic value or the second characteristic value of the two candidate pictures. According to the method provided by the embodiment of the invention, the duplicate removal processing is carried out on the candidate picture set twice according to the first characteristic value and the second characteristic value of the candidate picture, so that the occurrence of the situation that repeated pictures or similar pictures are stored in the candidate picture set can be effectively reduced.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A picture set operation method is characterized by comprising the following steps:
acquiring a candidate picture set; wherein the candidate picture set comprises a plurality of candidate pictures;
determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures;
according to the first similarity value, carrying out duplicate removal processing on the candidate picture set;
determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second characteristic values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing;
and according to the second similarity value, performing deduplication again on the candidate pictures subjected to deduplication processing to generate a deduplication picture set.
2. The method of claim 1,
determining a first similarity value between any two candidate pictures in the multiple candidate pictures according to the first feature values of the multiple candidate pictures, including:
for any two candidate pictures in the plurality of candidate pictures: calculating a distance value between first characteristic values of the two candidate pictures; determining the distance value as a first similarity value between the two candidate pictures;
the performing, according to the first similarity value, deduplication processing on the candidate picture set includes:
if the first similarity value is smaller than a first threshold value, determining one of the two candidate pictures as a repeated picture;
deleting the duplicate picture from the set of candidate pictures.
3. The method of claim 1,
the determining a second similarity value between any two candidate pictures in the processed candidate picture set according to second feature values of a plurality of candidate pictures in the processed candidate picture set includes:
aiming at any two candidate pictures in the processed candidate picture set: calculating a distance value between second characteristic values of the two candidate pictures; determining the distance value as a second similarity value between the two candidate pictures;
and performing the deduplication again on the candidate picture after the deduplication processing according to the second similarity value, including:
if the second similarity value is smaller than a second threshold value, determining one of the two candidate pictures as a similar picture;
receiving operation information of a user for the similar picture;
and if the operation information indicates that the similar pictures are deleted, deleting the similar pictures from the candidate picture set after the duplicate removal processing.
4. The method of claim 3,
the determining one of the two candidate pictures as a similar picture comprises:
determining quality parameters of the two candidate pictures;
determining a candidate picture with a lower quality parameter as the similar picture.
5. The method of claim 1,
the acquiring of the candidate picture set comprises:
acquiring at least one candidate video;
for each of the candidate videos: extracting a plurality of video frames from the candidate video;
and combining the plurality of video frames to generate the candidate picture set.
6. The method of claim 1, wherein the de-duplicated picture set comprises a plurality of de-duplicated pictures, further comprising:
acquiring an existing picture set; wherein the existing picture set comprises a plurality of existing pictures;
determining a third similarity value between the existing picture and the duplication-removing picture according to the first characteristic value of the existing picture and the first characteristic value of the duplication-removing picture;
deleting the pictures similar to the pictures in the existing picture set in the duplication removing picture set according to the third similarity value;
determining a fourth similarity value between the existing picture and the duplicate removal picture according to the second characteristic value of the existing picture and the second characteristic value of the duplicate removal picture in the duplicate removal picture set after deletion processing;
according to the fourth similarity value, carrying out duplicate removal processing on the processed duplicate removal picture set;
and expanding a plurality of pictures in the duplication-removing picture set to the existing picture set.
7. The method of claim 6,
deleting the pictures similar to the pictures in the existing picture set in the duplication removing picture set according to the third similarity value, wherein the deleting comprises:
and if the third similarity value is smaller than a third threshold value, deleting the pictures similar to the pictures in the existing picture set in the duplication elimination picture set.
8. The method of claim 6,
and according to the fourth similarity value, performing deduplication processing on the deduplicated picture set after deletion processing, including:
if the fourth similarity value is smaller than a fourth threshold value, receiving operation information of a user for the duplicate removal picture;
and if the operation information indicates that the duplicate removal pictures are deleted, deleting the duplicate removal pictures from the candidate picture set after deletion processing.
9. A method for labeling a picture set is characterized by comprising the following steps:
acquiring a first candidate picture set;
determining similar pictures of the first candidate picture set and the existing picture set according to the first characteristic values of the pictures in the first candidate picture set and the existing picture set, and deleting the similar pictures from the first candidate picture set to obtain a second candidate picture set;
according to the second candidate picture set and the second characteristic value of the pictures in the existing picture set, determining similar pictures of the second candidate picture set and the existing picture set, and deleting the similar pictures from the second candidate picture set to obtain a picture set to be labeled;
and labeling the pictures in the picture set to be labeled.
10. An album operating apparatus comprising:
the image set acquisition module is used for acquiring a candidate image set; wherein the candidate picture set comprises a plurality of candidate pictures;
the first similarity value determining module is used for determining a first similarity value between any two candidate pictures in the candidate pictures according to the first characteristic values of the candidate pictures;
the first duplicate removal module is used for carrying out duplicate removal processing on the candidate picture set according to the first similarity value;
the second similarity value determining module is used for determining a second similarity value between any two candidate pictures in the candidate picture set subjected to the deduplication processing according to second feature values of a plurality of candidate pictures in the candidate picture set subjected to the deduplication processing;
and the second duplicate removal module is used for carrying out duplicate removal processing on the candidate pictures subjected to the duplicate removal processing again according to the second similarity value to generate a duplicate removal picture set.
11. A picture set labeling device, comprising:
the image set acquisition module is used for acquiring a first candidate image set;
the first deleting module is used for determining similar pictures of the first candidate picture set and the existing picture set according to first characteristic values of pictures in the first candidate picture set and the existing picture set, and deleting the similar pictures from the first candidate picture set to obtain a second candidate picture set;
the second deleting module is used for determining similar pictures of the second candidate picture set and the existing picture set according to second characteristic values of pictures in the second candidate picture set and the existing picture set, and deleting the similar pictures from the second candidate picture set to obtain a picture set to be labeled;
and the picture marking module is used for marking the pictures in the picture set to be marked.
12. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202011241796.1A 2020-11-09 2020-11-09 Picture set operation method, picture set labeling method and device Pending CN113780042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011241796.1A CN113780042A (en) 2020-11-09 2020-11-09 Picture set operation method, picture set labeling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011241796.1A CN113780042A (en) 2020-11-09 2020-11-09 Picture set operation method, picture set labeling method and device

Publications (1)

Publication Number Publication Date
CN113780042A true CN113780042A (en) 2021-12-10

Family

ID=78835142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011241796.1A Pending CN113780042A (en) 2020-11-09 2020-11-09 Picture set operation method, picture set labeling method and device

Country Status (1)

Country Link
CN (1) CN113780042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372933A (en) * 2023-12-06 2024-01-09 南京智绘星图信息科技有限公司 Image redundancy removing method and device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649759A (en) * 2016-12-26 2017-05-10 北京珠穆朗玛移动通信有限公司 Picture processing method and mobile terminal
CN107480203A (en) * 2017-07-23 2017-12-15 北京中科火眼科技有限公司 It is a kind of to be directed to identical and similar pictures duplicate removal view data cleaning method
CN109145127A (en) * 2018-06-20 2019-01-04 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110032914A (en) * 2018-01-12 2019-07-19 北京京东尚科信息技术有限公司 A kind of method and apparatus marking picture
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN111061890A (en) * 2019-12-09 2020-04-24 腾讯云计算(北京)有限责任公司 Method for verifying labeling information, method and device for determining category
CN111325245A (en) * 2020-02-05 2020-06-23 腾讯科技(深圳)有限公司 Duplicate image recognition method and device, electronic equipment and computer-readable storage medium
CN111382305A (en) * 2018-12-29 2020-07-07 广州市百果园信息技术有限公司 Video duplicate removal method and device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649759A (en) * 2016-12-26 2017-05-10 北京珠穆朗玛移动通信有限公司 Picture processing method and mobile terminal
CN107480203A (en) * 2017-07-23 2017-12-15 北京中科火眼科技有限公司 It is a kind of to be directed to identical and similar pictures duplicate removal view data cleaning method
CN110032914A (en) * 2018-01-12 2019-07-19 北京京东尚科信息技术有限公司 A kind of method and apparatus marking picture
CN109145127A (en) * 2018-06-20 2019-01-04 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN111382305A (en) * 2018-12-29 2020-07-07 广州市百果园信息技术有限公司 Video duplicate removal method and device, computer equipment and storage medium
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN111061890A (en) * 2019-12-09 2020-04-24 腾讯云计算(北京)有限责任公司 Method for verifying labeling information, method and device for determining category
CN111325245A (en) * 2020-02-05 2020-06-23 腾讯科技(深圳)有限公司 Duplicate image recognition method and device, electronic equipment and computer-readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372933A (en) * 2023-12-06 2024-01-09 南京智绘星图信息科技有限公司 Image redundancy removing method and device and electronic equipment
CN117372933B (en) * 2023-12-06 2024-02-20 南京智绘星图信息科技有限公司 Image redundancy removing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US20210374386A1 (en) Entity recognition from an image
CN113382279B (en) Live broadcast recommendation method, device, equipment, storage medium and computer program product
US9588952B2 (en) Collaboratively reconstituting tables
CN107203574A (en) Data management and the polymerization of data analysis
CN110633594A (en) Target detection method and device
US9424269B1 (en) Systems and methods for deduplicating archive objects
WO2020034116A1 (en) Verification method for ai calculation results, and related products
KR20120051952A (en) Method and apparatus for image searching using feature point
CN111538903A (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
CN111078849A (en) Method and apparatus for outputting information
CN117971698A (en) Test case generation method and device, electronic equipment and storage medium
CN113780042A (en) Picture set operation method, picture set labeling method and device
CN111444364B (en) Image detection method and device
WO2024082525A1 (en) File snapshot method and system, electronic device, and storage medium
CN114417102B (en) Text de-duplication method and device and electronic equipment
JP2020525949A (en) Media search method and device
CN105260423A (en) Duplicate removal method and apparatus for electronic cards
CN112487943B (en) Key frame de-duplication method and device and electronic equipment
CN110413603B (en) Method and device for determining repeated data, electronic equipment and computer storage medium
CN111698330B (en) Data recovery method and device of storage cluster and server
CN111666449B (en) Video retrieval method, apparatus, electronic device, and computer-readable medium
CN113742450A (en) User data grade label falling method and device, electronic equipment and storage medium
CN111159996B (en) Short text set similarity comparison method and system based on text fingerprint algorithm
CN109408290B (en) Fragmented file recovery method and device based on InoDB and storage medium
CN105512230A (en) Data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination