CN110414625B - Method and device for determining similar data, electronic equipment and storage medium - Google Patents

Method and device for determining similar data, electronic equipment and storage medium Download PDF

Info

Publication number
CN110414625B
CN110414625B CN201910722643.XA CN201910722643A CN110414625B CN 110414625 B CN110414625 B CN 110414625B CN 201910722643 A CN201910722643 A CN 201910722643A CN 110414625 B CN110414625 B CN 110414625B
Authority
CN
China
Prior art keywords
data
information
similarity
meta information
meta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910722643.XA
Other languages
Chinese (zh)
Other versions
CN110414625A (en
Inventor
李�根
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910722643.XA priority Critical patent/CN110414625B/en
Publication of CN110414625A publication Critical patent/CN110414625A/en
Application granted granted Critical
Publication of CN110414625B publication Critical patent/CN110414625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The embodiment of the disclosure provides a method, a device, an electronic device and a storage medium for determining similar data, wherein the method comprises the following steps: acquiring first data and second data to be processed, wherein the first data and the second data are both images or videos; extracting a first content feature of the first data and a second content feature of the second data; determining a first similarity of the first data and the second data according to the first content characteristic and the second content characteristic; acquiring first meta information of first data and second meta information of second data; and determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information. In the embodiment of the disclosure, after the first similarity is determined, it may be finally determined whether the first data and the second data are similar by combining the first meta information and the second meta information, and obviously, compared with a mode of determining whether the first data and the second data are similar only based on the first similarity and a preset threshold, the method is more accurate, and the accuracy of the determination may be improved.

Description

Method and device for determining similar data, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for determining similar data, an electronic device, and a storage medium.
Background
In the prior art, in the process of removing duplicates of images or videos, the similarity of two images or videos is usually determined based on the content of the images or videos, and then the determined similarity is compared with a preset threshold to judge whether the two images or videos are similar, but based on the mode in the prior art, misjudgment often occurs easily, for example, two dissimilar videos both use the same special effect in the shooting process. Therefore, the method for judging whether two pictures or videos are similar in the prior art is not accurate enough.
Disclosure of Invention
The purpose of this disclosure is to solve at least one of the above technical drawbacks and to improve the user experience. The technical scheme adopted by the disclosure is as follows:
in a first aspect, an embodiment of the present disclosure provides a method for determining similar data, where the method includes:
acquiring first data and second data to be processed, wherein the first data and the second data are both images or videos;
extracting a first content feature of the first data and a second content feature of the second data;
determining a first similarity of the first data and the second data according to the first content characteristic and the second content characteristic;
acquiring first meta information of first data and second meta information of second data;
and determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In an optional embodiment of the first aspect, the first meta information and the second meta information include at least one of the following information:
special effects, tag information, data association information, and history information;
wherein the association information comprises at least one of:
audio, comment information, manipulated information.
In an optional embodiment of the first aspect, if the first meta-information and the second meta-information include the same type of associated information, determining whether the first data and the second data are similar according to the first similarity, the first meta-information, and the second meta-information includes:
determining a second similarity of the first data and the second data according to the same type of associated information;
and determining whether the first data and the second data are similar according to the first similarity, the second similarity and the first meta information and the second meta information except the associated information.
In an optional embodiment of the first aspect, determining whether the first data and the second data are similar according to the first similarity, the first meta information, and the second meta information includes:
respectively generating a first similarity, each first element information and a feature vector of each second element information;
and inputting the generated feature vectors into a classification model, and determining whether the first data and the second data are similar based on the output of the classification model.
In an optional embodiment of the first aspect, the obtaining the first meta information of the first data and the second meta information of the second data includes:
and when the first similarity is greater than the similarity threshold value, acquiring first meta information of the first data and second meta information of the second data.
In an optional embodiment of the first aspect, the method further comprises:
determining a first judgment result whether the first data and the second data are similar according to the first similarity and a similarity threshold;
after determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information, the method further comprises:
if the first judgment result is different from the second judgment result, adjusting a similarity threshold according to the first judgment result and the second judgment result;
and the second judgment result is a result of determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In an embodiment of the first aspect, adjusting the similarity threshold according to the first determination result and the second determination result includes:
and if the first judgment result is that the first data is similar to the second data and the first and second judgment results are that the first data is not similar to the second data, the similarity threshold is increased.
In a second aspect, an embodiment of the present disclosure provides an apparatus for determining similar data, where the apparatus includes:
the data acquisition module is used for acquiring first data and second data to be processed, wherein the first data and the second data are both images or videos;
the content feature extraction module is used for extracting a first content feature of the first data and a second content feature of the second data;
the first similarity determining module is used for determining the first similarity of the first data and the second data according to the first content characteristics and the second content characteristics;
the meta-information acquisition module is used for acquiring first meta-information of the first data and second meta-information of the second data;
and the processing module is used for determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In an alternative embodiment of the second aspect, the first meta-information and the second meta-information comprise at least one of the following information:
special effects, tag information, data association information and historical information;
wherein the association information comprises at least one of:
audio, comment information, manipulated information.
In an embodiment of the second aspect, if the first meta-information and the second meta-information include the same type of associated information, the processing module is specifically configured to, when determining whether the first data and the second data are similar according to the first similarity, the first meta-information, and the second meta-information:
determining a second similarity of the first data and the second data according to the same type of associated information;
and determining whether the first data and the second data are similar according to the first similarity, the second similarity and the first meta information and the second meta information except the associated information.
In an embodiment of the second aspect, when determining whether the first data and the second data are similar according to the first similarity, the first meta information, and the second meta information, the processing module is specifically configured to:
respectively generating a first similarity, each first meta information and a feature vector of each second meta information;
and inputting the generated feature vectors into a classification model, and determining whether the first data and the second data are similar based on the output of the classification model.
In an embodiment of the second aspect, when acquiring the first meta information of the first data and the second meta information of the second data, the meta information acquiring module is specifically configured to:
and when the first similarity is greater than the similarity threshold value, acquiring first meta information of the first data and second meta information of the second data.
In an optional embodiment of the second aspect, the processing module is further configured to:
determining a first judgment result whether the first data and the second data are similar according to the first similarity and a similarity threshold;
the apparatus further includes a threshold adjustment module specifically configured to:
after whether the first data and the second data are similar is determined according to the first similarity, the first meta information and the second meta information, if the first judgment result is different from the second judgment result, the similarity threshold value is adjusted according to the first judgment result and the second judgment result;
and the second judgment result is a result of determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In an embodiment of the second aspect, when the threshold adjustment module adjusts the similarity threshold according to the first determination result and the second determination result, the threshold adjustment module is specifically configured to:
and if the first judgment result is that the first data is similar to the second data and the first and second judgment results are that the first data is not similar to the second data, the similarity threshold is increased.
In a third aspect, the present disclosure provides an electronic device comprising a processor and a memory;
a memory for storing computer operating instructions;
a processor for performing the method as shown in any of the first aspect of the embodiments of the present disclosure by invoking computer operational instructions.
In a fourth aspect, the present disclosure provides a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a method as shown in any one of the first aspect of embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure has the following beneficial effects:
in the embodiment of the present disclosure, after the first similarity between the first data and the second data is determined according to the first content feature and the second content feature, it may further be combined with the first meta information of the first data and the second meta information of the second data to finally determine whether the first data and the second data are similar.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings that are required to be used in the description of the embodiments of the present disclosure will be briefly described below.
FIG. 1 is a schematic flow chart diagram of a method for determining similar data in an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of an apparatus for determining similar data according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the disclosure.
Detailed Description
Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining technical senses of the present disclosure and are not construed as limiting the present disclosure.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.
Embodiments of the present disclosure provide a method of determining similar data, which may include, as shown in fig. 1:
step S110, obtaining first data and second data to be processed, where the first data and the second data are both images or both videos.
That is, the data to be processed is the same type of data, and in the embodiment of the present disclosure, the data to be processed may be a video or an image, that is, when the first data is a video, the second data is also a video, and when the first data is an image, the second data is also an image.
Step S120, a first content feature of the first data and a second content feature of the second data are extracted.
Step S130, determining a first similarity between the first data and the second data according to the first content feature and the second content feature.
Wherein, the content characteristic is associated with the image, when the data is a video, the content characteristic of the video is associated with the video frame image in the video. Such as content characteristics, may include the type of object appearing in the image, text in the image, and so forth. The content features of the image can be extracted by utilizing a pre-trained neural network model for extracting the content features of the image based on the image to obtain content feature vectors of the image, and the neural network model can be a bidirectional long-term and short-term memory network.
In practical application, the content feature of the first data and the content feature of the second data may be vectorized to obtain a content feature vector of the first data and a content feature vector of the second data. Each component of the content feature vector of the first data corresponds to one item of the content feature of the first data, each component of the content feature vector of the second data corresponds to one item of the content feature of the second data, and the similarity between the content feature vector of the first data and the content feature vector of the second data is calculated, namely the first similarity between the first data and the second data.
In step S140, first meta information of the first data and second meta information of the second data are obtained.
In an optional embodiment of the disclosure, the first and second meta-information comprise at least one of:
special effects, tag information, data association information and historical information;
wherein the association information comprises at least one of:
audio, comment information, manipulated information.
Wherein the special effect refers to whether the first data and the second data have a special effect, and when the special effect exists, the special effect is a specific reason. For example, the first data and the second data are both videos, and the video is added with a 'rabbit sticker' in the recording process, and at the moment, the meta information of the special effect of the video is the 'rabbit sticker'. The history information refers to information on whether an author of the first data or an author of the second data carries other data from a certain time node to a current time node, for example, the first data and the second data are both videos, the history information is information on whether the author who shoots the videos carries other videos from the certain time node to the current time node, and if the other information is carried, an identifier of the carried video is identified. tag information is used for identifying a tag of information of the data, that is, some information of the data can be known based on the tag information, for example, when the first data and the second data are videos about a game, the tag information of the first data and the second data can be a tag for identifying the game and a tag for identifying which game is specific; the association information of the data refers to information for further explaining the data, such as audio, comment information, operated information, and the like, which may be data, where the operated information refers to information that the data is operated from a certain time node to a current time node, for example, when the data is a video, the operated information of the video may be used to record whether the video is edited into another video from the certain time node to the current time node, if the video is edited into another video, the time of editing into another video, an identification of the edited video, and the like.
And step S150, determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
That is, after acquiring the first meta information of the first data and the second meta information of the second data, the acquired first meta information of the first data and the second meta information of the second data may be combined with the determined first similarity, and finally it may be determined whether the first data and the second data are similar.
In the embodiment of the disclosure, after the first similarity between the first data and the second data is determined according to the first content feature and the second content feature, the first meta information of the first data and the second meta information of the second data may be further combined to finally determine whether the first data and the second data are similar, obviously, compared with the prior art that whether the first data and the second data are similar is determined based on only the determined first similarity and a preset threshold, the method is more accurate, and the accuracy of the determination can be improved.
In an optional embodiment of the present disclosure, if the first meta-information and the second meta-information include the same type of associated information, determining whether the first data and the second data are similar according to the first similarity, the first meta-information, and the second meta-information, includes:
determining a second similarity of the first data and the second data according to the same type of associated information;
and determining whether the first data and the second data are similar according to the first similarity, the second similarity and the first meta information and the second meta information except the associated information.
The first meta-information and the second meta-information include the same type of associated information, that is, the associated information in the first meta-information and the second meta-information both include the same type of associated information, for example, the associated information in the first meta-information includes audio, the associated information in the second meta-information also includes audio, and at this time, the first meta-information and the second meta-information include the same type of associated information, and the same type of associated information is audio.
Correspondingly, after the first meta information and the second meta information are determined to both include the same type of associated information, the second similarity of the first data and the second data can be determined according to the same type of associated information in the first meta information and the second meta information.
In practical application, the similarity of each type of associated information in the first meta information and the second meta information may be determined, and then the second similarity of the first data and the second data may be obtained based on the similarity of each type of associated information. For example, the associated information in the first meta information includes audio and comment information, and the associated information in the second meta information also includes audio and comment information, at this time, the similarity between the audio in the first meta information and the audio in the second meta information and the similarity between the comment information in the first meta information and the comment information in the second meta information may be determined, and then the second similarity between the first data and the second data may be determined based on the similarity between the audio and the similarity between the comment information. Of course, in practical applications, a total similarity may also be directly calculated based on each same type of associated information in the first meta information and the second meta information as the second similarity of the first data and the second data.
Accordingly, after determining the second similarity of the first data and the second data, the second similarity may be combined with the first similarity and the first meta information and the second meta information, excluding the association information, to further determine whether the first data and the second data are similar.
In an alternative embodiment of the present disclosure, determining whether the first data and the second data are similar according to the first similarity, the first meta information, and the second meta information includes:
respectively generating a first similarity, each first meta information and a feature vector of each second meta information;
and inputting the generated feature vectors into a classification model, and determining whether the first data and the second data are similar based on the output of the classification model.
In practical application, feature vectors of the first similarity, the first meta information, and the second meta information may be generated, that is, the first similarity, the first meta information, and the second meta information may be converted into a representation manner of the feature vectors, and then the generated feature vectors are input into the classification model, and the classification model may determine whether the first data and the second data are similar according to the input feature vectors, for example, the first data and the second data are similar, and at this time, information used for representing that the first data and the second data are similar may be output.
In practical applications, it is needless to say that, instead of generating the feature vectors of the first similarity, each of the first meta information, and each of the second meta information in advance, the first similarity, each of the first meta information, and each of the second meta information are directly input to the classification model, the classification model generates the feature vectors of the first similarity, each of the first meta information, and each of the second meta information, and whether the first data and the second data are similar may be determined based on the generated feature vectors.
The classification model is a classification model of what type, and embodiments of the present disclosure are not limited in particular, and may be a decision tree classification model, a multiple bernoulli model, or the like.
In an optional embodiment of the present disclosure, the obtaining the first meta information of the first data and the second meta information of the second data includes:
and when the first similarity is greater than the similarity threshold value, acquiring first meta information of the first data and second meta information of the second data.
In practical application, after determining the first similarity of the first data and the second data according to the first content feature and the second content feature, it may be determined whether the first similarity is greater than a similarity threshold (i.e., it is determined whether the two data to be processed are similar by a threshold), and if the first similarity is greater than the similarity threshold, it is determined that the first data and the second data are similar, so as to make the obtained result more accurate and reduce the situation of misjudgment, the first meta information of the first data and the second meta information of the second data may be obtained, and then combined with the determined first similarity, the first meta information, and the second meta information, to further determine whether the first data and the second data are similar.
In an optional embodiment of the disclosure, the method further comprises:
determining a first judgment result whether the first data and the second data are similar according to the first similarity and a similarity threshold;
after determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information, the method further comprises the following steps:
if the first judgment result is different from the second judgment result, adjusting a similarity threshold according to the first judgment result and the second judgment result;
and the second judgment result is a result of determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In practical application, if the obtained first judgment result is different from the second processing result which is determined whether the first data and the second data are similar based on the first similarity, the first meta information and the second meta information, based on the determined first similarity and the preset threshold, the value of the current similarity threshold is not particularly accurate, and at this time, the value of the similarity threshold can be adjusted based on the second processing result, so as to improve the accuracy of the judgment result.
In an optional embodiment of the present disclosure, adjusting the similarity threshold according to the first determination result and the second determination result includes:
if the first judgment result is that the first data is similar to the second data and the first judgment result is that the first data is dissimilar to the second data, the similarity threshold is increased.
In practical application, if the first data is similar to the second data based on the determined first determination result, but the second determination result is that the first data is not similar to the second data, it indicates that the value set by the current similarity threshold is smaller, and at this time, the value set by the current similarity threshold may be appropriately increased.
In an example, if the first similarity of the first data and the second data is 0.7, and the set value of the current similarity threshold is 0.6, where the first similarity is greater than the similarity threshold, that is, the first data and the second data are similar, but based on the first similarity and the obtained first meta information and second meta information, the second determination processing result is determined that the first data and the second data are not similar, that is, the set value of the current similarity threshold is smaller, and at this time, the set value of the current similarity threshold may be appropriately increased, for example, the set value of the current similarity threshold may be set to 0.8.
Of course, in practical applications, it may also happen that the obtained second determination result is that the first data and the second data are not similar, but the obtained second determination result is that the first data and the second data are similar, and at this time, it is described that the value set by the current similarity threshold is larger, and the value set by the current similarity threshold may be appropriately reduced.
In an example, if the first similarity of the first data and the second data is 0.6, and the value set by the current similarity threshold is 0.7, where the first similarity is smaller than the similarity threshold, that is, the first data and the second data are not similar, but the second determination processing result determined based on the first similarity and the acquired first meta information and second meta information is that the first data and the second data are similar, the value set by the current similarity threshold is larger, and the value set by the current similarity threshold may be appropriately reduced at this time, for example, the value set by the current similarity threshold may be set to 0.5.
In an optional embodiment of the present disclosure, adjusting the similarity threshold according to the first determination result and the second determination result further includes:
determining the meta information of the first data and the second data which are of the same type and have similarity larger than a set value;
and adjusting the similarity threshold value based on the determined meta-information that the first data and the second data are of the same type and the similarity is greater than the set value.
The meta information of the same type and having a similarity greater than the set value means that the first data and the second data both have meta information of the same type, and the similarity between the existing meta information of the same type is greater than the set value (i.e., is similar). For example, the first data and the second data are two videos to be processed, meta information of a type of special effect exists in the two videos to be processed, the meta information of the type of the special effect is stickers related to rabbits, the similarity of the stickers of the rabbits in the first data and the second data is greater than a set value, and the meta information of the type of the special effect is the meta information of which the similarity is greater than the set value and is of the same type as that of the first data and the second data.
Further, the similarity threshold may be adjusted based on the determined meta-information that the first data and the second data are of the same type and the similarity is greater than the set value. The specific implementation manner of adjusting the similarity threshold value based on the determined meta-information that the first data and the second data are of the same type and the similarity is greater than the set value may be preconfigured according to actual needs, and the embodiment of the present disclosure is not limited.
As an alternative embodiment, a mapping relationship between different quantities and adjustment steps may be preconfigured, then the adjustment steps corresponding to the quantities of the meta information of which the similarity is greater than the set value and of which the types of the first data and the second data are the same are determined, and the value of the currently set similarity threshold is adjusted based on the determined adjustment steps.
In one example, if the mapping relationship between the different numbers and the adjustment step sizes is: the adjustment step length corresponding to the number 1 is 0.2, the adjustment step length corresponding to the number 2 is 0.3, and the currently set value of the similarity threshold is 0.6; and the determined first data and the second data are of the same type, and the number of the meta information with the similarity larger than the set value is 2, at this time, the adjustment step length corresponding to the number 2 can be determined to be 0.3 based on the mapping relation between different numbers and the adjustment step length, and then the currently set value of the similarity threshold value is adjusted to be 0.8.
As another alternative embodiment, a weight of each piece of meta information and an adjustment step size corresponding to each piece of meta information may be set, and when it is determined that the first data and the second data are of the same type and the number of pieces of meta information having a similarity greater than the set value is greater than 2, reference meta information is determined based on the determined weight of each piece of meta information (the value of the currently set similarity threshold is adjusted based on which adjustment step size corresponds to which piece of meta information), and then the value of the currently set similarity threshold is adjusted based on the adjustment step size corresponding to the reference meta information. Of course, if it is determined that the first data and the second data are of the same type and the number of the meta information having the similarity greater than the set value is only 1, the value of the currently set similarity threshold may be adjusted directly based on the adjustment step length corresponding to the meta information having the similarity greater than the set value and having the same type of the first data and the second data.
In an example, if the currently set value of the similarity threshold is 0.6, the weight of the meta information of the special effect type is 0.6, and the corresponding adjustment step length is 0.3; the weight of the meta-information of the comment information type is 0.3, and the corresponding adjustment step length is 0.2; the two first data and the second data are of the same type, and the meta information with the similarity greater than the set value is the meta information of the special effect type and the meta information of the comment information type (namely, the quantity is 2), and because the weight of the meta information of the special effect type is greater than that of the meta information of the comment information type, the currently set value of the similarity threshold can be adjusted based on the adjustment step length corresponding to the meta information of the special effect type, and the currently set value of the similarity threshold can be adjusted to 0.9.
Based on the same principle as the method shown in fig. 1, an embodiment of the present disclosure further provides an apparatus 30 for determining similar data, as shown in fig. 2, the apparatus 30 for determining similar data may include a data obtaining module 310, a content feature extracting module 320, a first similarity determining module 330, a meta information obtaining module 340, and a processing module 350, wherein:
the data acquisition module is used for acquiring first data and second data to be processed, wherein the first data and the second data are both images or videos;
the content feature extraction module is used for extracting a first content feature of the first data and a second content feature of the second data;
the first similarity determining module is used for determining the first similarity of the first data and the second data according to the first content characteristics and the second content characteristics;
the meta information acquisition module is used for acquiring first meta information of the first data and second meta information of the second data;
and the processing module is used for determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In an optional embodiment of the disclosure, the first meta information and the second meta information include at least one of the following information:
special effects, tag information, data association information and historical information;
wherein the association information comprises at least one of:
audio, comment information, manipulated information.
In an optional embodiment of the present disclosure, if the first meta-information and the second meta-information include the same type of associated information, the processing module is specifically configured to, when determining whether the first data and the second data are similar according to the first similarity, the first meta-information, and the second meta-information:
determining a second similarity of the first data and the second data according to the same type of associated information;
and determining whether the first data and the second data are similar according to the first similarity, the second similarity and the first meta information and the second meta information except the associated information.
In an optional embodiment of the present disclosure, when determining whether the first data and the second data are similar according to the first similarity, the first meta information, and the second meta information, the processing module is specifically configured to:
respectively generating a first similarity, each first meta information and a feature vector of each second meta information;
and inputting the generated feature vectors into a classification model, and determining whether the first data and the second data are similar based on the output of the classification model.
In an optional embodiment of the present disclosure, when acquiring the first meta information of the first data and the second meta information of the second data, the meta information acquiring module is specifically configured to:
and when the first similarity is greater than the similarity threshold value, acquiring first meta information of the first data and second meta information of the second data.
In an optional embodiment of the present disclosure, the processing module is further configured to:
determining a first judgment result whether the first data and the second data are similar according to the first similarity and a similarity threshold;
the apparatus further includes a threshold adjustment module specifically configured to:
after determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information, if the first judgment result is different from the second judgment result, adjusting a similarity threshold according to the first judgment result and the second judgment result;
and the second judgment result is a result of determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
In an optional embodiment of the present disclosure, when the threshold adjustment module adjusts the similarity threshold according to the first determination result and the second determination result, the threshold adjustment module is specifically configured to:
and if the first judgment result is that the first data is similar to the second data and the first and second judgment results are that the first data is not similar to the second data, the similarity threshold is increased.
The device for determining similar data according to the embodiments of the present disclosure may perform a method for determining similar data provided by the embodiments of the present disclosure, and the implementation principle is similar, the actions performed by the modules in the device for determining similar data according to the embodiments of the present disclosure correspond to the steps in the method for determining similar data according to the embodiments of the present disclosure, and for the detailed functional description of the modules in the device for determining similar data, reference may be specifically made to the description in the corresponding method for determining similar data shown in the foregoing, and no further description is provided here.
Based on the same principle as the method shown in the embodiments of the present disclosure, embodiments of the present disclosure also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and the processor is used for executing the method shown in the embodiment by calling the computer operation instruction.
Based on the same principle as the method shown in the embodiment of the present disclosure, a computer-readable storage medium is further provided in the embodiment of the present disclosure, where the computer-readable storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the method shown in the embodiment, which is not described herein again.
In the embodiment of the present disclosure, referring to fig. 3, a schematic structural diagram of an electronic device 500 suitable for implementing an embodiment of the present disclosure is shown, where the electronic device 500 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc., among others. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 3, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other combinations of features described above or equivalents thereof without departing from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (8)

1. A method of determining similar data, comprising:
acquiring first data and second data to be processed, wherein the first data and the second data are both images or videos;
extracting a first content feature of the first data and a second content feature of the second data;
determining a first similarity of the first data and the second data according to the first content feature and the second content feature;
acquiring first meta information of the first data and second meta information of the second data, including: when the first similarity is larger than a similarity threshold value, acquiring first meta information of the first data and second meta information of the second data;
determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information;
wherein, if the first meta-information and the second meta-information include the same type of associated information, determining whether the first data and the second data are similar according to the first similarity, the first meta-information, and the second meta-information includes:
determining a second similarity of the first data and the second data according to the associated information of the same type;
determining whether the first data and the second data are similar according to the first similarity, the second similarity, and the first meta information and the second meta information except the associated information;
wherein the association information comprises at least one of: audio, comment information, manipulated information.
2. The method of claim 1, wherein the first and second meta-information comprises at least one of:
special effects, tag information, data association information and history information.
3. The method of claim 1, wherein the determining whether the first data and the second data are similar according to the first similarity, the first meta information, and the second meta information comprises:
respectively generating the first similarity, each first meta information and the feature vector of each second meta information;
inputting the generated feature vectors into a classification model, and determining whether the first data and the second data are similar based on the output of the classification model.
4. The method of claim 1, further comprising:
determining a first judgment result whether the first data and the second data are similar according to the first similarity and a similarity threshold;
after determining whether the first data and the second data are similar according to the first similarity, the first meta information, and the second meta information, the method further includes:
if the first judgment result is different from the second judgment result, adjusting the similarity threshold according to the first judgment result and the second judgment result;
and determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information.
5. The method of claim 4, wherein the adjusting the similarity threshold according to the first determination result and the second determination result comprises:
if the first judgment result is that the first data is similar to the second data and the second judgment result is that the first data is not similar to the second data, the similarity threshold is increased.
6. An apparatus for determining similar data, comprising:
the data acquisition module is used for acquiring first data and second data to be processed, wherein the first data and the second data are both images or videos;
the content feature extraction module is used for extracting a first content feature of the first data and a second content feature of the second data;
a first similarity determining module, configured to determine a first similarity between the first data and the second data according to the first content feature and the second content feature;
a meta information obtaining module, configured to obtain first meta information of the first data and second meta information of the second data, including: when the first similarity is larger than a similarity threshold value, acquiring first meta information of the first data and second meta information of the second data;
the processing module is used for determining whether the first data and the second data are similar according to the first similarity, the first meta information and the second meta information;
wherein, if the first meta information and the second meta information include the same type of associated information, the processing module is specifically configured to: determining a second similarity of the first data and the second data according to the same type of associated information; determining whether the first data and the second data are similar according to the first similarity, the second similarity, and the first meta information and the second meta information except the associated information; wherein the association information comprises at least one of: audio, comment information, manipulated information.
7. An electronic device, comprising:
a processor and a memory;
the memory for storing a computer program;
the processor configured to execute the method of any one of claims 1 to 5 by calling the computer program.
8. A computer-readable storage medium, characterized in that it stores a computer program which is loaded and executed by a processor to implement the method of any one of claims 1 to 5.
CN201910722643.XA 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium Active CN110414625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910722643.XA CN110414625B (en) 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910722643.XA CN110414625B (en) 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110414625A CN110414625A (en) 2019-11-05
CN110414625B true CN110414625B (en) 2022-11-08

Family

ID=68366154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910722643.XA Active CN110414625B (en) 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110414625B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851653A (en) * 2019-11-08 2020-02-28 上海摩象网络科技有限公司 Method and device for shooting material mark and electronic equipment
CN113065619A (en) * 2021-06-03 2021-07-02 明品云(北京)数据科技有限公司 Data processing method, data processing device, computer readable storage medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004265120A (en) * 2003-02-28 2004-09-24 Sony Corp Image processor and processing method, storage medium, and program
JP2005208686A (en) * 2004-01-19 2005-08-04 Nippon Telegr & Teleph Corp <Ntt> Weighted mata data management method, weighted mata data management apparatus, weighted mata data management program and recording medium with program recorded thereon
JP2015153021A (en) * 2014-02-12 2015-08-24 日本放送協会 Link information generation device and program

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9195640B1 (en) * 2009-01-12 2015-11-24 Sri International Method and system for finding content having a desired similarity
JP2011217197A (en) * 2010-03-31 2011-10-27 Sony Corp Electronic apparatus, reproduction control system, reproduction control method, and program thereof
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
US20160196478A1 (en) * 2013-09-03 2016-07-07 Samsung Electronics Co., Ltd. Image processing method and device
CA2885835A1 (en) * 2014-04-04 2015-10-04 Image Searcher, Inc. Image processing server
US9881084B1 (en) * 2014-06-24 2018-01-30 A9.Com, Inc. Image match based video search
CN107644364A (en) * 2017-09-18 2018-01-30 北京京东尚科信息技术有限公司 Object filter method and system
CN109857908B (en) * 2019-03-04 2021-04-09 北京字节跳动网络技术有限公司 Method and apparatus for matching videos

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004265120A (en) * 2003-02-28 2004-09-24 Sony Corp Image processor and processing method, storage medium, and program
JP2005208686A (en) * 2004-01-19 2005-08-04 Nippon Telegr & Teleph Corp <Ntt> Weighted mata data management method, weighted mata data management apparatus, weighted mata data management program and recording medium with program recorded thereon
JP2015153021A (en) * 2014-02-12 2015-08-24 日本放送協会 Link information generation device and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Quality assurence in large collections of video sequences》;L.Polok et al.;《2015 IEEE international conference on image processing(ICIP)》;20151210;3580-3584 *
《图像检索中跨模语义信息获取方法研究》;何宁;《中国优秀硕士学位论文全文数据库》;20131015;1-115 *

Also Published As

Publication number Publication date
CN110414625A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
US11023716B2 (en) Method and device for generating stickers
CN109919244B (en) Method and apparatus for generating a scene recognition model
CN110365973B (en) Video detection method and device, electronic equipment and computer readable storage medium
CN109961032B (en) Method and apparatus for generating classification model
CN109684589B (en) Client comment data processing method and device and computer storage medium
CN110059623B (en) Method and apparatus for generating information
CN109934142B (en) Method and apparatus for generating feature vectors of video
CN111598006A (en) Method and device for labeling objects
CN113395538B (en) Sound effect rendering method and device, computer readable medium and electronic equipment
CN110414625B (en) Method and device for determining similar data, electronic equipment and storage medium
CN111897950A (en) Method and apparatus for generating information
CN113033677A (en) Video classification method and device, electronic equipment and storage medium
CN110347875B (en) Video scene classification method and device, mobile terminal and storage medium
CN110008926B (en) Method and device for identifying age
CN111726675A (en) Object information display method and device, electronic equipment and computer storage medium
CN109816023B (en) Method and device for generating picture label model
CN112954453B (en) Video dubbing method and device, storage medium and electronic equipment
CN113628097A (en) Image special effect configuration method, image recognition method, image special effect configuration device and electronic equipment
CN110765304A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113033552B (en) Text recognition method and device and electronic equipment
CN110334763B (en) Model data file generation method, model data file generation device, model data file identification device, model data file generation apparatus, model data file identification apparatus, and model data file identification medium
CN114666622A (en) Special effect video determination method and device, electronic equipment and storage medium
CN115086700A (en) Push processing method, device, equipment and medium
CN110188833B (en) Method and apparatus for training a model
CN110189000B (en) Grading unification method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant