CN110414625A - Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data - Google Patents

Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data Download PDF

Info

Publication number
CN110414625A
CN110414625A CN201910722643.XA CN201910722643A CN110414625A CN 110414625 A CN110414625 A CN 110414625A CN 201910722643 A CN201910722643 A CN 201910722643A CN 110414625 A CN110414625 A CN 110414625A
Authority
CN
China
Prior art keywords
data
metamessage
similarity
similar
judging result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910722643.XA
Other languages
Chinese (zh)
Other versions
CN110414625B (en
Inventor
李�根
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910722643.XA priority Critical patent/CN110414625B/en
Publication of CN110414625A publication Critical patent/CN110414625A/en
Application granted granted Critical
Publication of CN110414625B publication Critical patent/CN110414625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The embodiment of the present disclosure provides method, apparatus, electronic equipment and the storage medium of a kind of determining set of metadata of similar data, comprising: obtains the first data and the second data to be processed, wherein the first data and the second data are image or are video;Extract the first content feature of the first data and the second content characteristic of the second data;According to first content feature and the second content characteristic, the first similarity of the first data and the second data is determined;Obtain the first metamessage of the first data and the second metamessage of the second data;According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar.In the embodiments of the present disclosure, after determining the first similarity, it can be combined with the first metamessage and the second metamessage, it finally determines the first data and whether the second data is similar, obviously, compared to the first similarity is based only upon and preset threshold value determines whether more accurate for similar mode, the accuracy rate of judgement can be promoted.

Description

Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data
Technical field
This disclosure relates to technical field of image processing, specifically, this disclosure relates to a kind of method of determining set of metadata of similar data, Device, electronic equipment and storage medium.
Background technique
In the prior art, during the duplicate removal to image or video, it is typically based on the content of picture or video, determines two Then the similarity of a picture or video is compared to determining similarity and preset threshold to judge two pictures or video It is whether similar, still, it is based on mode in the prior art, erroneous judgement usually would tend to occur, for example, two dissimilar videos, But the two videos have used identical special efficacy in shooting process, when use scheme in the prior art judges two views Frequency it is whether similar, may since there are identical special efficacys in the two videos, and then judge the two videos be it is similar, make At error in judgement.As it can be seen that judging that whether similar two pictures or video mode be not accurate enough in the prior art.
Summary of the invention
The purpose of the disclosure is intended at least can solve above-mentioned one of technological deficiency, promotes the usage experience of user.This public affairs Open the technical solution adopted is as follows::
In a first aspect, the embodiment of the present disclosure provides a kind of method of determining set of metadata of similar data, this method comprises:
Obtain the first data and the second data to be processed, wherein the first data and the second data are image or For video;
Extract the first content feature of the first data and the second content characteristic of the second data;
According to first content feature and the second content characteristic, the first similarity of the first data and the second data is determined;
Obtain the first metamessage of the first data and the second metamessage of the second data;
According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar.
In first aspect optional embodiment, the first metamessage and the second metamessage include at least one in following information :
Special efficacy, tag (label) information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
In first aspect optional embodiment, if the first metamessage and the second metamessage include the association letter of same type Breath, according to the first similarity, the first metamessage and the second metamessage, determines whether the first data and the second data are similar, comprising:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information, Determine whether the first data and the second data are similar.
In first aspect optional embodiment, according to the first similarity, the first metamessage and the second metamessage, first is determined Whether data and the second data are similar, comprising:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model Whether two data are similar.
In first aspect optional embodiment, the first metamessage of the first data and the second metamessage of the second data are obtained Include:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second Metamessage.
In first aspect optional embodiment, this method further include:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined Fruit;
According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar Later, further includes:
If the first judging result and the second judging result difference are adjusted according to the first judging result and the second judging result Whole similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage Result whether similar with the second data.
In first aspect optional embodiment, according to the first judging result and the second judging result, similarity threshold is adjusted, Include:
If the first judging result be that the first data are similar with the second data and the one or two judging result be the first data and Second data are dissimilar, then tune up similarity threshold.
Second aspect, the embodiment of the present disclosure provide a kind of device of determining set of metadata of similar data, which includes:
Data acquisition module, for obtaining the first data and the second data to be processed, wherein the first data and the second number According to being image or be video;
Content Feature Extraction module, for extracting the first content feature of the first data and the second content spy of the second data Sign;
First similarity determining module, for according to first content feature and the second content characteristic, determine the first data and First similarity of the second data;
Metamessage obtains module, for obtaining the first metamessage of the first data and the second metamessage of the second data;
Processing module, for determining the first data and second according to the first similarity, the first metamessage and the second metamessage Whether data are similar.
In second aspect optional embodiment, the first metamessage and the second metamessage include at least one in following information :
Special efficacy, tag information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
In second aspect optional embodiment, if the first metamessage and the second metamessage include the association letter of same type Breath, processing module according to the first similarity, the first metamessage and the second metamessage, are determining whether are the first data and the second data When similar, it is specifically used for:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information, Determine whether the first data and the second data are similar.
In second aspect optional embodiment, processing module is according to the first similarity, the first metamessage and second yuan of letter Breath, when determining the first data and whether similar the second data, is specifically used for:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model Whether two data are similar.
In second aspect optional embodiment, metamessage obtains module in the first metamessage and second for obtaining the first data When the second metamessage of data, it is specifically used for:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second Metamessage.
In second aspect optional embodiment, processing module is also used to:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined Fruit;
The device further includes threshold adjustment module, is specifically used for:
According to the first similarity, the first metamessage and the second metamessage, determine the first data and the second data whether phase Like after, if the first judging result and the second judging result difference, according to the first judging result and the second judging result, adjustment Similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage Result whether similar with the second data.
In second aspect optional embodiment, threshold adjustment module according to the first judging result and the second judging result, When adjusting similarity threshold, it is specifically used for:
If the first judging result be that the first data are similar with the second data and the one or two judging result be the first data and Second data are dissimilar, then tune up similarity threshold.
The third aspect, present disclose provides a kind of electronic equipment, which includes processor and memory;
Memory, for storing computer operation instruction;
Processor, for executing any of the first aspect such as the embodiment of the present disclosure by calling computer operation instruction Method shown in embodiment.
Fourth aspect, present disclose provides a kind of computer readable storage medium, the computer-readable recording medium storages Have at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, code set or Instruction set is loaded as processor and is executed to realize the side as shown in any embodiment of the first aspect of the embodiment of the present disclosure Method.
The technical solution that the embodiment of the present disclosure provides has the benefit that
In the embodiments of the present disclosure, according to first content feature and the second content characteristic, the first data and second are determined It, can also be further in conjunction with second yuan of letter of the first metamessage of the first data and the second data after first similarity of data Breath, finally determines the first data and whether the second data is similar, it is clear that be based only upon determining first in compared with the prior art It is more accurate for similar mode that similarity and preset threshold value determine whether, can promote the accuracy rate of judgement.
Detailed description of the invention
In order to illustrate more clearly of technical solution in embodiment of the disclosure, the embodiment of the present disclosure will be described below Needed in attached drawing be briefly described.
Fig. 1 is a kind of flow diagram of the method for determining set of metadata of similar data in embodiment of the disclosure;
Fig. 2 is a kind of structural schematic diagram of the device of determining set of metadata of similar data in embodiment of the disclosure;
Fig. 3 is the structural schematic diagram of a kind of electronic equipment in embodiment of the disclosure.
Specific embodiment
Embodiment of the disclosure is described below in detail, the example of the embodiment is shown in the accompanying drawings, wherein phase from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached drawing The embodiment of description is exemplary, and is only used for explaining the Sense of Technology of the disclosure, and cannot be construed to the limitation to the disclosure.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, "one" It may also comprise plural form with "the".It is to be further understood that wording " comprising " used in the specification of the disclosure is Refer to that there are this feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition it is one or more its His feature, integer, step, operation, element, component and/or their combination.It should be understood that when we claim element to be " connected " Or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be intermediary elements.This Outside, " connection " or " coupling " used herein may include being wirelessly connected or wirelessly coupling.Wording "and/or" packet used herein Include one or more associated wholes for listing item or any cell and all combination.
How the technical solution of the disclosure and the technical solution of the disclosure are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiment of the disclosure is described.
Embodiment of the disclosure provides a kind of method of determining set of metadata of similar data, as shown in Figure 1, this method may include:
Step S110 obtains the first data and the second data to be processed, wherein the first data and the second data are figure Picture is video.
That is, data to be processed are same type of data, in the embodiments of the present disclosure, data to be processed can Think video or image, i.e., when the first data are video, the second data are also video, when the first data are image, the Two data are also image.
Step S120 extracts the first content feature of the first data and the second content characteristic of the second data.
Step S130 determines the first of the first data and the second data according to first content feature and the second content characteristic Similarity.
Wherein, content characteristic is associated with image, the view when data are video, in the content characteristic and video of the video Frequency frame image is associated.Such as content characteristic may include the type of the object occurred in image, the text in image.Wherein, It is special to extract the content of image to can use the neural network model that image content features are extracted based on image of training in advance Sign, obtains the content feature vector of image, which can be two-way long-term short-term memory network.
In practical applications, the content feature vector of the content characteristic of the first data and the second data can be obtained The content feature vector of the content feature vector of first data and the second data.Each of the content feature vector of first data One of the content characteristic of corresponding first data of component, corresponding second number of each component of the content feature vector of the second data According to one of content characteristic, calculate the similar of the content feature vector of the first data and the content feature vector of the second data Degree, the first similarity of as the first data and the second data.
Step S140 obtains the first metamessage of the first data and the second metamessage of the second data.
In disclosure optional embodiment, the first metamessage and the second metamessage include at least one in following information :
Special efficacy, tag information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
Wherein, special efficacy refers to the first data and the second data with the presence or absence of special efficacy, and the special efficacy is specific when there are special efficacy Why special efficacy.Such as first data and the second data be video, and the video is added to " rabbit patch during recording Paper ", the metamessage of the special efficacy of the video is " rabbit paster " at this time.Historical information refers to the author or second of the first data Whether the author of data carries the information of other data from sometime node to current time node, for example, the first data and Second data are video, and historical information is to shoot the author of video from sometime node to current time node, if The information for carrying other videos, if carried other information, the mark etc. for the video being handled upside down.Tag information is for identifying number According to information label, i.e., some information of data can be known based on tag information, for example, when the first data and the second data When being the video about game, at this time the tag information of the first data and the second data can with for the label for identifying game, And the label for identifying specially which game;The related information of data refers to the letter for further illustrating data Breath such as can be the audio of data, comment information, by information such as operation informations, wherein by operation information refer to data from certain The information that one timing node is operated to current time node, for example, when data be video when, the video can by operation information With for recording from sometime node to current time node, whether which is compiled into other videos, if by editing When to other videos, it is compiled into the time of other videos, and mark for the video being compiled into etc..
Step S150 determines the first data and the second data according to the first similarity, the first metamessage and the second metamessage It is whether similar.
That is, can be incited somebody to action after obtaining the second metamessage of the first metamessage and the second data of the first data First metamessage of the first data got and the second metamessage of the second data are combined with the first similarity determined, It is final to determine whether the first data and the second data are similar.
In the embodiments of the present disclosure, according to first content feature and the second content characteristic, the first data and second are determined It, can also be further in conjunction with second yuan of letter of the first metamessage of the first data and the second data after first similarity of data Breath, finally determines the first data and whether the second data is similar, it is clear that be based only upon determining first in compared with the prior art It is more accurate for similar mode that similarity and preset threshold value determine whether, can promote the accuracy rate of judgement.
In disclosure optional embodiment, if the first metamessage and the second metamessage include the association letter of same type Breath, according to the first similarity, the first metamessage and the second metamessage, determines whether the first data and the second data are similar, comprising:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information, Determine whether the first data and the second data are similar.
Wherein, the first metamessage and the second metamessage include same type related information refer to the first metamessage and All contain in related information in second metamessage and is wrapped in same type of related information, such as the related information of the first metamessage Audio is included, the related information in the second metamessage also includes audio, and the first metamessage and the second metamessage include identical at this time The related information of type, and the related information of same type is audio.
Correspondingly, determine the first metamessage and the second metamessage include same type related information after, can be with The related information of same type in one metamessage and the second metamessage, determines the second similarity of the first data and the second data.
In practical applications, the related information of every kind of same type in the first metamessage and the second metamessage can be determined Similarity, the similarity for being then based on the related information of every kind of same type obtain the first data and the second of the second data similar Degree.For example the first related information in metamessage includes audio, comment information, the related information in the second metamessage also includes sound Frequency and comment information can determine the similarity of the audio in the first metamessage and the audio in the second metamessage at this time, and The similarity of comment information in first metamessage and the comment information in the second metamessage, be then based on audio similarity and The similarity of comment information determines the second similarity of the first data and the second data.It certainly, in practical applications, can be with base The related information of every kind of same type directly calculates a total similarity as in the first metamessage and the second metamessage Second similarity of one data and the second data.
Correspondingly, after determining the second similarity of the first data and the second data, it can be by the second similarity and first Similarity and the first metamessage in addition to related information and the second metamessage combine, it is further determine the first data and Whether the second data are similar.
In disclosure optional embodiment, according to the first similarity, the first metamessage and the second metamessage, first is determined Whether data and the second data are similar, comprising:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model Whether two data are similar.
In practical applications, the spy of the first similarity, each first metamessage and each second metamessage can be generated respectively Vector is levied, i.e., converts the first similarity, each first metamessage and each second metamessage to the representation of feature vector, Then each feature vector of generation is input in disaggregated model, disaggregated model can determine first according to the feature vector of input Whether data and the second data similar, for example, the first data and the second data be it is similar, may export for characterizing at this time First data and the second data are similar information.
Certainly, in practical applications, the first similarity, each first metamessage and each can not also be generated respectively in advance The feature vector of second metamessage, but directly distinguish the first similarity, each first metamessage and each second metamessage defeated Enter to disaggregated model, then generates the first similarity, each first metamessage and each second metamessage respectively by disaggregated model Feature vector, each feature vector for being then based on generation determine whether the first data and the second data are similar.
Wherein, disaggregated model is specially what kind of disaggregated model, and the embodiment of the present disclosure is not specifically limited, such as can be with It is Decision-Tree Classifier Model, multiple Bernoulli Jacob's model etc..
In disclosure optional embodiment, the first metamessage of the first data and the second metamessage of the second data are obtained Include:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second Metamessage.
In practical applications, according to first content feature and the second content characteristic, the first data and the second data are determined The first similarity after, it can be determined that whether the first similarity is greater than similarity threshold (determines two by way of threshold value Whether pending data is similar), if the first similarity is greater than similarity threshold, that is, determines the first data and the second data are similar , in order to enable the result arrived is more accurate, reduce the case where judging by accident, the first metamessage and second of available first data Second metamessage of data, then in conjunction with the first determining similarity, the first metamessage and the second metamessage, further really Whether fixed first data and the second data are similar.
In disclosure optional embodiment, this method further include:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined Fruit;
According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar Later, further includes:
If the first judging result and the second judging result difference are adjusted according to the first judging result and the second judging result Whole similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage Result whether similar with the second data.
In practical applications, if based on the first determining similarity and preset threshold, obtained the first judging result and base In the first similarity, the first metamessage and the second metamessage, the first data and the whether similar second processing of the second data are determined As a result different, then illustrate that the value of current similarity threshold is not that especially accurately, can be based on second processing result at this time to phase It is adjusted like the value of degree threshold value, to improve the accuracy rate of judging result.
In disclosure optional embodiment, according to the first judging result and the second judging result, similarity threshold is adjusted, Include:
If the first judging result is that the first data are similar with the second data and the first judging result is the first data and Two data are dissimilar, then tune up similarity threshold.
In practical applications, if it is similar with the second data for the first data based on the first determining judging result, but the Two judging results are that the first data and the second data are dissimilar, then illustrate that value set by current similarity threshold is smaller, this When appropriate can increase value set by current similarity threshold.
In one example, if the first similarity of the first data and the second data is 0.7, set by current similarity threshold The value set be 0.6, at this time the first similarity be greater than similarity threshold, that is, illustrate the first data and the second data be it is similar, But it is based on the first similarity, and the first metamessage and the second metamessage that get, second determined judges processing result Be not for the first data and the second data it is similar, then illustrate that value set by current similarity threshold is smaller, at this time can be with It is appropriate to increase value set by current similarity threshold, for example value set by current similarity threshold can be arranged It is 0.8.
Certainly, in practical applications, it is also possible to which the second judging result occurred is the first data and the second data It is not similar, but the second obtained judging result is the first data and the second data are similar situations, then explanation is worked as at this time Value set by preceding similarity threshold is larger, appropriate can reduce value set by current similarity threshold.
In one example, if the first similarity of the first data and the second data is 0.6, set by current similarity threshold The value set is 0.7, and the first similarity is less than similarity threshold at this time, that is, illustrates the first data and the second data are not similar , but it is based on the first similarity, and the first metamessage and the second metamessage that get, the second judgement processing knot determined Fruit be the first data and the second data be it is similar, then illustrate that value set by current similarity threshold is larger, at this time can be with It is appropriate to reduce value set by current similarity threshold, for example value set by current similarity threshold can be arranged It is 0.5.
In disclosure optional embodiment, according to the first judging result and the second judging result, similarity threshold is adjusted, Further include:
Determine the first data and the second data same type, and the metamessage that similarity is greater than the set value;
Based on the first determining data and the second data same type, and the metamessage that similarity is greater than the set value, it adjusts Whole similarity threshold.
Wherein, same type, and the metamessage that similarity is greater than the set value refers to that the first data and the second data are equal There are same type of metamessages, and the similarity between the existing same type of metamessage is greater than the set value (i.e. It is similar).For example, the first data and the second data are two videos to be processed, and exist in the two videos to be processed The metamessage of this type of special efficacy, and the metamessage of special efficacy this type is the paster about rabbit, and the first data and The similarity of the paster of rabbit in second data is greater than the set value, and the metamessage of this type of special efficacy is the first data at this time With the second data same type, and the metamessage that similarity is greater than the set value.
Further, can be based on the first determining data and the second data same type, and similarity is greater than setting The metamessage of value adjusts similarity threshold.Wherein, based on the first determining data and the second data same type, and it is similar The metamessage being greater than the set value is spent, the specific implementation of similarity threshold is adjusted, can be pre-configured with according to actual needs, this Open embodiment is without limitation.
As a kind of embodiment of choosing, the mapping relations of different number and adjusting step can be pre-configured with, then really Fixed first data and the second data same type, and the corresponding adjustment of quantity of metamessage that similarity is greater than the set value walks It is long, it is adjusted based on value of the determining adjusting step to the similarity threshold of current setting.
In one example, if the mapping relations of different number and adjusting step are as follows: the corresponding adjusting step of quantity 1 is 0.2, the corresponding adjusting step of quantity 2 is 0.3, and the current set value of similarity threshold is 0.6;And the first data determined With the second data same type, and the quantity of metamessage that similarity is greater than the set value is 2, can be based on different number at this time It is 2 corresponding adjusting steps with the mapping relations quantification of adjusting step is 0.3, it is then that similarity threshold is currently set Value be adjusted to 0.8.
As the embodiment of another kind choosing, the weight and the corresponding adjustment of each metamessage of each metamessage can be set Step-length is determining the first data and the second data same type, and the quantity of metamessage that similarity is greater than the set value is greater than 2 When, determine benchmark metamessage (based on the corresponding adjusting step of which metamessage to working as based on the weight of determining each metamessage The value of the similarity threshold of preceding setting is adjusted), the corresponding adjusting step of benchmark metamessage is then based on to current setting The value of similarity threshold be adjusted.Certainly, however, it is determined that the first data and the second data same type, and similarity is big It is only 1 in the quantity of the metamessage of setting value, the first data and the second data same type, and phase can be directly based upon at this time The value of the similarity threshold of current setting is adjusted like the metamessage corresponding adjusting step that degree is greater than the set value.
In one example, if the current set value of similarity threshold is 0.6, the weight of the metamessage of special efficacy type It is 0.6, and corresponding adjusting step is 0.3;The weight of the metamessage of comment information type is 0.3, and corresponding adjusting step It is 0.2;Two the first data and the second data same type, and the metamessage that similarity is greater than the set value is special efficacy type The metamessage (i.e. quantity is 2) of metamessage and comment information type, since the weight of the metamessage of special efficacy type is greater than comment letter Cease the weight of the metamessage of type, at this time can adjusting step corresponding to the metamessage based on special efficacy type to similarity threshold Current set value is adjusted, it can the current set value of similarity threshold is adjusted to 0.9.
Based on principle identical with method shown in Fig. 1, a kind of determining similarity number is additionally provided in embodiment of the disclosure According to device 30, as shown in Fig. 2, the device 30 of the determination set of metadata of similar data may include that data acquisition module 310, content characteristic mention Modulus block 320, the first similarity determining module 330, metamessage obtain module 340 and processing module 350, in which:
Data acquisition module, for obtaining the first data and the second data to be processed, wherein the first data and the second number According to being image or be video;
Content Feature Extraction module, for extracting the first content feature of the first data and the second content spy of the second data Sign;
First similarity determining module, for according to first content feature and the second content characteristic, determine the first data and First similarity of the second data;
Metamessage obtains module, for obtaining the first metamessage of the first data and the second metamessage of the second data;
Processing module, for determining the first data and second according to the first similarity, the first metamessage and the second metamessage Whether data are similar.
In disclosure optional embodiment, the first metamessage and the second metamessage include in following information at least one of:
Special efficacy, tag information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
In disclosure optional embodiment, if the first metamessage and the second metamessage include the related information of same type, Processing module according to the first similarity, the first metamessage and the second metamessage, determine the first data and the second data whether phase Like when, be specifically used for:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information, Determine whether the first data and the second data are similar.
In disclosure optional embodiment, processing module according to the first similarity, the first metamessage and the second metamessage, When determining the first data and whether similar the second data, it is specifically used for:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model Whether two data are similar.
In disclosure optional embodiment, metamessage obtains module in the first metamessage for obtaining the first data and the second number According to the second metamessage when, be specifically used for:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second Metamessage.
In disclosure optional embodiment, processing module is also used to:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined Fruit;
The device further includes threshold adjustment module, is specifically used for:
According to the first similarity, the first metamessage and the second metamessage, determine the first data and the second data whether phase Like after, if the first judging result and the second judging result difference, according to the first judging result and the second judging result, adjustment Similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage Result whether similar with the second data.
In disclosure optional embodiment, threshold adjustment module is adjusted according to the first judging result and the second judging result When whole similarity threshold, it is specifically used for:
If the first judging result be that the first data are similar with the second data and the one or two judging result be the first data and Second data are dissimilar, then tune up similarity threshold.
A kind of determination provided by embodiment of the disclosure can be performed in the device of the determination set of metadata of similar data of the embodiment of the present disclosure The method of set of metadata of similar data, realization principle is similar, each mould in the device of the determination set of metadata of similar data in each embodiment of the disclosure Movement performed by block be it is corresponding with the step in the method for the determination set of metadata of similar data in each embodiment of the disclosure, for true Determine each module of the device of set of metadata of similar data detailed functions description specifically may refer to hereinbefore shown in corresponding determination it is similar Description in the method for data, details are not described herein again.
Based on principle identical with method shown in embodiment of the disclosure, one is additionally provided in embodiment of the disclosure Kind electronic equipment, the electronic equipment can include but is not limited to: processor and memory;Memory, for storing computer behaviour It instructs;Processor, for by calling computer operation instruction to execute method shown in embodiment.
Based on principle identical with method shown in embodiment of the disclosure, one is additionally provided in embodiment of the disclosure Kind computer readable storage medium, the computer-readable recording medium storage have at least one instruction, at least a Duan Chengxu, code Collection or instruction set, at least one instruction, an at least Duan Chengxu, code set or instruction set are loaded by processor and are executed on to realize Method shown in embodiment is stated, details are not described herein.
Scheme in embodiment of the disclosure, below with reference to Fig. 3, it illustrates one kind to be adapted to carry out the embodiment of the present disclosure Electronic equipment 500 structural schematic diagram, which can be terminal device or server.Wherein, terminal device can It is (flat to include but is not limited to such as mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD Plate computer), PMP (portable media player), car-mounted terminal (such as vehicle mounted guidance terminal) etc. mobile terminal and Such as fixed terminal of number TV, desktop computer etc..Electronic equipment shown in Fig. 3 is only an example, should not be to this The function and use scope of open embodiment bring any restrictions.
As shown in figure 3, electronic equipment 500 may include processing unit (such as central processing unit, graphics processor etc.) 501, random access can be loaded into according to the program being stored in read-only memory (ROM) 502 or from storage device 508 Program in memory (RAM) 503 and execute various movements appropriate and processing.In RAM 503, it is also stored with electronic equipment Various programs and data needed for 500 operations.Processing unit 501, ROM 502 and RAM 503 pass through the phase each other of bus 504 Even.Input/output (I/O) interface 505 is also connected to bus 504.
In general, following device can connect to I/O interface 505: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 506 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 507 of dynamic device etc.;Storage device 508 including such as tape, hard disk etc.;And communication device 509.Communication device 509, which can permit electronic equipment 500, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 3 shows tool There is the electronic equipment 500 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 509, or from storage device 508 It is mounted, or is mounted from ROM 502.When the computer program is executed by processing unit 501, the embodiment of the present disclosure is executed Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity When sub- equipment executes, so that the electronic equipment executes method shown in above-described embodiment.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, the One acquiring unit is also described as " obtaining the unit of at least two internet protocol addresses ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (10)

1. a kind of method of determining set of metadata of similar data characterized by comprising
Obtain the first data and the second data to be processed, wherein first data and second data be image or Person is video;
Extract the first content feature of first data and the second content characteristic of second data;
According to the first content feature and second content characteristic, the of first data and second data is determined One similarity;
Obtain the first metamessage of first data and the second metamessage of second data;
According to first similarity, first metamessage and second metamessage, first data and described are determined Whether the second data are similar.
2. the method according to claim 1, wherein first metamessage and second metamessage include with At least one of in lower information:
Special efficacy, label tag information, the related information of data, historical information;
Wherein, the related information includes at least one of the following:
Audio, comment information, by operation information.
3. according to the method described in claim 2, it is characterized in that, if first metamessage and second metamessage include The related information of same type, it is described according to first similarity, first metamessage and second metamessage, it determines Whether first data and second data are similar, comprising:
According to the related information of the same type, the second similarity of first data and second data is determined;
According to first similarity, second similarity and first metamessage in addition to the related information With second metamessage, determine whether first data and second data are similar.
4. the method according to claim 1, wherein it is described according to first similarity, it is described first yuan letter Breath and second metamessage, determine whether first data and second data are similar, comprising:
The feature vector of first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, first data are determined based on the output of the disaggregated model It is whether similar with second data.
5. the method according to claim 1, wherein first metamessage for obtaining first data and institute The second metamessage for stating the second data includes:
When first similarity is greater than similarity threshold, the first metamessage and second number of first data are obtained According to the second metamessage.
6. the method according to claim 1, wherein the method also includes:
According to first similarity and similarity threshold, first data and second data whether similar are determined One judging result;
It is described according to first similarity, first metamessage and second metamessage, determine first data and After whether second data are similar, further includes:
If first judging result and the second judging result difference, according to first judging result and second judgement As a result, adjusting the similarity threshold;
Wherein, second judging result be according to first similarity, first metamessage and second metamessage, Determine first data and the whether similar result of second data.
7. according to the method described in claim 6, it is characterized in that, described sentence according to first judging result and described second Break as a result, adjusting the similarity threshold, comprising:
If the first judging result is first data, the one or two judging result similar and described with second data is institute It states the first data and second data is dissimilar, then tune up the similarity threshold.
8. a kind of device of determining set of metadata of similar data characterized by comprising
Data acquisition module, for obtaining the first data and the second data to be processed, wherein first data and described Two data are image or are video;
Content Feature Extraction module, for extract first data first content feature and second data second in Hold feature;
First similarity determining module, for according to the first content feature and second content characteristic, determining described the First similarity of one data and second data;
Metamessage obtains module, for obtaining the first metamessage of first data and second yuan of letter of second data Breath;
Processing module, for according to first similarity, first metamessage and second metamessage, determining described the Whether one data and second data are similar.
9. a kind of electronic equipment characterized by comprising
Processor and memory;
The memory, for storing computer program;
The processor, for by calling the computer program, method described in any one of perform claim requirement 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the readable storage medium storing program for executing is stored with computer program, institute Computer program is stated to be loaded as processor and executed to realize method described in any one of claims 1 to 7.
CN201910722643.XA 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium Active CN110414625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910722643.XA CN110414625B (en) 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910722643.XA CN110414625B (en) 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110414625A true CN110414625A (en) 2019-11-05
CN110414625B CN110414625B (en) 2022-11-08

Family

ID=68366154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910722643.XA Active CN110414625B (en) 2019-08-06 2019-08-06 Method and device for determining similar data, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110414625B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851653A (en) * 2019-11-08 2020-02-28 上海摩象网络科技有限公司 Method and device for shooting material mark and electronic equipment
CN113065619A (en) * 2021-06-03 2021-07-02 明品云(北京)数据科技有限公司 Data processing method, data processing device, computer readable storage medium and equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004265120A (en) * 2003-02-28 2004-09-24 Sony Corp Image processor and processing method, storage medium, and program
JP2005208686A (en) * 2004-01-19 2005-08-04 Nippon Telegr & Teleph Corp <Ntt> Weighted mata data management method, weighted mata data management apparatus, weighted mata data management program and recording medium with program recorded thereon
US20130243407A1 (en) * 2010-03-31 2013-09-19 Sony Corporation Electronic apparatus, reproduction control system, reproduction control method, and program therefor
US20130330008A1 (en) * 2011-09-24 2013-12-12 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
JP2015153021A (en) * 2014-02-12 2015-08-24 日本放送協会 Link information generation device and program
US9195640B1 (en) * 2009-01-12 2015-11-24 Sri International Method and system for finding content having a desired similarity
CN105184212A (en) * 2014-04-04 2015-12-23 卡姆芬德公司 Image processing server
US20160196478A1 (en) * 2013-09-03 2016-07-07 Samsung Electronics Co., Ltd. Image processing method and device
CN107644364A (en) * 2017-09-18 2018-01-30 北京京东尚科信息技术有限公司 Object filter method and system
US9881084B1 (en) * 2014-06-24 2018-01-30 A9.Com, Inc. Image match based video search
CN109857908A (en) * 2019-03-04 2019-06-07 北京字节跳动网络技术有限公司 Method and apparatus for matching video

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004265120A (en) * 2003-02-28 2004-09-24 Sony Corp Image processor and processing method, storage medium, and program
JP2005208686A (en) * 2004-01-19 2005-08-04 Nippon Telegr & Teleph Corp <Ntt> Weighted mata data management method, weighted mata data management apparatus, weighted mata data management program and recording medium with program recorded thereon
US9195640B1 (en) * 2009-01-12 2015-11-24 Sri International Method and system for finding content having a desired similarity
US20130243407A1 (en) * 2010-03-31 2013-09-19 Sony Corporation Electronic apparatus, reproduction control system, reproduction control method, and program therefor
US20130330008A1 (en) * 2011-09-24 2013-12-12 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
US20160196478A1 (en) * 2013-09-03 2016-07-07 Samsung Electronics Co., Ltd. Image processing method and device
JP2015153021A (en) * 2014-02-12 2015-08-24 日本放送協会 Link information generation device and program
CN105184212A (en) * 2014-04-04 2015-12-23 卡姆芬德公司 Image processing server
US9881084B1 (en) * 2014-06-24 2018-01-30 A9.Com, Inc. Image match based video search
CN107644364A (en) * 2017-09-18 2018-01-30 北京京东尚科信息技术有限公司 Object filter method and system
CN109857908A (en) * 2019-03-04 2019-06-07 北京字节跳动网络技术有限公司 Method and apparatus for matching video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
L.POLOK ET AL.: "《Quality assurence in large collections of video sequences》", 《2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING(ICIP)》 *
何宁: "《图像检索中跨模语义信息获取方法研究》", 《中国优秀硕士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851653A (en) * 2019-11-08 2020-02-28 上海摩象网络科技有限公司 Method and device for shooting material mark and electronic equipment
CN113065619A (en) * 2021-06-03 2021-07-02 明品云(北京)数据科技有限公司 Data processing method, data processing device, computer readable storage medium and equipment

Also Published As

Publication number Publication date
CN110414625B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN108446387A (en) Method and apparatus for updating face registration library
CN109858445A (en) Method and apparatus for generating model
CN110399848A (en) Video cover generation method, device and electronic equipment
CN109086719A (en) Method and apparatus for output data
CN109919244B (en) Method and apparatus for generating a scene recognition model
CN108595634B (en) Short message management method and device and electronic equipment
CN110381368A (en) Video cover generation method, device and electronic equipment
CN110413812A (en) Training method, device, electronic equipment and the storage medium of neural network model
CN109977839A (en) Information processing method and device
CN109829432A (en) Method and apparatus for generating information
CN110059623B (en) Method and apparatus for generating information
CN110213614A (en) The method and apparatus of key frame are extracted from video file
CN110365973A (en) Detection method, device, electronic equipment and the computer readable storage medium of video
CN109934191A (en) Information processing method and device
CN110222775A (en) Image processing method, device, electronic equipment and computer readable storage medium
CN110032978A (en) Method and apparatus for handling video
CN108446658A (en) The method and apparatus of facial image for identification
CN110321447A (en) Determination method, apparatus, electronic equipment and the storage medium of multiimage
CN110502665A (en) Method for processing video frequency and device
CN111708944A (en) Multimedia resource identification method, device, equipment and storage medium
CN111488273A (en) Test verification method, test verification device, storage medium, and electronic apparatus
CN110414625A (en) Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data
CN109934142A (en) Method and apparatus for generating the feature vector of video
CN110097004B (en) Facial expression recognition method and device
CN110008926B (en) Method and device for identifying age

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant