CN110414625A - Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data - Google Patents
Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data Download PDFInfo
- Publication number
- CN110414625A CN110414625A CN201910722643.XA CN201910722643A CN110414625A CN 110414625 A CN110414625 A CN 110414625A CN 201910722643 A CN201910722643 A CN 201910722643A CN 110414625 A CN110414625 A CN 110414625A
- Authority
- CN
- China
- Prior art keywords
- data
- metamessage
- similarity
- similar
- judging result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Abstract
The embodiment of the present disclosure provides method, apparatus, electronic equipment and the storage medium of a kind of determining set of metadata of similar data, comprising: obtains the first data and the second data to be processed, wherein the first data and the second data are image or are video;Extract the first content feature of the first data and the second content characteristic of the second data;According to first content feature and the second content characteristic, the first similarity of the first data and the second data is determined;Obtain the first metamessage of the first data and the second metamessage of the second data;According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar.In the embodiments of the present disclosure, after determining the first similarity, it can be combined with the first metamessage and the second metamessage, it finally determines the first data and whether the second data is similar, obviously, compared to the first similarity is based only upon and preset threshold value determines whether more accurate for similar mode, the accuracy rate of judgement can be promoted.
Description
Technical field
This disclosure relates to technical field of image processing, specifically, this disclosure relates to a kind of method of determining set of metadata of similar data,
Device, electronic equipment and storage medium.
Background technique
In the prior art, during the duplicate removal to image or video, it is typically based on the content of picture or video, determines two
Then the similarity of a picture or video is compared to determining similarity and preset threshold to judge two pictures or video
It is whether similar, still, it is based on mode in the prior art, erroneous judgement usually would tend to occur, for example, two dissimilar videos,
But the two videos have used identical special efficacy in shooting process, when use scheme in the prior art judges two views
Frequency it is whether similar, may since there are identical special efficacys in the two videos, and then judge the two videos be it is similar, make
At error in judgement.As it can be seen that judging that whether similar two pictures or video mode be not accurate enough in the prior art.
Summary of the invention
The purpose of the disclosure is intended at least can solve above-mentioned one of technological deficiency, promotes the usage experience of user.This public affairs
Open the technical solution adopted is as follows::
In a first aspect, the embodiment of the present disclosure provides a kind of method of determining set of metadata of similar data, this method comprises:
Obtain the first data and the second data to be processed, wherein the first data and the second data are image or
For video;
Extract the first content feature of the first data and the second content characteristic of the second data;
According to first content feature and the second content characteristic, the first similarity of the first data and the second data is determined;
Obtain the first metamessage of the first data and the second metamessage of the second data;
According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar.
In first aspect optional embodiment, the first metamessage and the second metamessage include at least one in following information
:
Special efficacy, tag (label) information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
In first aspect optional embodiment, if the first metamessage and the second metamessage include the association letter of same type
Breath, according to the first similarity, the first metamessage and the second metamessage, determines whether the first data and the second data are similar, comprising:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information,
Determine whether the first data and the second data are similar.
In first aspect optional embodiment, according to the first similarity, the first metamessage and the second metamessage, first is determined
Whether data and the second data are similar, comprising:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model
Whether two data are similar.
In first aspect optional embodiment, the first metamessage of the first data and the second metamessage of the second data are obtained
Include:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second
Metamessage.
In first aspect optional embodiment, this method further include:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined
Fruit;
According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar
Later, further includes:
If the first judging result and the second judging result difference are adjusted according to the first judging result and the second judging result
Whole similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage
Result whether similar with the second data.
In first aspect optional embodiment, according to the first judging result and the second judging result, similarity threshold is adjusted,
Include:
If the first judging result be that the first data are similar with the second data and the one or two judging result be the first data and
Second data are dissimilar, then tune up similarity threshold.
Second aspect, the embodiment of the present disclosure provide a kind of device of determining set of metadata of similar data, which includes:
Data acquisition module, for obtaining the first data and the second data to be processed, wherein the first data and the second number
According to being image or be video;
Content Feature Extraction module, for extracting the first content feature of the first data and the second content spy of the second data
Sign;
First similarity determining module, for according to first content feature and the second content characteristic, determine the first data and
First similarity of the second data;
Metamessage obtains module, for obtaining the first metamessage of the first data and the second metamessage of the second data;
Processing module, for determining the first data and second according to the first similarity, the first metamessage and the second metamessage
Whether data are similar.
In second aspect optional embodiment, the first metamessage and the second metamessage include at least one in following information
:
Special efficacy, tag information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
In second aspect optional embodiment, if the first metamessage and the second metamessage include the association letter of same type
Breath, processing module according to the first similarity, the first metamessage and the second metamessage, are determining whether are the first data and the second data
When similar, it is specifically used for:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information,
Determine whether the first data and the second data are similar.
In second aspect optional embodiment, processing module is according to the first similarity, the first metamessage and second yuan of letter
Breath, when determining the first data and whether similar the second data, is specifically used for:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model
Whether two data are similar.
In second aspect optional embodiment, metamessage obtains module in the first metamessage and second for obtaining the first data
When the second metamessage of data, it is specifically used for:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second
Metamessage.
In second aspect optional embodiment, processing module is also used to:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined
Fruit;
The device further includes threshold adjustment module, is specifically used for:
According to the first similarity, the first metamessage and the second metamessage, determine the first data and the second data whether phase
Like after, if the first judging result and the second judging result difference, according to the first judging result and the second judging result, adjustment
Similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage
Result whether similar with the second data.
In second aspect optional embodiment, threshold adjustment module according to the first judging result and the second judging result,
When adjusting similarity threshold, it is specifically used for:
If the first judging result be that the first data are similar with the second data and the one or two judging result be the first data and
Second data are dissimilar, then tune up similarity threshold.
The third aspect, present disclose provides a kind of electronic equipment, which includes processor and memory;
Memory, for storing computer operation instruction;
Processor, for executing any of the first aspect such as the embodiment of the present disclosure by calling computer operation instruction
Method shown in embodiment.
Fourth aspect, present disclose provides a kind of computer readable storage medium, the computer-readable recording medium storages
Have at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, code set or
Instruction set is loaded as processor and is executed to realize the side as shown in any embodiment of the first aspect of the embodiment of the present disclosure
Method.
The technical solution that the embodiment of the present disclosure provides has the benefit that
In the embodiments of the present disclosure, according to first content feature and the second content characteristic, the first data and second are determined
It, can also be further in conjunction with second yuan of letter of the first metamessage of the first data and the second data after first similarity of data
Breath, finally determines the first data and whether the second data is similar, it is clear that be based only upon determining first in compared with the prior art
It is more accurate for similar mode that similarity and preset threshold value determine whether, can promote the accuracy rate of judgement.
Detailed description of the invention
In order to illustrate more clearly of technical solution in embodiment of the disclosure, the embodiment of the present disclosure will be described below
Needed in attached drawing be briefly described.
Fig. 1 is a kind of flow diagram of the method for determining set of metadata of similar data in embodiment of the disclosure;
Fig. 2 is a kind of structural schematic diagram of the device of determining set of metadata of similar data in embodiment of the disclosure;
Fig. 3 is the structural schematic diagram of a kind of electronic equipment in embodiment of the disclosure.
Specific embodiment
Embodiment of the disclosure is described below in detail, the example of the embodiment is shown in the accompanying drawings, wherein phase from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached drawing
The embodiment of description is exemplary, and is only used for explaining the Sense of Technology of the disclosure, and cannot be construed to the limitation to the disclosure.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, "one"
It may also comprise plural form with "the".It is to be further understood that wording " comprising " used in the specification of the disclosure is
Refer to that there are this feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition it is one or more its
His feature, integer, step, operation, element, component and/or their combination.It should be understood that when we claim element to be " connected "
Or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be intermediary elements.This
Outside, " connection " or " coupling " used herein may include being wirelessly connected or wirelessly coupling.Wording "and/or" packet used herein
Include one or more associated wholes for listing item or any cell and all combination.
How the technical solution of the disclosure and the technical solution of the disclosure are solved with specifically embodiment below above-mentioned
Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept
Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiment of the disclosure is described.
Embodiment of the disclosure provides a kind of method of determining set of metadata of similar data, as shown in Figure 1, this method may include:
Step S110 obtains the first data and the second data to be processed, wherein the first data and the second data are figure
Picture is video.
That is, data to be processed are same type of data, in the embodiments of the present disclosure, data to be processed can
Think video or image, i.e., when the first data are video, the second data are also video, when the first data are image, the
Two data are also image.
Step S120 extracts the first content feature of the first data and the second content characteristic of the second data.
Step S130 determines the first of the first data and the second data according to first content feature and the second content characteristic
Similarity.
Wherein, content characteristic is associated with image, the view when data are video, in the content characteristic and video of the video
Frequency frame image is associated.Such as content characteristic may include the type of the object occurred in image, the text in image.Wherein,
It is special to extract the content of image to can use the neural network model that image content features are extracted based on image of training in advance
Sign, obtains the content feature vector of image, which can be two-way long-term short-term memory network.
In practical applications, the content feature vector of the content characteristic of the first data and the second data can be obtained
The content feature vector of the content feature vector of first data and the second data.Each of the content feature vector of first data
One of the content characteristic of corresponding first data of component, corresponding second number of each component of the content feature vector of the second data
According to one of content characteristic, calculate the similar of the content feature vector of the first data and the content feature vector of the second data
Degree, the first similarity of as the first data and the second data.
Step S140 obtains the first metamessage of the first data and the second metamessage of the second data.
In disclosure optional embodiment, the first metamessage and the second metamessage include at least one in following information
:
Special efficacy, tag information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
Wherein, special efficacy refers to the first data and the second data with the presence or absence of special efficacy, and the special efficacy is specific when there are special efficacy
Why special efficacy.Such as first data and the second data be video, and the video is added to " rabbit patch during recording
Paper ", the metamessage of the special efficacy of the video is " rabbit paster " at this time.Historical information refers to the author or second of the first data
Whether the author of data carries the information of other data from sometime node to current time node, for example, the first data and
Second data are video, and historical information is to shoot the author of video from sometime node to current time node, if
The information for carrying other videos, if carried other information, the mark etc. for the video being handled upside down.Tag information is for identifying number
According to information label, i.e., some information of data can be known based on tag information, for example, when the first data and the second data
When being the video about game, at this time the tag information of the first data and the second data can with for the label for identifying game,
And the label for identifying specially which game;The related information of data refers to the letter for further illustrating data
Breath such as can be the audio of data, comment information, by information such as operation informations, wherein by operation information refer to data from certain
The information that one timing node is operated to current time node, for example, when data be video when, the video can by operation information
With for recording from sometime node to current time node, whether which is compiled into other videos, if by editing
When to other videos, it is compiled into the time of other videos, and mark for the video being compiled into etc..
Step S150 determines the first data and the second data according to the first similarity, the first metamessage and the second metamessage
It is whether similar.
That is, can be incited somebody to action after obtaining the second metamessage of the first metamessage and the second data of the first data
First metamessage of the first data got and the second metamessage of the second data are combined with the first similarity determined,
It is final to determine whether the first data and the second data are similar.
In the embodiments of the present disclosure, according to first content feature and the second content characteristic, the first data and second are determined
It, can also be further in conjunction with second yuan of letter of the first metamessage of the first data and the second data after first similarity of data
Breath, finally determines the first data and whether the second data is similar, it is clear that be based only upon determining first in compared with the prior art
It is more accurate for similar mode that similarity and preset threshold value determine whether, can promote the accuracy rate of judgement.
In disclosure optional embodiment, if the first metamessage and the second metamessage include the association letter of same type
Breath, according to the first similarity, the first metamessage and the second metamessage, determines whether the first data and the second data are similar, comprising:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information,
Determine whether the first data and the second data are similar.
Wherein, the first metamessage and the second metamessage include same type related information refer to the first metamessage and
All contain in related information in second metamessage and is wrapped in same type of related information, such as the related information of the first metamessage
Audio is included, the related information in the second metamessage also includes audio, and the first metamessage and the second metamessage include identical at this time
The related information of type, and the related information of same type is audio.
Correspondingly, determine the first metamessage and the second metamessage include same type related information after, can be with
The related information of same type in one metamessage and the second metamessage, determines the second similarity of the first data and the second data.
In practical applications, the related information of every kind of same type in the first metamessage and the second metamessage can be determined
Similarity, the similarity for being then based on the related information of every kind of same type obtain the first data and the second of the second data similar
Degree.For example the first related information in metamessage includes audio, comment information, the related information in the second metamessage also includes sound
Frequency and comment information can determine the similarity of the audio in the first metamessage and the audio in the second metamessage at this time, and
The similarity of comment information in first metamessage and the comment information in the second metamessage, be then based on audio similarity and
The similarity of comment information determines the second similarity of the first data and the second data.It certainly, in practical applications, can be with base
The related information of every kind of same type directly calculates a total similarity as in the first metamessage and the second metamessage
Second similarity of one data and the second data.
Correspondingly, after determining the second similarity of the first data and the second data, it can be by the second similarity and first
Similarity and the first metamessage in addition to related information and the second metamessage combine, it is further determine the first data and
Whether the second data are similar.
In disclosure optional embodiment, according to the first similarity, the first metamessage and the second metamessage, first is determined
Whether data and the second data are similar, comprising:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model
Whether two data are similar.
In practical applications, the spy of the first similarity, each first metamessage and each second metamessage can be generated respectively
Vector is levied, i.e., converts the first similarity, each first metamessage and each second metamessage to the representation of feature vector,
Then each feature vector of generation is input in disaggregated model, disaggregated model can determine first according to the feature vector of input
Whether data and the second data similar, for example, the first data and the second data be it is similar, may export for characterizing at this time
First data and the second data are similar information.
Certainly, in practical applications, the first similarity, each first metamessage and each can not also be generated respectively in advance
The feature vector of second metamessage, but directly distinguish the first similarity, each first metamessage and each second metamessage defeated
Enter to disaggregated model, then generates the first similarity, each first metamessage and each second metamessage respectively by disaggregated model
Feature vector, each feature vector for being then based on generation determine whether the first data and the second data are similar.
Wherein, disaggregated model is specially what kind of disaggregated model, and the embodiment of the present disclosure is not specifically limited, such as can be with
It is Decision-Tree Classifier Model, multiple Bernoulli Jacob's model etc..
In disclosure optional embodiment, the first metamessage of the first data and the second metamessage of the second data are obtained
Include:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second
Metamessage.
In practical applications, according to first content feature and the second content characteristic, the first data and the second data are determined
The first similarity after, it can be determined that whether the first similarity is greater than similarity threshold (determines two by way of threshold value
Whether pending data is similar), if the first similarity is greater than similarity threshold, that is, determines the first data and the second data are similar
, in order to enable the result arrived is more accurate, reduce the case where judging by accident, the first metamessage and second of available first data
Second metamessage of data, then in conjunction with the first determining similarity, the first metamessage and the second metamessage, further really
Whether fixed first data and the second data are similar.
In disclosure optional embodiment, this method further include:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined
Fruit;
According to the first similarity, the first metamessage and the second metamessage, determine whether the first data and the second data are similar
Later, further includes:
If the first judging result and the second judging result difference are adjusted according to the first judging result and the second judging result
Whole similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage
Result whether similar with the second data.
In practical applications, if based on the first determining similarity and preset threshold, obtained the first judging result and base
In the first similarity, the first metamessage and the second metamessage, the first data and the whether similar second processing of the second data are determined
As a result different, then illustrate that the value of current similarity threshold is not that especially accurately, can be based on second processing result at this time to phase
It is adjusted like the value of degree threshold value, to improve the accuracy rate of judging result.
In disclosure optional embodiment, according to the first judging result and the second judging result, similarity threshold is adjusted,
Include:
If the first judging result is that the first data are similar with the second data and the first judging result is the first data and
Two data are dissimilar, then tune up similarity threshold.
In practical applications, if it is similar with the second data for the first data based on the first determining judging result, but the
Two judging results are that the first data and the second data are dissimilar, then illustrate that value set by current similarity threshold is smaller, this
When appropriate can increase value set by current similarity threshold.
In one example, if the first similarity of the first data and the second data is 0.7, set by current similarity threshold
The value set be 0.6, at this time the first similarity be greater than similarity threshold, that is, illustrate the first data and the second data be it is similar,
But it is based on the first similarity, and the first metamessage and the second metamessage that get, second determined judges processing result
Be not for the first data and the second data it is similar, then illustrate that value set by current similarity threshold is smaller, at this time can be with
It is appropriate to increase value set by current similarity threshold, for example value set by current similarity threshold can be arranged
It is 0.8.
Certainly, in practical applications, it is also possible to which the second judging result occurred is the first data and the second data
It is not similar, but the second obtained judging result is the first data and the second data are similar situations, then explanation is worked as at this time
Value set by preceding similarity threshold is larger, appropriate can reduce value set by current similarity threshold.
In one example, if the first similarity of the first data and the second data is 0.6, set by current similarity threshold
The value set is 0.7, and the first similarity is less than similarity threshold at this time, that is, illustrates the first data and the second data are not similar
, but it is based on the first similarity, and the first metamessage and the second metamessage that get, the second judgement processing knot determined
Fruit be the first data and the second data be it is similar, then illustrate that value set by current similarity threshold is larger, at this time can be with
It is appropriate to reduce value set by current similarity threshold, for example value set by current similarity threshold can be arranged
It is 0.5.
In disclosure optional embodiment, according to the first judging result and the second judging result, similarity threshold is adjusted,
Further include:
Determine the first data and the second data same type, and the metamessage that similarity is greater than the set value;
Based on the first determining data and the second data same type, and the metamessage that similarity is greater than the set value, it adjusts
Whole similarity threshold.
Wherein, same type, and the metamessage that similarity is greater than the set value refers to that the first data and the second data are equal
There are same type of metamessages, and the similarity between the existing same type of metamessage is greater than the set value (i.e.
It is similar).For example, the first data and the second data are two videos to be processed, and exist in the two videos to be processed
The metamessage of this type of special efficacy, and the metamessage of special efficacy this type is the paster about rabbit, and the first data and
The similarity of the paster of rabbit in second data is greater than the set value, and the metamessage of this type of special efficacy is the first data at this time
With the second data same type, and the metamessage that similarity is greater than the set value.
Further, can be based on the first determining data and the second data same type, and similarity is greater than setting
The metamessage of value adjusts similarity threshold.Wherein, based on the first determining data and the second data same type, and it is similar
The metamessage being greater than the set value is spent, the specific implementation of similarity threshold is adjusted, can be pre-configured with according to actual needs, this
Open embodiment is without limitation.
As a kind of embodiment of choosing, the mapping relations of different number and adjusting step can be pre-configured with, then really
Fixed first data and the second data same type, and the corresponding adjustment of quantity of metamessage that similarity is greater than the set value walks
It is long, it is adjusted based on value of the determining adjusting step to the similarity threshold of current setting.
In one example, if the mapping relations of different number and adjusting step are as follows: the corresponding adjusting step of quantity 1 is
0.2, the corresponding adjusting step of quantity 2 is 0.3, and the current set value of similarity threshold is 0.6;And the first data determined
With the second data same type, and the quantity of metamessage that similarity is greater than the set value is 2, can be based on different number at this time
It is 2 corresponding adjusting steps with the mapping relations quantification of adjusting step is 0.3, it is then that similarity threshold is currently set
Value be adjusted to 0.8.
As the embodiment of another kind choosing, the weight and the corresponding adjustment of each metamessage of each metamessage can be set
Step-length is determining the first data and the second data same type, and the quantity of metamessage that similarity is greater than the set value is greater than 2
When, determine benchmark metamessage (based on the corresponding adjusting step of which metamessage to working as based on the weight of determining each metamessage
The value of the similarity threshold of preceding setting is adjusted), the corresponding adjusting step of benchmark metamessage is then based on to current setting
The value of similarity threshold be adjusted.Certainly, however, it is determined that the first data and the second data same type, and similarity is big
It is only 1 in the quantity of the metamessage of setting value, the first data and the second data same type, and phase can be directly based upon at this time
The value of the similarity threshold of current setting is adjusted like the metamessage corresponding adjusting step that degree is greater than the set value.
In one example, if the current set value of similarity threshold is 0.6, the weight of the metamessage of special efficacy type
It is 0.6, and corresponding adjusting step is 0.3;The weight of the metamessage of comment information type is 0.3, and corresponding adjusting step
It is 0.2;Two the first data and the second data same type, and the metamessage that similarity is greater than the set value is special efficacy type
The metamessage (i.e. quantity is 2) of metamessage and comment information type, since the weight of the metamessage of special efficacy type is greater than comment letter
Cease the weight of the metamessage of type, at this time can adjusting step corresponding to the metamessage based on special efficacy type to similarity threshold
Current set value is adjusted, it can the current set value of similarity threshold is adjusted to 0.9.
Based on principle identical with method shown in Fig. 1, a kind of determining similarity number is additionally provided in embodiment of the disclosure
According to device 30, as shown in Fig. 2, the device 30 of the determination set of metadata of similar data may include that data acquisition module 310, content characteristic mention
Modulus block 320, the first similarity determining module 330, metamessage obtain module 340 and processing module 350, in which:
Data acquisition module, for obtaining the first data and the second data to be processed, wherein the first data and the second number
According to being image or be video;
Content Feature Extraction module, for extracting the first content feature of the first data and the second content spy of the second data
Sign;
First similarity determining module, for according to first content feature and the second content characteristic, determine the first data and
First similarity of the second data;
Metamessage obtains module, for obtaining the first metamessage of the first data and the second metamessage of the second data;
Processing module, for determining the first data and second according to the first similarity, the first metamessage and the second metamessage
Whether data are similar.
In disclosure optional embodiment, the first metamessage and the second metamessage include in following information at least one of:
Special efficacy, tag information, the related information of data, historical information;
Wherein, related information includes at least one of the following:
Audio, comment information, by operation information.
In disclosure optional embodiment, if the first metamessage and the second metamessage include the related information of same type,
Processing module according to the first similarity, the first metamessage and the second metamessage, determine the first data and the second data whether phase
Like when, be specifically used for:
According to the related information of same type, the second similarity of the first data and the second data is determined;
According to the first similarity, the second similarity and the first metamessage and the second metamessage in addition to related information,
Determine whether the first data and the second data are similar.
In disclosure optional embodiment, processing module according to the first similarity, the first metamessage and the second metamessage,
When determining the first data and whether similar the second data, it is specifically used for:
The feature vector of the first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, the first data and are determined based on the output of disaggregated model
Whether two data are similar.
In disclosure optional embodiment, metamessage obtains module in the first metamessage for obtaining the first data and the second number
According to the second metamessage when, be specifically used for:
The first similarity be greater than similarity threshold when, obtain the first data the first metamessage and the second data second
Metamessage.
In disclosure optional embodiment, processing module is also used to:
According to the first similarity and similarity threshold, the first data and the whether similar first judgement knot of the second data are determined
Fruit;
The device further includes threshold adjustment module, is specifically used for:
According to the first similarity, the first metamessage and the second metamessage, determine the first data and the second data whether phase
Like after, if the first judging result and the second judging result difference, according to the first judging result and the second judging result, adjustment
Similarity threshold;
Wherein, the second judging result is to determine the first data according to the first similarity, the first metamessage and the second metamessage
Result whether similar with the second data.
In disclosure optional embodiment, threshold adjustment module is adjusted according to the first judging result and the second judging result
When whole similarity threshold, it is specifically used for:
If the first judging result be that the first data are similar with the second data and the one or two judging result be the first data and
Second data are dissimilar, then tune up similarity threshold.
A kind of determination provided by embodiment of the disclosure can be performed in the device of the determination set of metadata of similar data of the embodiment of the present disclosure
The method of set of metadata of similar data, realization principle is similar, each mould in the device of the determination set of metadata of similar data in each embodiment of the disclosure
Movement performed by block be it is corresponding with the step in the method for the determination set of metadata of similar data in each embodiment of the disclosure, for true
Determine each module of the device of set of metadata of similar data detailed functions description specifically may refer to hereinbefore shown in corresponding determination it is similar
Description in the method for data, details are not described herein again.
Based on principle identical with method shown in embodiment of the disclosure, one is additionally provided in embodiment of the disclosure
Kind electronic equipment, the electronic equipment can include but is not limited to: processor and memory;Memory, for storing computer behaviour
It instructs;Processor, for by calling computer operation instruction to execute method shown in embodiment.
Based on principle identical with method shown in embodiment of the disclosure, one is additionally provided in embodiment of the disclosure
Kind computer readable storage medium, the computer-readable recording medium storage have at least one instruction, at least a Duan Chengxu, code
Collection or instruction set, at least one instruction, an at least Duan Chengxu, code set or instruction set are loaded by processor and are executed on to realize
Method shown in embodiment is stated, details are not described herein.
Scheme in embodiment of the disclosure, below with reference to Fig. 3, it illustrates one kind to be adapted to carry out the embodiment of the present disclosure
Electronic equipment 500 structural schematic diagram, which can be terminal device or server.Wherein, terminal device can
It is (flat to include but is not limited to such as mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD
Plate computer), PMP (portable media player), car-mounted terminal (such as vehicle mounted guidance terminal) etc. mobile terminal and
Such as fixed terminal of number TV, desktop computer etc..Electronic equipment shown in Fig. 3 is only an example, should not be to this
The function and use scope of open embodiment bring any restrictions.
As shown in figure 3, electronic equipment 500 may include processing unit (such as central processing unit, graphics processor etc.)
501, random access can be loaded into according to the program being stored in read-only memory (ROM) 502 or from storage device 508
Program in memory (RAM) 503 and execute various movements appropriate and processing.In RAM 503, it is also stored with electronic equipment
Various programs and data needed for 500 operations.Processing unit 501, ROM 502 and RAM 503 pass through the phase each other of bus 504
Even.Input/output (I/O) interface 505 is also connected to bus 504.
In general, following device can connect to I/O interface 505: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 506 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 507 of dynamic device etc.;Storage device 508 including such as tape, hard disk etc.;And communication device 509.Communication device
509, which can permit electronic equipment 500, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 3 shows tool
There is the electronic equipment 500 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 509, or from storage device 508
It is mounted, or is mounted from ROM 502.When the computer program is executed by processing unit 501, the embodiment of the present disclosure is executed
Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated,
In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to
Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit
Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned
Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity
When sub- equipment executes, so that the electronic equipment executes method shown in above-described embodiment.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof
Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+
+, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can
Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package,
Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part.
In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN)
Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service
Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard
The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, the
One acquiring unit is also described as " obtaining the unit of at least two internet protocol addresses ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (10)
1. a kind of method of determining set of metadata of similar data characterized by comprising
Obtain the first data and the second data to be processed, wherein first data and second data be image or
Person is video;
Extract the first content feature of first data and the second content characteristic of second data;
According to the first content feature and second content characteristic, the of first data and second data is determined
One similarity;
Obtain the first metamessage of first data and the second metamessage of second data;
According to first similarity, first metamessage and second metamessage, first data and described are determined
Whether the second data are similar.
2. the method according to claim 1, wherein first metamessage and second metamessage include with
At least one of in lower information:
Special efficacy, label tag information, the related information of data, historical information;
Wherein, the related information includes at least one of the following:
Audio, comment information, by operation information.
3. according to the method described in claim 2, it is characterized in that, if first metamessage and second metamessage include
The related information of same type, it is described according to first similarity, first metamessage and second metamessage, it determines
Whether first data and second data are similar, comprising:
According to the related information of the same type, the second similarity of first data and second data is determined;
According to first similarity, second similarity and first metamessage in addition to the related information
With second metamessage, determine whether first data and second data are similar.
4. the method according to claim 1, wherein it is described according to first similarity, it is described first yuan letter
Breath and second metamessage, determine whether first data and second data are similar, comprising:
The feature vector of first similarity, each first metamessage and each second metamessage is generated respectively;
Each feature vector of generation is input in disaggregated model, first data are determined based on the output of the disaggregated model
It is whether similar with second data.
5. the method according to claim 1, wherein first metamessage for obtaining first data and institute
The second metamessage for stating the second data includes:
When first similarity is greater than similarity threshold, the first metamessage and second number of first data are obtained
According to the second metamessage.
6. the method according to claim 1, wherein the method also includes:
According to first similarity and similarity threshold, first data and second data whether similar are determined
One judging result;
It is described according to first similarity, first metamessage and second metamessage, determine first data and
After whether second data are similar, further includes:
If first judging result and the second judging result difference, according to first judging result and second judgement
As a result, adjusting the similarity threshold;
Wherein, second judging result be according to first similarity, first metamessage and second metamessage,
Determine first data and the whether similar result of second data.
7. according to the method described in claim 6, it is characterized in that, described sentence according to first judging result and described second
Break as a result, adjusting the similarity threshold, comprising:
If the first judging result is first data, the one or two judging result similar and described with second data is institute
It states the first data and second data is dissimilar, then tune up the similarity threshold.
8. a kind of device of determining set of metadata of similar data characterized by comprising
Data acquisition module, for obtaining the first data and the second data to be processed, wherein first data and described
Two data are image or are video;
Content Feature Extraction module, for extract first data first content feature and second data second in
Hold feature;
First similarity determining module, for according to the first content feature and second content characteristic, determining described the
First similarity of one data and second data;
Metamessage obtains module, for obtaining the first metamessage of first data and second yuan of letter of second data
Breath;
Processing module, for according to first similarity, first metamessage and second metamessage, determining described the
Whether one data and second data are similar.
9. a kind of electronic equipment characterized by comprising
Processor and memory;
The memory, for storing computer program;
The processor, for by calling the computer program, method described in any one of perform claim requirement 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the readable storage medium storing program for executing is stored with computer program, institute
Computer program is stated to be loaded as processor and executed to realize method described in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910722643.XA CN110414625B (en) | 2019-08-06 | 2019-08-06 | Method and device for determining similar data, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910722643.XA CN110414625B (en) | 2019-08-06 | 2019-08-06 | Method and device for determining similar data, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110414625A true CN110414625A (en) | 2019-11-05 |
CN110414625B CN110414625B (en) | 2022-11-08 |
Family
ID=68366154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910722643.XA Active CN110414625B (en) | 2019-08-06 | 2019-08-06 | Method and device for determining similar data, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110414625B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851653A (en) * | 2019-11-08 | 2020-02-28 | 上海摩象网络科技有限公司 | Method and device for shooting material mark and electronic equipment |
CN113065619A (en) * | 2021-06-03 | 2021-07-02 | 明品云(北京)数据科技有限公司 | Data processing method, data processing device, computer readable storage medium and equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004265120A (en) * | 2003-02-28 | 2004-09-24 | Sony Corp | Image processor and processing method, storage medium, and program |
JP2005208686A (en) * | 2004-01-19 | 2005-08-04 | Nippon Telegr & Teleph Corp <Ntt> | Weighted mata data management method, weighted mata data management apparatus, weighted mata data management program and recording medium with program recorded thereon |
US20130243407A1 (en) * | 2010-03-31 | 2013-09-19 | Sony Corporation | Electronic apparatus, reproduction control system, reproduction control method, and program therefor |
US20130330008A1 (en) * | 2011-09-24 | 2013-12-12 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
JP2015153021A (en) * | 2014-02-12 | 2015-08-24 | 日本放送協会 | Link information generation device and program |
US9195640B1 (en) * | 2009-01-12 | 2015-11-24 | Sri International | Method and system for finding content having a desired similarity |
CN105184212A (en) * | 2014-04-04 | 2015-12-23 | 卡姆芬德公司 | Image processing server |
US20160196478A1 (en) * | 2013-09-03 | 2016-07-07 | Samsung Electronics Co., Ltd. | Image processing method and device |
CN107644364A (en) * | 2017-09-18 | 2018-01-30 | 北京京东尚科信息技术有限公司 | Object filter method and system |
US9881084B1 (en) * | 2014-06-24 | 2018-01-30 | A9.Com, Inc. | Image match based video search |
CN109857908A (en) * | 2019-03-04 | 2019-06-07 | 北京字节跳动网络技术有限公司 | Method and apparatus for matching video |
-
2019
- 2019-08-06 CN CN201910722643.XA patent/CN110414625B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004265120A (en) * | 2003-02-28 | 2004-09-24 | Sony Corp | Image processor and processing method, storage medium, and program |
JP2005208686A (en) * | 2004-01-19 | 2005-08-04 | Nippon Telegr & Teleph Corp <Ntt> | Weighted mata data management method, weighted mata data management apparatus, weighted mata data management program and recording medium with program recorded thereon |
US9195640B1 (en) * | 2009-01-12 | 2015-11-24 | Sri International | Method and system for finding content having a desired similarity |
US20130243407A1 (en) * | 2010-03-31 | 2013-09-19 | Sony Corporation | Electronic apparatus, reproduction control system, reproduction control method, and program therefor |
US20130330008A1 (en) * | 2011-09-24 | 2013-12-12 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
US20160196478A1 (en) * | 2013-09-03 | 2016-07-07 | Samsung Electronics Co., Ltd. | Image processing method and device |
JP2015153021A (en) * | 2014-02-12 | 2015-08-24 | 日本放送協会 | Link information generation device and program |
CN105184212A (en) * | 2014-04-04 | 2015-12-23 | 卡姆芬德公司 | Image processing server |
US9881084B1 (en) * | 2014-06-24 | 2018-01-30 | A9.Com, Inc. | Image match based video search |
CN107644364A (en) * | 2017-09-18 | 2018-01-30 | 北京京东尚科信息技术有限公司 | Object filter method and system |
CN109857908A (en) * | 2019-03-04 | 2019-06-07 | 北京字节跳动网络技术有限公司 | Method and apparatus for matching video |
Non-Patent Citations (2)
Title |
---|
L.POLOK ET AL.: "《Quality assurence in large collections of video sequences》", 《2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING(ICIP)》 * |
何宁: "《图像检索中跨模语义信息获取方法研究》", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851653A (en) * | 2019-11-08 | 2020-02-28 | 上海摩象网络科技有限公司 | Method and device for shooting material mark and electronic equipment |
CN113065619A (en) * | 2021-06-03 | 2021-07-02 | 明品云(北京)数据科技有限公司 | Data processing method, data processing device, computer readable storage medium and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110414625B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108446387A (en) | Method and apparatus for updating face registration library | |
CN109858445A (en) | Method and apparatus for generating model | |
CN110399848A (en) | Video cover generation method, device and electronic equipment | |
CN109086719A (en) | Method and apparatus for output data | |
CN109919244B (en) | Method and apparatus for generating a scene recognition model | |
CN108595634B (en) | Short message management method and device and electronic equipment | |
CN110381368A (en) | Video cover generation method, device and electronic equipment | |
CN110413812A (en) | Training method, device, electronic equipment and the storage medium of neural network model | |
CN109977839A (en) | Information processing method and device | |
CN109829432A (en) | Method and apparatus for generating information | |
CN110059623B (en) | Method and apparatus for generating information | |
CN110213614A (en) | The method and apparatus of key frame are extracted from video file | |
CN110365973A (en) | Detection method, device, electronic equipment and the computer readable storage medium of video | |
CN109934191A (en) | Information processing method and device | |
CN110222775A (en) | Image processing method, device, electronic equipment and computer readable storage medium | |
CN110032978A (en) | Method and apparatus for handling video | |
CN108446658A (en) | The method and apparatus of facial image for identification | |
CN110321447A (en) | Determination method, apparatus, electronic equipment and the storage medium of multiimage | |
CN110502665A (en) | Method for processing video frequency and device | |
CN111708944A (en) | Multimedia resource identification method, device, equipment and storage medium | |
CN111488273A (en) | Test verification method, test verification device, storage medium, and electronic apparatus | |
CN110414625A (en) | Determine method, apparatus, electronic equipment and the storage medium of set of metadata of similar data | |
CN109934142A (en) | Method and apparatus for generating the feature vector of video | |
CN110097004B (en) | Facial expression recognition method and device | |
CN110008926B (en) | Method and device for identifying age |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |