CN106294563A - A kind for the treatment of method and apparatus of multi-medium data - Google Patents
A kind for the treatment of method and apparatus of multi-medium data Download PDFInfo
- Publication number
- CN106294563A CN106294563A CN201610601570.5A CN201610601570A CN106294563A CN 106294563 A CN106294563 A CN 106294563A CN 201610601570 A CN201610601570 A CN 201610601570A CN 106294563 A CN106294563 A CN 106294563A
- Authority
- CN
- China
- Prior art keywords
- medium data
- information
- branch
- label information
- tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the treating method and apparatus of a kind of multi-medium data, the label information solving artificial mark multi-medium data can expend huge manpower and time, and the problem that accuracy rate is relatively low.Method includes: receive pending multi-medium data;Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence information of tree previously generated, determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate is for representing the similarity degree of described multi-medium data and each branch tree;Determine the described coverage rate branch tree more than the first predetermined threshold value, and from the Rule of judgment branch that described branch tree comprises, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label information of described multi-medium data such that it is able to determine the label information of multi-medium data quickly and accurately.
Description
Technical field
The present invention relates to areas of information technology, particularly to the treating method and apparatus of a kind of multi-medium data.
Background technology
Under information technology leads, multi-medium data presents explosive growth, and Appropriate application multi-medium data can make intelligence
The service of energy interactive system reaches the effect got twice the result with half the effort.The human-computer interaction interface that user can be provided by intelligent interactive system
Realizing mutual, therefore, user is the service object of intelligent interactive system, is again the significant data source of intelligent interactive system.
Intelligent interactive system under big data background can recommend its sense emerging for user in the multi-medium data of magnanimity
The multi-medium data of interest.Intelligent interactive system is the label information according to multi-medium data, recommends multi-medium data for user,
Therefore, only possess label information accurately, intelligent interactive system just can be made to recommend suitable multimedia number for user accurately
According to.In existing music player, it is all by musical expert, is manually that each voice data in its music libraries is (such as song, play
Bent etc.) add label information, in order to and music player can be according to the label information of each voice data, for using this music
The user of player recommends the contents such as its song interested, opera.But it is timely that artificial notation methods can expend huge manpower
Between, accuracy rate is relatively low.
Summary of the invention
Embodiments provide the treating method and apparatus of a kind of multi-medium data, solve and manually mark multimedia
The label information of data can expend huge manpower and time, and the problem that accuracy rate is relatively low.
First aspect, the present invention provides the processing method of a kind of multi-medium data, described method to include:
Receive pending multi-medium data;
Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence of tree previously generated
Information, determines the coverage rate of described multi-medium data and each branch tree, and wherein, described coverage rate is used for representing described multimedia number
According to the similarity degree with each branch tree;
Determine the described coverage rate branch tree more than the first predetermined threshold value, and the Rule of judgment comprised from described branch tree divides
Zhi Zhong, determines the Rule of judgment branch that the characteristic information of described multi-medium data meets;
By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label letter of described multi-medium data
Breath.
In a kind of possible embodiment, from the Rule of judgment branch that described branch tree comprises, determine described multimedia
The Rule of judgment branch that the characteristic information of data meets, including:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data with described
The Rule of judgment of Rule of judgment branch mates;
If at least one characteristic information of described multi-medium data mates, then with the Rule of judgment of arbitrary Rule of judgment branch
Determine that the characteristic information of described multi-medium data meets described Rule of judgment branch.
In a kind of possible embodiment, described method also includes:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion
Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation
A first kind label information in the first kind label information of same category and mutual exclusion.
In a kind of possible embodiment, described method also includes:
According to user's network log to the operation that multi-medium data performs, determine multi-medium data operated by user
Equations of The Second Kind label information.
In a kind of possible embodiment, according to user's network log to the operation that multi-medium data performs, determine use
The Equations of The Second Kind label information of the multi-medium data operated by family, including:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding
Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K
Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises
First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes
Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute
State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it
Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines
Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log
Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
In a kind of possible embodiment, described method also includes:
In the multi-medium data that with the addition of described Equations of The Second Kind label information, record adds described Equations of The Second Kind label information
Temporal information;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind mark
Label information.
In a kind of possible embodiment, according to characteristic information and the tree previously generated of described multi-medium data
Each branch tree characteristic of correspondence information, determine the coverage rate of described multi-medium data and each branch tree, including:
For described branch tree, determine the characteristic information of described multi-medium data and the friendship of described branch tree characteristic of correspondence
Concentrate the number M of the characteristic information comprised;
Determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and concentration comprises
The number N 1 of characteristic information, and according to the ratio of described M Yu described N1, determines covering of described multi-medium data and described branch tree
Lid rate;Or determine the number of number and the described branch tree characteristic of correspondence information of the characteristic information of described multi-medium data
Total number N2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
Second aspect, the present invention also provides for the processing means of a kind of multi-medium data, and described device includes:
Receiver module, for receiving pending multi-medium data;
Branch tree determines module, for the characteristic information according to described multi-medium data and the tree previously generated
Each branch tree characteristic of correspondence information, determines the coverage rate of described multi-medium data and each branch tree, and wherein, described coverage rate is used
In the similarity degree representing described multi-medium data and each branch tree;
Branch determines module, for determining the described coverage rate branch tree more than the first predetermined threshold value, and from described branch
In the Rule of judgment branch that tree comprises, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
Label determines module, for by the value of the leaf node in described Rule of judgment branch, being defined as described multimedia
The first kind label information of data.
In a kind of possible embodiment, described branch determine module specifically for:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data with described
The Rule of judgment of Rule of judgment branch mates;
If at least one characteristic information of described multi-medium data mates, then with the Rule of judgment of arbitrary Rule of judgment branch
Determine that the characteristic information of described multi-medium data meets described Rule of judgment branch.
In a kind of possible embodiment, described label determines that module is additionally operable to:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion
Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation
A first kind label information in the first kind label information of same category and mutual exclusion.
In a kind of possible embodiment, described label determines that module is additionally operable to:
According to user, multi-medium data is performed the network log of operation, determines the of multi-medium data operated by user
Two class label informations.
In a kind of possible embodiment, described label determine module specifically for:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding
Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K
Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises
First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes
Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute
State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it
Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines
Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log
Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
In a kind of possible embodiment, described branch tree determine module specifically for:
For described branch tree, determine the characteristic information of described multi-medium data and the friendship of described branch tree characteristic of correspondence
Concentrate the number M of the characteristic information comprised;
Determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and concentration comprises
The number N 1 of characteristic information, and according to the ratio of described M Yu described N1, determines covering of described multi-medium data and described branch tree
Lid rate;Or determine the number of number and the described branch tree characteristic of correspondence information of the characteristic information of described multi-medium data
Total number N2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
In the multimedia data processing method of embodiment of the present invention offer and device, receive pending multi-medium data;
Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence information of the tree previously generated, determine
Described multi-medium data and the coverage rate of each branch tree;Determine described coverage rate more than the branch tree of the first predetermined threshold value, and from
In the Rule of judgment branch that described branch tree comprises, determine that the Rule of judgment that the characteristic information of described multi-medium data meets divides
?;By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label information of described multi-medium data, from
And the label information of multi-medium data can be determined quickly and accurately.Further, since coverage rate is more than the first threshold preset
Branch tree number may more than one, accordingly, it is determined that the number of the label information of the multi-medium data gone out is more than one,
The label information making multi-medium data covers more comprehensively, and label information based on multi-medium data carries out the accuracy recommended more
High.
Accompanying drawing explanation
The schematic flow sheet of the processing method of a kind of multi-medium data that Fig. 1 provides for the embodiment of the present invention;
The schematic diagram of a kind of tree that Fig. 2 provides for the embodiment of the present invention;
The schematic diagram of the branch tree of a kind of tree that Fig. 3 provides for the embodiment of the present invention;
The schematic flow sheet of the processing method of the another kind of multi-medium data that Fig. 4 provides for the embodiment of the present invention;
The schematic diagram of the processing means of a kind of multi-medium data that Fig. 5 provides for the embodiment of the present invention.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
The a part of embodiment of the present invention rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under not making creative work premise, broadly falls into the scope of protection of the invention.
Below in conjunction with Figure of description, the embodiment of the present invention is described in further detail.Should be appreciated that described herein
Embodiment be merely to illustrate and explain the present invention, be not intended to limit the present invention.
In embodiment shown in Fig. 1, it is provided that the processing method of a kind of multi-medium data, described method includes:
S11, receive pending multi-medium data;
In this step, the multi-medium data received can be that user uploads, it is also possible to obtains from data base,
The mode obtaining multi-medium data is not defined by the embodiment of the present invention.
Optionally, described multi-medium data includes but not limited to: voice data (such as song, opera etc.) and video data
(such as TV play, film etc.) etc..
Being as a example by song by multi-medium data, the characteristic information of the characteristic for characterizing song includes: song title, singer
Name, use musical instrument, rhythm, beat, music type, pouplarity, write words, wrirte music, the crucial lyrics etc..Such as, mark
(ID) be 001 the sequence that formed of the characteristic information of song be: [miss, Zheng Jun, popular, elopement, school, Bruce, rock and roll,
Baidu, guitar, waist drum, saxophone, Chang'an].
S12, corresponding according to the characteristic information of described multi-medium data and each branch tree of the tree previously generated
Characteristic information, determines the coverage rate of described multi-medium data and each branch tree, and wherein, described coverage rate is used for representing described many matchmakers
Volume data and the similarity degree of each branch tree.
S13, determine described coverage rate more than the branch tree of the first predetermined threshold value, and the judgement bar comprised from described branch tree
In part branch, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
S14, by the value of the leaf node in described Rule of judgment branch, be defined as the first kind mark of described multi-medium data
Label information.
Concrete, for each coverage rate more than the branch tree of the first threshold preset, it is performed both by S13~S14.Owing to covering
Lid rate more than the branch tree of the first threshold preset number may more than one, the label information of described multi-medium data individual
Number is at least one.
In the embodiment of the present invention, receive pending multi-medium data;Characteristic information according to described multi-medium data and
Each branch tree characteristic of correspondence information of the tree previously generated, determines the covering of described multi-medium data and each branch tree
Rate;Determine the described coverage rate branch tree more than the first predetermined threshold value, and from the Rule of judgment branch that described branch tree comprises,
Determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;By the leaf node in described Rule of judgment branch
Value, be defined as the first kind label information of described multi-medium data such that it is able to determine multi-medium data quickly and accurately
Label information.Further, since coverage rate is more than the possible more than one of the number of the branch tree of the first threshold preset, therefore,
The number of the label information of the multi-medium data determined is more than one so that the label information of multi-medium data covers more complete
Face, the accuracy that label information based on multi-medium data carries out recommending is higher.
In a kind of possible embodiment, the characteristic information according to described multi-medium data and the tree previously generated in S12
Each branch tree characteristic of correspondence information of type structure, determines the coverage rate of described multi-medium data and each branch tree, including following
Two kinds of possible implementations:
One, for each described branch tree, determine that the characteristic information of described multi-medium data is corresponding with described branch tree
The number M of the characteristic information comprised in the common factor of feature, and determine characteristic information and the described branch tree of described multi-medium data
Characteristic of correspondence information and concentrate the number N 1 of characteristic information comprised;And the ratio according to described M Yu described N1, determine
Described multi-medium data and the coverage rate of described branch tree.
Such as, directly by the ratio of described M Yu described N1, it is defined as the covering of described multi-medium data and described branch tree
Rate.
Two, for each described branch tree, determine that the characteristic information of described multi-medium data is corresponding with described branch tree
The number M of the characteristic information comprised in the common factor of feature, and determine that the number of characteristic information of described multi-medium data is with described
The total number N2 of the number of branch's tree characteristic of correspondence information;And the ratio according to described M Yu described N2, determine described many matchmakers
Volume data and the coverage rate of described branch tree.
Such as, directly by the ratio of described M Yu described N2, it is defined as the covering of described multi-medium data and described branch tree
Rate.
And for example, owing to the number of different branches tree characteristic of correspondence information may be different, in order to improve multi-medium data and
The comparability of the coverage rate of different branches tree, is multiplied by the ratio of described M Yu described N2 2 values obtained, is defined as described many matchmakers
Volume data and the coverage rate of described branch tree.
Certainly, the embodiment of the present invention is not limited to use above-mentioned two mode to determine coverage rate, it would however also be possible to employ other modes,
As long as the mode of the level of coverage that can determine that between two sequences is encompassed by invention which is intended to be protected.
In a kind of possible embodiment, described method also includes:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion
Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation
A first kind label information in the first kind label information of same category and mutual exclusion.
Concrete, according to the mutual exclusion rule set, the first kind label information of described multi-medium data is filtered, will
The first kind label information belonging to same category and mutual exclusion in the first kind label information of described multi-medium data filters,
Only retain one, so that the first kind label information of described multi-medium data is more accurate.Such as, based on above-mentioned S11
~the first kind label information belonging to class of languages and mutual exclusion in the first kind label information that obtains of S14 include Korean, Japanese and
Chinese, owing to same multi-medium data can only add a class of languages label, therefore, selects from these three first kind label informations
Select one, and delete other two.
Optionally, if described in belong to the number of first kind label information of same category and mutual exclusion more than 1, retain described
Belong to a first kind label information in the first kind label information of same category and mutual exclusion, including following possible enforcement
Mode:
If the number belonging to the first kind label information of same category and mutual exclusion described in mode 1 is more than 1, return from described
Belong to and the first kind label information of same category and mutual exclusion randomly chooses a label information, and belong to same described in deletion
Other label informations in the first kind label information of classification and mutual exclusion.
If the number belonging to the first kind label information of same category and mutual exclusion described in mode 2 is more than 1, according at least
The first kind label information of one other classification, selects from the described first kind label information belonging to same category and mutual exclusion
One label information, and described in deletion, belong to other label informations in the first kind label information of same category and mutual exclusion.
In which, can first kind label informations based on other classifications, belong to same category and mutual exclusion from described
First kind label information in select a label information so that the first kind label information remained is more accurate.
Illustrate, if the first kind label information belonging to class of languages and mutual exclusion in the label information of certain song includes
Korean, Japanese and Chinese, can select according to the first kind label information that singer's name in this song is corresponding the most further, tool
Body: if the entitled Chinese of singer, then from the first kind label information belonging to class of languages, select Chinese, if singer is entitled Korea Spro
Literary composition, then select Korean from the first kind label information belonging to class of languages, if the entitled Japanese of singer, then from belonging to class of languages
First kind label information in select Japanese.And for example, it is also possible to according to the first kind label information that song title in this song is corresponding
Select, concrete: if the entitled Chinese of song, then from the first kind label information belonging to class of languages, to select Chinese, if
The entitled Korean of song, then select Korean from the first kind label information belonging to class of languages, if the entitled Japanese of song, then from returning
Belong to selection Japanese in the first kind label information of class of languages.
Based on any of the above-described embodiment, in a kind of possible embodiment, described method also includes:
According to user's network log to the operation that multi-medium data performs, determine multi-medium data operated by user
Equations of The Second Kind label information.
Concrete, use rule, with really by the analysis of network log is formed modify (amendment) based on user behavior
Determine the Equations of The Second Kind label information of multi-medium data, thus (include that first kind label is believed at label information based on multi-medium data
Breath and Equations of The Second Kind label information) it is user when recommending multi-medium data, accuracy is higher.
Optionally, according to user's network log to the operation that multi-medium data performs, periodically determine operated by user
The Equations of The Second Kind label information of multi-medium data.I.e. for each setting cycle, it is used for representing user according in this setting cycle
Network log to the operation that multi-medium data performs, to determine that the Equations of The Second Kind label of the multi-medium data operated by user is believed
Breath.Such as, the network log of every day is added up, to determine the Equations of The Second Kind label information of the multi-medium data operated by user.
Optionally, described network log includes but not limited at least one in following information:
The operated identification information of multi-medium data, the identification information of performed operation, perform the time letter of operation
Breath, the label information (including first kind label information and Equations of The Second Kind label information) of operated multi-medium data.
Optionally, the operation performed multi-medium data includes but not limited to following operation: collection operation, deletion action,
Play operation etc..
In a kind of possible embodiment, according to user's network log to the operation that multi-medium data performs, determine use
The Equations of The Second Kind label information of the multi-medium data operated by family, including:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding
Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K
Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises
First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes
Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute
State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it
Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines
Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log
Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
Concrete, for each daily record set, in chronological order, determine the network log pair in described daily record set successively
Whether the first kind label information of the multi-medium data answered comprises described specific label information.Further, if jth time is determined
After the multi-medium data having P1 network log corresponding continuously comprises described specific label information, have continuously less than the 4th threshold value
The multi-medium data that individual network log is corresponding does not comprise described specific label information, and jth is determined for+1 time the most again has continuously
The multi-medium data that P2 network log is corresponding comprises described specific label information, then described specific label information is defined as institute
State in the Equations of The Second Kind label information of the multi-medium data corresponding less than the 4th threshold value network log;If jth time is determined continuously
After the multi-medium data having P1 network log corresponding comprises described specific label information, have continuously more than or equal to the 4th threshold
The multi-medium data that value network log is corresponding does not comprise described specific label information, and jth is determined for+1 time continuously the most again
The multi-medium data having P2 network log corresponding comprises described specific label information, does not the most make any process, continues to determine and connects
Whether the label information of the multi-medium data that the network log that gets off is corresponding comprises described specific label information, and repeats above-mentioned
Process, until last network log in described daily record set.
Optionally, can be according to the temporal information of network log, in units of the time, network log is divided into multiple net
Network daily record group;Such as, divided in units of 1 hour;Comprised net is determined again from the network log group divided
The number of network daily record multi-medium data more than K and operated by least K/A network log comprises the net of specific label information
Network daily record group (the most described daily record set).Can also be according to the temporal information of network log, in units of the number of network log,
Network log is divided into multiple network log group;Such as, divide in units of described K;Again from the network day divided
Will group being determined in comprised network log, the multi-medium data operated by least K/A network log comprises specific label
The network log group (the most described daily record set) of information.
Optionally, owing to not every user operation all has regularity, therefore, it can, based on setting operation, determine
The Equations of The Second Kind label information of multi-medium data, so that determining that the Equations of The Second Kind label information of multi-medium data is more accurate.Such as,
Based on play operation, determine the Equations of The Second Kind label information of multi-medium data.
Optionally, in order to make the Equations of The Second Kind specific label information determined the most accurate, described method also includes:
For with the addition of arbitrary multi-medium data of Equations of The Second Kind label information, if the daily record set of processed setting quantity
In, comprise and with the addition of the quantity of this multi-medium data of arbitrary Equations of The Second Kind label information and be unsatisfactory for the quantity of this multi-medium data
The constraints set, deletes the described arbitrary Equations of The Second Kind label information in this multi-medium data.
Wherein, the quantity of this multi-medium data is not add this multi-medium data of described arbitrary Equations of The Second Kind label information
The quantity sum of quantity and this multi-medium data that with the addition of described arbitrary Equations of The Second Kind label information.
Concrete, based on multiple daily record set, in order to Equations of The Second Kind label information determined by judging is the most accurate, for adding
Add arbitrary multi-medium data of arbitrary Equations of The Second Kind label information, if in the daily record set of processed setting quantity.If comprising
With the addition of the quantity of this multi-medium data of this Equations of The Second Kind label information and the quantity of this multi-medium data and meet the constraint of setting
Condition, then it is assumed that this multi-medium data should comprise this Equations of The Second Kind label information;This Equations of The Second Kind label information is with the addition of if comprising
The quantity of this multi-medium data and the quantity of this multi-medium data be unsatisfactory for described constraints, then it is assumed that this multi-medium data
Should not comprise this Equations of The Second Kind label information.
Optionally, the constraints set as: comprise this multi-medium data that with the addition of arbitrary Equations of The Second Kind label information
Quantity is more than the half of the quantity of this multi-medium data.
Illustrate, network log is carried out time cutting, multiple day will be divided in units of the time by network log
Will set, carries out dependence based on time context for each daily record set and extracts and analyze, to each daily record set
The specific label information that multi-medium data operated by middle network log comprises, in chronological order, determines maximization scene collection
Close Ctag, specific as follows:
1) the Article 1 network log in chronological order, including this specific label information tag in this daily record set is opened
Begin, carry out the expansion of scene set, it is assumed that the Article 1 network log including tag is the gi in this daily record set, will
The gi network log adds to Ctag;
2) if the gi+1 network log also comprises tag, then the gi+1 network log is added to Ctag, repeat to hold
This step of row;Otherwise enter step 3);
3) start to search for backward the number of the network log not comprising tag continuously from gi+1, if the sequence not comprising tag is
Gi+1, gi+2 ..., gi+k, then continue to start to search for backward the network log comprising tag continuously from gi+k+1, be set to gi+k+p
Individual.If p > k/2, then gi+1 to gi+k+p network log is added Ctag, otherwise enter step 4.
4) gi+k+1 is started the initial note as this scene set of first network log comprising tag backward
Record, if this network log is not last network log in daily record set, returns step 2), otherwise enter step 5).
5) if the size of Ctag is more than minimum scene queue thresholds Φ, then all networks not comprising tag in Ctag
The characteristic information of the multi-medium data operated by daily record all adds this tag and the mark of first network log comprising tag
Knowledge information, otherwise this loop ends.
Further, the multi-medium data of all tag of with the addition of is calculated, if with the addition of the multimedia messages of tag
Occurrence number more than the 1/2 of this multimedia messages occurrence number, then adds this tag attribute and current time for this multimedia messages
As modifytime attribute.
Optionally, described method also includes:
In the multi-medium data that with the addition of Equations of The Second Kind label information, record adds the time of described Equations of The Second Kind label information
Information;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind mark
Label information.
Concrete, constantly circulate, during overall situation dynamic scene management, time attribute modifytime that this stage is added
(i.e. adding the temporal information of tag) verifies, if the time attribute of arbitrary multimedia messages is beyond effect duration threshold value, then deletes
Except the Equations of The Second Kind label information added in this multimedia messages and time attribute.
Based on any of the above-described embodiment, the embodiment of the present invention can generate described tree in accordance with the following steps:
According to the label classification belonging to the label information of the sample data being pre-configured with, described sample data is divided into
Few two data acquisition systems, a branch tree of the corresponding described tree of each described data acquisition system;
Spy for each described data acquisition system, belonging to the characteristic information of the sample data comprised according to described data acquisition system
Levy classification, the sample data in described data acquisition system is divided at least one classification group, and calculates the information of each classification group
Ratio of profit increase, described information gain-ratio is that the comentropy of characteristic information based on the sample data in described classification group determines;Depend on
The classification group of secondary selection information gain-ratio maximum is as Split Attribute, the sample comprised according to the classification group that information gain-ratio is maximum
The characteristic information of data, builds the Rule of judgment branch of described branch tree, and the leaf node in described Rule of judgment branch is institute
State the label information of sample data in data acquisition system.
Wherein, the information gain-ratio of classification group is the biggest, then the priority of the Rule of judgment branch that category group is corresponding is the highest.
Such as, as a example by song, the feature classification belonging to characteristic information include but not limited to musical instrument class, year of issue for class,
Singer's special edition class etc..And for example, as a example by opera, the feature classification belonging to characteristic information include but not limited to singing style class, musical instrument class,
Role class etc..
Optionally, when building the Rule of judgment branch of described branch tree, if the classification group that current information ratio of profit increase is maximum
Including at least two, then select a sorted group as Split Attribute from described at least two sorted group.Such as, from described to
Few two sorted group randomly choose a sorted group as Split Attribute.
In a kind of possible embodiment, according to the label classification belonging to the label information of the sample data being pre-configured with,
Described sample data is divided at least two data acquisition system, specifically includes:
According to the characteristic information of any two sample data, determine the coverage rate of described any two sample data;
If described coverage rate is more than the 5th threshold value set, described any two sample data is merged, form number
According to group, and return the step determining coverage rate, until after the coverage rate determined is respectively less than or is equal to described 5th threshold value, will be every
The individual data set finally given is defined as a data acquisition system.
Said process is referred to as preshearing and props up process, the most first using sample data as the leaf node of tree, then to it
Carry out preshearing to prop up so that incoherent sample data is assigned in different branches tree, specific as follows:
A) coverage rate (coverage_rate) between any two sample data.
The characteristic information sequence assuming sample data 1 is L1=[l11,l12,l13...], the characteristic information sequence of sample data 2
It is classified as L2=[l21,l22,l23...], then coverage rate coverage_rate between sample data 1 and sample data 212=2*len
(L1 ∩ L2)/len (L1+L2), wherein, len (L1 ∩ L2) represents the element number comprised in L1 ∩ L2, and len (L1+L2) represents
The element number sum comprised in the element number comprised in L1 and L2.
B) by coverage_rate > two sample datas of ω (the i.e. the 5th threshold value) merge.
C) coverage rate is recalculated.
D) step b and c are repeated until not having annexable sample data or data set.
The number of initial branch tree can be reduced by above-mentioned predictive pruning process, thus reduce the calculating of follow-up decision classification
Amount, improves the treatment effeciency of follow-up decision classification.
Optionally, carrying out before preshearing props up, also including:
The characteristic information of sample data is normalized;
Assimilation is gone to process the characteristic information of normalized sample data.
Wherein, normalized is the characteristic information by belonging to same feature classification in the characteristic information of all sample datas
Carry out normalizing, such as, Bruce, rhythm and blues, Blues, these characteristic informations of R&B are normalized to Bruce type.Go assimilation
Process is that characteristic information sample datas all in the characteristic information of sample data all comprised is rejected.
In a kind of possible embodiment, according to the characteristic information of any two sample data, determine described any two
The coverage rate of sample data, including:
For any two sample data, determine described any two sample data characteristic information common factor in comprise
The number M of characteristic information, and determine the total number N2 of the characteristic information of described any two sample data;And according to described M
With the ratio of described N2, determine the coverage rate of described any two sample data;Or
For any two sample data, determine described any two sample data characteristic information common factor in comprise
The number M of characteristic information, and determine described any two sample data characteristic information and concentrate the characteristic information comprised
Number N 2;And the ratio according to described M Yu described N2, determine the coverage rate of described any two sample data.
Such as, by the ratio of described M Yu described N2, it is defined as the coverage rate of described any two sample data.And for example, will
The ratio of described M and described N2 is multiplied by 2 values obtained, and is defined as the coverage rate of described any two sample data.And for example, by institute
State the ratio of M and described N2, be defined as the coverage rate of described any two sample data.
In the embodiment of the present invention, the information gain-ratio of each classification group may be used for judging the sample data in category group
Characteristic information, for determining the ability that label information is classified.Wherein, the breath ratio of profit increase of classification group is the biggest, then the category is described
The characteristic information of the sample data in group is the strongest for determining the ability that label information is classified.Selected by information gain-ratio
Split Attribute builds Rule of judgment branch such that it is able to overcomes and can be partial to attribute when selecting Split Attribute by information gain
Many classifications are as the deficiency of Split Attribute.
In a kind of possible embodiment, according to equation below, calculate the information gain-ratio of each classification group:
Wherein, A represents the tag set that each data acquisition system is corresponding, and C represents every class label in A, and ε is all categories group
The meansigma methods of SplitInfo (A, C);N is the number of classification group, mnFor the number of sample data in classification group, C represents that every class is special
Reference ceases, and c represents the characteristic information in C, numcRepresent the number of the sample data comprising characteristic information c, numxRepresent data set
The number of the sample data comprised in conjunction.
Wherein, E (c)=sum (-p (I) * log (p (I))), I=1,2 ..., X, X represent according to the classification gauge being pre-configured with
Then divide the number (i.e. the number of data acquisition system) of the data set obtained.P (I) represents that in sample data, characteristic information c occurs in
Probability in i-th data acquisition system.Such as, sample data there is, the number of the sample data of characteristic information a is s, s sample
In data classification be the number of the sample data of I be m, then p (I)=m/s.
Illustrate, illustrate as a example by voice data, it is assumed that have 8 sample datas, be designated as m1~m8.First, root
According to the description information of sample data, the characteristic information of sample data can be got, the characteristic information got is carried out
After polymerization, obtain each sample data characteristic of correspondence information sequence, specific as follows:
M1:[Bruce, acoustic guitar, electric guitar, mouth organ, harmony, chant, impromptu, original, Blues, comfort, nervous,
Complain tearfully, discongest];
M2:[rock and roll, passion, rhythm, frame drum, bass, solo, guitar];
M3:[Lu opera, your pupil, the role of a young woman in traditional opera, clown, weight qin, loud and clear];
M4:[light music, releives, graceful, easily, comfortable, piano, violoncello, saxophone, harp, flute, clarinet,
Panpipes, violin, popular musical instrument, Flos Daturae, mouth organ, accordion, xylophone, three foot ferrum, a pair of hand-held bells played by striking together, husky hammer, Mantovani];
M5:[Beijing opera, raw, denier, only, ugly, Mei Pai, Ma Pai, parts in Beijing opera spoken in Beijing dialect, Chinese fiddle];
M6:[Ban get Rui, releives, graceful, easily, comfortable, violin, piano, xylophone, three foot ferrum];
M7:[Beijing opera, gong and drum, loud and clear, raw, denier, clean];
M8:[Lu opera, gong and drum, loud and clear, the role of a young woman in traditional opera, clown].
First element in features described above sequence is this sample data generic, is not involved in subsequent treatment.
Then, calculate the coverage rate between any two sample data, based on the coverage rate obtained, sample data is closed
And, it is assumed that above-mentioned 8 sample datas are finally divided into three data acquisition systems: [(m1, m2), (m3, m5, m7, m8), (m4, m6)],
As in figure 2 it is shown, three branch trees will be divide into whole decision tree.For each branch tree, calculate each class in this branch tree
Comentropy E do not organized:
As a example by data acquisition system (m3, m5, m7, m8), the label information of these four sample datas only has two, i.e. Lu opera and
Beijing opera, in conjunction with the knowledge table being pre-configured with, calculates comentropy E of the characteristic information of sample data in (m3, m5, m7, m8), specifically
As follows:
E (your pupil, the role of a young woman in traditional opera, clown)=-(1/2) * log2(1/2)-(1/2)*log2(1/2)=1;
E (raw, denier, clean)=-(1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (weight qin)=-(1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (gong and drum)=-(1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (loud and clear)=-(1/3) log2(1/3)-(1/2)log2(1/2)=1.0266;
E (horse group)=-(1/1) log2(1/1)-(0/1)log2(0/1)=0.
Then, the information gain of of all categories group in calculating data acquisition system, particularly as follows: the sample in (m3, m5, m7, m8)
The characteristic information of data is divided into role class, singing style class and musical instrument class, wherein:
Gain (musical instrument)=E (musical instrument)-(2/4) E (weight qin)-(2/4) E (gong and drum)=-(1/2) log2(1/2)-(1/
2)log2(1/2)-1/2-1/2=0;
Gain (singing style)=E (singing style)-(1/4) E (horse group)-(3/4) E (loud and clear)=1-(1/4) * 0-(3/4) * 1.0266
=0.231;
Gain (role)=E (role)-(2/4) E (your pupil, the role of a young woman in traditional opera, clown)-(2/4) E (raw, denier, clean)=1-(2/4) *
1-(2/4) * 1=0.
Then, calculate the information gain-ratio of each classification group, particularly as follows:
GainRatio (musical instrument)=0;
GainRatio (role)=0;
GainRatio (singing style)=0.3113/ (SplitInfo+ ε)=0.3113/ (2.8775+1.63)=0.07.
Then, choose the maximum attribute of information gain-ratio successively to divide as Split Attribute, the Rule of judgment building branch tree
, structure is as shown in Figure 3.When building the Rule of judgment branch of branch tree, end condition is that arbitrary label all makes a distinction, or
The label of arbitrary sample data covers the degree of the classification group described in this label and reaches coverage rate threshold value set in advance.
The embodiment of the present invention provide method processing procedure as shown in Figure 4, including:
1) pretreatment.Particularly as follows: when there being multimedia data entry, the description information to multi-medium data, carry out pre-place
Reason (includes cleaning, polymerization etc. processes), obtains the characteristic information of this multi-medium data;
2) the differentiation stage based on tree.Particularly as follows: according to the characteristic information of this multi-medium data, based on pre-Mr.
The tree become, determines the label information of this multi-medium data;
3) process based on Modify rule.Particularly as follows: according to Modify rule, update this multi-medium data for the first time
Label information;Wherein, Modify rule analysis network log obtains.
4) process based on mutual exclusion rule.Particularly as follows: according to mutual exclusion rule, second time updates the label of this multi-medium data
Information, and this multi-medium data is stored in data base;Optionally, mutual exclusion table can be analyzed network log and obtains, also
Can be pre-configured with.
5) output resource.Particularly as follows: according to the label information of multi-medium data, based on user preference, from data base be
User recommends multi-medium data.
Illustrate, it is assumed that the characteristic information of the multi-medium data mi being currently entered for [loud and clear, gong and drum, raw, denier, only,
Ugly, urheen], mi is inputted the decision-tree model shown in Fig. 3, calculates the coverage rate of mi and three branch trees of decision tree respectively,
As long as coverage rate is more than the first threshold set, this mi i.e. flows to this branch tree, and such a multi-medium data can belong to multiple
Branch tree, so that a multi-medium data can have multiple different classes of label information.It is computed assuming that mi is only divided
In the branch tree that (m3, m5, m7, m8) is corresponding, further: the characteristic information first determining mi singing style class is [loud and clear], and divides
The not ground floor Rule of judgment branch with this branch tree compares, the most respectively calculating and the [horse of ground floor Rule of judgment branch
Group] and the coverage rate of [loud and clear], determine that mi and right side [loud and clear] branch covering rate, more than left side, then divide on the right side of this mi entrance
?.In like manner, carry out subsequent branches until mi enters Beijing opera leaf node, add Beijing opera label information for mi.
During adaptive phase based on user's explicit feedback, if mi occurs in certain scene set: [[t1, m1, g1], [t2,
M2, g2], [t3, m3, g3] ... [ti, mi, gi] ...], it is assumed that the label information of mi be tags:[tag1, tag2 ...],
Dinamictags:[{addtag: ' p.prand ', modifytime: ' 11:30:10 ' }, addtag: ' after supper ',
modifytime:‘17:20:00’}]}.The bar number scale of the network log calculating occurred mi is totlenum, calculates all
Ctag set with the addition of for mi the bar number of tag label, be designated as addnum, if addnum > (1/2) is totlenum, be then mi
Adding this label of tag, tag represents " p.prand " or " after supper " herein.
Said method handling process can realize with software program, and this software program can be stored in storage medium, when
When the software program of storage is called, perform said method step.
Based on same inventive concept, the embodiment of the present invention additionally provides the processing means of a kind of multi-medium data, due to
The principle that this device solves problem is similar to the processing method of above-mentioned a kind of multi-medium data, and therefore the enforcement of this device can be joined
The enforcement of square method, repeats no more in place of repetition.
In embodiment shown in Fig. 5, it is provided that the processing means of a kind of multi-medium data, including:
Receiver module 51, for receiving pending multi-medium data;
Branch tree determines module 52, for the characteristic information according to described multi-medium data and the tree previously generated
Each branch tree characteristic of correspondence information, determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate
For representing the similarity degree of described multi-medium data and each branch tree;
Branch determines module 53, for determining the described coverage rate branch tree more than the first predetermined threshold value, and from described point
In the Rule of judgment branch that Zhi Shu comprises, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
Label determines module 54, for by the value of the leaf node in described Rule of judgment branch, being defined as described many matchmakers
The first kind label information of volume data.
Optionally, described branch determine module 52 specifically for:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data with described
The Rule of judgment of Rule of judgment branch mates;
If at least one characteristic information of described multi-medium data mates, then with the Rule of judgment of arbitrary Rule of judgment branch
Determine that the characteristic information of described multi-medium data meets described arbitrary Rule of judgment branch.
Optionally, described label determines that module 54 is additionally operable to:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion
Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation
A first kind label information in the first kind label information of same category and mutual exclusion.
Based on any of the above-described embodiment, optionally, described label determines that module 54 is additionally operable to:
According to user's network log to the operation that multi-medium data performs, determine multi-medium data operated by user
Equations of The Second Kind label information.
In a kind of possible embodiment, described label determine module 54 specifically for:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding
Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K
Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises
First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes
Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute
State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it
Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines
Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log
Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
Further, described label determines that module 54 is additionally operable to:
In the multi-medium data that with the addition of Equations of The Second Kind label information, record adds the time of described Equations of The Second Kind label information
Information;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind mark
Label information.
Based on any of the above-described embodiment, optionally, described branch tree determine module 52 specifically for:
For described branch tree, determine the characteristic information of described multi-medium data and the friendship of described branch tree characteristic of correspondence
Concentrate the number M of the characteristic information comprised;
Determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and concentration comprises
The number N 1 of characteristic information, and according to the ratio of described M Yu described N1, determines covering of described multi-medium data and described branch tree
Lid rate;Or determine the number of number and the described branch tree characteristic of correspondence information of the characteristic information of described multi-medium data
Total number N2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
Based on any of the above-described embodiment, optionally, described device also includes:
MBM 55, for generating described tree in accordance with the following steps:
According to the label classification belonging to the label information of the sample data being pre-configured with, described sample data is divided into
Few two data acquisition systems, a branch tree of the corresponding described tree of each described data acquisition system;
Spy for each described data acquisition system, belonging to the characteristic information of the sample data comprised according to described data acquisition system
Levy classification, the sample data in described data acquisition system is divided at least one classification group, and calculates the information of each classification group
Ratio of profit increase, described information gain-ratio is that the comentropy of characteristic information based on the sample data in described classification group determines;Depend on
The classification group of secondary selection information gain-ratio maximum is as Split Attribute, the sample comprised according to the classification group that information gain-ratio is maximum
The characteristic information of data, builds the Rule of judgment branch of described branch tree, and the leaf node in described Rule of judgment branch is institute
State the label information of sample data in data acquisition system.
Optionally, described MBM 55 specifically for:
According to the characteristic information of any two sample data, determine the coverage rate of described any two sample data;
If described coverage rate is more than the 5th threshold value set, described any two sample data is merged, form number
According to group, and return the step determining coverage rate, until after the coverage rate determined is respectively less than or is equal to described 5th threshold value, will be every
The individual data set finally given is defined as a data acquisition system.
In the present embodiment, receiver module 51, branch tree determines that module 52, branch determine that module 53, label determine module
54 and MBM 55 be to present with the form of functional module.Here " module " can refer to ASIC
(application-specific integrated circuit, ASIC), circuit, perform one or more software or firmware
The processor of program and memorizer, integrated logic circuit, and/or other can provide the device of above-mentioned functions.Simple at one
Embodiment in, those skilled in the art is it is contemplated that receiver module 51 and MBM 55 can be by computer equipments
Processor, memorizer and input interface etc. realize, and branch tree determines that module 52, branch determine that module 53 and label determine module
54 can be realized by the processor of computer equipment and memorizer etc..
Those skilled in the art are it should be appreciated that embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the reality in terms of the present invention can use complete hardware embodiment, complete software implementation or combine software and hardware
Execute the form of example.And, the present invention can use at one or more computers wherein including computer usable program code
The upper computer program product implemented of usable storage medium (including but not limited to disk memory, CD-ROM, optical memory etc.)
The form of product.
The present invention is with reference to method, equipment (system) and the flow process of computer program according to embodiments of the present invention
Figure and/or block diagram describe.It should be understood that can the most first-class by computer program instructions flowchart and/or block diagram
Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
Instruction arrives the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce
A raw machine so that the instruction performed by the processor of computer or other programmable data processing device is produced for real
The device of the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame now.
These computer program instructions may be alternatively stored in and computer or other programmable data processing device can be guided with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in this computer-readable memory produces and includes referring to
Make the manufacture of device, this command device realize at one flow process of flow chart or multiple flow process and/or one square frame of block diagram or
The function specified in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that at meter
Perform sequence of operations step on calculation machine or other programmable devices to produce computer implemented process, thus at computer or
The instruction performed on other programmable devices provides for realizing at one flow process of flow chart or multiple flow process and/or block diagram one
The step of the function specified in individual square frame or multiple square frame.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation
Property concept, then can make other change and amendment to these embodiments.So, claims are intended to be construed to include excellent
Select embodiment and fall into all changes and the amendment of the scope of the invention.
Obviously, those skilled in the art can carry out various change and the modification essence without deviating from the present invention to the present invention
God and scope.So, if these amendments of the present invention and modification belong to the scope of the claims in the present invention and equivalent technologies thereof
Within, then the present invention is also intended to comprise these change and modification.
Claims (11)
1. the processing method of a multi-medium data, it is characterised in that described method includes:
Receive pending multi-medium data;
Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence information of tree previously generated,
Determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate be used for representing described multi-medium data with
The similarity degree of each branch tree;
Determine the described coverage rate branch tree more than the first predetermined threshold value, and the Rule of judgment branch comprised from described branch tree
In, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label information of described multi-medium data.
Method the most according to claim 1, it is characterised in that from the Rule of judgment branch that described branch tree comprises, really
The Rule of judgment branch that the characteristic information of fixed described multi-medium data meets, including:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data and described judgement
The Rule of judgment of conditional branching mates;
If at least one characteristic information of described multi-medium data mates with the Rule of judgment of arbitrary Rule of judgment branch, it is determined that
The characteristic information of described multi-medium data meets described Rule of judgment branch.
Method the most according to claim 1, it is characterised in that described method also includes:
In the first kind label information of described multi-medium data, determine the first kind label letter belonging to same category and mutual exclusion
Breath;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to same described in reservation
A first kind label information in the first kind label information of classification and mutual exclusion.
4. according to the method described in any one of claims 1 to 3, it is characterised in that described method also includes:
According to user, multi-medium data is performed the network log of operation, determines the Equations of The Second Kind of multi-medium data operated by user
Label information.
Method the most according to claim 4, it is characterised in that according to user, multi-medium data is performed the network day of operation
Will, determines the Equations of The Second Kind label information of multi-medium data operated by user, including:
For each daily record set, sequentially in time, determine and belong to the multi-medium data that the network log of same operation is corresponding
Whether comprising specific label information, the number of the network log that described daily record set comprises is the integer more than 0 more than K, K, institute
Stating specific label information is the first mark that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises
Label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label information, jth
Determine that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in described daily record
Net between the network log that the network log time determined in jth sequentially in time in set and jth are determined for+1 time
Described specific label information, less than the 4th threshold value set, is defined as being positioned at the network that jth time is determined by the number of network daily record
The Equations of The Second Kind label information of the multi-medium data that network log between the network log that daily record and i+1 time are determined is corresponding, j
=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
Method the most according to claim 5, it is characterised in that described method also includes:
In the multi-medium data that with the addition of Equations of The Second Kind label information, record adds the time letter of described Equations of The Second Kind label information
Breath;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind label letter
Breath.
Method the most according to claim 1, it is characterised in that according to characteristic information and pre-Mr. of described multi-medium data
Each branch tree characteristic of correspondence information of the tree become, determines the coverage rate of described multi-medium data and each branch tree, bag
Include:
For described branch tree, determine in the characteristic information of described multi-medium data and the common factor of described branch tree characteristic of correspondence
The number M of the characteristic information comprised;
That determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and that concentration comprises feature
The number N 1 of information, and according to the ratio of described M Yu described N1, determine the covering of described multi-medium data and described branch tree
Rate;Or determine number and the number of described branch tree characteristic of correspondence information total of the characteristic information of described multi-medium data
Number N 2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
8. the processing means of a multi-medium data, it is characterised in that described device includes:
Receiver module, for receiving pending multi-medium data;
Branch tree determines module, is used for each point of the characteristic information according to described multi-medium data and the tree previously generated
Propping up tree characteristic of correspondence information, determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate is used for table
Show the similarity degree of described multi-medium data and each branch tree;
Branch determines module, for determining the described coverage rate branch tree more than the first predetermined threshold value, and from described branch Shu Bao
In the Rule of judgment branch contained, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
Label determines module, for by the value of the leaf node in described Rule of judgment branch, being defined as described multi-medium data
First kind label information.
Device the most according to claim 8, it is characterised in that described branch determine module specifically for:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data and described judgement
The Rule of judgment of conditional branching mates;
If at least one characteristic information of described multi-medium data mates with the Rule of judgment of arbitrary Rule of judgment branch, it is determined that
The characteristic information of described multi-medium data meets described Rule of judgment branch.
Device the most according to claim 8, it is characterised in that described label determines that module is additionally operable to:
In the first kind label information of described multi-medium data, determine the first kind label letter belonging to same category and mutual exclusion
Breath;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to same described in reservation
A first kind label information in the first kind label information of classification and mutual exclusion.
11. devices according to claim 8, it is characterised in that described branch tree determine module specifically for:
For described branch tree, determine in the characteristic information of described multi-medium data and the common factor of described branch tree characteristic of correspondence
The number M of the characteristic information comprised;
That determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and that concentration comprises feature
The number N 1 of information, and according to the ratio of described M Yu described N1, determine the covering of described multi-medium data and described branch tree
Rate;Or determine number and the number of described branch tree characteristic of correspondence information total of the characteristic information of described multi-medium data
Number N 2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610601570.5A CN106294563B (en) | 2016-07-27 | 2016-07-27 | A kind for the treatment of method and apparatus of multi-medium data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610601570.5A CN106294563B (en) | 2016-07-27 | 2016-07-27 | A kind for the treatment of method and apparatus of multi-medium data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106294563A true CN106294563A (en) | 2017-01-04 |
CN106294563B CN106294563B (en) | 2019-09-17 |
Family
ID=57662641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610601570.5A Active CN106294563B (en) | 2016-07-27 | 2016-07-27 | A kind for the treatment of method and apparatus of multi-medium data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106294563B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536787A (en) * | 2018-03-29 | 2018-09-14 | 优酷网络技术(北京)有限公司 | content identification method and device |
CN109739955A (en) * | 2019-01-24 | 2019-05-10 | 北京诸葛找房信息技术有限公司 | Source of houses label automatic extracting device and its method based on participle with multimode matching |
CN112395261A (en) * | 2019-08-16 | 2021-02-23 | 中国移动通信集团浙江有限公司 | Service recommendation method and device, computing equipment and computer storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894125A (en) * | 2010-05-13 | 2010-11-24 | 复旦大学 | Content-based video classification method |
CN102262659A (en) * | 2011-07-15 | 2011-11-30 | 北京航空航天大学 | Audio label disseminating method based on content calculation |
US8504573B1 (en) * | 2008-08-21 | 2013-08-06 | Adobe Systems Incorporated | Management of smart tags via hierarchy |
CN104317891A (en) * | 2014-10-23 | 2015-01-28 | 华为软件技术有限公司 | Method and device for tagging pages |
CN104657422A (en) * | 2015-01-16 | 2015-05-27 | 北京邮电大学 | Classification decision tree-based intelligent content distribution classification method |
CN104794179A (en) * | 2015-04-07 | 2015-07-22 | 无锡天脉聚源传媒科技有限公司 | Video quick indexing method and device based on knowledge tree |
CN105072460A (en) * | 2015-07-15 | 2015-11-18 | 中国科学技术大学先进技术研究院 | Information annotation and association method, system and device based on VCE |
-
2016
- 2016-07-27 CN CN201610601570.5A patent/CN106294563B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8504573B1 (en) * | 2008-08-21 | 2013-08-06 | Adobe Systems Incorporated | Management of smart tags via hierarchy |
CN101894125A (en) * | 2010-05-13 | 2010-11-24 | 复旦大学 | Content-based video classification method |
CN102262659A (en) * | 2011-07-15 | 2011-11-30 | 北京航空航天大学 | Audio label disseminating method based on content calculation |
CN104317891A (en) * | 2014-10-23 | 2015-01-28 | 华为软件技术有限公司 | Method and device for tagging pages |
CN104657422A (en) * | 2015-01-16 | 2015-05-27 | 北京邮电大学 | Classification decision tree-based intelligent content distribution classification method |
CN104794179A (en) * | 2015-04-07 | 2015-07-22 | 无锡天脉聚源传媒科技有限公司 | Video quick indexing method and device based on knowledge tree |
CN105072460A (en) * | 2015-07-15 | 2015-11-18 | 中国科学技术大学先进技术研究院 | Information annotation and association method, system and device based on VCE |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536787A (en) * | 2018-03-29 | 2018-09-14 | 优酷网络技术(北京)有限公司 | content identification method and device |
CN109739955A (en) * | 2019-01-24 | 2019-05-10 | 北京诸葛找房信息技术有限公司 | Source of houses label automatic extracting device and its method based on participle with multimode matching |
CN112395261A (en) * | 2019-08-16 | 2021-02-23 | 中国移动通信集团浙江有限公司 | Service recommendation method and device, computing equipment and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106294563B (en) | 2019-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Interiano et al. | Musical trends and predictability of success in contemporary songs in and out of the top charts | |
US7788279B2 (en) | System and method for storing and retrieving non-text-based information | |
US7544881B2 (en) | Music-piece classifying apparatus and method, and related computer program | |
US7624012B2 (en) | Method and apparatus for automatically generating a general extraction function calculable on an input signal, e.g. an audio signal to extract therefrom a predetermined global characteristic value of its contents, e.g. a descriptor | |
CN103793446A (en) | Music video generation method and system | |
CN108255840B (en) | Song recommendation method and system | |
CN109299245B (en) | Method and device for recalling knowledge points | |
CN107464555A (en) | Background sound is added to the voice data comprising voice | |
CN101470732A (en) | Auxiliary word stock generation method and apparatus | |
Smith et al. | Towards a Hybrid Recommendation System for a Sound Library. | |
CN108766451B (en) | Audio file processing method and device and storage medium | |
CN107918657A (en) | The matching process and device of a kind of data source | |
CN105679324A (en) | Voiceprint identification similarity scoring method and apparatus | |
Rizo et al. | A Pattern Recognition Approach for Melody Track Selection in MIDI Files. | |
CN106294563B (en) | A kind for the treatment of method and apparatus of multi-medium data | |
CN111444380A (en) | Music search sorting method, device, equipment and storage medium | |
US20090132508A1 (en) | System and method for associating a category label of one user with a category label defined by another user | |
CN111078859A (en) | Author recommendation method based on reference times | |
Mueller | Where’d you get that idea? Determinants of creativity and impact in popular music | |
CN109471951A (en) | Lyrics generation method, device, equipment and storage medium neural network based | |
KR20200129873A (en) | Test music recommendation apparatus, music source meta data building apparatus | |
Konev et al. | The program complex for vocal recognition | |
CN114461885A (en) | Song quality evaluation method, device and storage medium | |
Vatolkin et al. | Partition based feature processing for improved music classification | |
Shier et al. | Spiegelib: An automatic synthesizer programming library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |