CN106294563A - A kind for the treatment of method and apparatus of multi-medium data - Google Patents

A kind for the treatment of method and apparatus of multi-medium data Download PDF

Info

Publication number
CN106294563A
CN106294563A CN201610601570.5A CN201610601570A CN106294563A CN 106294563 A CN106294563 A CN 106294563A CN 201610601570 A CN201610601570 A CN 201610601570A CN 106294563 A CN106294563 A CN 106294563A
Authority
CN
China
Prior art keywords
medium data
information
branch
label information
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610601570.5A
Other languages
Chinese (zh)
Other versions
CN106294563B (en
Inventor
胡伟凤
高雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Co Ltd
Original Assignee
Hisense Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Co Ltd filed Critical Hisense Group Co Ltd
Priority to CN201610601570.5A priority Critical patent/CN106294563B/en
Publication of CN106294563A publication Critical patent/CN106294563A/en
Application granted granted Critical
Publication of CN106294563B publication Critical patent/CN106294563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the treating method and apparatus of a kind of multi-medium data, the label information solving artificial mark multi-medium data can expend huge manpower and time, and the problem that accuracy rate is relatively low.Method includes: receive pending multi-medium data;Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence information of tree previously generated, determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate is for representing the similarity degree of described multi-medium data and each branch tree;Determine the described coverage rate branch tree more than the first predetermined threshold value, and from the Rule of judgment branch that described branch tree comprises, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label information of described multi-medium data such that it is able to determine the label information of multi-medium data quickly and accurately.

Description

A kind for the treatment of method and apparatus of multi-medium data
Technical field
The present invention relates to areas of information technology, particularly to the treating method and apparatus of a kind of multi-medium data.
Background technology
Under information technology leads, multi-medium data presents explosive growth, and Appropriate application multi-medium data can make intelligence The service of energy interactive system reaches the effect got twice the result with half the effort.The human-computer interaction interface that user can be provided by intelligent interactive system Realizing mutual, therefore, user is the service object of intelligent interactive system, is again the significant data source of intelligent interactive system.
Intelligent interactive system under big data background can recommend its sense emerging for user in the multi-medium data of magnanimity The multi-medium data of interest.Intelligent interactive system is the label information according to multi-medium data, recommends multi-medium data for user, Therefore, only possess label information accurately, intelligent interactive system just can be made to recommend suitable multimedia number for user accurately According to.In existing music player, it is all by musical expert, is manually that each voice data in its music libraries is (such as song, play Bent etc.) add label information, in order to and music player can be according to the label information of each voice data, for using this music The user of player recommends the contents such as its song interested, opera.But it is timely that artificial notation methods can expend huge manpower Between, accuracy rate is relatively low.
Summary of the invention
Embodiments provide the treating method and apparatus of a kind of multi-medium data, solve and manually mark multimedia The label information of data can expend huge manpower and time, and the problem that accuracy rate is relatively low.
First aspect, the present invention provides the processing method of a kind of multi-medium data, described method to include:
Receive pending multi-medium data;
Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence of tree previously generated Information, determines the coverage rate of described multi-medium data and each branch tree, and wherein, described coverage rate is used for representing described multimedia number According to the similarity degree with each branch tree;
Determine the described coverage rate branch tree more than the first predetermined threshold value, and the Rule of judgment comprised from described branch tree divides Zhi Zhong, determines the Rule of judgment branch that the characteristic information of described multi-medium data meets;
By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label letter of described multi-medium data Breath.
In a kind of possible embodiment, from the Rule of judgment branch that described branch tree comprises, determine described multimedia The Rule of judgment branch that the characteristic information of data meets, including:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data with described The Rule of judgment of Rule of judgment branch mates;
If at least one characteristic information of described multi-medium data mates, then with the Rule of judgment of arbitrary Rule of judgment branch Determine that the characteristic information of described multi-medium data meets described Rule of judgment branch.
In a kind of possible embodiment, described method also includes:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation A first kind label information in the first kind label information of same category and mutual exclusion.
In a kind of possible embodiment, described method also includes:
According to user's network log to the operation that multi-medium data performs, determine multi-medium data operated by user Equations of The Second Kind label information.
In a kind of possible embodiment, according to user's network log to the operation that multi-medium data performs, determine use The Equations of The Second Kind label information of the multi-medium data operated by family, including:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
In a kind of possible embodiment, described method also includes:
In the multi-medium data that with the addition of described Equations of The Second Kind label information, record adds described Equations of The Second Kind label information Temporal information;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind mark Label information.
In a kind of possible embodiment, according to characteristic information and the tree previously generated of described multi-medium data Each branch tree characteristic of correspondence information, determine the coverage rate of described multi-medium data and each branch tree, including:
For described branch tree, determine the characteristic information of described multi-medium data and the friendship of described branch tree characteristic of correspondence Concentrate the number M of the characteristic information comprised;
Determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and concentration comprises The number N 1 of characteristic information, and according to the ratio of described M Yu described N1, determines covering of described multi-medium data and described branch tree Lid rate;Or determine the number of number and the described branch tree characteristic of correspondence information of the characteristic information of described multi-medium data Total number N2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
Second aspect, the present invention also provides for the processing means of a kind of multi-medium data, and described device includes:
Receiver module, for receiving pending multi-medium data;
Branch tree determines module, for the characteristic information according to described multi-medium data and the tree previously generated Each branch tree characteristic of correspondence information, determines the coverage rate of described multi-medium data and each branch tree, and wherein, described coverage rate is used In the similarity degree representing described multi-medium data and each branch tree;
Branch determines module, for determining the described coverage rate branch tree more than the first predetermined threshold value, and from described branch In the Rule of judgment branch that tree comprises, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
Label determines module, for by the value of the leaf node in described Rule of judgment branch, being defined as described multimedia The first kind label information of data.
In a kind of possible embodiment, described branch determine module specifically for:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data with described The Rule of judgment of Rule of judgment branch mates;
If at least one characteristic information of described multi-medium data mates, then with the Rule of judgment of arbitrary Rule of judgment branch Determine that the characteristic information of described multi-medium data meets described Rule of judgment branch.
In a kind of possible embodiment, described label determines that module is additionally operable to:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation A first kind label information in the first kind label information of same category and mutual exclusion.
In a kind of possible embodiment, described label determines that module is additionally operable to:
According to user, multi-medium data is performed the network log of operation, determines the of multi-medium data operated by user Two class label informations.
In a kind of possible embodiment, described label determine module specifically for:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
In a kind of possible embodiment, described branch tree determine module specifically for:
For described branch tree, determine the characteristic information of described multi-medium data and the friendship of described branch tree characteristic of correspondence Concentrate the number M of the characteristic information comprised;
Determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and concentration comprises The number N 1 of characteristic information, and according to the ratio of described M Yu described N1, determines covering of described multi-medium data and described branch tree Lid rate;Or determine the number of number and the described branch tree characteristic of correspondence information of the characteristic information of described multi-medium data Total number N2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
In the multimedia data processing method of embodiment of the present invention offer and device, receive pending multi-medium data; Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence information of the tree previously generated, determine Described multi-medium data and the coverage rate of each branch tree;Determine described coverage rate more than the branch tree of the first predetermined threshold value, and from In the Rule of judgment branch that described branch tree comprises, determine that the Rule of judgment that the characteristic information of described multi-medium data meets divides ?;By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label information of described multi-medium data, from And the label information of multi-medium data can be determined quickly and accurately.Further, since coverage rate is more than the first threshold preset Branch tree number may more than one, accordingly, it is determined that the number of the label information of the multi-medium data gone out is more than one, The label information making multi-medium data covers more comprehensively, and label information based on multi-medium data carries out the accuracy recommended more High.
Accompanying drawing explanation
The schematic flow sheet of the processing method of a kind of multi-medium data that Fig. 1 provides for the embodiment of the present invention;
The schematic diagram of a kind of tree that Fig. 2 provides for the embodiment of the present invention;
The schematic diagram of the branch tree of a kind of tree that Fig. 3 provides for the embodiment of the present invention;
The schematic flow sheet of the processing method of the another kind of multi-medium data that Fig. 4 provides for the embodiment of the present invention;
The schematic diagram of the processing means of a kind of multi-medium data that Fig. 5 provides for the embodiment of the present invention.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under not making creative work premise, broadly falls into the scope of protection of the invention.
Below in conjunction with Figure of description, the embodiment of the present invention is described in further detail.Should be appreciated that described herein Embodiment be merely to illustrate and explain the present invention, be not intended to limit the present invention.
In embodiment shown in Fig. 1, it is provided that the processing method of a kind of multi-medium data, described method includes:
S11, receive pending multi-medium data;
In this step, the multi-medium data received can be that user uploads, it is also possible to obtains from data base, The mode obtaining multi-medium data is not defined by the embodiment of the present invention.
Optionally, described multi-medium data includes but not limited to: voice data (such as song, opera etc.) and video data (such as TV play, film etc.) etc..
Being as a example by song by multi-medium data, the characteristic information of the characteristic for characterizing song includes: song title, singer Name, use musical instrument, rhythm, beat, music type, pouplarity, write words, wrirte music, the crucial lyrics etc..Such as, mark (ID) be 001 the sequence that formed of the characteristic information of song be: [miss, Zheng Jun, popular, elopement, school, Bruce, rock and roll, Baidu, guitar, waist drum, saxophone, Chang'an].
S12, corresponding according to the characteristic information of described multi-medium data and each branch tree of the tree previously generated Characteristic information, determines the coverage rate of described multi-medium data and each branch tree, and wherein, described coverage rate is used for representing described many matchmakers Volume data and the similarity degree of each branch tree.
S13, determine described coverage rate more than the branch tree of the first predetermined threshold value, and the judgement bar comprised from described branch tree In part branch, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
S14, by the value of the leaf node in described Rule of judgment branch, be defined as the first kind mark of described multi-medium data Label information.
Concrete, for each coverage rate more than the branch tree of the first threshold preset, it is performed both by S13~S14.Owing to covering Lid rate more than the branch tree of the first threshold preset number may more than one, the label information of described multi-medium data individual Number is at least one.
In the embodiment of the present invention, receive pending multi-medium data;Characteristic information according to described multi-medium data and Each branch tree characteristic of correspondence information of the tree previously generated, determines the covering of described multi-medium data and each branch tree Rate;Determine the described coverage rate branch tree more than the first predetermined threshold value, and from the Rule of judgment branch that described branch tree comprises, Determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;By the leaf node in described Rule of judgment branch Value, be defined as the first kind label information of described multi-medium data such that it is able to determine multi-medium data quickly and accurately Label information.Further, since coverage rate is more than the possible more than one of the number of the branch tree of the first threshold preset, therefore, The number of the label information of the multi-medium data determined is more than one so that the label information of multi-medium data covers more complete Face, the accuracy that label information based on multi-medium data carries out recommending is higher.
In a kind of possible embodiment, the characteristic information according to described multi-medium data and the tree previously generated in S12 Each branch tree characteristic of correspondence information of type structure, determines the coverage rate of described multi-medium data and each branch tree, including following Two kinds of possible implementations:
One, for each described branch tree, determine that the characteristic information of described multi-medium data is corresponding with described branch tree The number M of the characteristic information comprised in the common factor of feature, and determine characteristic information and the described branch tree of described multi-medium data Characteristic of correspondence information and concentrate the number N 1 of characteristic information comprised;And the ratio according to described M Yu described N1, determine Described multi-medium data and the coverage rate of described branch tree.
Such as, directly by the ratio of described M Yu described N1, it is defined as the covering of described multi-medium data and described branch tree Rate.
Two, for each described branch tree, determine that the characteristic information of described multi-medium data is corresponding with described branch tree The number M of the characteristic information comprised in the common factor of feature, and determine that the number of characteristic information of described multi-medium data is with described The total number N2 of the number of branch's tree characteristic of correspondence information;And the ratio according to described M Yu described N2, determine described many matchmakers Volume data and the coverage rate of described branch tree.
Such as, directly by the ratio of described M Yu described N2, it is defined as the covering of described multi-medium data and described branch tree Rate.
And for example, owing to the number of different branches tree characteristic of correspondence information may be different, in order to improve multi-medium data and The comparability of the coverage rate of different branches tree, is multiplied by the ratio of described M Yu described N2 2 values obtained, is defined as described many matchmakers Volume data and the coverage rate of described branch tree.
Certainly, the embodiment of the present invention is not limited to use above-mentioned two mode to determine coverage rate, it would however also be possible to employ other modes, As long as the mode of the level of coverage that can determine that between two sequences is encompassed by invention which is intended to be protected.
In a kind of possible embodiment, described method also includes:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation A first kind label information in the first kind label information of same category and mutual exclusion.
Concrete, according to the mutual exclusion rule set, the first kind label information of described multi-medium data is filtered, will The first kind label information belonging to same category and mutual exclusion in the first kind label information of described multi-medium data filters, Only retain one, so that the first kind label information of described multi-medium data is more accurate.Such as, based on above-mentioned S11 ~the first kind label information belonging to class of languages and mutual exclusion in the first kind label information that obtains of S14 include Korean, Japanese and Chinese, owing to same multi-medium data can only add a class of languages label, therefore, selects from these three first kind label informations Select one, and delete other two.
Optionally, if described in belong to the number of first kind label information of same category and mutual exclusion more than 1, retain described Belong to a first kind label information in the first kind label information of same category and mutual exclusion, including following possible enforcement Mode:
If the number belonging to the first kind label information of same category and mutual exclusion described in mode 1 is more than 1, return from described Belong to and the first kind label information of same category and mutual exclusion randomly chooses a label information, and belong to same described in deletion Other label informations in the first kind label information of classification and mutual exclusion.
If the number belonging to the first kind label information of same category and mutual exclusion described in mode 2 is more than 1, according at least The first kind label information of one other classification, selects from the described first kind label information belonging to same category and mutual exclusion One label information, and described in deletion, belong to other label informations in the first kind label information of same category and mutual exclusion.
In which, can first kind label informations based on other classifications, belong to same category and mutual exclusion from described First kind label information in select a label information so that the first kind label information remained is more accurate.
Illustrate, if the first kind label information belonging to class of languages and mutual exclusion in the label information of certain song includes Korean, Japanese and Chinese, can select according to the first kind label information that singer's name in this song is corresponding the most further, tool Body: if the entitled Chinese of singer, then from the first kind label information belonging to class of languages, select Chinese, if singer is entitled Korea Spro Literary composition, then select Korean from the first kind label information belonging to class of languages, if the entitled Japanese of singer, then from belonging to class of languages First kind label information in select Japanese.And for example, it is also possible to according to the first kind label information that song title in this song is corresponding Select, concrete: if the entitled Chinese of song, then from the first kind label information belonging to class of languages, to select Chinese, if The entitled Korean of song, then select Korean from the first kind label information belonging to class of languages, if the entitled Japanese of song, then from returning Belong to selection Japanese in the first kind label information of class of languages.
Based on any of the above-described embodiment, in a kind of possible embodiment, described method also includes:
According to user's network log to the operation that multi-medium data performs, determine multi-medium data operated by user Equations of The Second Kind label information.
Concrete, use rule, with really by the analysis of network log is formed modify (amendment) based on user behavior Determine the Equations of The Second Kind label information of multi-medium data, thus (include that first kind label is believed at label information based on multi-medium data Breath and Equations of The Second Kind label information) it is user when recommending multi-medium data, accuracy is higher.
Optionally, according to user's network log to the operation that multi-medium data performs, periodically determine operated by user The Equations of The Second Kind label information of multi-medium data.I.e. for each setting cycle, it is used for representing user according in this setting cycle Network log to the operation that multi-medium data performs, to determine that the Equations of The Second Kind label of the multi-medium data operated by user is believed Breath.Such as, the network log of every day is added up, to determine the Equations of The Second Kind label information of the multi-medium data operated by user.
Optionally, described network log includes but not limited at least one in following information:
The operated identification information of multi-medium data, the identification information of performed operation, perform the time letter of operation Breath, the label information (including first kind label information and Equations of The Second Kind label information) of operated multi-medium data.
Optionally, the operation performed multi-medium data includes but not limited to following operation: collection operation, deletion action, Play operation etc..
In a kind of possible embodiment, according to user's network log to the operation that multi-medium data performs, determine use The Equations of The Second Kind label information of the multi-medium data operated by family, including:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
Concrete, for each daily record set, in chronological order, determine the network log pair in described daily record set successively Whether the first kind label information of the multi-medium data answered comprises described specific label information.Further, if jth time is determined After the multi-medium data having P1 network log corresponding continuously comprises described specific label information, have continuously less than the 4th threshold value The multi-medium data that individual network log is corresponding does not comprise described specific label information, and jth is determined for+1 time the most again has continuously The multi-medium data that P2 network log is corresponding comprises described specific label information, then described specific label information is defined as institute State in the Equations of The Second Kind label information of the multi-medium data corresponding less than the 4th threshold value network log;If jth time is determined continuously After the multi-medium data having P1 network log corresponding comprises described specific label information, have continuously more than or equal to the 4th threshold The multi-medium data that value network log is corresponding does not comprise described specific label information, and jth is determined for+1 time continuously the most again The multi-medium data having P2 network log corresponding comprises described specific label information, does not the most make any process, continues to determine and connects Whether the label information of the multi-medium data that the network log that gets off is corresponding comprises described specific label information, and repeats above-mentioned Process, until last network log in described daily record set.
Optionally, can be according to the temporal information of network log, in units of the time, network log is divided into multiple net Network daily record group;Such as, divided in units of 1 hour;Comprised net is determined again from the network log group divided The number of network daily record multi-medium data more than K and operated by least K/A network log comprises the net of specific label information Network daily record group (the most described daily record set).Can also be according to the temporal information of network log, in units of the number of network log, Network log is divided into multiple network log group;Such as, divide in units of described K;Again from the network day divided Will group being determined in comprised network log, the multi-medium data operated by least K/A network log comprises specific label The network log group (the most described daily record set) of information.
Optionally, owing to not every user operation all has regularity, therefore, it can, based on setting operation, determine The Equations of The Second Kind label information of multi-medium data, so that determining that the Equations of The Second Kind label information of multi-medium data is more accurate.Such as, Based on play operation, determine the Equations of The Second Kind label information of multi-medium data.
Optionally, in order to make the Equations of The Second Kind specific label information determined the most accurate, described method also includes:
For with the addition of arbitrary multi-medium data of Equations of The Second Kind label information, if the daily record set of processed setting quantity In, comprise and with the addition of the quantity of this multi-medium data of arbitrary Equations of The Second Kind label information and be unsatisfactory for the quantity of this multi-medium data The constraints set, deletes the described arbitrary Equations of The Second Kind label information in this multi-medium data.
Wherein, the quantity of this multi-medium data is not add this multi-medium data of described arbitrary Equations of The Second Kind label information The quantity sum of quantity and this multi-medium data that with the addition of described arbitrary Equations of The Second Kind label information.
Concrete, based on multiple daily record set, in order to Equations of The Second Kind label information determined by judging is the most accurate, for adding Add arbitrary multi-medium data of arbitrary Equations of The Second Kind label information, if in the daily record set of processed setting quantity.If comprising With the addition of the quantity of this multi-medium data of this Equations of The Second Kind label information and the quantity of this multi-medium data and meet the constraint of setting Condition, then it is assumed that this multi-medium data should comprise this Equations of The Second Kind label information;This Equations of The Second Kind label information is with the addition of if comprising The quantity of this multi-medium data and the quantity of this multi-medium data be unsatisfactory for described constraints, then it is assumed that this multi-medium data Should not comprise this Equations of The Second Kind label information.
Optionally, the constraints set as: comprise this multi-medium data that with the addition of arbitrary Equations of The Second Kind label information Quantity is more than the half of the quantity of this multi-medium data.
Illustrate, network log is carried out time cutting, multiple day will be divided in units of the time by network log Will set, carries out dependence based on time context for each daily record set and extracts and analyze, to each daily record set The specific label information that multi-medium data operated by middle network log comprises, in chronological order, determines maximization scene collection Close Ctag, specific as follows:
1) the Article 1 network log in chronological order, including this specific label information tag in this daily record set is opened Begin, carry out the expansion of scene set, it is assumed that the Article 1 network log including tag is the gi in this daily record set, will The gi network log adds to Ctag;
2) if the gi+1 network log also comprises tag, then the gi+1 network log is added to Ctag, repeat to hold This step of row;Otherwise enter step 3);
3) start to search for backward the number of the network log not comprising tag continuously from gi+1, if the sequence not comprising tag is Gi+1, gi+2 ..., gi+k, then continue to start to search for backward the network log comprising tag continuously from gi+k+1, be set to gi+k+p Individual.If p > k/2, then gi+1 to gi+k+p network log is added Ctag, otherwise enter step 4.
4) gi+k+1 is started the initial note as this scene set of first network log comprising tag backward Record, if this network log is not last network log in daily record set, returns step 2), otherwise enter step 5).
5) if the size of Ctag is more than minimum scene queue thresholds Φ, then all networks not comprising tag in Ctag The characteristic information of the multi-medium data operated by daily record all adds this tag and the mark of first network log comprising tag Knowledge information, otherwise this loop ends.
Further, the multi-medium data of all tag of with the addition of is calculated, if with the addition of the multimedia messages of tag Occurrence number more than the 1/2 of this multimedia messages occurrence number, then adds this tag attribute and current time for this multimedia messages As modifytime attribute.
Optionally, described method also includes:
In the multi-medium data that with the addition of Equations of The Second Kind label information, record adds the time of described Equations of The Second Kind label information Information;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind mark Label information.
Concrete, constantly circulate, during overall situation dynamic scene management, time attribute modifytime that this stage is added (i.e. adding the temporal information of tag) verifies, if the time attribute of arbitrary multimedia messages is beyond effect duration threshold value, then deletes Except the Equations of The Second Kind label information added in this multimedia messages and time attribute.
Based on any of the above-described embodiment, the embodiment of the present invention can generate described tree in accordance with the following steps:
According to the label classification belonging to the label information of the sample data being pre-configured with, described sample data is divided into Few two data acquisition systems, a branch tree of the corresponding described tree of each described data acquisition system;
Spy for each described data acquisition system, belonging to the characteristic information of the sample data comprised according to described data acquisition system Levy classification, the sample data in described data acquisition system is divided at least one classification group, and calculates the information of each classification group Ratio of profit increase, described information gain-ratio is that the comentropy of characteristic information based on the sample data in described classification group determines;Depend on The classification group of secondary selection information gain-ratio maximum is as Split Attribute, the sample comprised according to the classification group that information gain-ratio is maximum The characteristic information of data, builds the Rule of judgment branch of described branch tree, and the leaf node in described Rule of judgment branch is institute State the label information of sample data in data acquisition system.
Wherein, the information gain-ratio of classification group is the biggest, then the priority of the Rule of judgment branch that category group is corresponding is the highest.
Such as, as a example by song, the feature classification belonging to characteristic information include but not limited to musical instrument class, year of issue for class, Singer's special edition class etc..And for example, as a example by opera, the feature classification belonging to characteristic information include but not limited to singing style class, musical instrument class, Role class etc..
Optionally, when building the Rule of judgment branch of described branch tree, if the classification group that current information ratio of profit increase is maximum Including at least two, then select a sorted group as Split Attribute from described at least two sorted group.Such as, from described to Few two sorted group randomly choose a sorted group as Split Attribute.
In a kind of possible embodiment, according to the label classification belonging to the label information of the sample data being pre-configured with, Described sample data is divided at least two data acquisition system, specifically includes:
According to the characteristic information of any two sample data, determine the coverage rate of described any two sample data;
If described coverage rate is more than the 5th threshold value set, described any two sample data is merged, form number According to group, and return the step determining coverage rate, until after the coverage rate determined is respectively less than or is equal to described 5th threshold value, will be every The individual data set finally given is defined as a data acquisition system.
Said process is referred to as preshearing and props up process, the most first using sample data as the leaf node of tree, then to it Carry out preshearing to prop up so that incoherent sample data is assigned in different branches tree, specific as follows:
A) coverage rate (coverage_rate) between any two sample data.
The characteristic information sequence assuming sample data 1 is L1=[l11,l12,l13...], the characteristic information sequence of sample data 2 It is classified as L2=[l21,l22,l23...], then coverage rate coverage_rate between sample data 1 and sample data 212=2*len (L1 ∩ L2)/len (L1+L2), wherein, len (L1 ∩ L2) represents the element number comprised in L1 ∩ L2, and len (L1+L2) represents The element number sum comprised in the element number comprised in L1 and L2.
B) by coverage_rate > two sample datas of ω (the i.e. the 5th threshold value) merge.
C) coverage rate is recalculated.
D) step b and c are repeated until not having annexable sample data or data set.
The number of initial branch tree can be reduced by above-mentioned predictive pruning process, thus reduce the calculating of follow-up decision classification Amount, improves the treatment effeciency of follow-up decision classification.
Optionally, carrying out before preshearing props up, also including:
The characteristic information of sample data is normalized;
Assimilation is gone to process the characteristic information of normalized sample data.
Wherein, normalized is the characteristic information by belonging to same feature classification in the characteristic information of all sample datas Carry out normalizing, such as, Bruce, rhythm and blues, Blues, these characteristic informations of R&B are normalized to Bruce type.Go assimilation Process is that characteristic information sample datas all in the characteristic information of sample data all comprised is rejected.
In a kind of possible embodiment, according to the characteristic information of any two sample data, determine described any two The coverage rate of sample data, including:
For any two sample data, determine described any two sample data characteristic information common factor in comprise The number M of characteristic information, and determine the total number N2 of the characteristic information of described any two sample data;And according to described M With the ratio of described N2, determine the coverage rate of described any two sample data;Or
For any two sample data, determine described any two sample data characteristic information common factor in comprise The number M of characteristic information, and determine described any two sample data characteristic information and concentrate the characteristic information comprised Number N 2;And the ratio according to described M Yu described N2, determine the coverage rate of described any two sample data.
Such as, by the ratio of described M Yu described N2, it is defined as the coverage rate of described any two sample data.And for example, will The ratio of described M and described N2 is multiplied by 2 values obtained, and is defined as the coverage rate of described any two sample data.And for example, by institute State the ratio of M and described N2, be defined as the coverage rate of described any two sample data.
In the embodiment of the present invention, the information gain-ratio of each classification group may be used for judging the sample data in category group Characteristic information, for determining the ability that label information is classified.Wherein, the breath ratio of profit increase of classification group is the biggest, then the category is described The characteristic information of the sample data in group is the strongest for determining the ability that label information is classified.Selected by information gain-ratio Split Attribute builds Rule of judgment branch such that it is able to overcomes and can be partial to attribute when selecting Split Attribute by information gain Many classifications are as the deficiency of Split Attribute.
In a kind of possible embodiment, according to equation below, calculate the information gain-ratio of each classification group:
G a i n R a t i o ( A , C ) = G a i n ( A , C ) S p l i t I n f o ( A , C ) + ϵ ;
S p l i t I n f o ( A , C ) = - Σ i = 1 m n num c num x log 2 num c num x ;
G a i n ( A , C ) = E ( A ) - s u m ( num c num x * E ( c ) ) = s u m ( - m n num x * log 2 ( m n num x ) ) - s u m ( num c num x * E ( c ) ) ;
Wherein, A represents the tag set that each data acquisition system is corresponding, and C represents every class label in A, and ε is all categories group The meansigma methods of SplitInfo (A, C);N is the number of classification group, mnFor the number of sample data in classification group, C represents that every class is special Reference ceases, and c represents the characteristic information in C, numcRepresent the number of the sample data comprising characteristic information c, numxRepresent data set The number of the sample data comprised in conjunction.
Wherein, E (c)=sum (-p (I) * log (p (I))), I=1,2 ..., X, X represent according to the classification gauge being pre-configured with Then divide the number (i.e. the number of data acquisition system) of the data set obtained.P (I) represents that in sample data, characteristic information c occurs in Probability in i-th data acquisition system.Such as, sample data there is, the number of the sample data of characteristic information a is s, s sample In data classification be the number of the sample data of I be m, then p (I)=m/s.
Illustrate, illustrate as a example by voice data, it is assumed that have 8 sample datas, be designated as m1~m8.First, root According to the description information of sample data, the characteristic information of sample data can be got, the characteristic information got is carried out After polymerization, obtain each sample data characteristic of correspondence information sequence, specific as follows:
M1:[Bruce, acoustic guitar, electric guitar, mouth organ, harmony, chant, impromptu, original, Blues, comfort, nervous, Complain tearfully, discongest];
M2:[rock and roll, passion, rhythm, frame drum, bass, solo, guitar];
M3:[Lu opera, your pupil, the role of a young woman in traditional opera, clown, weight qin, loud and clear];
M4:[light music, releives, graceful, easily, comfortable, piano, violoncello, saxophone, harp, flute, clarinet, Panpipes, violin, popular musical instrument, Flos Daturae, mouth organ, accordion, xylophone, three foot ferrum, a pair of hand-held bells played by striking together, husky hammer, Mantovani];
M5:[Beijing opera, raw, denier, only, ugly, Mei Pai, Ma Pai, parts in Beijing opera spoken in Beijing dialect, Chinese fiddle];
M6:[Ban get Rui, releives, graceful, easily, comfortable, violin, piano, xylophone, three foot ferrum];
M7:[Beijing opera, gong and drum, loud and clear, raw, denier, clean];
M8:[Lu opera, gong and drum, loud and clear, the role of a young woman in traditional opera, clown].
First element in features described above sequence is this sample data generic, is not involved in subsequent treatment.
Then, calculate the coverage rate between any two sample data, based on the coverage rate obtained, sample data is closed And, it is assumed that above-mentioned 8 sample datas are finally divided into three data acquisition systems: [(m1, m2), (m3, m5, m7, m8), (m4, m6)], As in figure 2 it is shown, three branch trees will be divide into whole decision tree.For each branch tree, calculate each class in this branch tree Comentropy E do not organized:
As a example by data acquisition system (m3, m5, m7, m8), the label information of these four sample datas only has two, i.e. Lu opera and Beijing opera, in conjunction with the knowledge table being pre-configured with, calculates comentropy E of the characteristic information of sample data in (m3, m5, m7, m8), specifically As follows:
E (your pupil, the role of a young woman in traditional opera, clown)=-(1/2) * log2(1/2)-(1/2)*log2(1/2)=1;
E (raw, denier, clean)=-(1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (weight qin)=-(1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (gong and drum)=-(1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (loud and clear)=-(1/3) log2(1/3)-(1/2)log2(1/2)=1.0266;
E (horse group)=-(1/1) log2(1/1)-(0/1)log2(0/1)=0.
Then, the information gain of of all categories group in calculating data acquisition system, particularly as follows: the sample in (m3, m5, m7, m8) The characteristic information of data is divided into role class, singing style class and musical instrument class, wherein:
Gain (musical instrument)=E (musical instrument)-(2/4) E (weight qin)-(2/4) E (gong and drum)=-(1/2) log2(1/2)-(1/ 2)log2(1/2)-1/2-1/2=0;
Gain (singing style)=E (singing style)-(1/4) E (horse group)-(3/4) E (loud and clear)=1-(1/4) * 0-(3/4) * 1.0266 =0.231;
Gain (role)=E (role)-(2/4) E (your pupil, the role of a young woman in traditional opera, clown)-(2/4) E (raw, denier, clean)=1-(2/4) * 1-(2/4) * 1=0.
Then, calculate the information gain-ratio of each classification group, particularly as follows:
GainRatio (musical instrument)=0;
GainRatio (role)=0;
GainRatio (singing style)=0.3113/ (SplitInfo+ ε)=0.3113/ (2.8775+1.63)=0.07.
Then, choose the maximum attribute of information gain-ratio successively to divide as Split Attribute, the Rule of judgment building branch tree , structure is as shown in Figure 3.When building the Rule of judgment branch of branch tree, end condition is that arbitrary label all makes a distinction, or The label of arbitrary sample data covers the degree of the classification group described in this label and reaches coverage rate threshold value set in advance.
The embodiment of the present invention provide method processing procedure as shown in Figure 4, including:
1) pretreatment.Particularly as follows: when there being multimedia data entry, the description information to multi-medium data, carry out pre-place Reason (includes cleaning, polymerization etc. processes), obtains the characteristic information of this multi-medium data;
2) the differentiation stage based on tree.Particularly as follows: according to the characteristic information of this multi-medium data, based on pre-Mr. The tree become, determines the label information of this multi-medium data;
3) process based on Modify rule.Particularly as follows: according to Modify rule, update this multi-medium data for the first time Label information;Wherein, Modify rule analysis network log obtains.
4) process based on mutual exclusion rule.Particularly as follows: according to mutual exclusion rule, second time updates the label of this multi-medium data Information, and this multi-medium data is stored in data base;Optionally, mutual exclusion table can be analyzed network log and obtains, also Can be pre-configured with.
5) output resource.Particularly as follows: according to the label information of multi-medium data, based on user preference, from data base be User recommends multi-medium data.
Illustrate, it is assumed that the characteristic information of the multi-medium data mi being currently entered for [loud and clear, gong and drum, raw, denier, only, Ugly, urheen], mi is inputted the decision-tree model shown in Fig. 3, calculates the coverage rate of mi and three branch trees of decision tree respectively, As long as coverage rate is more than the first threshold set, this mi i.e. flows to this branch tree, and such a multi-medium data can belong to multiple Branch tree, so that a multi-medium data can have multiple different classes of label information.It is computed assuming that mi is only divided In the branch tree that (m3, m5, m7, m8) is corresponding, further: the characteristic information first determining mi singing style class is [loud and clear], and divides The not ground floor Rule of judgment branch with this branch tree compares, the most respectively calculating and the [horse of ground floor Rule of judgment branch Group] and the coverage rate of [loud and clear], determine that mi and right side [loud and clear] branch covering rate, more than left side, then divide on the right side of this mi entrance ?.In like manner, carry out subsequent branches until mi enters Beijing opera leaf node, add Beijing opera label information for mi.
During adaptive phase based on user's explicit feedback, if mi occurs in certain scene set: [[t1, m1, g1], [t2, M2, g2], [t3, m3, g3] ... [ti, mi, gi] ...], it is assumed that the label information of mi be tags:[tag1, tag2 ...], Dinamictags:[{addtag: ' p.prand ', modifytime: ' 11:30:10 ' }, addtag: ' after supper ', modifytime:‘17:20:00’}]}.The bar number scale of the network log calculating occurred mi is totlenum, calculates all Ctag set with the addition of for mi the bar number of tag label, be designated as addnum, if addnum > (1/2) is totlenum, be then mi Adding this label of tag, tag represents " p.prand " or " after supper " herein.
Said method handling process can realize with software program, and this software program can be stored in storage medium, when When the software program of storage is called, perform said method step.
Based on same inventive concept, the embodiment of the present invention additionally provides the processing means of a kind of multi-medium data, due to The principle that this device solves problem is similar to the processing method of above-mentioned a kind of multi-medium data, and therefore the enforcement of this device can be joined The enforcement of square method, repeats no more in place of repetition.
In embodiment shown in Fig. 5, it is provided that the processing means of a kind of multi-medium data, including:
Receiver module 51, for receiving pending multi-medium data;
Branch tree determines module 52, for the characteristic information according to described multi-medium data and the tree previously generated Each branch tree characteristic of correspondence information, determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate For representing the similarity degree of described multi-medium data and each branch tree;
Branch determines module 53, for determining the described coverage rate branch tree more than the first predetermined threshold value, and from described point In the Rule of judgment branch that Zhi Shu comprises, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
Label determines module 54, for by the value of the leaf node in described Rule of judgment branch, being defined as described many matchmakers The first kind label information of volume data.
Optionally, described branch determine module 52 specifically for:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data with described The Rule of judgment of Rule of judgment branch mates;
If at least one characteristic information of described multi-medium data mates, then with the Rule of judgment of arbitrary Rule of judgment branch Determine that the characteristic information of described multi-medium data meets described arbitrary Rule of judgment branch.
Optionally, described label determines that module 54 is additionally operable to:
In the first kind label information of described multi-medium data, determine the first kind mark belonging to same category and mutual exclusion Label information;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to described in reservation A first kind label information in the first kind label information of same category and mutual exclusion.
Based on any of the above-described embodiment, optionally, described label determines that module 54 is additionally operable to:
According to user's network log to the operation that multi-medium data performs, determine multi-medium data operated by user Equations of The Second Kind label information.
In a kind of possible embodiment, described label determine module 54 specifically for:
For each daily record set, sequentially in time, determine and belong to the multimedia that the network log of same operation is corresponding Whether data comprise specific label information, and the number of the network log that described daily record set comprises is whole more than 0 more than K, K Number, described specific label information is that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises First label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label and believes Ceasing, jth determines that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in institute State daily record set is positioned at sequentially in time network log that jth time determines and the network log that jth is determined for+1 time it Between the number of network log less than the 4th threshold value set, described specific label information is defined as being positioned at jth time and determines Network log and the network log time determined of i+1 between the Equations of The Second Kind label of multi-medium data corresponding to network log Information, j=1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
Further, described label determines that module 54 is additionally operable to:
In the multi-medium data that with the addition of Equations of The Second Kind label information, record adds the time of described Equations of The Second Kind label information Information;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind mark Label information.
Based on any of the above-described embodiment, optionally, described branch tree determine module 52 specifically for:
For described branch tree, determine the characteristic information of described multi-medium data and the friendship of described branch tree characteristic of correspondence Concentrate the number M of the characteristic information comprised;
Determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and concentration comprises The number N 1 of characteristic information, and according to the ratio of described M Yu described N1, determines covering of described multi-medium data and described branch tree Lid rate;Or determine the number of number and the described branch tree characteristic of correspondence information of the characteristic information of described multi-medium data Total number N2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
Based on any of the above-described embodiment, optionally, described device also includes:
MBM 55, for generating described tree in accordance with the following steps:
According to the label classification belonging to the label information of the sample data being pre-configured with, described sample data is divided into Few two data acquisition systems, a branch tree of the corresponding described tree of each described data acquisition system;
Spy for each described data acquisition system, belonging to the characteristic information of the sample data comprised according to described data acquisition system Levy classification, the sample data in described data acquisition system is divided at least one classification group, and calculates the information of each classification group Ratio of profit increase, described information gain-ratio is that the comentropy of characteristic information based on the sample data in described classification group determines;Depend on The classification group of secondary selection information gain-ratio maximum is as Split Attribute, the sample comprised according to the classification group that information gain-ratio is maximum The characteristic information of data, builds the Rule of judgment branch of described branch tree, and the leaf node in described Rule of judgment branch is institute State the label information of sample data in data acquisition system.
Optionally, described MBM 55 specifically for:
According to the characteristic information of any two sample data, determine the coverage rate of described any two sample data;
If described coverage rate is more than the 5th threshold value set, described any two sample data is merged, form number According to group, and return the step determining coverage rate, until after the coverage rate determined is respectively less than or is equal to described 5th threshold value, will be every The individual data set finally given is defined as a data acquisition system.
In the present embodiment, receiver module 51, branch tree determines that module 52, branch determine that module 53, label determine module 54 and MBM 55 be to present with the form of functional module.Here " module " can refer to ASIC (application-specific integrated circuit, ASIC), circuit, perform one or more software or firmware The processor of program and memorizer, integrated logic circuit, and/or other can provide the device of above-mentioned functions.Simple at one Embodiment in, those skilled in the art is it is contemplated that receiver module 51 and MBM 55 can be by computer equipments Processor, memorizer and input interface etc. realize, and branch tree determines that module 52, branch determine that module 53 and label determine module 54 can be realized by the processor of computer equipment and memorizer etc..
Those skilled in the art are it should be appreciated that embodiments of the invention can be provided as method, system or computer program Product.Therefore, the reality in terms of the present invention can use complete hardware embodiment, complete software implementation or combine software and hardware Execute the form of example.And, the present invention can use at one or more computers wherein including computer usable program code The upper computer program product implemented of usable storage medium (including but not limited to disk memory, CD-ROM, optical memory etc.) The form of product.
The present invention is with reference to method, equipment (system) and the flow process of computer program according to embodiments of the present invention Figure and/or block diagram describe.It should be understood that can the most first-class by computer program instructions flowchart and/or block diagram Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided Instruction arrives the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce A raw machine so that the instruction performed by the processor of computer or other programmable data processing device is produced for real The device of the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame now.
These computer program instructions may be alternatively stored in and computer or other programmable data processing device can be guided with spy Determine in the computer-readable memory that mode works so that the instruction being stored in this computer-readable memory produces and includes referring to Make the manufacture of device, this command device realize at one flow process of flow chart or multiple flow process and/or one square frame of block diagram or The function specified in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that at meter Perform sequence of operations step on calculation machine or other programmable devices to produce computer implemented process, thus at computer or The instruction performed on other programmable devices provides for realizing at one flow process of flow chart or multiple flow process and/or block diagram one The step of the function specified in individual square frame or multiple square frame.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation Property concept, then can make other change and amendment to these embodiments.So, claims are intended to be construed to include excellent Select embodiment and fall into all changes and the amendment of the scope of the invention.
Obviously, those skilled in the art can carry out various change and the modification essence without deviating from the present invention to the present invention God and scope.So, if these amendments of the present invention and modification belong to the scope of the claims in the present invention and equivalent technologies thereof Within, then the present invention is also intended to comprise these change and modification.

Claims (11)

1. the processing method of a multi-medium data, it is characterised in that described method includes:
Receive pending multi-medium data;
Characteristic information according to described multi-medium data and each branch tree characteristic of correspondence information of tree previously generated, Determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate be used for representing described multi-medium data with The similarity degree of each branch tree;
Determine the described coverage rate branch tree more than the first predetermined threshold value, and the Rule of judgment branch comprised from described branch tree In, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
By the value of the leaf node in described Rule of judgment branch, it is defined as the first kind label information of described multi-medium data.
Method the most according to claim 1, it is characterised in that from the Rule of judgment branch that described branch tree comprises, really The Rule of judgment branch that the characteristic information of fixed described multi-medium data meets, including:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data and described judgement The Rule of judgment of conditional branching mates;
If at least one characteristic information of described multi-medium data mates with the Rule of judgment of arbitrary Rule of judgment branch, it is determined that The characteristic information of described multi-medium data meets described Rule of judgment branch.
Method the most according to claim 1, it is characterised in that described method also includes:
In the first kind label information of described multi-medium data, determine the first kind label letter belonging to same category and mutual exclusion Breath;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to same described in reservation A first kind label information in the first kind label information of classification and mutual exclusion.
4. according to the method described in any one of claims 1 to 3, it is characterised in that described method also includes:
According to user, multi-medium data is performed the network log of operation, determines the Equations of The Second Kind of multi-medium data operated by user Label information.
Method the most according to claim 4, it is characterised in that according to user, multi-medium data is performed the network day of operation Will, determines the Equations of The Second Kind label information of multi-medium data operated by user, including:
For each daily record set, sequentially in time, determine and belong to the multi-medium data that the network log of same operation is corresponding Whether comprising specific label information, the number of the network log that described daily record set comprises is the integer more than 0 more than K, K, institute Stating specific label information is the first mark that the multi-medium data that in described daily record set, at least K/A network log is corresponding all comprises Label information, A is the Second Threshold set;
If jth time determines that the multi-medium data that P1 continuous print network log is corresponding all comprises described specific label information, jth Determine that the multi-medium data that P2 continuous print network log is answered all comprises described specific label information for+1 time, and in described daily record Net between the network log that the network log time determined in jth sequentially in time in set and jth are determined for+1 time Described specific label information, less than the 4th threshold value set, is defined as being positioned at the network that jth time is determined by the number of network daily record The Equations of The Second Kind label information of the multi-medium data that network log between the network log that daily record and i+1 time are determined is corresponding, j =1,2 ..., L, described L are positive integer, P1 and P2 is all higher than the 3rd threshold value set.
Method the most according to claim 5, it is characterised in that described method also includes:
In the multi-medium data that with the addition of Equations of The Second Kind label information, record adds the time letter of described Equations of The Second Kind label information Breath;
After described temporal information exceedes the time threshold of setting, from described multi-medium data, delete described Equations of The Second Kind label letter Breath.
Method the most according to claim 1, it is characterised in that according to characteristic information and pre-Mr. of described multi-medium data Each branch tree characteristic of correspondence information of the tree become, determines the coverage rate of described multi-medium data and each branch tree, bag Include:
For described branch tree, determine in the characteristic information of described multi-medium data and the common factor of described branch tree characteristic of correspondence The number M of the characteristic information comprised;
That determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and that concentration comprises feature The number N 1 of information, and according to the ratio of described M Yu described N1, determine the covering of described multi-medium data and described branch tree Rate;Or determine number and the number of described branch tree characteristic of correspondence information total of the characteristic information of described multi-medium data Number N 2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
8. the processing means of a multi-medium data, it is characterised in that described device includes:
Receiver module, for receiving pending multi-medium data;
Branch tree determines module, is used for each point of the characteristic information according to described multi-medium data and the tree previously generated Propping up tree characteristic of correspondence information, determine the coverage rate of described multi-medium data and each branch tree, wherein, described coverage rate is used for table Show the similarity degree of described multi-medium data and each branch tree;
Branch determines module, for determining the described coverage rate branch tree more than the first predetermined threshold value, and from described branch Shu Bao In the Rule of judgment branch contained, determine the Rule of judgment branch that the characteristic information of described multi-medium data meets;
Label determines module, for by the value of the leaf node in described Rule of judgment branch, being defined as described multi-medium data First kind label information.
Device the most according to claim 8, it is characterised in that described branch determine module specifically for:
According to the priority orders of described Rule of judgment branch, successively by the characteristic information of described multi-medium data and described judgement The Rule of judgment of conditional branching mates;
If at least one characteristic information of described multi-medium data mates with the Rule of judgment of arbitrary Rule of judgment branch, it is determined that The characteristic information of described multi-medium data meets described Rule of judgment branch.
Device the most according to claim 8, it is characterised in that described label determines that module is additionally operable to:
In the first kind label information of described multi-medium data, determine the first kind label letter belonging to same category and mutual exclusion Breath;
The number of the first kind label information belonging to same category and mutual exclusion described in if is more than 1, belongs to same described in reservation A first kind label information in the first kind label information of classification and mutual exclusion.
11. devices according to claim 8, it is characterised in that described branch tree determine module specifically for:
For described branch tree, determine in the characteristic information of described multi-medium data and the common factor of described branch tree characteristic of correspondence The number M of the characteristic information comprised;
That determine the characteristic information of described multi-medium data and described branch tree characteristic of correspondence information and that concentration comprises feature The number N 1 of information, and according to the ratio of described M Yu described N1, determine the covering of described multi-medium data and described branch tree Rate;Or determine number and the number of described branch tree characteristic of correspondence information total of the characteristic information of described multi-medium data Number N 2, and according to the ratio of described M Yu described N2, determine the coverage rate of described multi-medium data and described branch tree.
CN201610601570.5A 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data Active CN106294563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610601570.5A CN106294563B (en) 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610601570.5A CN106294563B (en) 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data

Publications (2)

Publication Number Publication Date
CN106294563A true CN106294563A (en) 2017-01-04
CN106294563B CN106294563B (en) 2019-09-17

Family

ID=57662641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610601570.5A Active CN106294563B (en) 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data

Country Status (1)

Country Link
CN (1) CN106294563B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536787A (en) * 2018-03-29 2018-09-14 优酷网络技术(北京)有限公司 content identification method and device
CN109739955A (en) * 2019-01-24 2019-05-10 北京诸葛找房信息技术有限公司 Source of houses label automatic extracting device and its method based on participle with multimode matching
CN112395261A (en) * 2019-08-16 2021-02-23 中国移动通信集团浙江有限公司 Service recommendation method and device, computing equipment and computer storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894125A (en) * 2010-05-13 2010-11-24 复旦大学 Content-based video classification method
CN102262659A (en) * 2011-07-15 2011-11-30 北京航空航天大学 Audio label disseminating method based on content calculation
US8504573B1 (en) * 2008-08-21 2013-08-06 Adobe Systems Incorporated Management of smart tags via hierarchy
CN104317891A (en) * 2014-10-23 2015-01-28 华为软件技术有限公司 Method and device for tagging pages
CN104657422A (en) * 2015-01-16 2015-05-27 北京邮电大学 Classification decision tree-based intelligent content distribution classification method
CN104794179A (en) * 2015-04-07 2015-07-22 无锡天脉聚源传媒科技有限公司 Video quick indexing method and device based on knowledge tree
CN105072460A (en) * 2015-07-15 2015-11-18 中国科学技术大学先进技术研究院 Information annotation and association method, system and device based on VCE

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504573B1 (en) * 2008-08-21 2013-08-06 Adobe Systems Incorporated Management of smart tags via hierarchy
CN101894125A (en) * 2010-05-13 2010-11-24 复旦大学 Content-based video classification method
CN102262659A (en) * 2011-07-15 2011-11-30 北京航空航天大学 Audio label disseminating method based on content calculation
CN104317891A (en) * 2014-10-23 2015-01-28 华为软件技术有限公司 Method and device for tagging pages
CN104657422A (en) * 2015-01-16 2015-05-27 北京邮电大学 Classification decision tree-based intelligent content distribution classification method
CN104794179A (en) * 2015-04-07 2015-07-22 无锡天脉聚源传媒科技有限公司 Video quick indexing method and device based on knowledge tree
CN105072460A (en) * 2015-07-15 2015-11-18 中国科学技术大学先进技术研究院 Information annotation and association method, system and device based on VCE

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536787A (en) * 2018-03-29 2018-09-14 优酷网络技术(北京)有限公司 content identification method and device
CN109739955A (en) * 2019-01-24 2019-05-10 北京诸葛找房信息技术有限公司 Source of houses label automatic extracting device and its method based on participle with multimode matching
CN112395261A (en) * 2019-08-16 2021-02-23 中国移动通信集团浙江有限公司 Service recommendation method and device, computing equipment and computer storage medium

Also Published As

Publication number Publication date
CN106294563B (en) 2019-09-17

Similar Documents

Publication Publication Date Title
Interiano et al. Musical trends and predictability of success in contemporary songs in and out of the top charts
US7788279B2 (en) System and method for storing and retrieving non-text-based information
US7544881B2 (en) Music-piece classifying apparatus and method, and related computer program
US7624012B2 (en) Method and apparatus for automatically generating a general extraction function calculable on an input signal, e.g. an audio signal to extract therefrom a predetermined global characteristic value of its contents, e.g. a descriptor
CN103793446A (en) Music video generation method and system
CN108255840B (en) Song recommendation method and system
CN109299245B (en) Method and device for recalling knowledge points
CN107464555A (en) Background sound is added to the voice data comprising voice
CN101470732A (en) Auxiliary word stock generation method and apparatus
Smith et al. Towards a Hybrid Recommendation System for a Sound Library.
CN108766451B (en) Audio file processing method and device and storage medium
CN107918657A (en) The matching process and device of a kind of data source
CN105679324A (en) Voiceprint identification similarity scoring method and apparatus
Rizo et al. A Pattern Recognition Approach for Melody Track Selection in MIDI Files.
CN106294563B (en) A kind for the treatment of method and apparatus of multi-medium data
CN111444380A (en) Music search sorting method, device, equipment and storage medium
US20090132508A1 (en) System and method for associating a category label of one user with a category label defined by another user
CN111078859A (en) Author recommendation method based on reference times
Mueller Where’d you get that idea? Determinants of creativity and impact in popular music
CN109471951A (en) Lyrics generation method, device, equipment and storage medium neural network based
KR20200129873A (en) Test music recommendation apparatus, music source meta data building apparatus
Konev et al. The program complex for vocal recognition
CN114461885A (en) Song quality evaluation method, device and storage medium
Vatolkin et al. Partition based feature processing for improved music classification
Shier et al. Spiegelib: An automatic synthesizer programming library

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant