CN106294563B - A kind for the treatment of method and apparatus of multi-medium data - Google Patents

A kind for the treatment of method and apparatus of multi-medium data Download PDF

Info

Publication number
CN106294563B
CN106294563B CN201610601570.5A CN201610601570A CN106294563B CN 106294563 B CN106294563 B CN 106294563B CN 201610601570 A CN201610601570 A CN 201610601570A CN 106294563 B CN106294563 B CN 106294563B
Authority
CN
China
Prior art keywords
medium data
branch
characteristic information
information
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610601570.5A
Other languages
Chinese (zh)
Other versions
CN106294563A (en
Inventor
胡伟凤
高雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Co Ltd
Original Assignee
Hisense Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Co Ltd filed Critical Hisense Group Co Ltd
Priority to CN201610601570.5A priority Critical patent/CN106294563B/en
Publication of CN106294563A publication Critical patent/CN106294563A/en
Application granted granted Critical
Publication of CN106294563B publication Critical patent/CN106294563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Abstract

The invention discloses a kind for the treatment of method and apparatus of multi-medium data, the label information for solving artificial mark multi-medium data can expend huge manpower and time, and the problem that accuracy rate is lower.Method includes: to receive multi-medium data to be processed;According to the corresponding characteristic information of each branch tree of the characteristic information of the multi-medium data and pre-generated tree, determine the coverage rate of the multi-medium data Yu each branch tree, wherein, the coverage rate is used to indicate the similarity degree of the multi-medium data Yu each branch tree;It determines that the coverage rate is greater than the branch tree of the first preset threshold, and from the Rule of judgment branch that the branch tree includes, determines the Rule of judgment branch that the characteristic information of the multi-medium data meets;By the value of the leaf node in the Rule of judgment branch, it is determined as the first kind label information of the multi-medium data, so as to quickly and accurately determine the label information of multi-medium data.

Description

A kind for the treatment of method and apparatus of multi-medium data
Technical field
The present invention relates to information technology field, in particular to a kind for the treatment of method and apparatus of multi-medium data.
Background technique
In the case where information technology is led, explosive growth is presented in multi-medium data, rationally can make intelligence using multi-medium data The service of energy interactive system achievees the effect that get twice the result with half the effort.The human-computer interaction interface that user can be provided by intelligent interactive system Realize interaction, therefore, user is both the service object of intelligent interactive system and the significant data source of intelligent interactive system.
Its sense can be recommended emerging for user in the multi-medium data of magnanimity in the intelligent interactive system under big data background The multi-medium data of interest.Intelligent interactive system is the label information according to multi-medium data, recommends multi-medium data for user, Therefore, only has accurate label information, just can make intelligent interactive system accurately is that user recommends suitable multimedia number According to.In existing music player, all it is by musical expert, is manually each audio data (such as song, the play in its music libraries Song etc.) addition label information, in order to which music player can be according to the label information of each audio data, to use the music The user of player recommends the contents such as its interested song, opera.But it is timely that artificial notation methods can expend huge manpower Between, accuracy rate is lower.
Summary of the invention
The embodiment of the invention provides a kind for the treatment of method and apparatus of multi-medium data, solves artificial mark multimedia The label information of data can expend huge manpower and time, and the problem that accuracy rate is lower.
In a first aspect, the present invention provides a kind of processing method of multi-medium data, which comprises
Receive multi-medium data to be processed;
According to the corresponding feature of each branch tree of the characteristic information of the multi-medium data and pre-generated tree Information determines the coverage rate of the multi-medium data Yu each branch tree, wherein the coverage rate is for indicating the multimedia number According to the similarity degree with each branch tree;
Determine that the coverage rate is greater than the branch tree of the first preset threshold, and the Rule of judgment for including from the branch tree point Zhi Zhong determines the Rule of judgment branch that the characteristic information of the multi-medium data meets;
By the value of the leaf node in the Rule of judgment branch, it is determined as the first kind label letter of the multi-medium data Breath.
In a kind of possible embodiment, from the Rule of judgment branch that the branch tree includes, the multimedia is determined The Rule of judgment branch that the characteristic information of data meets, comprising:
According to the priority orders of the Rule of judgment branch, successively by the characteristic information of the multi-medium data with it is described The Rule of judgment of Rule of judgment branch is matched;
If at least one characteristic information of the multi-medium data is matched with the Rule of judgment of any Rule of judgment branch, Determine that the characteristic information of the multi-medium data meets the Rule of judgment branch.
In a kind of possible embodiment, the method also includes:
In the first kind label information of the multi-medium data, the first category for belonging to same category and mutual exclusion is determined Sign information;
If the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, belonged to described in reservation A first kind label information in same category and the first kind label information of mutual exclusion.
In a kind of possible embodiment, the method also includes:
According to the network log for the operation that user executes multi-medium data, multi-medium data operated by user is determined Second class label information.
In a kind of possible embodiment, according to the network log for the operation that user executes multi-medium data, determines and use Second class label information of multi-medium data operated by family, comprising:
For each log set, sequentially in time, the corresponding multimedia of network log for belonging to same operation is determined Whether data include specific label information, and the number for the network log that the log set includes is greater than K, and K is whole greater than 0 It counts, the corresponding multi-medium data of at least K/A network log includes in the specific label information log set First label information, A are the second threshold of setting;
If jth time determines that the P1 continuous corresponding multi-medium datas of network log include the specific label letter Breath, jth+1 time the multi-medium data for determining that P2 continuous network logs are answered includes the specific label information, and in institute State in log set be located at sequentially in time network log that jth time is determined and jth+1 time network log determined it Between the number of network log be less than the 4th threshold value of setting, be determined as the specific label information to be located at jth and time determine Network log and the network log time determined of i+1 between the corresponding multi-medium data of network log the second class label Information, j=1,2 ..., L, the L are positive integer, and P1 and P2 are all larger than the third threshold value of setting.
In a kind of possible embodiment, the method also includes:
In the multi-medium data for being added to the second class label information, record adds the second class label information Temporal information;
After the temporal information is more than the time threshold of setting, second category is deleted from the multi-medium data Sign information.
In a kind of possible embodiment, according to the characteristic information of the multi-medium data and pre-generated tree The corresponding characteristic information of each branch tree, determine the coverage rate of the multi-medium data Yu each branch tree, comprising:
For the branch tree, the friendship of the characteristic information of multi-medium data feature corresponding with the branch tree is determined The number M for the characteristic information that concentration includes;
It determine the characteristic information of multi-medium data characteristic information corresponding with the branch tree and concentrates and includes The number N 1 of characteristic information, and according to the ratio of the M and the N1 determines covering for the multi-medium data and the branch tree Lid rate;Or determine the number of the number characteristic information corresponding with the branch tree of the characteristic information of the multi-medium data Total number N2, and according to the ratio of the M and the N2, determine the coverage rate of the multi-medium data Yu the branch tree.
Second aspect, the present invention also provides a kind of processing unit of multi-medium data, described device includes:
Receiving module, for receiving multi-medium data to be processed;
Branch's tree determining module, for the characteristic information and pre-generated tree according to the multi-medium data The corresponding characteristic information of each branch tree, determines the coverage rate of the multi-medium data Yu each branch tree, wherein the coverage rate is used In the similarity degree for indicating the multi-medium data and each branch tree;
Branch's determining module, for determining that the coverage rate is greater than the branch tree of the first preset threshold, and from the branch In the Rule of judgment branch that tree includes, the Rule of judgment branch that the characteristic information of the multi-medium data meets is determined;
Label determining module, for being determined as the multimedia for the value of the leaf node in the Rule of judgment branch The first kind label information of data.
In a kind of possible embodiment, branch's determining module is specifically used for:
According to the priority orders of the Rule of judgment branch, successively by the characteristic information of the multi-medium data with it is described The Rule of judgment of Rule of judgment branch is matched;
If at least one characteristic information of the multi-medium data is matched with the Rule of judgment of any Rule of judgment branch, Determine that the characteristic information of the multi-medium data meets the Rule of judgment branch.
In a kind of possible embodiment, the label determining module is also used to:
In the first kind label information of the multi-medium data, the first category for belonging to same category and mutual exclusion is determined Sign information;
If the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, belonged to described in reservation A first kind label information in same category and the first kind label information of mutual exclusion.
In a kind of possible embodiment, the label determining module is also used to:
The network log for executing operation to multi-medium data according to user, determines the of multi-medium data operated by user Two class label informations.
In a kind of possible embodiment, the label determining module is specifically used for:
For each log set, sequentially in time, the corresponding multimedia of network log for belonging to same operation is determined Whether data include specific label information, and the number for the network log that the log set includes is greater than K, and K is whole greater than 0 It counts, the corresponding multi-medium data of at least K/A network log includes in the specific label information log set First label information, A are the second threshold of setting;
If jth time determines that the P1 continuous corresponding multi-medium datas of network log include the specific label letter Breath, jth+1 time the multi-medium data for determining that P2 continuous network logs are answered includes the specific label information, and in institute State in log set be located at sequentially in time network log that jth time is determined and jth+1 time network log determined it Between the number of network log be less than the 4th threshold value of setting, be determined as the specific label information to be located at jth and time determine Network log and the network log time determined of i+1 between the corresponding multi-medium data of network log the second class label Information, j=1,2 ..., L, the L are positive integer, and P1 and P2 are all larger than the third threshold value of setting.
In a kind of possible embodiment, the branch tree determining module is specifically used for:
For the branch tree, the friendship of the characteristic information of multi-medium data feature corresponding with the branch tree is determined The number M for the characteristic information that concentration includes;
It determine the characteristic information of multi-medium data characteristic information corresponding with the branch tree and concentrates and includes The number N 1 of characteristic information, and according to the ratio of the M and the N1 determines covering for the multi-medium data and the branch tree Lid rate;Or determine the number of the number characteristic information corresponding with the branch tree of the characteristic information of the multi-medium data Total number N2, and according to the ratio of the M and the N2, determine the coverage rate of the multi-medium data Yu the branch tree.
In multimedia data processing method provided in an embodiment of the present invention and device, multi-medium data to be processed is received; According to the corresponding characteristic information of each branch tree of the characteristic information of the multi-medium data and pre-generated tree, determine The coverage rate of the multi-medium data and each branch tree;Determine the coverage rate be greater than the first preset threshold branch tree, and from In the Rule of judgment branch that the branch tree includes, the Rule of judgment point that the characteristic information of the multi-medium data meets is determined Branch;By the value of the leaf node in the Rule of judgment branch, it is determined as the first kind label information of the multi-medium data, from And it can quickly and accurately determine the label information of multi-medium data.In addition, since coverage rate is greater than preset first threshold Branch tree number may more than one, accordingly, it is determined that go out multi-medium data label information number be more than one, So that the label information covering of multi-medium data is more comprehensively, the accuracy that the label information based on multi-medium data is recommended is more It is high.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the processing method of multi-medium data provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of tree provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of the branch tree of tree provided in an embodiment of the present invention;
Fig. 4 is the flow diagram of the processing method of another multi-medium data provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of the processing unit of multi-medium data provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
The embodiment of the present invention is described in further detail with reference to the accompanying drawings of the specification.It should be appreciated that described herein Embodiment only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.
In embodiment shown in FIG. 1, a kind of processing method of multi-medium data is provided, which comprises
S11, multi-medium data to be processed is received;
In this step, received multi-medium data can be user's upload, be also possible to obtain from database, The embodiment of the present invention is not defined the mode for obtaining multi-medium data.
Optionally, the multi-medium data includes but is not limited to: audio data (such as song, opera etc.) and video data (such as TV play, film) etc..
By taking multi-medium data is song as an example, the characteristic information of the characteristic for characterizing song includes: song title, singer Name, using musical instrument, rhythm, beat, music type, pouplarity, write words, wrirte music, the crucial lyrics etc..For example, mark (ID) sequence formed for the characteristic information of 001 song are as follows: [miss, Zheng Jun is popular, elopement, school, Bruce, rock and roll, Baidu, guitar, waist drum, saxophone, Chang'an].
It is S12, corresponding according to the characteristic information of the multi-medium data and each branch tree of pre-generated tree Characteristic information determines the coverage rate of the multi-medium data Yu each branch tree, wherein the coverage rate is for indicating more matchmakers The similarity degree of volume data and each branch tree.
S13, determine that the coverage rate is greater than the branch tree of the first preset threshold, and the judgement item for including from the branch tree In part branch, the Rule of judgment branch that the characteristic information of the multi-medium data meets is determined;
S14, by the value of the leaf node in the Rule of judgment branch, be determined as the first category of the multi-medium data Sign information.
Specifically, being greater than the branch tree of preset first threshold for each coverage rate, it is performed both by S13~S14.Due to covering Lid rate is greater than the possible more than one of number of the branch tree of preset first threshold, of the label information of the multi-medium data Number is at least one.
In the embodiment of the present invention, multi-medium data to be processed is received;According to the characteristic information of the multi-medium data and The corresponding characteristic information of each branch tree of pre-generated tree, determines the covering of the multi-medium data Yu each branch tree Rate;Determine that the coverage rate is greater than the branch tree of the first preset threshold, and from the Rule of judgment branch that the branch tree includes, Determine the Rule of judgment branch that the characteristic information of the multi-medium data meets;By the leaf node in the Rule of judgment branch Value, be determined as the first kind label information of the multi-medium data, so as to quickly and accurately determine multi-medium data Label information.In addition, due to the possible more than one of number that coverage rate is greater than the branch tree of preset first threshold, The number of the label information for the multi-medium data determined is more than one, so that the label information covering of multi-medium data is more complete Face, the accuracy that the label information based on multi-medium data is recommended are higher.
In a kind of possible embodiment, according to the characteristic information of the multi-medium data and pre-generated tree in S12 The corresponding characteristic information of each branch tree of type structure, determines the coverage rate of the multi-medium data Yu each branch tree, including following Two kinds of possible implementations:
One, for each branch tree, determine that the characteristic information of the multi-medium data is corresponding with the branch tree The number M for the characteristic information for including in the intersection of feature, and the characteristic information of the determining multi-medium data and the branch tree Corresponding characteristic information and concentrate include characteristic information number N 1;And the ratio according to the M and the N1, it determines The coverage rate of the multi-medium data and the branch tree.
For example, the ratio of the M and the N1 are directly determined as the covering of the multi-medium data Yu the branch tree Rate.
Two, for each branch tree, determine that the characteristic information of the multi-medium data is corresponding with the branch tree The number M for the characteristic information for including in the intersection of feature, and determine the multi-medium data characteristic information number with it is described The total number N2 of the number of the corresponding characteristic information of branch tree;And the ratio according to the M and the N2, determine more matchmakers The coverage rate of volume data and the branch tree.
For example, the ratio of the M and the N2 are directly determined as the covering of the multi-medium data Yu the branch tree Rate.
For another example, since the number of the corresponding characteristic information of different branch trees may be different, in order to improve multi-medium data with The comparativity of the coverage rate of different branch trees, the value that the ratio of the M and N2 is obtained multiplied by 2, is determined as more matchmakers The coverage rate of volume data and the branch tree.
Certainly, the embodiment of the present invention is not limited to determine coverage rate using above-mentioned two mode, can also use other modes, As long as can determine that the mode of the level of coverage between two sequences is encompassed by invention which is intended to be protected.
In a kind of possible embodiment, the method also includes:
In the first kind label information of the multi-medium data, the first category for belonging to same category and mutual exclusion is determined Sign information;
If the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, belonged to described in reservation A first kind label information in same category and the first kind label information of mutual exclusion.
Specifically, the first kind label information of the multi-medium data is filtered according to the mutual exclusion rule of setting, it will The first kind label information that same category and mutual exclusion are belonged in the first kind label information of the multi-medium data is filtered, Only retain one, so that the first kind label information of the multi-medium data is more acurrate.For example, being based on above-mentioned S11 Belonged in the first kind label information that~S14 is obtained class of languages and mutual exclusion first kind label information include Korean, Japanese and Chinese, since same multi-medium data can only add a class of languages label, selected from three first kind label informations One is selected, and deletes other two.
Optionally, if the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, described in reservation Belong to a first kind label information in the first kind label information of same category and mutual exclusion, including following possible implementation Mode:
If mode 1, the number of the first kind label information for belonging to same category and mutual exclusion are greater than 1, return from described Belong to one label information of random selection in the first kind label information of same category and mutual exclusion, and is belonged to described in deletion same Other label informations in classification and the first kind label information of mutual exclusion.
If mode 2, the number of the first kind label information for belonging to same category and mutual exclusion are greater than 1, according at least The first kind label information of one other classification is selected from the first kind label information for belonging to same category and mutual exclusion One label information, and other label informations in the first kind label information of same category and mutual exclusion are belonged to described in deleting.
In which, same category and mutual exclusion can be belonged to from described based on the first kind label information of other classifications First kind label information in select a label information, to keep the first kind label information remained more acurrate.
For example, if the first kind label information for belonging to class of languages and mutual exclusion in the label information of some song includes Korean, Japanese and Chinese may further then select according to the corresponding first kind label information of singer's name in the song, have Body: if the entitled Chinese of singer, Chinese is selected from the first kind label information for belong to class of languages, if entitled Korea Spro of singer Text then selects Korean from the first kind label information for belonging to class of languages, if the entitled Japanese of singer, from belonging to class of languages First kind label information in select Japanese.It for another example, can also be according to the corresponding first kind label information of song title in the song It is selected, specific: if the entitled Chinese of song, Chinese is selected from the first kind label information for belong to class of languages, if The entitled Korean of song, then select Korean from the first kind label information for belonging to class of languages, if the entitled Japanese of song, from returning Belong in the first kind label information of class of languages and selects Japanese.
Based on any of the above-described embodiment, in a kind of possible embodiment, the method also includes:
According to the network log for the operation that user executes multi-medium data, multi-medium data operated by user is determined Second class label information.
Specifically, being based on user behavior by the analysis to network log forms modify (modification) using rule, with true Determine the second class label information of multi-medium data, thus based on multi-medium data label information (including the first kind label letter Breath and the second class label information) be user recommend multi-medium data when, accuracy is higher.
Optionally, the network log of the operation executed according to user to multi-medium data periodically determines operated by user Multi-medium data the second class label information.I.e. for each setting period, according in the setting period for indicating user To the network log for the operation that multi-medium data executes, to determine the second class label letter of multi-medium data operated by user Breath.For example, the network log that statistics is daily, to determine the second class label information of multi-medium data operated by user.
Optionally, include but is not limited at least one of following information in the network log:
The time letter that the identification information of operated multi-medium data, the identification information of performed operation, execution operate The label information (including first kind label information and second class label information) of breath, operated multi-medium data.
Optionally, include but is not limited to following operation to the operation that multi-medium data executes: collection operation, delete operation, Play operation etc..
In a kind of possible embodiment, according to the network log for the operation that user executes multi-medium data, determines and use Second class label information of multi-medium data operated by family, comprising:
For each log set, sequentially in time, the corresponding multimedia of network log for belonging to same operation is determined Whether data include specific label information, and the number for the network log that the log set includes is greater than K, and K is whole greater than 0 It counts, the corresponding multi-medium data of at least K/A network log includes in the specific label information log set First label information, A are the second threshold of setting;
If jth time determines that the P1 continuous corresponding multi-medium datas of network log include the specific label letter Breath, jth+1 time the multi-medium data for determining that P2 continuous network logs are answered includes the specific label information, and in institute State in log set be located at sequentially in time network log that jth time is determined and jth+1 time network log determined it Between the number of network log be less than the 4th threshold value of setting, be determined as the specific label information to be located at jth and time determine Network log and the network log time determined of i+1 between the corresponding multi-medium data of network log the second class label Information, j=1,2 ..., L, the L are positive integer, and P1 and P2 are all larger than the third threshold value of setting.
Specifically, in chronological order, successively determining the network log pair in the log set for each log set It whether include the specific label information in the first kind label information for the multi-medium data answered.Further, if jth time is determined Continuously there is the corresponding multi-medium data of P1 network log comprising continuously having less than the 4th threshold value after the specific label information The corresponding multi-medium data of a network log does not include the specific label information, next determines continuously have jth+1 time again The corresponding multi-medium data of P2 network log includes the specific label information, then the specific label information is determined as institute In the second class label information for stating multi-medium data corresponding less than the 4th threshold value network log;If jth time is determined continuous There is the corresponding multi-medium data of P1 network log comprising continuously having more than or equal to the 4th threshold after the specific label information It is worth the corresponding multi-medium data of network log and does not include the specific label information, next and determines for jth+1 time continuous Having the corresponding multi-medium data of P2 network log includes the specific label information, then with no treatment, continues determination and connect Whether include the specific label information in the label information of the corresponding multi-medium data of the network log to get off, and repeats above-mentioned Process, until the last one network log in the log set.
Optionally, network log can be divided into multiple nets according to the temporal information of network log, as unit of the time Network log group;For example, being divided as unit of 1 hour;Included net is determined from the network log group divided again The number of network log is greater than the net that multi-medium data operated by K and at least K/A network log includes specific label information Network log group (the i.e. described log set).Can also according to the temporal information of network log, as unit of the number of network log, Network log is divided into multiple network log groups;For example, being divided as unit of the K;Again from the network day divided Determine that multi-medium data operated by least K/A network log includes specific label in included network log in will group The network log group (the i.e. described log set) of information.
Optionally, since not every user's operation all has regularity, it can be based on setting operation, determined Second class label information of multi-medium data, so as to determine that the second class label information of multi-medium data is more accurate.For example, Based on play operation, the second class label information of multi-medium data is determined.
Optionally, in order to keep the second class specific label information determined more accurate, the method also includes:
For being added to any multi-medium data of the second class label information, if the log set of processed setting quantity In, the quantity of the multi-medium data comprising being added to any second class label information is unsatisfactory for the quantity of the multi-medium data The constraint condition of setting deletes any second class label information in the multi-medium data.
Wherein, the quantity of the multi-medium data is the multi-medium data without adding any second class label information Quantity be added to the sum of the quantity of the multi-medium data of any second class label information.
Specifically, multiple log set are based on, in order to judge whether identified second class label information is accurate, for adding Any multi-medium data of any second class label information is added, if in the log set of processed setting quantity.If comprising The quantity of the quantity and the multi-medium data that are added to the multi-medium data of the second class label information meets the constraint of setting Condition, then it is assumed that the multi-medium data should include the second class label information;If comprising being added to the second class label information The quantity of the multi-medium data and the quantity of the multi-medium data be unsatisfactory for the constraint condition, then it is assumed that the multi-medium data It should not include the second class label information.
Optionally, the constraint condition of setting are as follows: the multi-medium data comprising being added to any second class label information Quantity is greater than the half of the quantity of the multi-medium data.
For example, carrying out time cutting to network log, i.e., network log is divided into multiple days as unit of the time Will set carries out the dependence based on time context for each log set and extracts and analyze, to each log set The specific label information that multi-medium data operated by middle network log includes is determined to maximize scene collection in chronological order Ctag is closed, specific as follows:
It 1) include in chronological order, that first network log of specific label information tag is opened out of this log set Begin, carries out the expansion of scene set, it is assumed that it include that first article of network log of tag is a for the gi in the log set, it will The gi network log is added in Ctag;
If 2) the gi+1 network log also includes tag, the gi+1 network log is added in Ctag, repetition is held The row step;Otherwise it enters step 3);
3) number of the continuous network log not comprising tag is searched for backward since gi+1, if the sequence not comprising tag is Gi+1, gi+2 ..., gi+k then continue to search for the continuous network log comprising tag backward since gi+k+1, are set as gi+k+p It is a.If p > k/2, Ctag is added in gi+1 to gi+k+p network log, otherwise enters step 4.
4) gi+k+1 is started first network log comprising tag backward to remember as the starting of this scene set Record, if the network log is not the last one network log in log set, return step 2), it otherwise enters step 5).
5) if the size of Ctag is greater than minimum scene queue thresholds Φ, all networks not comprising tag in Ctag The mark of this tag and first network log comprising tag is all added in the characteristic information of multi-medium data operated by log Know information, otherwise this circulation terminates.
Further, all multi-medium datas for being added to tag are calculated, if being added to the multimedia messages of tag Frequency of occurrence is greater than the 1/2 of the multimedia messages frequency of occurrence, then adds this tag attribute and current time for the multimedia messages As modifytime attribute.
Optionally, the method also includes:
In the multi-medium data for being added to the second class label information, record adds the time of the second class label information Information;
After the temporal information is more than the time threshold of setting, second category is deleted from the multi-medium data Sign information.
Specifically, constantly recycling the time attribute modifytime added to this stage in global dynamic scene management process (adding the temporal information of tag) is verified, if the time attribute of any multimedia messages exceeds validity period threshold value, is deleted Except the second class label information and time attribute added in the multimedia messages.
Based on any of the above-described embodiment, the tree can be generated in accordance with the following steps in the embodiment of the present invention:
According to label classification belonging to the label information of preconfigured sample data, by the sample data be divided into Few two datasets are closed, and each data acquisition system corresponds to a branch tree of the tree;
For each data acquisition system, spy belonging to the characteristic information for the sample data for including according to the data acquisition system Classification is levied, the sample data in the data acquisition system is divided at least one classification group, and calculate the information of each classification group Ratio of profit increase, the information gain-ratio are the comentropy determinations based on the characteristic information of the sample data in the classification group;According to It is secondary to select the maximum classification group of information gain-ratio as Split Attribute, the sample for including according to the maximum classification group of information gain-ratio The characteristic information of data, constructs the Rule of judgment branch of the branch tree, and the leaf node in the Rule of judgment branch is institute State the label information of the sample data in data acquisition system.
Wherein, the information gain-ratio of classification group is bigger, then the priority of the corresponding Rule of judgment branch of category group is higher.
For example, by taking song as an example, feature classification belonging to characteristic information include but is not limited to musical instrument class, year of issue for class, Singer's album class etc..For another example, by taking opera as an example, feature classification belonging to characteristic information include but is not limited to singing style class, musical instrument class, Role class etc..
Optionally, when constructing the Rule of judgment branch of the branch tree, if the maximum classification group of current information ratio of profit increase Including at least two, then select a sorting group as Split Attribute from least two sorting group.For example, from it is described to One sorting group of random selection is as Split Attribute in few two sorting groups.
In a kind of possible embodiment, according to label classification belonging to the label information of preconfigured sample data, The sample data is divided at least two data acquisition systems, is specifically included:
According to the characteristic information of any two sample data, the coverage rate of any two sample data is determined;
If the coverage rate is greater than the 5th threshold value of setting, any two sample data is merged, forms number According to group, and the step of returning to determining coverage rate, until after the coverage rate determined is respectively less than or is equal to the 5th threshold value, it will be every A finally obtained data group is determined as a data acquisition system.
The above process is known as preshearing branch process, i.e., first using sample data as the leaf node of tree, then to it Preshearing branch is carried out, so that incoherent sample data is assigned in different branch trees, it is specific as follows:
A) coverage rate (coverage_rate) between any two sample data.
Assuming that the characteristic information sequence of sample data 1 is L1=[l11,l12,l13...], the characteristic information sequence of sample data 2 It is classified as L2=[l21,l22,l23...], then the coverage rate coverage_rate between sample data 1 and sample data 212=2*len (L1 ∩ L2)/len (L1+L2), wherein len (L1 ∩ L2) indicates that the element number for including in L1 ∩ L2, len (L1+L2) indicate The sum of the element number for including in the element number and L2 for including in L1.
B) two sample datas of coverage_rate > ω (i.e. the 5th threshold value) are merged.
C) coverage rate is recalculated.
D) step b and c is repeated until without annexable sample data or data group.
The number of initial branch tree can be reduced by above-mentioned predictive pruning process, to reduce the calculating of follow-up decision classification Amount improves the treatment effeciency of follow-up decision classification.
Optionally, before carrying out preshearing branch, further includes:
The characteristic information of sample data is normalized;
Assimilation is carried out to the characteristic information of normalized sample data to handle.
Wherein, normalized is will to belong to the characteristic information of same feature classification in the characteristic information of all sample datas Normalizing is carried out, for example, Bruce, rhythm and blues, Blues, R&B these characteristic informations are normalized to Bruce type.It goes to assimilate Processing is that the characteristic information that all sample datas all include in the characteristic information by sample data is rejected.
In a kind of possible embodiment, according to the characteristic information of any two sample data, any two are determined The coverage rate of sample data, comprising:
For any two sample data, determines in the intersection of the characteristic information of any two sample data and include The number M of characteristic information, and determine the total number N2 of the characteristic information of any two sample data;And according to the M With the ratio of the N2, the coverage rate of any two sample data is determined;Or
For any two sample data, determines in the intersection of the characteristic information of any two sample data and include The number M of characteristic information, and determine any two sample data characteristic information and concentrate the characteristic information for including Number N 2;And the ratio according to the M and the N2, determine the coverage rate of any two sample data.
For example, the ratio of the M and the N2 are determined as the coverage rate of any two sample data.For another example, will The value that the ratio of the M and N2 is obtained multiplied by 2, is determined as the coverage rate of any two sample data.For another example, by institute The ratio for stating M Yu the N2 is determined as the coverage rate of any two sample data.
In the embodiment of the present invention, the information gain-ratio of each classification group can be used for judging the sample data in category group Characteristic information, for determine label information classification ability.Wherein, the breath ratio of profit increase of classification group is bigger, then illustrates the category The characteristic information of sample data in group, the ability for determining label information classification are stronger.It is selected by information gain-ratio Split Attribute constructs Rule of judgment branch, can be partial to attribute when selecting Split Attribute so as to overcome through information gain Deficiency of more classifications as Split Attribute.
In a kind of possible embodiment, according to following formula, the information gain-ratio of each classification group is calculated:
Wherein, A indicates that the corresponding tag set of each data acquisition system, C indicate that every class label in A, ε are all categories group The average value of SplitInfo (A, C);N is the number of classification group, mnFor the number of sample data in classification group, C indicates that every class is special Reference breath, c indicate the characteristic information in C, numcIndicate the number of the sample data comprising characteristic information c, numxIndicate data set The number for the sample data for including in conjunction.
Wherein, E (c)=sum (- p (I) * log (p (I))), I=1,2 ..., X, X are indicated according to preconfigured classification gauge The number (i.e. the number of data acquisition system) of the data group then divided.P (I) indicates that characteristic information c is appeared in sample data Probability in i-th data acquisition system.For example, the number for occurring the sample data of characteristic information a in sample data is s, s sample The number for the sample data that classification is I in data is m, then p (I)=m/s.
For example, being illustrated by taking audio data as an example, it is assumed that have 8 sample datas, be denoted as m1~m8.Firstly, root According to the description information of sample data, the available characteristic information to sample data, the characteristic information that will acquire is cleaned After polymerization, the corresponding characteristic information sequence of each sample data is obtained, specific as follows:
M1:[Bruce, acoustic guitar, electric guitar, mouth organ, harmony, chant is impromptu, original, Blues, and comfort is nervous, It complains tearfully, discongests];
M2:[rock and roll, passion, rhythm, frame drum, bass, solo, guitar];
M3:[Lu opera, your pupil, the role of a young woman in traditional opera, clown, weight qin are loud and clear];
M4:[light music, releives, graceful, easily, comfortably, piano, violoncello, saxophone, harp, flute, clarinet, Panpipes, violin, popular musical instrument, datura, mouth organ, accordion, xylophone, three foot iron, a pair of hand-held bells played by striking together, sand hammer, Mantovani];
M5:[Beijing opera, raw, denier is only, ugly, Mei Pai, Ma Pai, parts in Beijing opera spoken in Beijing dialect, Chinese fiddle];
M6:[Ban get Rui, releives, graceful, easily, comfortably, violin, piano, xylophone, three foot iron];
M7:[Beijing opera, gong and drum are loud and clear, raw, and denier is net];
M8:[Lu opera, gong and drum are loud and clear, the role of a young woman in traditional opera, clown].
First element in features described above sequence is the sample data generic, is not involved in subsequent processing.
Then, the coverage rate between any two sample data is calculated, is closed sample data based on obtained coverage rate And, it is assumed that above-mentioned 8 sample datas are finally divided into three data acquisition systems: [(m1, m2), (m3, m5, m7, m8), (m4, m6)], As shown in Fig. 2, whole decision tree is divided into three branch trees.For each branch tree, each class in the branch tree is calculated Other group of comentropy E:
By taking data acquisition system (m3, m5, m7, m8) as an example, the label informations of four sample datas only there are two, i.e., Lu opera and Beijing opera calculates the comentropy E of the characteristic information of sample data in (m3, m5, m7, m8), specifically in conjunction with preconfigured knowledge table It is as follows:
E (your pupil, the role of a young woman in traditional opera, clown)=- (1/2) * log2(1/2)-(1/2)*log2(1/2)=1;
E (raw, denier, net)=- (1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (weight qin)=- (1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (gong and drum)=- (1/2) log2(1/2)-(1/2)log2(1/2)=1;
E (loud and clear)=- (1/3) log2(1/3)-(1/2)log2(1/2)=1.0266;
E (horse group)=- (1/1) log2(1/1)-(0/1)log2(0/1)=0.
Then, of all categories group of information gain in data acquisition system is calculated, specifically: the sample in (m3, m5, m7, m8) The characteristic information of data is divided into role class, singing style class and musical instrument class, in which:
Gain (musical instrument)=E (musical instrument)-(2/4) E (weight qin)-(2/4) E (gong and drum)=- (1/2) log2(1/2)-(1/ 2)log2(1/2) -1/2-1/2=0;
Gain (singing style)=E (singing style)-(1/4) E (horse group)-(3/4) E (loud and clear)=1- (1/4) * 0- (3/4) * 1.0266 =0.231;
Gain (role)=E (role)-(2/4) E (your pupil, the role of a young woman in traditional opera, clown)-(2/4) E (raw, denier, net)=1- (2/4) * 1- (2/4) * 1=0.
Then, the information gain-ratio of each classification group is calculated, specifically:
GainRatio (musical instrument)=0;
GainRatio (role)=0;
GainRatio (singing style)=0.3113/ (SplitInfo+ ε)=0.3113/ (2.8775+1.63)=0.07.
Then, the maximum attribute of information gain-ratio is successively chosen as Split Attribute, constructs the Rule of judgment point of branch tree Branch, structure are as shown in Figure 3.When constructing the Rule of judgment branch of branch tree, termination condition all distinguishes for any label, or The degree that the label of any sample data covers classification group described in the label reaches preset coverage rate threshold value.
The treatment process of method provided in an embodiment of the present invention is as shown in Figure 4, comprising:
1) it pre-processes.Specifically: when there is multimedia data entry, to the description information of multi-medium data, located in advance Reason (including cleaning, the processing such as polymerization), obtains the characteristic information of the multi-medium data;
2) the differentiation stage based on tree.Specifically: according to the characteristic information of the multi-medium data, it is based on pre- Mr. At tree, determine the label information of the multi-medium data;
3) processing based on Modify rule.Specifically: according to Modify rule, the multi-medium data is updated for the first time Label information;Wherein, Modify rule analysis network log obtains.
4) based on the processing of mutual exclusion rule.Specifically: according to mutual exclusion rule, the label of the multi-medium data is updated for the second time Information, and the multi-medium data is stored in database profession;Optionally, mutual exclusion table can be what analysis network log obtained, It can be preconfigured.
5) resource is exported.Specifically: according to the label information of multi-medium data, it is based on user preference, is from database User recommends multi-medium data.
Illustrate, it is assumed that the characteristic information of multi-medium data mi currently entered be [loud and clear, gong and drum, raw, denier, only, It is ugly, urheen], mi is inputted into decision-tree model shown in Fig. 3, calculates separately the coverage rate of three branch trees of mi and decision tree, As long as coverage rate is greater than the first threshold of setting, which flows to the branch tree, and such a multi-medium data can belong to multiple Branch tree a, so that multi-medium data can have the label information of multiple and different classifications.It is computed and assumes that mi is only divided Into (m3, m5, m7, m8) corresponding branch tree, further, determine that the characteristic information of mi singing style class is [loud and clear] first, and point It is not compared with the first layer Rule of judgment branch of the branch tree, that is, calculates separately [the horse with first layer Rule of judgment branch Group] and [loud and clear] coverage rate, determine that mi and right side [loud and clear] branch covering rate are greater than left side, then the mi enters right side and divides Branch.Similarly, subsequent branches are carried out until mi enters Beijing opera leaf node, adds Beijing opera label information for mi.
When laundering period based on user's explicit feedback, if mi is appeared in certain scene set: [[t1, m1, g1], [t2, M2, g2], [t3, m3, g3] ... [ti, mi, gi] ...], it is assumed that the label information of mi be tags:[tag1, tag2 ...], Dinamictags:[{ addtag: ' p.prand ', modifytime: ' 11:30:10 ' }, addtag: ' after supper ', modifytime:‘17:20:00'}]}.The number scale for calculating the network log of occurred mi is totlenum, is calculated all The item number of tag label is added in Ctag set for mi, addnum is denoted as, if addnum > (1/2) totlenum, for mi This label of tag is added, tag represents " p.prand " or " after supper " herein.
Above method process flow can realize that the software program can store in storage medium with software program, when When the software program of storage is called, above method step is executed.
Based on the same inventive concept, a kind of processing unit of multi-medium data is additionally provided in the embodiment of the present invention, due to The principle that the device solves the problems, such as is similar to a kind of above-mentioned processing method of multi-medium data, therefore the implementation of the device can be joined The implementation of square method, overlaps will not be repeated.
In embodiment shown in fig. 5, a kind of processing unit of multi-medium data is provided, comprising:
Receiving module 51, for receiving multi-medium data to be processed;
Branch's tree determining module 52, for the characteristic information and pre-generated tree according to the multi-medium data The corresponding characteristic information of each branch tree, determine the coverage rate of the multi-medium data Yu each branch tree, wherein the coverage rate For indicating the similarity degree of the multi-medium data Yu each branch tree;
Branch's determining module 53, for determining that the coverage rate is greater than the branch tree of the first preset threshold, and from described point In the Rule of judgment branch that Zhi Shu includes, the Rule of judgment branch that the characteristic information of the multi-medium data meets is determined;
Label determining module 54, for being determined as more matchmakers for the value of the leaf node in the Rule of judgment branch The first kind label information of volume data.
Optionally, branch's determining module 52 is specifically used for:
According to the priority orders of the Rule of judgment branch, successively by the characteristic information of the multi-medium data with it is described The Rule of judgment of Rule of judgment branch is matched;
If at least one characteristic information of the multi-medium data is matched with the Rule of judgment of any Rule of judgment branch, Determine that the characteristic information of the multi-medium data meets any Rule of judgment branch.
Optionally, the label determining module 54 is also used to:
In the first kind label information of the multi-medium data, the first category for belonging to same category and mutual exclusion is determined Sign information;
If the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, belonged to described in reservation A first kind label information in same category and the first kind label information of mutual exclusion.
Based on any of the above-described embodiment, optionally, the label determining module 54 is also used to:
According to the network log for the operation that user executes multi-medium data, multi-medium data operated by user is determined Second class label information.
In a kind of possible embodiment, the label determining module 54 is specifically used for:
For each log set, sequentially in time, the corresponding multimedia of network log for belonging to same operation is determined Whether data include specific label information, and the number for the network log that the log set includes is greater than K, and K is whole greater than 0 It counts, the corresponding multi-medium data of at least K/A network log includes in the specific label information log set First label information, A are the second threshold of setting;
If jth time determines that the P1 continuous corresponding multi-medium datas of network log include the specific label letter Breath, jth+1 time the multi-medium data for determining that P2 continuous network logs are answered includes the specific label information, and in institute State in log set be located at sequentially in time network log that jth time is determined and jth+1 time network log determined it Between the number of network log be less than the 4th threshold value of setting, be determined as the specific label information to be located at jth and time determine Network log and the network log time determined of i+1 between the corresponding multi-medium data of network log the second class label Information, j=1,2 ..., L, the L are positive integer, and P1 and P2 are all larger than the third threshold value of setting.
Further, the label determining module 54 is also used to:
In the multi-medium data for being added to the second class label information, record adds the time of the second class label information Information;
After the temporal information is more than the time threshold of setting, second category is deleted from the multi-medium data Sign information.
Based on any of the above-described embodiment, optionally, the branch tree determining module 52 is specifically used for:
For the branch tree, the friendship of the characteristic information of multi-medium data feature corresponding with the branch tree is determined The number M for the characteristic information that concentration includes;
It determine the characteristic information of multi-medium data characteristic information corresponding with the branch tree and concentrates and includes The number N 1 of characteristic information, and according to the ratio of the M and the N1 determines covering for the multi-medium data and the branch tree Lid rate;Or determine the number of the number characteristic information corresponding with the branch tree of the characteristic information of the multi-medium data Total number N2, and according to the ratio of the M and the N2, determine the coverage rate of the multi-medium data Yu the branch tree.
Based on any of the above-described embodiment, optionally, described device further include:
Modeling module 55, for generating the tree in accordance with the following steps:
According to label classification belonging to the label information of preconfigured sample data, by the sample data be divided into Few two datasets are closed, and each data acquisition system corresponds to a branch tree of the tree;
For each data acquisition system, spy belonging to the characteristic information for the sample data for including according to the data acquisition system Classification is levied, the sample data in the data acquisition system is divided at least one classification group, and calculate the information of each classification group Ratio of profit increase, the information gain-ratio are the comentropy determinations based on the characteristic information of the sample data in the classification group;According to It is secondary to select the maximum classification group of information gain-ratio as Split Attribute, the sample for including according to the maximum classification group of information gain-ratio The characteristic information of data, constructs the Rule of judgment branch of the branch tree, and the leaf node in the Rule of judgment branch is institute State the label information of the sample data in data acquisition system.
Optionally, the modeling module 55 is specifically used for:
According to the characteristic information of any two sample data, the coverage rate of any two sample data is determined;
If the coverage rate is greater than the 5th threshold value of setting, any two sample data is merged, forms number According to group, and the step of returning to determining coverage rate, until after the coverage rate determined is respectively less than or is equal to the 5th threshold value, it will be every A finally obtained data group is determined as a data acquisition system.
In the present embodiment, receiving module 51, branch's tree determining module 52, branch's determining module 53, label determining module 54 and modeling module 55 be to be presented in the form of functional module.Here " module " can refer to application-specific integrated circuit (application-specific integrated circuit, ASIC), circuit execute one or more softwares or firmware The processor and memory of program, integrated logic circuit and/or other device of above-mentioned function can be provided.It is simple at one Embodiment in, those skilled in the art is contemplated that receiving module 51 and modeling module 55 can be by computer equipments Processor, memory and input interface etc. realize, branch's tree determining module 52, branch's determining module 53 and label determining module 54 can be realized by the processor of computer equipment and memory etc..
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of processing method of multi-medium data, which is characterized in that the described method includes:
Receive multi-medium data to be processed;
According to the corresponding characteristic information of each branch tree of the characteristic information of the multi-medium data and pre-generated tree, Determine the coverage rate of the multi-medium data Yu each branch tree, wherein the coverage rate for indicate the multi-medium data with The similarity degree of each branch tree;
Determine that the coverage rate is greater than the branch tree of the first preset threshold, and the Rule of judgment branch for including from the branch tree In, determine the Rule of judgment branch that the characteristic information of the multi-medium data meets;
By the value of the leaf node in the Rule of judgment branch, it is determined as the first kind label information of the multi-medium data;
The method also includes:
The network log for executing operation to multi-medium data according to user, determines the second class of multi-medium data operated by user Label information.
2. the method according to claim 1, wherein from the Rule of judgment branch that the branch tree includes, really The Rule of judgment branch that the characteristic information of the fixed multi-medium data meets, comprising:
According to the priority orders of the Rule of judgment branch, successively by the characteristic information of the multi-medium data and the judgement The Rule of judgment of conditional branching is matched;
If at least one characteristic information of the multi-medium data is matched with the Rule of judgment of any Rule of judgment branch, it is determined that The characteristic information of the multi-medium data meets the Rule of judgment branch.
3. the method according to claim 1, wherein the method also includes:
In the first kind label information of the multi-medium data, the first kind label letter for belonging to same category and mutual exclusion is determined Breath;
If the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, belonged to described in reservation same A first kind label information in classification and the first kind label information of mutual exclusion.
4. the method according to claim 1, wherein the network day of operation is executed to multi-medium data according to user Will determines the second class label information of multi-medium data operated by user, comprising:
For each log set, sequentially in time, the corresponding multi-medium data of network log for belonging to same operation is determined It whether include specific label information, the number for the network log that the log set includes is greater than K, and K is the integer greater than 0, institute Stating specific label information is the first mark that the corresponding multi-medium data of at least K/A network log includes in the log set Information is signed, A is the second threshold of setting;
If jth time determines that the P1 continuous corresponding multi-medium datas of network log include the specific label information, jth + 1 time the multi-medium data for determining that P2 continuous network logs are answered includes the specific label information, and in the log The net being located at sequentially in time between the jth time network log determined and jth+1 time network log determined in set The number of network log is less than the 4th threshold value of setting, and the specific label information is determined as to be located at the secondary network determined of jth Second class label information of the corresponding multi-medium data of network log between network log that log and i+1 time are determined, j =1,2 ..., L, the L are positive integer, and P1 and P2 are all larger than the third threshold value of setting.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
In the multi-medium data for being added to the second class label information, record adds the time letter of the second class label information Breath;
After the temporal information is more than the time threshold of setting, the second class label letter is deleted from the multi-medium data Breath.
6. the method according to claim 1, wherein according to the characteristic information of the multi-medium data and pre- Mr. At tree the corresponding characteristic information of each branch tree, determine the coverage rate of the multi-medium data Yu each branch tree, wrap It includes:
For the branch tree, in the intersection for determining the characteristic information of multi-medium data feature corresponding with the branch tree The number M for the characteristic information for including;
Feature that determine the characteristic information of multi-medium data characteristic information corresponding with the branch tree and that concentration includes The number N 1 of information, and according to the ratio of the M and the N1, determine the covering of the multi-medium data Yu the branch tree Rate;Or the number of the number characteristic information corresponding with the branch tree of the characteristic information of the determining multi-medium data is total Number N 2, and according to the ratio of the M and the N2, determine the coverage rate of the multi-medium data Yu the branch tree.
7. a kind of processing unit of multi-medium data, which is characterized in that described device includes:
Receiving module, for receiving multi-medium data to be processed;
Branch's tree determining module, for according to each point of the characteristic information of the multi-medium data and pre-generated tree The corresponding characteristic information of branch tree determines the coverage rate of the multi-medium data Yu each branch tree, wherein the coverage rate is used for table Show the similarity degree of the multi-medium data Yu each branch tree;
Branch's determining module, for determining that the coverage rate is greater than the branch tree of the first preset threshold, and from the branch Shu Bao In the Rule of judgment branch contained, the Rule of judgment branch that the characteristic information of the multi-medium data meets is determined;
Label determining module, for being determined as the multi-medium data for the value of the leaf node in the Rule of judgment branch First kind label information, and according to user to multi-medium data execute operation network log, determine more operated by user Second class label information of media data.
8. device according to claim 7, which is characterized in that branch's determining module is specifically used for:
According to the priority orders of the Rule of judgment branch, successively by the characteristic information of the multi-medium data and the judgement The Rule of judgment of conditional branching is matched;
If at least one characteristic information of the multi-medium data is matched with the Rule of judgment of any Rule of judgment branch, it is determined that The characteristic information of the multi-medium data meets the Rule of judgment branch.
9. device according to claim 7, which is characterized in that the label determining module is also used to:
In the first kind label information of the multi-medium data, the first kind label letter for belonging to same category and mutual exclusion is determined Breath;
If the number of the first kind label information for belonging to same category and mutual exclusion is greater than 1, belonged to described in reservation same A first kind label information in classification and the first kind label information of mutual exclusion.
10. device according to claim 7, which is characterized in that the branch tree determining module is specifically used for:
For the branch tree, in the intersection for determining the characteristic information of multi-medium data feature corresponding with the branch tree The number M for the characteristic information for including;
Feature that determine the characteristic information of multi-medium data characteristic information corresponding with the branch tree and that concentration includes The number N 1 of information, and according to the ratio of the M and the N1, determine the covering of the multi-medium data Yu the branch tree Rate;Or the number of the number characteristic information corresponding with the branch tree of the characteristic information of the determining multi-medium data is total Number N 2, and according to the ratio of the M and the N2, determine the coverage rate of the multi-medium data Yu the branch tree.
CN201610601570.5A 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data Active CN106294563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610601570.5A CN106294563B (en) 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610601570.5A CN106294563B (en) 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data

Publications (2)

Publication Number Publication Date
CN106294563A CN106294563A (en) 2017-01-04
CN106294563B true CN106294563B (en) 2019-09-17

Family

ID=57662641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610601570.5A Active CN106294563B (en) 2016-07-27 2016-07-27 A kind for the treatment of method and apparatus of multi-medium data

Country Status (1)

Country Link
CN (1) CN106294563B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536787A (en) * 2018-03-29 2018-09-14 优酷网络技术(北京)有限公司 content identification method and device
CN109739955A (en) * 2019-01-24 2019-05-10 北京诸葛找房信息技术有限公司 Source of houses label automatic extracting device and its method based on participle with multimode matching
CN112395261A (en) * 2019-08-16 2021-02-23 中国移动通信集团浙江有限公司 Service recommendation method and device, computing equipment and computer storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317891A (en) * 2014-10-23 2015-01-28 华为软件技术有限公司 Method and device for tagging pages
CN104794179A (en) * 2015-04-07 2015-07-22 无锡天脉聚源传媒科技有限公司 Video quick indexing method and device based on knowledge tree

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8200669B1 (en) * 2008-08-21 2012-06-12 Adobe Systems Incorporated Management of smart tags via hierarchy
CN101894125B (en) * 2010-05-13 2012-05-09 复旦大学 Content-based video classification method
CN102262659B (en) * 2011-07-15 2013-08-21 北京航空航天大学 Audio label disseminating method based on content calculation
CN104657422B (en) * 2015-01-16 2018-05-15 北京邮电大学 A kind of content issue intelligent method for classifying based on categorised decision tree
CN105072460B (en) * 2015-07-15 2018-08-07 中国科学技术大学先进技术研究院 A kind of information labeling and correlating method based on video content element, system and equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317891A (en) * 2014-10-23 2015-01-28 华为软件技术有限公司 Method and device for tagging pages
CN104794179A (en) * 2015-04-07 2015-07-22 无锡天脉聚源传媒科技有限公司 Video quick indexing method and device based on knowledge tree

Also Published As

Publication number Publication date
CN106294563A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
US7788279B2 (en) System and method for storing and retrieving non-text-based information
Kaminskas et al. Location-aware music recommendation using auto-tagging and hybrid matching
US7544881B2 (en) Music-piece classifying apparatus and method, and related computer program
CN103793537B (en) System for recommending individual music based on multi-dimensional time series analysis and achieving method of system
CN109408665A (en) A kind of information recommendation method and device, storage medium
US20080288255A1 (en) System and method for quantifying, representing, and identifying similarities in data streams
US10129314B2 (en) Media feature determination for internet-based media streaming
CN108304493B (en) Hypernym mining method and device based on knowledge graph
CN106919575A (en) application program searching method and device
KR20120101233A (en) Method for providing sentiment information and method and system for providing contents recommendation using sentiment information
CN107918657A (en) The matching process and device of a kind of data source
CN106294563B (en) A kind for the treatment of method and apparatus of multi-medium data
CN111444380B (en) Music search ordering method, device, equipment and storage medium
CN105279289B (en) Individualized music based on exponential damping window recommends sort method
Ünal et al. A hierarchical approach to makam classification of Turkish makam music, using symbolic data
CN101578600A (en) System and method for associating a category label of one user with a category label defined by another user
Mueller Where’d you get that idea? Determinants of creativity and impact in popular music
KR20160116356A (en) System and Method for Predicting Music Popularity using the Signal Component Analysis
CN111581429A (en) Music pushing method, device, equipment and computer readable storage medium
KR20180064749A (en) System and method for forecasting and analysis of revenues from sound sources
Salamon Pitch analysis for active music discovery
Oliveira et al. Musical success in the United States and Brazil: Novel datasets and temporal analyses
Sauer et al. Recommending audio mixing workflows
KR20200118587A (en) Music recommendation system using intrinsic information of music
Seufitelli et al. Hit song science: a comprehensive survey and research directions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant