CN108629358A - The prediction technique and device of object type - Google Patents
The prediction technique and device of object type Download PDFInfo
- Publication number
- CN108629358A CN108629358A CN201710179031.1A CN201710179031A CN108629358A CN 108629358 A CN108629358 A CN 108629358A CN 201710179031 A CN201710179031 A CN 201710179031A CN 108629358 A CN108629358 A CN 108629358A
- Authority
- CN
- China
- Prior art keywords
- class
- label
- object set
- prediction
- unknown
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides a kind of prediction technique and device of object type, and this method includes:Obtain the characteristic and object relationship data for giving each object in object set;According to characteristic and object relationship data, the prediction label of each object is obtained;According to the original tag and prediction label of known class object, the label Variation Matrix of the first known class object set is obtained;N times sampling is carried out to unknown class collection, obtains n sampling collection, the information of each sampling collection and the information of known class object in given object set are combined respectively, obtain n combined data set;For each combined data set, combined data set and label Variation Matrix are handled, obtain n classification prediction model and n updated label Variation Matrixes;According to any unknown class A in given object setiCharacteristic, n classification prediction model and n updated label Variation Matrixes, obtain unknown class AiClass prediction result.
Description
Technical field
The invention relates to technical field of data processing, more particularly to the prediction technique and dress of a kind of object type
It sets.
Background technology
Machine learning be nearly more than 20 years rise a multi-field cross discipline, be related to probability theory, statistics, Approximation Theory,
The multi-door subject such as convextiry analysis and computational complexity theory.Machine Learning Theory, which mainly designs and analyze some, allows computer can be with
Automatically the algorithm " learnt ", the algorithm are that a kind of automatically analyzed from data obtains rule, and utilizes the rule to unknown data
The algorithm predicted.Currently, in machine learning field, the main learning method different including three classes, respectively supervised learning,
Unsupervised learning and semi-supervised learning.
In the prior art, in the learning method using machine learning when the sample to not tape label carries out class prediction,
There is a problem of that prediction result accuracy rate is low.Therefore how classification to be carried out based on the learning method of machine learning accurately pre-
It surveys, it has also become those skilled in the art's urgent problem to be solved.
Invention content
To solve the above-mentioned problems, the embodiment of the present application provides a kind of prediction technique and device of object type.
Specifically, the embodiment of the present application is achieved by the following technical solution:
According to the embodiment of the present application in a first aspect, a kind of prediction technique of object type is provided, for giving object
Unknown class in set carries out class prediction, the method includes:
For the given object set, the characteristic and object of each object in the given object set are obtained
Between relation data;Wherein, the given object set includes known class object and unknown class, and each known class
Other object all has the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, passed using label
Algorithm is passed, the prediction label of each object in the given object set is obtained;
According to the original tag and prediction label of the known class object, the label of the first known class object set is obtained
Variation Matrix, wherein the first known class object set includes the portion of known class object in the given object set
Divide or all, the label Variation Matrix is used to indicate original tag variation of all categories in the first known class object set
To the probability of prediction label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively by the information of each sampling collection and institute
The information for stating known class object in given object set is combined, and obtains n combined data set;Wherein, the unknown class
Other object set includes the whole of unknown class in the given object set, and n is the default value not less than 1, and works as n
When > 1, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to the combined data set and described first
The label Variation Matrix of known class object set is handled, and n classification prediction model and n updated label variations are obtained
Matrix;
According to any unknown class A in the given object setiCharacteristic, it is described n classification prediction mould
Type and the n updated label Variation Matrixes, obtain the unknown class AiClass prediction result.
In the embodiment of the present application, between the characteristic and the object according to each object in the given object set
Relation data obtains the prediction label of each object in the given object set using label pass-algorithm, including:
The characteristic expression of each object in the given object set is characterized row vector;
According to the feature row vector and the object relationship data, all presence in the given object set are calculated
The cosine similarity of two feature row vectors of direct relation;
According to the cosine similarity and the object relationship data, by known class pair in the given object set
The original tag of elephant passes to each object in the given object set, obtains each object in the given object set
Prediction label.
It is described that n times sampling is carried out to unknown class collection in the embodiment of the present application, n sampling collection is obtained, it respectively will be every
The information and the information of known class object in the given object set of a sampling collection are combined, and obtain n data splitting
Collection, including:
To unknown class collection S carry out 3 times sampling, every time extract 30% object, obtain 3 sampling collection M1, M2 and
M3;
Sampling collection M1 is combined with the second known class object set D, obtains combined data set F1, wherein F1 is { M1, D },
The second known class object set D includes the whole of known class object in the given object set;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2
For { M2, D };
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3
For { M3, D }.
It is described according to any unknown class A in the given object set in the embodiment of the present applicationiCharacteristic
According to, the n classification prediction model and the n updated label Variation Matrixes, the unknown class A is obtainedi's
Class prediction is as a result, include:
By any unknown class A in the given object setiCharacteristic be input to n classification prediction model
In a in, obtain the more class probability vectors of a first kind;And by the unknown class AiPrediction label input
Into b in n updated label Variation Matrixes, the b more class probability vectors of the second class are obtained;Wherein a and b is not
Default value more than n;
According to the more class probability vectors of the first kind obtained and the more class probability vectors of the second class, the unknown class is obtained
Other object AiClass prediction result.
It is described according to the more class probability vectors of the first kind obtained and the more class probabilities of the second class in the embodiment of the present application
Vector obtains the unknown class AiClass prediction as a result, including:
Mean value fortune is carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class
It calculates, obtains the more class probability vectors of target;
Classification corresponding to the more maximum labels of class probability vector median of the target is determined as the unknown classification
Object AiClassification.
In the embodiment of the present application, the object is user;The classification includes:The age bracket of user, the trip side of user
Formula preference, trip period, the level of consumption of user or the consumption propensity of user of user.
In the embodiment of the present application, the object is user;The characteristic includes:The history trip location information of user
Or in the terminal device of user application program installation situation.
In the embodiment of the present application, the object is user;The object relationship data include:For describe each user it
Between red packet provide the data of relationship, or data for describing the friend relation between each user.
According to the second aspect of the embodiment of the present application, a kind of prediction meanss of object type are provided, for giving object
Unknown class in set carries out class prediction, and described device includes:
Collective data obtains module, and for being directed to the given object set, it is each right in the given object set to obtain
The characteristic and object relationship data of elephant;Wherein, the given object set includes known class object and unknown
Class, and each known class object all has the original tag for indicating classification;
Prediction label obtains module, each right in the given object set that module obtains for being obtained according to the collective data
The characteristic of elephant and the object relationship data, using label pass-algorithm, it is each right in the given object set to obtain
The prediction label of elephant;
Label Variation Matrix obtains module, for the original tag and prediction label according to the known class object, obtains
Obtain the label Variation Matrix of the first known class object set, wherein the first known class object set includes described given
Some or all of known class object in object set, the label Variation Matrix is for indicating first known class pair
As concentrating original tag variation of all categories to the probability of prediction label;
Composite module obtains n sampling collection, will each sample respectively for carrying out n times sampling to unknown class collection
The information of collection and the information of known class object in the given object set are combined, and obtain n combined data set;Its
In, the unknown class concentration includes the whole of unknown class in the given object set, and n is not less than 1
Default value, and as n > 1, n sampling collection mutually disjoints;
Training module, each combined data set for being obtained for the composite module, uses the noise classification of resistance to label
Algorithm handles the label Variation Matrix of the combined data set and the first known class object set, obtains n points
Class prediction model and n updated label Variation Matrixes;
Class prediction module, for according to any unknown class A in the given object setiCharacteristic,
The n classification prediction model and the n updated label Variation Matrixes, obtain the unknown class AiClassification
Prediction result.
In the embodiment of the present application, the prediction label obtains module, including:
Vector characterization submodule, for by the given object set each object characteristic expression be characterized row to
Amount;
Similarity calculation submodule, for characterizing the feature row vector and the object that submodule obtains according to the vector
Between relation data, calculate that all in the given object set there are the cosine of two feature row vectors of direct relation is similar
Degree;
Label transmits submodule, cosine similarity for being calculated according to the similarity calculation submodule and described
The original tag of known class object in the given object set is passed to the given object by object relationship data
Each object in set obtains the prediction label of each object in the given object set.
In the embodiment of the present application, the composite module, including:
Submodule of sampling extracts 30% object, obtains 3 every time for carrying out 3 sampling to unknown class collection S
A sampling collects M1, M2 and M3;
Submodule is combined, for that will sample, collection M1 is combined with the second known class object set D, obtains combined data set F1,
Wherein, F1 is { M1, D }, and the second known class object set D includes known class object in the given object set
All;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2
For { M2, D };
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3
For { M3, D }.
In the embodiment of the present application, the class prediction module, including:
More class probability vectors obtain submodule, are used for any unknown class A in the given object seti's
Characteristic is input in a in n classification prediction model, obtains the more class probability vectors of a first kind;And by institute
State unknown class AiPrediction label be input in b in n updated label Variation Matrixes, obtain b second
The more class probability vectors of class;Wherein a and b is the default value no more than n;
Class prediction submodule, for obtaining more points of the first kind that submodule is obtained according to more class probability vectors
Class probability vector and the more class probability vectors of the second class, obtain the unknown class AiClass prediction result.
In the embodiment of the present application, the class prediction submodule, including:
More class probability vector obtaining units, for the more class probability vectors of a first kind and the b second
The more class probability vectors of class carry out mean operation, obtain the more class probability vectors of target;
Class prediction unit, for the classification corresponding to the more maximum labels of class probability vector median of the target is true
It is set to the unknown class AiClassification.
In the embodiment of the present application, the object is user;The classification includes:The age bracket of user, the trip side of user
Formula preference, trip period, the level of consumption of user or the consumption propensity of user of user.
In the embodiment of the present application, the object is user;The characteristic includes:The history trip location information of user
Or in the terminal device of user application program installation situation.
In the embodiment of the present application, the object is user;The object relationship data include:For describe each user it
Between red packet provide the data of relationship, or data for describing the friend relation between each user.
According to the third aspect of the embodiment of the present application, a kind of computer storage media is provided, is stored in the storage medium
There is program instruction, described program, which instructs, includes:
For given object set, obtains in the given object set and closed between the characteristic and object of each object
Coefficient evidence;Wherein, the given object set includes known class object and unknown class, and each known class pair
As all having the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, passed using label
Algorithm is passed, the prediction label of each object in the given object set is obtained;
According to the original tag and prediction label of the known class object, the label of the first known class object set is obtained
Variation Matrix, wherein the first known class object set includes the portion of known class object in the given object set
Divide or all, the label Variation Matrix is used to indicate original tag variation of all categories in the first known class object set
To the probability of prediction label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively by the information of each sampling collection and institute
The information for stating known class object in given object set is combined, and obtains n combined data set;Wherein, the unknown class
Other object set includes the whole of unknown class in the given object set, and n is the default value not less than 1, and works as n
When > 1, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to the combined data set and described first
The label Variation Matrix of known class object set is handled, and n classification prediction model and n updated label variations are obtained
Matrix;
According to any unknown class A in the given object setiCharacteristic, it is described n classification prediction mould
Type and the n updated label Variation Matrixes, obtain the unknown class AiClass prediction result.
It, can be by the semi-supervised learning (i.e. label transmits) and supervised learning in machine learning method in the embodiment of the present application
(classifying) is combined, and is used for carrying out class prediction to unknown class, the embodiment of the present application transmits label
Prediction label is effectively attached in supervised learning, and come to the progress of unknown class in such a way that multiple models are established in sampling
Class prediction, to achieve the purpose that improve the Stability and veracity of prediction result.
It should be understood that above general description and following detailed description is merely exemplary, this can not be limited
Apply for embodiment.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets the application implementation
Example, and be used to explain the principle of the present invention together with specification.
Figure 1A is the stream according to a kind of prediction technique of object type shown in one exemplary embodiment of the embodiment of the present application
Cheng Tu;
Figure 1B is a kind of embodiment that step 120 in Figure 1A is shown according to one exemplary embodiment of the embodiment of the present application
Flow chart;
Fig. 1 C are the instance graphs that method shown in Figure 1A is shown according to one exemplary embodiment of the embodiment of the present application;
Fig. 2 is the stream according to the prediction technique of another object type shown in one exemplary embodiment of the embodiment of the present application
Cheng Tu;
Fig. 3 is the frame according to a kind of prediction meanss of object type shown in one exemplary embodiment of the embodiment of the present application
Figure;
Fig. 4 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application
Figure;
Fig. 5 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application
Figure;
Fig. 6 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application
Figure;
Fig. 7 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application
Figure;
Fig. 8 is a kind of prediction meanss for object type shown according to one exemplary embodiment of the embodiment of the present application
One structural schematic diagram.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the embodiment of the present application.On the contrary, they be only with
The example of as detailed in the attached claim, the embodiment of the present application the consistent device and method of some aspects.
It is the purpose only merely for description specific embodiment in the term that the embodiment of the present application uses, is not intended to be limiting this
Apply for embodiment.The embodiment of the present application and the "an" of singulative used in the attached claims, " described " and
"the" is also intended to including most forms, unless context clearly shows that other meanings.It is also understood that art used herein
Language "and/or" refer to and include one or more associated list items purposes any or all may combine.
It will be appreciated that though various letters may be described using term first, second, third, etc. in the embodiment of the present application
Breath, but these information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example,
In the case where not departing from the embodiment of the present application range, the first information can also be referred to as the second information, similarly, the second information
It can also be referred to as the first information.Depending on context, word as used in this " if " can be construed to " ...
When " or " when ... " or " in response to determination ".
Machine learning be nearly more than 20 years rise a multi-field cross discipline, be related to probability theory, statistics, Approximation Theory,
The multi-door subject such as convextiry analysis and computational complexity theory.Machine Learning Theory, which mainly designs and analyze some, allows computer can be with
Automatically the algorithm " learnt ", the algorithm are that a kind of automatically analyzed from data obtains rule, and utilizes the rule to unknown data
The algorithm predicted.Currently, in machine learning field, the main learning method different including three classes, respectively supervised learning,
Unsupervised learning and semi-supervised learning.In the prior art, in the learning method using machine learning in the sample to not tape label
When carrying out class prediction, there is a problem of that prediction result accuracy rate is low.To solve the above-mentioned problems, the embodiment of the present application provides
A kind of prediction technique and device of object type.
The prediction technique of object type provided by the embodiments of the present application is introduced first below.
As shown in Figure 1A, Figure 1A is according to a kind of the pre- of object type shown in one exemplary embodiment of the embodiment of the present application
The flow chart of survey method, for carrying out class prediction to the unknown class given in object set, this method may include
Following steps:
In step 110, for given object set, obtain give in object set the characteristic of each object and
Object relationship data;Wherein, which includes known class object and unknown class, and each known
Class all has the original tag for indicating classification.
In the embodiment of the present application, it includes multiple objects to give object set, and in practical applications, object can be to use
Family, or event, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, object relationship data are referred to for describing in given object set between each object
The data of incidence relation.For example, when object is user, object relationship data can be good between each user for describing
The data of friendly relationship, specifically, can be based on network application (such as social networking application, trip application and Video Applications etc.)
User's friend relation data;Alternatively, object relationship data may be to be closed for describing the red packet granting between each user
The data of system, specifically, can be that the red packet based on network application provides relation data.When object is event, closed between object
Coefficient evidence can be the event correlation data based on network application.
In the embodiment of the present application, when object is user, classification may include:The age bracket of user, the trip side of user
Formula preference, trip period, the level of consumption of user or the consumption propensity etc. of user of user;When object is event, classification can
Think that the probability that event occurs, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, when object is user, the characteristic of object may include:The history trip ground of user
The application program etc. installed in the terminal device of point information or user, wherein the history trip location information of user can be from
It is obtained in the trip application of the user's registration, for the terminal device of Android system, the terminal device that can be used from the user
It is middle to obtain mounted application program.
In the embodiment of the present application, it is known that class has original tag, unknown class not to have original tag.
For example, the friend relation data of the QQ friends (such as user B, user C and user D) of user A and user A, user A
Age bracket it is known that the age bracket of the QQ friends of user A is unknown, according to the relationship of the QQ friends of the age bracket of user A and user A
Data, the age bracket of the QQ friends of prediction user A, at this point, given object set includes:User A, user B, user C and use
Tetra- objects of family D, object relationship data are the good of the QQ friends (such as user B, user C and user D) of user A and user A
Friendly relation data, user A are known class object, and user B, user C and user D are unknown class, and classification is user
Age bracket.
In the step 120, according to the characteristic of each object in given object set and object relationship data, mark is utilized
Pass-algorithm is signed, the prediction label for giving each object in object set is obtained.
In the embodiment of the present application, using label pass-algorithm to determining the characteristic and object of each object in object set
Between after relation data handled, giving known class object in object set has original tag and prediction label, this is given
Unknown class only has prediction label in object set.
In a kind of optional embodiment provided by the embodiments of the present application, as shown in Figure 1B, above-mentioned steps 120 may include with
Lower step:
In step 121, the characteristic expression of each object in given object set is characterized row vector.
In the embodiment of the present application, feature row vector is characterized the vectorization of data, each element pair in feature row vector
A feature for elephant is coped with, the value of each element is 1 or 0, and it is 0 representative pair that value represents object to have this feature, value for 1
As this no feature.
In step 122, it according to feature row vector and object relationship data, calculates and gives all presence in object set
The cosine similarity of two feature row vectors of direct relation.
In the embodiment of the present application, it can determine to determine all in the presence of direct in object set according to object relationship data
The object of relationship, by taking red packet granting relationship as an example, direct relation refers to directly providing the relationship of red packet.
It, will be in given object set according to the cosine similarity and object relationship data being calculated in step 123
The original tag of known class object passes to each object in given object set, obtains each object in given object set
Prediction label.
In scene of going on a journey, for the age bracket prediction of user, step 121~step 123 to be described.
It is known that object relationship data, which are red packet, provides relationship, i.e. the structure that gives bonus between two two users includes multiple
The relational network of user, the relational network include:Tetra- passenger A, passenger B, passenger C and passenger D passengers, passenger A are provided with
Age bracket, passenger B, passenger C and passenger D do not provide age bracket.
Characteristic is:Passenger A, passenger B, tetra- passengers of passenger C and passenger D history go on a journey location information, with Beijing
For, 100 trip places can be marked off, when being characterized row vector to characteristic expression, if passenger went this to go out
Row place, then the corresponding value in the trip place is 1, if not going to the trip place, the corresponding value in the trip place is 0,
For example, the feature row vector of passenger A is [0,0,1 ... 1,0], this feature row vector includes 100 elements, and each element corresponds to
One trip place, the value of element represent passenger A for 1 and went to the corresponding trip place of the element, and the value of element is 0 representative
Passenger A did not go to the corresponding trip place of the element.
To predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, the age bracket of passenger A be known as it is after 80s, at this time
The label probability matrix of passenger A is [0,0,1,0], wherein the age bracket that first element in label probability matrix is passenger A
It is the probability (i.e. probability is 0) after 60, the age bracket that second element is passenger A is the probability (i.e. probability is 0) after 70, third
A element is that the age bracket of passenger A is probability after 80s (i.e. probability is 1), and the age bracket that the 4th element is passenger A is after 90s
Probability (i.e. probability be 0).
For example, the feature row vector of passenger A be [0,0,1 ... 1,0], similarly, B, C and D also correspond to a feature row to
Amount, to calculating the cohesions for the passenger that two have direct red packet to provide relationship, wherein the calculation formula of cohesion is:Two spies
Levy the cosine similarity of row vector.
For example, there are direct red packets to provide relationship by passenger A and passenger B, by the feature row vector and the passenger that calculate passenger A
The cosine similarity of the feature row vector of B, obtain cosine similarity be 0.8, at this point, the label probability matrix of passenger B be [0,0,
0.8,0], the prediction label of passenger B is:It is 0 that passenger B, which belongs to the probability after 60, and it is 0 to belong to the probability after 70, belongs to after 80s
Probability is 0.8, and the probability for belonging to after 90s is 0, similarly, can calculate the prediction label of passenger C and passenger D.
It should be noted that any object in corresponding given object set, no matter the object is either with or without original tag, all
It to be transmitted into row label, and obtain corresponding prediction label.
In addition, in addition to using the label pass-algorithm in the above embodiment, the embodiment of the present application can also use correlation
Any label pass-algorithm in technology is to achieve the purpose that the prediction label for obtaining each object, the embodiment of the present application do not make this
It limits.
In step 130, according to the original tag and prediction label of known class object, the first known class object is obtained
The label Variation Matrix of collection, wherein the first known class object set includes known class object in given object set
Partly or entirely, which is used to indicate that original tag of all categories in the first known class object set to make a variation to pre-
The probability of mark label.
It should be noted that for the ease of understanding and describing, in the above example, citing Jie is only carried out with a small amount of object
It continues, in practical applications, a large amount of object is generally comprised in given object set, in order to improve computational efficiency, the application is implemented
In example, the first known class object set can be partitioned into from given object set according to certain percentage, i.e., this has been first
It includes giving the part of known class object in object set rather than whole to know that class is concentrated, at this point, can specifically wrap
It includes:
According to preset percentage, the first known class object set is partitioned into from given object set, wherein known to first
Known class object sum * preset percentages in the number of objects that class is concentrated=given object set.
Such as preset percentage is 10%, it includes 1000 known class objects to give object set, then known to first
Class concentration includes 100 objects.
In the embodiment of the present application, each object in the first known class object set has original tag and pre- mark simultaneously
Label.When calculating the label Variation Matrix of the first known class object set, computational methods in the prior art may be used, first
The original tag variation of each object in the first known class object set is calculated to the probability of prediction label, later, according to each object
Original tag variation to prediction label probability, calculate the first known class object set label Variation Matrix.
In order to make it easy to understand, to predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, to label Variation Matrix
It is illustrated, the matrix that label Variation Matrix is one 4 × 4.
In step 140, n times sampling is carried out to unknown class collection, obtains n sampling collection, will each samples respectively
The information of collection and the information of known class object in given object set are combined, and obtain n combined data set;Wherein, should
It includes the whole for giving unknown class in object set that unknown class, which is concentrated, and n is the default value not less than 1, and
As n > 1, n sampling collection mutually disjoints.
In the embodiment of the present application, the value of n may include:2,3,4 or 5 etc., the sampling concentration sampled is taken out two-by-two
The object that sample is concentrated differs completely.
In order to make full use of unknown class to concentrate unknown class, while the service efficiency of computing resource is taken into account,
In one preferred embodiment, above-mentioned steps 104 may comprise steps of:S10, S11, S12 and S13, wherein
In S10,3 sampling are carried out to unknown class collection S, extract 30% object every time, obtains 3 sampling collection
M1, M2 and M3;
In S11, sampling collection M1 is combined with the second known class object set D, obtains combined data set F1, wherein F1
Include the whole of known class object in given object set for { M1, D }, the second known class object set D;
In S12, sampling collection M2 is combined with the second known class object set D, obtains combined data set F2, wherein F2
For { M2, D };
In S13, sampling collection M3 is combined with the second known class object set D, obtains combined data set F3, wherein F3
For { M3, D }.
In step 150, for each combined data set, using the noise classification of resistance to label algorithm, to combined data set and
The label Variation Matrix of first known class object set is handled, and n classification prediction model and n updated labels are obtained
Variation Matrix.
In the embodiment of the present application, the noise classification of resistance to label algorithm has higher patience to the uncertainty of prediction label,
Wherein, which can be that the more sorted logics for the noise of resistance to label return rmLR algorithms, in addition, for mark
Label transmit the uncertainty of result, and the stability of prediction result is improved in such a way that multiple models are established in sampling.
In the embodiment of the present application, more sorted logics that any one of the relevant technologies noise of resistance to label may be used return
RmLR algorithms carry out polytypic training to the label Variation Matrix of combined data set and the first known class object set, wherein
Specific training process can be realized in Spark related applications.
Specific algorithm is as follows, the grader of output and logistic regression compatibility:Wherein, xq
For the feature row vector of object q, wkFor grader;
Determine object function: Wherein, p (y=k | xn,wk) by a Softmax functionModeling.
LBFGS optimization object functions, fixed labels Variation Matrix are used later;Spark LBFGS need to provide Gradient
It realizes, returns to gradient and target function value, it is as follows:
Corresponding to the preferred embodiment in step 140, in the preferred embodiment, can train to obtain 3 classification
Prediction model and 3 updated label Variation Matrixes.
In a step 160, according to any unknown class A in given object setiCharacteristic, n classification it is pre-
Model and n updated label Variation Matrixes are surveyed, unknown class A is obtainediClass prediction result.
By taking the prediction of passenger's age bracket as an example, the essential information collection of passenger is very limited, may only have in many cases
The passenger (passenger known to age bracket) of cell-phone number, unknown classification is seldom for total passenger, by using this
Apply for the method that embodiment provides, passenger can be made full use of to share red packet and the relationship network data that generates, to effectively expand
The quantity for filling training set, to achieve the effect that promote predictablity rate.
In order to make it easy to understand, the technical solution of the embodiment of the present application is described in the example in conjunction with shown in Fig. 1 C, such as scheme
Shown in 1C, given object set includes:Known class object and unknown class, it is known that class has original mark
Label, unknown class do not have original tag;After being transmitted into row label to known class object and unknown class, it is known that
There is class original tag and prediction label, unknown class only to have prediction label;Known class object is carried out
Processing obtains the label Variation Matrix of known class object set;To known class object and unknown class carry out n times sampling,
Combination obtains combined data set 1, combined data set 2 ..., combined data set n, to the label variation square of known class object set
Battle array and combined data set 1 are trained, and obtain classification prediction model 1 and updated label Variation Matrix 1;To known class pair
As the label Variation Matrix and combined data set 2 of collection are trained, classification prediction model 2 and updated label variation square are obtained
Battle array 2;The label Variation Matrix and combined data set n of known class object set are trained, classification prediction model n and more is obtained
Label Variation Matrix n after new.Finally, by the characteristic of any unknown class be input to above-mentioned classification prediction model and
In updated label Variation Matrix, class prediction result is obtained.
As seen from the above-described embodiment, which can be by the semi-supervised learning in machine learning method (i.e. label transmits)
It is combined with supervised learning (classify), is used for carrying out class prediction to unknown class, the embodiment of the present application is by label
It transmits obtained prediction label to be effectively attached in supervised learning, and come to unknown class in such a way that multiple models are established in sampling
Other object carries out class prediction, to achieve the purpose that improve the Stability and veracity of prediction result.
As shown in Fig. 2, Fig. 2 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application
The flow chart of survey method, this method may comprise steps of:
In step 210, for given object set, obtain give in object set the characteristic of each object and
Object relationship data;Wherein, which includes known class object and unknown class, and each known
Class all has the original tag for indicating classification.
In a step 220, according to the characteristic of each object in given object set and object relationship data, mark is utilized
Pass-algorithm is signed, the prediction label for giving each object in object set is obtained.
In step 230, according to the original tag and prediction label of known class object, the first known class object is obtained
The label Variation Matrix of collection, wherein the first known class object set includes known class object in given object set
Partly or entirely, which is used to indicate that original tag of all categories in the first known class object set to make a variation to pre-
The probability of mark label.
In step 240, n times sampling is carried out to unknown class collection, obtains n sampling collection, will each samples respectively
The information of collection and the information of known class object in given object set are combined, and obtain n combined data set;Wherein, should
It includes the whole for giving unknown class in object set that unknown class, which is concentrated, and n is the default value not less than 1, and
As n > 1, n sampling collection mutually disjoints.
In step 250, for each combined data set, using the noise classification of resistance to label algorithm, to combined data set and
The label Variation Matrix of first known class object set is handled, and n classification prediction model and n updated labels are obtained
Variation Matrix.
Step 110~step 150 in step 210~step 250 in the embodiment of the present application, with Figure 1A illustrated embodiments
Similar, the embodiment of the present application repeats no more this, the content detail as per in Figure 1A illustrated embodiments.
In step 260, by any unknown class A in given object setiCharacteristic be input to n classification
In a in prediction model, the more class probability vectors of a first kind are obtained;And by unknown class AiPrediction label
It is input in b in n updated label Variation Matrixes, obtains the b more class probability vectors of the second class;Wherein a and b are equal
For the default value no more than n.
Preferably, in the embodiment of the present application, a and b are n, at this point it is possible to make full use of n points generated in step 250
Class prediction model and n updated label Variation Matrixes.
For example, 3 classification prediction models and 3 updated label Variation Matrixes are generated in step 250, in this step
In, by unknown class AiCharacteristic input 3 classification prediction models in, by unknown class AiPrediction label
It is input in 3 updated label Variation Matrixes, obtains 3 the second classes of the more class probability vector sums of 3 first kind and classify more
Probability vector
In step 270, it according to the more class probability vectors of the first kind obtained and the more class probability vectors of the second class, obtains
To unknown class AiClass prediction result.
In one preferred embodiment, above-mentioned steps 207 may include:
Mean operation is carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class, obtains mesh
Mark more class probability vectors;Classification corresponding to the more maximum labels of class probability vector median of target is determined as unknown classification
Object AiClassification.
In addition it is also possible to by being carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class
The mode of weighted sum obtains the more class probability vectors of target;The maximum label institute of the more class probability vector medians of target is right
The classification answered is determined as unknown class AiClassification.
It should be noted that although describing the operation of the embodiment of the present application method with particular order in the accompanying drawings, this is simultaneously
Undesired or hint must execute these operations according to the particular order, or have to carry out and operate ability shown in whole
Realize desired result.On the contrary, the step of describing in flow chart, which can change, executes sequence.Additionally or alternatively, it can save
Multiple steps are merged into a step and executed, and/or a step is decomposed into execution of multiple steps by slightly certain steps.
Corresponding with the embodiment of the prediction technique of aforementioned object classification, the embodiment of the present application also provides the pre- of object type
Survey the embodiment of device.
As shown in figure 3, Fig. 3 is the prediction according to a kind of object type shown in one exemplary embodiment of the embodiment of the present application
The block diagram of device, for carrying out class prediction to the unknown class given in object set, described device may include:
Collective data obtains module 310, for being directed to the given object set, obtains each in the given object set
The characteristic and object relationship data of object;Wherein, the given object set includes known class object and not
Know class, and each known class object all has the original tag for indicating classification;
In the embodiment of the present application, it includes multiple objects to give object set, and in practical applications, object can be to use
Family, or event, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, object relationship data are referred to for describing in given object set between each object
The data of incidence relation.For example, when object is user, object relationship data can be good between each user for describing
The data of friendly relationship, specifically, can be based on network application (such as social networking application, trip application and Video Applications etc.)
User's friend relation data;Alternatively, object relationship data may be to be closed for describing the red packet granting between each user
The data of system, specifically, can be that the red packet based on network application provides relation data.When object is event, closed between object
Coefficient evidence can be the event correlation data based on network application.
In the embodiment of the present application, when object is user, classification may include:The age bracket of user, the trip side of user
Formula preference, trip period, the level of consumption of user or the consumption propensity etc. of user of user;When object is event, classification can
Think that the probability that event occurs, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, when object is user, the characteristic of object may include:The history trip ground of user
The application program etc. installed in the terminal device of point information or user, wherein the history trip location information of user can be from
It is obtained in the trip application of the user's registration, for the terminal device of Android system, the terminal device that can be used from the user
It is middle to obtain mounted application program.
In the embodiment of the present application, it is known that class has original tag, unknown class not to have original tag.
For example, the friend relation data of the QQ friends (such as user B, user C and user D) of user A and user A, user A
Age bracket it is known that the age bracket of the QQ friends of user A is unknown, according to the relationship of the QQ friends of the age bracket of user A and user A
Data, the age bracket of the QQ friends of prediction user A, at this point, given object set includes:User A, user B, user C and use
Tetra- objects of family D, object relationship data are the good of the QQ friends (such as user B, user C and user D) of user A and user A
Friendly relation data, user A are known class object, and user B, user C and user D are unknown class, and classification is user
Age bracket.
Prediction label obtains module 320, for obtaining the given object set that module 310 obtains according to the collective data
In each object characteristic and the object relationship data, utilize label pass-algorithm, obtain the given object set
In each object prediction label;
In the embodiment of the present application, using label pass-algorithm to determining the characteristic and object of each object in object set
Between after relation data handled, giving known class object in object set has original tag and prediction label, this is given
Unknown class only has prediction label in object set.
Label Variation Matrix obtains module 330, is used for original tag and prediction label according to the known class object,
Obtain the label Variation Matrix of the first known class object set, wherein the first known class object set include it is described to
Determine some or all of known class object in object set, the label Variation Matrix is for indicating first known class
Probability of the original tag of all categories variation to prediction label in object set;
It should be noted that for the ease of understanding and describing, in the above example, citing Jie is only carried out with a small amount of object
It continues, in practical applications, a large amount of object is generally comprised in given object set, in order to improve computational efficiency, the application is implemented
In example, the first known class object set can be partitioned into from given object set according to certain percentage, i.e., this has been first
It includes giving the part of known class object in object set rather than whole to know that class is concentrated, at this point, can specifically wrap
It includes:
According to preset percentage, the first known class object set is partitioned into from given object set, wherein known to first
Known class object sum * preset percentages in the number of objects that class is concentrated=given object set.
Such as preset percentage is 10%, it includes 1000 known class objects to give object set, then known to first
Class concentration includes 100 objects.
In the embodiment of the present application, each object in the first known class object set has original tag and pre- mark simultaneously
Label.When calculating the label Variation Matrix of the first known class object set, computational methods in the prior art may be used, first
The original tag variation of each object in the first known class object set is calculated to the probability of prediction label, later, according to each object
Original tag variation to prediction label probability, calculate the first known class object set label Variation Matrix.
In order to make it easy to understand, to predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, to label Variation Matrix
It is illustrated, the matrix that label Variation Matrix is one 4 × 4.
Composite module 340 respectively will be each for the progress n times sampling of unknown class collection, obtaining n sampling collection
The information of collection of sampling and the information of known class object in the given object set are combined, and obtain n combined data set;
Wherein, the unknown class concentration includes the whole of unknown class in the given object set, and n is not less than 1
Default value, and as n > 1, n sampling collection mutually disjoints;
In the embodiment of the present application, the value of n may include:2,3,4 or 5 etc., the sampling concentration sampled is taken out two-by-two
The object that sample is concentrated differs completely.
Training module 350, each combined data set for being obtained for the composite module 340, is made an uproar using resistance to label
Sound sorting algorithm handles the label Variation Matrix of the combined data set and the first known class object set, obtains
To n classification prediction model and n updated label Variation Matrixes;
In the embodiment of the present application, the noise classification of resistance to label algorithm has higher patience to the uncertainty of prediction label,
Wherein, which can be that the more sorted logics for the noise of resistance to label return rmLR algorithms, in addition, for mark
Label transmit the uncertainty of result, and the stability of prediction result is improved in such a way that multiple models are established in sampling.
In the embodiment of the present application, more sorted logics that any one of the relevant technologies noise of resistance to label may be used return
RmLR algorithms carry out polytypic training to the label Variation Matrix of combined data set and the first known class object set, wherein
Specific training process can be realized in Spark related applications.
Specific algorithm is as follows, the grader of output and logistic regression compatibility:Wherein, xq
For the feature row vector of object q, wkFor grader;
Determine object function: Wherein, p (y=k | xn,wk) by a Softmax functionModeling.
LBFGS optimization object functions, fixed labels Variation Matrix are used later;Spark LBFGS need to provide Gradient
It realizes, returns to gradient and target function value, it is as follows:
Class prediction module 360, for according to any unknown class A in the given object setiCharacteristic
According to, the n classification prediction model and the n updated label Variation Matrixes, the unknown class A is obtainedi's
Class prediction result.
By taking the prediction of passenger's age bracket as an example, the essential information collection of passenger is very limited, may only have in many cases
The passenger (passenger known to age bracket) of cell-phone number, unknown classification is seldom for total passenger, by using this
Apply for the method that embodiment provides, passenger can be made full use of to share red packet and the relationship network data that generates, to effectively expand
The quantity for filling training set, to achieve the effect that promote predictablity rate.
As seen from the above-described embodiment, which can be by the semi-supervised learning in machine learning method (i.e. label transmits)
It is combined with supervised learning (classify), is used for carrying out class prediction to unknown class, the embodiment of the present application is by label
It transmits obtained prediction label to be effectively attached in supervised learning, and come to unknown class in such a way that multiple models are established in sampling
Other object carries out class prediction, to achieve the purpose that improve the Stability and veracity of prediction result.
As shown in figure 4, Fig. 4 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application
The block diagram for surveying device, on the basis of which can be with embodiment shown in Fig. 3, the prediction label obtains module 320, can be with
Including:
Vector characterization submodule 321, for the characteristic expression of each object in the given object set to be characterized
Row vector;
In the embodiment of the present application, feature row vector is characterized the vectorization of data, each element pair in feature row vector
A feature for elephant is coped with, the value of each element is 1 or 0, and it is 0 representative pair that value represents object to have this feature, value for 1
As this no feature.
Similarity calculation submodule 322, for characterizing the feature row vector and institute that submodule 321 obtains according to the vector
It states object relationship data, calculates that all in the given object set there are the cosine of two feature row vectors of direct relation
Similarity;
In the embodiment of the present application, it can determine to determine all in the presence of direct in object set according to object relationship data
The object of relationship, by taking red packet granting relationship as an example, direct relation refers to directly providing the relationship of red packet.
Label transmits submodule 323, the cosine similarity for being calculated according to the similarity calculation submodule 322
And the object relationship data, by the original tag of known class object in the given object set, pass to it is described to
Determine each object in object set, obtains the prediction label of each object in the given object set.
In scene of going on a journey, for the age bracket prediction of user, step 121~step 123 to be described.
It is known that object relationship data, which are red packet, provides relationship, i.e. the structure that gives bonus between two two users includes multiple
The relational network of user, the relational network include:Tetra- passenger A, passenger B, passenger C and passenger D passengers, passenger A are provided with
Age bracket, passenger B, passenger C and passenger D do not provide age bracket.
Characteristic is:Passenger A, passenger B, tetra- passengers of passenger C and passenger D history go on a journey location information, with Beijing
For, 100 trip places can be marked off, when being characterized row vector to characteristic expression, if passenger went this to go out
Row place, then the corresponding value in the trip place is 1, if not going to the trip place, the corresponding value in the trip place is 0,
For example, the feature row vector of passenger A is [0,0,1 ... 1,0], this feature row vector includes 100 elements, and each element corresponds to
One trip place, the value of element represent passenger A for 1 and went to the corresponding trip place of the element, and the value of element is 0 representative
Passenger A did not go to the corresponding trip place of the element.
To predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, the age bracket of passenger A be known as it is after 80s, at this time
The label probability matrix of passenger A is [0,0,1,0], wherein the age bracket that first element in label probability matrix is passenger A
It is the probability (i.e. probability is 0) after 60, the age bracket that second element is passenger A is the probability (i.e. probability is 0) after 70, third
A element is that the age bracket of passenger A is probability after 80s (i.e. probability is 1), and the age bracket that the 4th element is passenger A is after 90s
Probability (i.e. probability be 0).
For example, the feature row vector of passenger A be [0,0,1 ... 1,0], similarly, B, C and D also correspond to a feature row to
Amount, to calculating the cohesions for the passenger that two have direct red packet to provide relationship, wherein the calculation formula of cohesion is:Two spies
Levy the cosine similarity of row vector.
For example, there are direct red packets to provide relationship by passenger A and passenger B, by the feature row vector and the passenger that calculate passenger A
The cosine similarity of the feature row vector of B, obtain cosine similarity be 0.8, at this point, the label probability matrix of passenger B be [0,0,
0.8,0], the prediction label of passenger B is:It is 0 that passenger B, which belongs to the probability after 60, and it is 0 to belong to the probability after 70, belongs to after 80s
Probability is 0.8, and the probability for belonging to after 90s is 0, similarly, can calculate the prediction label of passenger C and passenger D.
It should be noted that any object in corresponding given object set, no matter the object is either with or without original tag, all
It to be transmitted into row label, and obtain corresponding prediction label.
As shown in figure 5, Fig. 5 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application
The block diagram of device is surveyed, which can be on the basis of Fig. 3 or embodiment illustrated in fig. 4, in order to make full use of unknown classification pair
As concentrating unknown class, while the service efficiency of computing resource is taken into account, the composite module 340 may include:
Submodule 341 of sampling extracts 30% object, obtains every time for carrying out 3 sampling to unknown class collection S
To 3 sampling collection M1, M2 and M3;
Submodule 342 is combined, collection M1 is combined with the second known class object set D for that will sample, and obtains combined data set
F1, wherein F1 is { M1, D }, and the second known class object set D includes known class pair in the given object set
The whole of elephant;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2
For { M2, D };
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3
For { M3, D }.
As shown in fig. 6, Fig. 6 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application
The block diagram of device is surveyed, which can be shown in Fig. 3~Fig. 5 on the basis of any embodiment, the class prediction module
360, may include:
More class probability vectors obtain submodule 361, are used for any unknown class in the given object set
AiCharacteristic be input in a in n classification prediction model, obtain the more class probability vectors of a first kind;And
By the unknown class AiPrediction label be input in b in n updated label Variation Matrixes, obtain b
The more class probability vectors of second class;Wherein a and b is the default value no more than n;
Preferably, in the embodiment of the present application, a and b are n, at this point it is possible to make full use of n points generated in step 250
Class prediction model and n updated label Variation Matrixes.
Class prediction submodule 362, first for being obtained according to more class probability vectors acquisition submodules 361
The more class probability vectors of class and the more class probability vectors of the second class, obtain the unknown class AiClass prediction result.
As shown in fig. 7, Fig. 7 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application
The block diagram of device is surveyed, on the basis of which can be with embodiment shown in Fig. 6, the class prediction submodule 362 can wrap
It includes:
More class probability vector obtaining units 3621, for the more class probability vectors of a first kind and the b
The more class probability vectors of second class carry out mean operation, obtain the more class probability vectors of target;
Class prediction unit 3622 is used for the class corresponding to the more maximum labels of class probability vector median of the target
It is not determined as the unknown class AiClassification.
In addition it is also possible to by being carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class
The mode of weighted sum obtains the more class probability vectors of target;The maximum label institute of the more class probability vector medians of target is right
The classification answered is determined as unknown class AiClassification.
The function of modules and the realization process of effect specifically refer to and correspond to step in the above method in above-mentioned apparatus
Realization process, details are not described herein.
For device embodiments, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separating component
The unit of explanation may or may not be physically separated, and the component shown as unit can be or can also
It is not physical unit, you can be located at a place, or may be distributed over multiple network units.It can be according to actual
It needs that some or all of module therein is selected to realize the purpose of the embodiment of the present application scheme.Those of ordinary skill in the art
Without creative efforts, you can to understand and implement.
The embodiment of the present application also provides a kind of computer storage media, have program stored therein instruction in the storage medium,
Described program instructs:For given object set, obtain in the given object set characteristic of each object and
Object relationship data;Wherein, the given object set includes known class object and unknown class, and it is each
Know that class all has the original tag for indicating classification;According to the characteristic of each object in the given object set
And the object relationship data obtain the prediction label of each object in the given object set using label pass-algorithm;
According to the original tag and prediction label of the known class object, the label variation square of the first known class object set is obtained
Battle array, wherein the first known class object set includes the part of known class object or complete in the given object set
Portion, the label Variation Matrix are used to indicate that original tag of all categories in the first known class object set to make a variation to prediction
The probability of label;To unknown class collection carry out n times sampling, obtain n sampling collection, respectively by it is each sampling collection information and
The information of known class object is combined in the given object set, obtains n combined data set;Wherein, described unknown
Class concentration includes the whole of unknown class in the given object set, and n is the default value not less than 1, and
As n > 1, n sampling collection mutually disjoints;For each combined data set, using the noise classification of resistance to label algorithm, to described
The label Variation Matrix of combined data set and the first known class object set is handled, and n classification prediction model is obtained
And n updated label Variation Matrixes;According to any unknown class A in the given object setiCharacteristic,
The n classification prediction model and the n updated label Variation Matrixes, obtain the unknown class AiClassification
Prediction result.
The embodiment of the present application can be used in one or more wherein include program code storage medium it is (including but unlimited
In magnetic disk storage, CD-ROM, optical memory etc.) on the form of computer program product implemented.Computer can use storage
Medium includes permanent and non-permanent, removable and non-removable media, can be accomplished by any method or technique information
Storage.Information can be computer-readable instruction, data structure, the module of program or other data.The storage medium of computer
Example include but not limited to:Phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory
(DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory
(EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), digital versatile disc
(DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus or any other non-biography
Defeated medium can be used for storage and can be accessed by a computing device information.
As shown in figure 8, Fig. 8 is that the embodiment of the present application is a kind of for object type shown according to an exemplary embodiment
One structural schematic diagram of prediction meanss.For example, device 800 may be provided as a server.With reference to Fig. 8, device 800 includes place
Component 822 is managed, further comprises one or more processors, and by the memory resource representated by memory 832, be used for
Storage can be by the instruction of the execution of processing component 822, such as application program.The application program stored in memory 832 can wrap
Include it is one or more each correspond to one group of instruction module.Refer in addition, processing component 822 is configured as executing
It enables, to execute class prediction method provided by the embodiments of the present application, this method includes:For given object set, described in acquisition
The characteristic and object relationship data of each object in given object set;Wherein, the given object set includes
Known class object and unknown class, and each known class object all has the original tag for indicating classification;Root
It is obtained using label pass-algorithm according to the characteristic and the object relationship data of each object in the given object set
Obtain the prediction label of each object in the given object set;According to the original tag of the known class object and pre- mark
Label obtain the label Variation Matrix of the first known class object set, wherein the first known class object set includes described
Some or all of known class object in given object set, the label Variation Matrix is for indicating first known class
Probability of the original tag of all categories variation to prediction label in other object set;N times sampling is carried out to unknown class collection, is obtained
To n sampling collection, the information of each sampling collection and the information of known class object in the given object set are carried out respectively
Combination, obtains n combined data set;Wherein, the unknown class concentration includes unknown class in the given object set
The whole of other object, n are the default value not less than 1, and as n > 1, and n sampling collection mutually disjoints;For each combination
Data set, using the noise classification of resistance to label algorithm, to the label of the combined data set and the first known class object set
Variation Matrix is handled, and n classification prediction model and n updated label Variation Matrixes are obtained;According to described given pair
As any unknown class A in setiCharacteristic, the n classification prediction model and the n updated labels
Variation Matrix obtains the unknown class AiClass prediction result.
Device 800 can also include the power management that a power supply module 826 is configured as executive device 800, and one has
Line or radio network interface 850 are configured as device 800 being connected to network and input and output (I/O) interface 858.Dress
Setting 800 can operate based on the operating system for being stored in memory 832, such as Windows ServerTM, Mac OS XTM,
UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 832 of instruction, above-metioned instruction can be executed by the processing component 822 of device 800 to complete the embodiment of the present application
The prediction technique of the above-mentioned object type provided.For example, the non-transitorycomputer readable storage medium can be ROM, with
Machine accesses memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Those skilled in the art will readily occur to the application implementation after considering specification and putting into practice disclosure disclosed herein
Other embodiments of example.Any modification, purposes or the adaptability that the embodiment of the present application is intended to cover the embodiment of the present application become
Change, these variations, uses, or adaptations follow the general principle of the application and include that the embodiment of the present application is undisclosed
Common knowledge or conventional techniques in the art.The description and examples are only to be considered as illustrative, the application
The true scope and spirit of embodiment are indicated by the following claims.
It should be understood that the embodiment of the present application is not limited to the accurate knot for being described above and being shown in the accompanying drawings
Structure, and various modifications and changes may be made without departing from the scope thereof.The range of the embodiment of the present application is only by appended right
It is required that limit.
Claims (17)
1. a kind of prediction technique of object type, for carrying out class prediction to the unknown class given in object set,
It is characterized in that, the method includes:
For the given object set, obtains in the given object set and closed between the characteristic and object of each object
Coefficient evidence;Wherein, the given object set includes known class object and unknown class, and each known class pair
As all having the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, is transmitted and calculated using label
Method obtains the prediction label of each object in the given object set;
According to the original tag and prediction label of the known class object, the label variation of the first known class object set is obtained
Matrix, wherein the first known class object set include in the given object set part of known class object or
All, the label Variation Matrix is used to indicate that original tag of all categories in the first known class object set to make a variation to pre-
The probability of mark label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively gives the information of each sampling collection with described
The information for determining known class object in object set is combined, and obtains n combined data set;Wherein, the unknown classification pair
As concentrate include unknown class in the given object set whole, n is default value not less than 1, and as n > 1
When, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to known to the combined data set and described first
The label Variation Matrix of class collection is handled, and n classification prediction model and n updated label variation squares are obtained
Battle array;
According to any unknown class A in the given object setiCharacteristic, the n classification prediction model and institute
N updated label Variation Matrixes are stated, the unknown class A is obtainediClass prediction result.
2. according to the method described in claim 1, it is characterized in that, the spy according to each object in the given object set
Data and the object relationship data are levied, using label pass-algorithm, each object is pre- in the acquisition given object set
Mark label, including:
The characteristic expression of each object in the given object set is characterized row vector;
According to the feature row vector and the object relationship data, calculate all in the presence of direct in the given object set
The cosine similarity of two feature row vectors of relationship;
According to the cosine similarity and the object relationship data, by known class object in the given object set
Original tag passes to each object in the given object set, obtains the prediction of each object in the given object set
Label.
3. according to the method described in claim 1, it is characterized in that, it is described to unknown class collection carry out n times sampling, obtain
The information of each sampling collection and the information of known class object in the given object set are carried out group by n sampling collection respectively
It closes, obtains n combined data set, including:
3 sampling are carried out to unknown class collection S, extract 30% object every time, obtain 3 sampling collection M1, M2 and M3;
Sampling collection M1 is combined with the second known class object set D, obtains combined data set F1, wherein F1 is { M1, D }, described
Second known class object set D includes the whole of known class object in the given object set;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 is
{M2,D};
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 is
{M3,D}。
4. according to the method described in claim 1, it is characterized in that, described according to any unknown class in the given object set
Other object AiCharacteristic, the n classification prediction model and the n updated label Variation Matrixes, described in acquisition
Unknown class AiClass prediction as a result, including:
By any unknown class A in the given object setiCharacteristic be input to n classification prediction model in a
In a, the more class probability vectors of a first kind are obtained;And by the unknown class AiPrediction label be input to n
In b in updated label Variation Matrix, the b more class probability vectors of the second class are obtained;Wherein a and b is no more than n
Default value;
According to the more class probability vectors of the first kind obtained and the more class probability vectors of the second class, the unknown classification pair is obtained
As AiClass prediction result.
5. according to the method described in claim 4, it is characterized in that, described according to the more class probability vectors of the first kind obtained
And the second more class probability vector of class, obtain the unknown class AiClass prediction as a result, including:
Mean operation is carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class, is obtained
To the more class probability vectors of target;
Classification corresponding to the more maximum labels of class probability vector median of the target is determined as the unknown class Ai
Classification.
6. according to the method described in claim 1, it is characterized in that, the object is user;The classification includes:The year of user
Age section, the trip mode preference of user, trip period, the level of consumption of user or the consumption propensity of user of user.
7. according to the method described in claim 1, it is characterized in that, the object is user;The characteristic includes:User
History trip location information or user terminal device in application program installation situation.
8. according to the method described in claim 1, it is characterized in that, the object is user;The object relationship data packet
It includes:The data of relationship, or the number for describing the friend relation between each user are provided for describing the red packet between each user
According to.
9. a kind of prediction meanss of object type, for carrying out class prediction to the unknown class given in object set,
It is characterized in that, described device includes:
Collective data obtains module, for being directed to the given object set, obtains each object in the given object set
Characteristic and object relationship data;Wherein, the given object set includes known class object and unknown classification
Object, and each known class object all has the original tag for indicating classification;
Prediction label obtains module, for obtaining each object in the given object set that module obtains according to the collective data
Characteristic and the object relationship data obtain each object in the given object set using label pass-algorithm
Prediction label;
Label Variation Matrix obtains module, for according to the original tag and prediction label of the known class object, obtaining the
The label Variation Matrix of one known class object set, wherein the first known class object set includes the given object
Some or all of known class object in set, the label Variation Matrix is for indicating the first known class object set
In original tag variation of all categories to prediction label probability;
Composite module obtains n sampling collection, respectively by each sampling collection for carrying out n times sampling to unknown class collection
Information and the information of known class object in the given object set are combined, and obtain n combined data set;Wherein, institute
The whole that unknown class concentration includes unknown class in the given object set is stated, n is the present count not less than 1
Value, and as n > 1, n sampling collection mutually disjoints;
Training module, each combined data set for being obtained for the composite module, using the noise classification of resistance to label algorithm,
The label Variation Matrix of the combined data set and the first known class object set is handled, it is pre- to obtain n classification
Survey model and n updated label Variation Matrixes;
Class prediction module, for according to any unknown class A in the given object setiCharacteristic, the n
A classification prediction model and the n updated label Variation Matrixes, obtain the unknown class AiClass prediction
As a result.
10. device according to claim 9, which is characterized in that the prediction label obtains module, including:
Vector characterization submodule, for the characteristic expression of each object in the given object set to be characterized row vector;
Similarity calculation submodule, for being closed between characterizing the feature row vector and the object that submodule obtains according to the vector
Coefficient evidence, calculates that all in the given object set there are the cosine similarities of two feature row vectors of direct relation;
Label transmits submodule, the cosine similarity for being calculated according to the similarity calculation submodule and the object
Between relation data the original tag of known class object in the given object set is passed into the given object set
In each object, obtain the prediction label of each object in the given object set.
11. device according to claim 9, which is characterized in that the composite module, including:
Submodule of sampling extracts 30% object, obtains 3 pumpings every time for carrying out 3 sampling to unknown class collection S
Sample collection M1, M2 and M3;
Submodule is combined, collection M1 is combined with the second known class object set D for that will sample, and obtains combined data set F1, wherein
F1 is { M1, D }, and the second known class object set D includes the whole of known class object in the given object set;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 is
{M2,D};
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 is
{M3,D}。
12. device according to claim 9, which is characterized in that the class prediction module, including:
More class probability vectors obtain submodule, are used for any unknown class A in the given object setiFeature
Data are input in a in n classification prediction model, obtain the more class probability vectors of a first kind;And by described in not
Know class AiPrediction label be input in b in n updated label Variation Matrixes, it is more to obtain b the second classes
Class probability vector;Wherein a and b is the default value no more than n;
Class prediction submodule is classified generally more for obtaining the first kind that submodule is obtained according to more class probability vectors
Rate vector and the more class probability vectors of the second class, obtain the unknown class AiClass prediction result.
13. device according to claim 12, which is characterized in that the class prediction submodule, including:
More class probability vector obtaining units, for more to the more class probability vectors of a first kind and the b the second classes
Class probability vector carries out mean operation, obtains the more class probability vectors of target;
Class prediction unit, for the classification corresponding to the more maximum labels of class probability vector median of the target to be determined as
The unknown class AiClassification.
14. device according to claim 9, which is characterized in that the object is user;The classification includes:User's
Age bracket, the trip mode preference of user, trip period, the level of consumption of user or the consumption propensity of user of user.
15. device according to claim 9, which is characterized in that the object is user;The characteristic includes:With
The installation situation of application program in the history trip location information at family or the terminal device of user.
16. device according to claim 9, which is characterized in that the object is user;The object relationship data packet
It includes:The data of relationship, or the number for describing the friend relation between each user are provided for describing the red packet between each user
According to.
17. a kind of computer storage media, which is characterized in that have program stored therein instruction in the storage medium, and described program refers to
Order includes:
For given object set, the characteristic and object relationship number of each object in the given object set are obtained
According to;Wherein, the given object set includes known class object and unknown class, and each known class object is equal
With the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, is transmitted and calculated using label
Method obtains the prediction label of each object in the given object set;
According to the original tag and prediction label of the known class object, the label variation of the first known class object set is obtained
Matrix, wherein the first known class object set include in the given object set part of known class object or
All, the label Variation Matrix is used to indicate that original tag of all categories in the first known class object set to make a variation to pre-
The probability of mark label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively gives the information of each sampling collection with described
The information for determining known class object in object set is combined, and obtains n combined data set;Wherein, the unknown classification pair
As concentrate include unknown class in the given object set whole, n is default value not less than 1, and as n > 1
When, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to known to the combined data set and described first
The label Variation Matrix of class collection is handled, and n classification prediction model and n updated label variation squares are obtained
Battle array;
According to any unknown class A in the given object setiCharacteristic, the n classification prediction model and institute
N updated label Variation Matrixes are stated, the unknown class A is obtainediClass prediction result.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710179031.1A CN108629358B (en) | 2017-03-23 | 2017-03-23 | Object class prediction method and device |
CN201880020197.1A CN110447039A (en) | 2017-03-23 | 2018-03-16 | The system and method for predicting object type |
PCT/CN2018/079348 WO2018171531A1 (en) | 2017-03-23 | 2018-03-16 | System and method for predicting classification for object |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710179031.1A CN108629358B (en) | 2017-03-23 | 2017-03-23 | Object class prediction method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108629358A true CN108629358A (en) | 2018-10-09 |
CN108629358B CN108629358B (en) | 2020-12-25 |
Family
ID=63585880
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710179031.1A Active CN108629358B (en) | 2017-03-23 | 2017-03-23 | Object class prediction method and device |
CN201880020197.1A Pending CN110447039A (en) | 2017-03-23 | 2018-03-16 | The system and method for predicting object type |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880020197.1A Pending CN110447039A (en) | 2017-03-23 | 2018-03-16 | The system and method for predicting object type |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN108629358B (en) |
WO (1) | WO2018171531A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110060247A (en) * | 2019-04-18 | 2019-07-26 | 深圳市深视创新科技有限公司 | Cope with the robust deep neural network learning method of sample marking error |
CN111611429A (en) * | 2019-02-25 | 2020-09-01 | 北京嘀嘀无限科技发展有限公司 | Data annotation method and device, electronic equipment and computer readable storage medium |
CN113811915A (en) * | 2019-02-26 | 2021-12-17 | 北京嘀嘀无限科技发展有限公司 | Unified order serving and fleet management for online shared travel platform |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11645693B1 (en) * | 2020-02-28 | 2023-05-09 | Amazon Technologies, Inc. | Complementary consumer item selection |
US11526700B2 (en) | 2020-06-29 | 2022-12-13 | International Business Machines Corporation | Annotating unlabeled data using classifier error rates |
CN112132178B (en) * | 2020-08-19 | 2023-10-13 | 深圳云天励飞技术股份有限公司 | Object classification method, device, electronic equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714139A (en) * | 2013-12-20 | 2014-04-09 | 华南理工大学 | Parallel data mining method for identifying a mass of mobile client bases |
CN104572733A (en) * | 2013-10-22 | 2015-04-29 | 腾讯科技(深圳)有限公司 | User interest tag classification method and device |
CN105069129A (en) * | 2015-06-24 | 2015-11-18 | 合肥工业大学 | Self-adaptive multi-label prediction method |
CN105184326A (en) * | 2015-09-30 | 2015-12-23 | 广东工业大学 | Active learning multi-label social network data analysis method based on graph data |
CN105446988A (en) * | 2014-06-30 | 2016-03-30 | 华为技术有限公司 | Classification predicting method and device |
CN105608471A (en) * | 2015-12-28 | 2016-05-25 | 苏州大学 | Robust transductive label estimation and data classification method and system |
US20160379133A1 (en) * | 2015-06-23 | 2016-12-29 | Microsoft Technology Licensing, Llc | Reasoning classification based on feature pertubation |
CN106452809A (en) * | 2015-08-04 | 2017-02-22 | 北京奇虎科技有限公司 | Data processing method and device |
CN106446191A (en) * | 2016-09-30 | 2017-02-22 | 浙江工业大学 | Logistic regression based multi-feature network popular tag prediction method |
CN106504029A (en) * | 2016-11-08 | 2017-03-15 | 山东大学 | A kind of gas station's Method for Sales Forecast method based on customer group's behavior analysiss |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090092299A1 (en) * | 2007-10-03 | 2009-04-09 | Siemens Medical Solutions Usa, Inc. | System and Method for Joint Classification Using Feature Space Cluster Labels |
US8386574B2 (en) * | 2009-10-29 | 2013-02-26 | Xerox Corporation | Multi-modality classification for one-class classification in social networks |
CN103605990B (en) * | 2013-10-23 | 2017-02-08 | 江苏大学 | Integrated multi-classifier fusion classification method and integrated multi-classifier fusion classification system based on graph clustering label propagation |
CN104750875B (en) * | 2015-04-23 | 2018-03-02 | 苏州大学 | A kind of machine error data classification method and system |
CN105930411A (en) * | 2016-04-18 | 2016-09-07 | 苏州大学 | Classifier training method, classifier and sentiment classification system |
-
2017
- 2017-03-23 CN CN201710179031.1A patent/CN108629358B/en active Active
-
2018
- 2018-03-16 CN CN201880020197.1A patent/CN110447039A/en active Pending
- 2018-03-16 WO PCT/CN2018/079348 patent/WO2018171531A1/en active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572733A (en) * | 2013-10-22 | 2015-04-29 | 腾讯科技(深圳)有限公司 | User interest tag classification method and device |
CN103714139A (en) * | 2013-12-20 | 2014-04-09 | 华南理工大学 | Parallel data mining method for identifying a mass of mobile client bases |
CN105446988A (en) * | 2014-06-30 | 2016-03-30 | 华为技术有限公司 | Classification predicting method and device |
US20160379133A1 (en) * | 2015-06-23 | 2016-12-29 | Microsoft Technology Licensing, Llc | Reasoning classification based on feature pertubation |
CN105069129A (en) * | 2015-06-24 | 2015-11-18 | 合肥工业大学 | Self-adaptive multi-label prediction method |
CN106452809A (en) * | 2015-08-04 | 2017-02-22 | 北京奇虎科技有限公司 | Data processing method and device |
CN105184326A (en) * | 2015-09-30 | 2015-12-23 | 广东工业大学 | Active learning multi-label social network data analysis method based on graph data |
CN105608471A (en) * | 2015-12-28 | 2016-05-25 | 苏州大学 | Robust transductive label estimation and data classification method and system |
CN106446191A (en) * | 2016-09-30 | 2017-02-22 | 浙江工业大学 | Logistic regression based multi-feature network popular tag prediction method |
CN106504029A (en) * | 2016-11-08 | 2017-03-15 | 山东大学 | A kind of gas station's Method for Sales Forecast method based on customer group's behavior analysiss |
Non-Patent Citations (3)
Title |
---|
JUI-SHENG CHOU: "comparison of multilabel classification models to forecast project dispute resolutions", 《EXPERT SYSTEMS WITH APPLICATIONS》 * |
刘列 等: "社交网络用户标签预测研究", 《中文信息学报》 * |
辛霆麟: "基于标签传播的链路预测算法研究与应用", 《中国优秀硕士学位论文全文数据库-基础科学辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111611429A (en) * | 2019-02-25 | 2020-09-01 | 北京嘀嘀无限科技发展有限公司 | Data annotation method and device, electronic equipment and computer readable storage medium |
CN111611429B (en) * | 2019-02-25 | 2023-05-12 | 北京嘀嘀无限科技发展有限公司 | Data labeling method, device, electronic equipment and computer readable storage medium |
CN113811915A (en) * | 2019-02-26 | 2021-12-17 | 北京嘀嘀无限科技发展有限公司 | Unified order serving and fleet management for online shared travel platform |
CN113811915B (en) * | 2019-02-26 | 2024-05-31 | 北京嘀嘀无限科技发展有限公司 | Unified order dispatch and fleet management for online shared travel platform |
CN110060247A (en) * | 2019-04-18 | 2019-07-26 | 深圳市深视创新科技有限公司 | Cope with the robust deep neural network learning method of sample marking error |
Also Published As
Publication number | Publication date |
---|---|
WO2018171531A1 (en) | 2018-09-27 |
CN110447039A (en) | 2019-11-12 |
CN108629358B (en) | 2020-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108629358A (en) | The prediction technique and device of object type | |
Han et al. | Joint air quality and weather prediction based on multi-adversarial spatiotemporal networks | |
CN108717408B (en) | Sensitive word real-time monitoring method, electronic equipment, storage medium and system | |
CN105005589B (en) | A kind of method and apparatus of text classification | |
CN110163647B (en) | Data processing method and device | |
CN107862022B (en) | Culture resource recommendation system | |
Lee et al. | When twitter meets foursquare: tweet location prediction using foursquare | |
CN108090216B (en) | Label prediction method, device and storage medium | |
Bian et al. | Predicting trending messages and diffusion participants in microblogging network | |
US9286379B2 (en) | Document quality measurement | |
CN112749330B (en) | Information pushing method, device, computer equipment and storage medium | |
Ruan et al. | GADM: Manual fake review detection for O2O commercial platforms | |
CN110990718A (en) | Social network model building module of company image improving system | |
CN109462578A (en) | Threat intelligence use and propagation method based on statistical learning | |
CN107392311A (en) | The method and apparatus of sequence cutting | |
CN103617146B (en) | A kind of machine learning method and device based on hardware resource consumption | |
Demertzis et al. | A machine hearing framework for real-time streaming analytics using Lambda architecture | |
CN115952343A (en) | Social robot detection method based on multi-relation graph convolutional network | |
Ozdikis et al. | Spatial statistics of term co-occurrences for location prediction of tweets | |
Cheng et al. | ISC: An iterative social based classifier for adult account detection on twitter | |
CN110909258A (en) | Information recommendation method, device, equipment and storage medium | |
Lu et al. | Predicting viral news events in online media | |
Kotzias et al. | Addressing the Sparsity of Location Information on Twitter. | |
Wang et al. | Abnormal trajectory detection based on geospatial consistent modeling | |
CN110008975B (en) | Social network water army detection method based on immune hazard theory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |