CN108629358A - The prediction technique and device of object type - Google Patents

The prediction technique and device of object type Download PDF

Info

Publication number
CN108629358A
CN108629358A CN201710179031.1A CN201710179031A CN108629358A CN 108629358 A CN108629358 A CN 108629358A CN 201710179031 A CN201710179031 A CN 201710179031A CN 108629358 A CN108629358 A CN 108629358A
Authority
CN
China
Prior art keywords
class
label
object set
prediction
unknown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710179031.1A
Other languages
Chinese (zh)
Other versions
CN108629358B (en
Inventor
秦志伟
卓呈祥
谭伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN201710179031.1A priority Critical patent/CN108629358B/en
Priority to CN201880020197.1A priority patent/CN110447039A/en
Priority to PCT/CN2018/079348 priority patent/WO2018171531A1/en
Publication of CN108629358A publication Critical patent/CN108629358A/en
Application granted granted Critical
Publication of CN108629358B publication Critical patent/CN108629358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides a kind of prediction technique and device of object type, and this method includes:Obtain the characteristic and object relationship data for giving each object in object set;According to characteristic and object relationship data, the prediction label of each object is obtained;According to the original tag and prediction label of known class object, the label Variation Matrix of the first known class object set is obtained;N times sampling is carried out to unknown class collection, obtains n sampling collection, the information of each sampling collection and the information of known class object in given object set are combined respectively, obtain n combined data set;For each combined data set, combined data set and label Variation Matrix are handled, obtain n classification prediction model and n updated label Variation Matrixes;According to any unknown class A in given object setiCharacteristic, n classification prediction model and n updated label Variation Matrixes, obtain unknown class AiClass prediction result.

Description

The prediction technique and device of object type
Technical field
The invention relates to technical field of data processing, more particularly to the prediction technique and dress of a kind of object type It sets.
Background technology
Machine learning be nearly more than 20 years rise a multi-field cross discipline, be related to probability theory, statistics, Approximation Theory, The multi-door subject such as convextiry analysis and computational complexity theory.Machine Learning Theory, which mainly designs and analyze some, allows computer can be with Automatically the algorithm " learnt ", the algorithm are that a kind of automatically analyzed from data obtains rule, and utilizes the rule to unknown data The algorithm predicted.Currently, in machine learning field, the main learning method different including three classes, respectively supervised learning, Unsupervised learning and semi-supervised learning.
In the prior art, in the learning method using machine learning when the sample to not tape label carries out class prediction, There is a problem of that prediction result accuracy rate is low.Therefore how classification to be carried out based on the learning method of machine learning accurately pre- It surveys, it has also become those skilled in the art's urgent problem to be solved.
Invention content
To solve the above-mentioned problems, the embodiment of the present application provides a kind of prediction technique and device of object type.
Specifically, the embodiment of the present application is achieved by the following technical solution:
According to the embodiment of the present application in a first aspect, a kind of prediction technique of object type is provided, for giving object Unknown class in set carries out class prediction, the method includes:
For the given object set, the characteristic and object of each object in the given object set are obtained Between relation data;Wherein, the given object set includes known class object and unknown class, and each known class Other object all has the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, passed using label Algorithm is passed, the prediction label of each object in the given object set is obtained;
According to the original tag and prediction label of the known class object, the label of the first known class object set is obtained Variation Matrix, wherein the first known class object set includes the portion of known class object in the given object set Divide or all, the label Variation Matrix is used to indicate original tag variation of all categories in the first known class object set To the probability of prediction label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively by the information of each sampling collection and institute The information for stating known class object in given object set is combined, and obtains n combined data set;Wherein, the unknown class Other object set includes the whole of unknown class in the given object set, and n is the default value not less than 1, and works as n When > 1, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to the combined data set and described first The label Variation Matrix of known class object set is handled, and n classification prediction model and n updated label variations are obtained Matrix;
According to any unknown class A in the given object setiCharacteristic, it is described n classification prediction mould Type and the n updated label Variation Matrixes, obtain the unknown class AiClass prediction result.
In the embodiment of the present application, between the characteristic and the object according to each object in the given object set Relation data obtains the prediction label of each object in the given object set using label pass-algorithm, including:
The characteristic expression of each object in the given object set is characterized row vector;
According to the feature row vector and the object relationship data, all presence in the given object set are calculated The cosine similarity of two feature row vectors of direct relation;
According to the cosine similarity and the object relationship data, by known class pair in the given object set The original tag of elephant passes to each object in the given object set, obtains each object in the given object set Prediction label.
It is described that n times sampling is carried out to unknown class collection in the embodiment of the present application, n sampling collection is obtained, it respectively will be every The information and the information of known class object in the given object set of a sampling collection are combined, and obtain n data splitting Collection, including:
To unknown class collection S carry out 3 times sampling, every time extract 30% object, obtain 3 sampling collection M1, M2 and M3;
Sampling collection M1 is combined with the second known class object set D, obtains combined data set F1, wherein F1 is { M1, D }, The second known class object set D includes the whole of known class object in the given object set;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 For { M2, D };
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 For { M3, D }.
It is described according to any unknown class A in the given object set in the embodiment of the present applicationiCharacteristic According to, the n classification prediction model and the n updated label Variation Matrixes, the unknown class A is obtainedi's Class prediction is as a result, include:
By any unknown class A in the given object setiCharacteristic be input to n classification prediction model In a in, obtain the more class probability vectors of a first kind;And by the unknown class AiPrediction label input Into b in n updated label Variation Matrixes, the b more class probability vectors of the second class are obtained;Wherein a and b is not Default value more than n;
According to the more class probability vectors of the first kind obtained and the more class probability vectors of the second class, the unknown class is obtained Other object AiClass prediction result.
It is described according to the more class probability vectors of the first kind obtained and the more class probabilities of the second class in the embodiment of the present application Vector obtains the unknown class AiClass prediction as a result, including:
Mean value fortune is carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class It calculates, obtains the more class probability vectors of target;
Classification corresponding to the more maximum labels of class probability vector median of the target is determined as the unknown classification Object AiClassification.
In the embodiment of the present application, the object is user;The classification includes:The age bracket of user, the trip side of user Formula preference, trip period, the level of consumption of user or the consumption propensity of user of user.
In the embodiment of the present application, the object is user;The characteristic includes:The history trip location information of user Or in the terminal device of user application program installation situation.
In the embodiment of the present application, the object is user;The object relationship data include:For describe each user it Between red packet provide the data of relationship, or data for describing the friend relation between each user.
According to the second aspect of the embodiment of the present application, a kind of prediction meanss of object type are provided, for giving object Unknown class in set carries out class prediction, and described device includes:
Collective data obtains module, and for being directed to the given object set, it is each right in the given object set to obtain The characteristic and object relationship data of elephant;Wherein, the given object set includes known class object and unknown Class, and each known class object all has the original tag for indicating classification;
Prediction label obtains module, each right in the given object set that module obtains for being obtained according to the collective data The characteristic of elephant and the object relationship data, using label pass-algorithm, it is each right in the given object set to obtain The prediction label of elephant;
Label Variation Matrix obtains module, for the original tag and prediction label according to the known class object, obtains Obtain the label Variation Matrix of the first known class object set, wherein the first known class object set includes described given Some or all of known class object in object set, the label Variation Matrix is for indicating first known class pair As concentrating original tag variation of all categories to the probability of prediction label;
Composite module obtains n sampling collection, will each sample respectively for carrying out n times sampling to unknown class collection The information of collection and the information of known class object in the given object set are combined, and obtain n combined data set;Its In, the unknown class concentration includes the whole of unknown class in the given object set, and n is not less than 1 Default value, and as n > 1, n sampling collection mutually disjoints;
Training module, each combined data set for being obtained for the composite module, uses the noise classification of resistance to label Algorithm handles the label Variation Matrix of the combined data set and the first known class object set, obtains n points Class prediction model and n updated label Variation Matrixes;
Class prediction module, for according to any unknown class A in the given object setiCharacteristic, The n classification prediction model and the n updated label Variation Matrixes, obtain the unknown class AiClassification Prediction result.
In the embodiment of the present application, the prediction label obtains module, including:
Vector characterization submodule, for by the given object set each object characteristic expression be characterized row to Amount;
Similarity calculation submodule, for characterizing the feature row vector and the object that submodule obtains according to the vector Between relation data, calculate that all in the given object set there are the cosine of two feature row vectors of direct relation is similar Degree;
Label transmits submodule, cosine similarity for being calculated according to the similarity calculation submodule and described The original tag of known class object in the given object set is passed to the given object by object relationship data Each object in set obtains the prediction label of each object in the given object set.
In the embodiment of the present application, the composite module, including:
Submodule of sampling extracts 30% object, obtains 3 every time for carrying out 3 sampling to unknown class collection S A sampling collects M1, M2 and M3;
Submodule is combined, for that will sample, collection M1 is combined with the second known class object set D, obtains combined data set F1, Wherein, F1 is { M1, D }, and the second known class object set D includes known class object in the given object set All;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 For { M2, D };
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 For { M3, D }.
In the embodiment of the present application, the class prediction module, including:
More class probability vectors obtain submodule, are used for any unknown class A in the given object seti's Characteristic is input in a in n classification prediction model, obtains the more class probability vectors of a first kind;And by institute State unknown class AiPrediction label be input in b in n updated label Variation Matrixes, obtain b second The more class probability vectors of class;Wherein a and b is the default value no more than n;
Class prediction submodule, for obtaining more points of the first kind that submodule is obtained according to more class probability vectors Class probability vector and the more class probability vectors of the second class, obtain the unknown class AiClass prediction result.
In the embodiment of the present application, the class prediction submodule, including:
More class probability vector obtaining units, for the more class probability vectors of a first kind and the b second The more class probability vectors of class carry out mean operation, obtain the more class probability vectors of target;
Class prediction unit, for the classification corresponding to the more maximum labels of class probability vector median of the target is true It is set to the unknown class AiClassification.
In the embodiment of the present application, the object is user;The classification includes:The age bracket of user, the trip side of user Formula preference, trip period, the level of consumption of user or the consumption propensity of user of user.
In the embodiment of the present application, the object is user;The characteristic includes:The history trip location information of user Or in the terminal device of user application program installation situation.
In the embodiment of the present application, the object is user;The object relationship data include:For describe each user it Between red packet provide the data of relationship, or data for describing the friend relation between each user.
According to the third aspect of the embodiment of the present application, a kind of computer storage media is provided, is stored in the storage medium There is program instruction, described program, which instructs, includes:
For given object set, obtains in the given object set and closed between the characteristic and object of each object Coefficient evidence;Wherein, the given object set includes known class object and unknown class, and each known class pair As all having the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, passed using label Algorithm is passed, the prediction label of each object in the given object set is obtained;
According to the original tag and prediction label of the known class object, the label of the first known class object set is obtained Variation Matrix, wherein the first known class object set includes the portion of known class object in the given object set Divide or all, the label Variation Matrix is used to indicate original tag variation of all categories in the first known class object set To the probability of prediction label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively by the information of each sampling collection and institute The information for stating known class object in given object set is combined, and obtains n combined data set;Wherein, the unknown class Other object set includes the whole of unknown class in the given object set, and n is the default value not less than 1, and works as n When > 1, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to the combined data set and described first The label Variation Matrix of known class object set is handled, and n classification prediction model and n updated label variations are obtained Matrix;
According to any unknown class A in the given object setiCharacteristic, it is described n classification prediction mould Type and the n updated label Variation Matrixes, obtain the unknown class AiClass prediction result.
It, can be by the semi-supervised learning (i.e. label transmits) and supervised learning in machine learning method in the embodiment of the present application (classifying) is combined, and is used for carrying out class prediction to unknown class, the embodiment of the present application transmits label Prediction label is effectively attached in supervised learning, and come to the progress of unknown class in such a way that multiple models are established in sampling Class prediction, to achieve the purpose that improve the Stability and veracity of prediction result.
It should be understood that above general description and following detailed description is merely exemplary, this can not be limited Apply for embodiment.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets the application implementation Example, and be used to explain the principle of the present invention together with specification.
Figure 1A is the stream according to a kind of prediction technique of object type shown in one exemplary embodiment of the embodiment of the present application Cheng Tu;
Figure 1B is a kind of embodiment that step 120 in Figure 1A is shown according to one exemplary embodiment of the embodiment of the present application Flow chart;
Fig. 1 C are the instance graphs that method shown in Figure 1A is shown according to one exemplary embodiment of the embodiment of the present application;
Fig. 2 is the stream according to the prediction technique of another object type shown in one exemplary embodiment of the embodiment of the present application Cheng Tu;
Fig. 3 is the frame according to a kind of prediction meanss of object type shown in one exemplary embodiment of the embodiment of the present application Figure;
Fig. 4 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application Figure;
Fig. 5 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application Figure;
Fig. 6 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application Figure;
Fig. 7 is the frame according to the prediction meanss of another object type shown in one exemplary embodiment of the embodiment of the present application Figure;
Fig. 8 is a kind of prediction meanss for object type shown according to one exemplary embodiment of the embodiment of the present application One structural schematic diagram.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the embodiment of the present application.On the contrary, they be only with The example of as detailed in the attached claim, the embodiment of the present application the consistent device and method of some aspects.
It is the purpose only merely for description specific embodiment in the term that the embodiment of the present application uses, is not intended to be limiting this Apply for embodiment.The embodiment of the present application and the "an" of singulative used in the attached claims, " described " and "the" is also intended to including most forms, unless context clearly shows that other meanings.It is also understood that art used herein Language "and/or" refer to and include one or more associated list items purposes any or all may combine.
It will be appreciated that though various letters may be described using term first, second, third, etc. in the embodiment of the present application Breath, but these information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, In the case where not departing from the embodiment of the present application range, the first information can also be referred to as the second information, similarly, the second information It can also be referred to as the first information.Depending on context, word as used in this " if " can be construed to " ... When " or " when ... " or " in response to determination ".
Machine learning be nearly more than 20 years rise a multi-field cross discipline, be related to probability theory, statistics, Approximation Theory, The multi-door subject such as convextiry analysis and computational complexity theory.Machine Learning Theory, which mainly designs and analyze some, allows computer can be with Automatically the algorithm " learnt ", the algorithm are that a kind of automatically analyzed from data obtains rule, and utilizes the rule to unknown data The algorithm predicted.Currently, in machine learning field, the main learning method different including three classes, respectively supervised learning, Unsupervised learning and semi-supervised learning.In the prior art, in the learning method using machine learning in the sample to not tape label When carrying out class prediction, there is a problem of that prediction result accuracy rate is low.To solve the above-mentioned problems, the embodiment of the present application provides A kind of prediction technique and device of object type.
The prediction technique of object type provided by the embodiments of the present application is introduced first below.
As shown in Figure 1A, Figure 1A is according to a kind of the pre- of object type shown in one exemplary embodiment of the embodiment of the present application The flow chart of survey method, for carrying out class prediction to the unknown class given in object set, this method may include Following steps:
In step 110, for given object set, obtain give in object set the characteristic of each object and Object relationship data;Wherein, which includes known class object and unknown class, and each known Class all has the original tag for indicating classification.
In the embodiment of the present application, it includes multiple objects to give object set, and in practical applications, object can be to use Family, or event, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, object relationship data are referred to for describing in given object set between each object The data of incidence relation.For example, when object is user, object relationship data can be good between each user for describing The data of friendly relationship, specifically, can be based on network application (such as social networking application, trip application and Video Applications etc.) User's friend relation data;Alternatively, object relationship data may be to be closed for describing the red packet granting between each user The data of system, specifically, can be that the red packet based on network application provides relation data.When object is event, closed between object Coefficient evidence can be the event correlation data based on network application.
In the embodiment of the present application, when object is user, classification may include:The age bracket of user, the trip side of user Formula preference, trip period, the level of consumption of user or the consumption propensity etc. of user of user;When object is event, classification can Think that the probability that event occurs, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, when object is user, the characteristic of object may include:The history trip ground of user The application program etc. installed in the terminal device of point information or user, wherein the history trip location information of user can be from It is obtained in the trip application of the user's registration, for the terminal device of Android system, the terminal device that can be used from the user It is middle to obtain mounted application program.
In the embodiment of the present application, it is known that class has original tag, unknown class not to have original tag.
For example, the friend relation data of the QQ friends (such as user B, user C and user D) of user A and user A, user A Age bracket it is known that the age bracket of the QQ friends of user A is unknown, according to the relationship of the QQ friends of the age bracket of user A and user A Data, the age bracket of the QQ friends of prediction user A, at this point, given object set includes:User A, user B, user C and use Tetra- objects of family D, object relationship data are the good of the QQ friends (such as user B, user C and user D) of user A and user A Friendly relation data, user A are known class object, and user B, user C and user D are unknown class, and classification is user Age bracket.
In the step 120, according to the characteristic of each object in given object set and object relationship data, mark is utilized Pass-algorithm is signed, the prediction label for giving each object in object set is obtained.
In the embodiment of the present application, using label pass-algorithm to determining the characteristic and object of each object in object set Between after relation data handled, giving known class object in object set has original tag and prediction label, this is given Unknown class only has prediction label in object set.
In a kind of optional embodiment provided by the embodiments of the present application, as shown in Figure 1B, above-mentioned steps 120 may include with Lower step:
In step 121, the characteristic expression of each object in given object set is characterized row vector.
In the embodiment of the present application, feature row vector is characterized the vectorization of data, each element pair in feature row vector A feature for elephant is coped with, the value of each element is 1 or 0, and it is 0 representative pair that value represents object to have this feature, value for 1 As this no feature.
In step 122, it according to feature row vector and object relationship data, calculates and gives all presence in object set The cosine similarity of two feature row vectors of direct relation.
In the embodiment of the present application, it can determine to determine all in the presence of direct in object set according to object relationship data The object of relationship, by taking red packet granting relationship as an example, direct relation refers to directly providing the relationship of red packet.
It, will be in given object set according to the cosine similarity and object relationship data being calculated in step 123 The original tag of known class object passes to each object in given object set, obtains each object in given object set Prediction label.
In scene of going on a journey, for the age bracket prediction of user, step 121~step 123 to be described.
It is known that object relationship data, which are red packet, provides relationship, i.e. the structure that gives bonus between two two users includes multiple The relational network of user, the relational network include:Tetra- passenger A, passenger B, passenger C and passenger D passengers, passenger A are provided with Age bracket, passenger B, passenger C and passenger D do not provide age bracket.
Characteristic is:Passenger A, passenger B, tetra- passengers of passenger C and passenger D history go on a journey location information, with Beijing For, 100 trip places can be marked off, when being characterized row vector to characteristic expression, if passenger went this to go out Row place, then the corresponding value in the trip place is 1, if not going to the trip place, the corresponding value in the trip place is 0, For example, the feature row vector of passenger A is [0,0,1 ... 1,0], this feature row vector includes 100 elements, and each element corresponds to One trip place, the value of element represent passenger A for 1 and went to the corresponding trip place of the element, and the value of element is 0 representative Passenger A did not go to the corresponding trip place of the element.
To predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, the age bracket of passenger A be known as it is after 80s, at this time The label probability matrix of passenger A is [0,0,1,0], wherein the age bracket that first element in label probability matrix is passenger A It is the probability (i.e. probability is 0) after 60, the age bracket that second element is passenger A is the probability (i.e. probability is 0) after 70, third A element is that the age bracket of passenger A is probability after 80s (i.e. probability is 1), and the age bracket that the 4th element is passenger A is after 90s Probability (i.e. probability be 0).
For example, the feature row vector of passenger A be [0,0,1 ... 1,0], similarly, B, C and D also correspond to a feature row to Amount, to calculating the cohesions for the passenger that two have direct red packet to provide relationship, wherein the calculation formula of cohesion is:Two spies Levy the cosine similarity of row vector.
For example, there are direct red packets to provide relationship by passenger A and passenger B, by the feature row vector and the passenger that calculate passenger A The cosine similarity of the feature row vector of B, obtain cosine similarity be 0.8, at this point, the label probability matrix of passenger B be [0,0, 0.8,0], the prediction label of passenger B is:It is 0 that passenger B, which belongs to the probability after 60, and it is 0 to belong to the probability after 70, belongs to after 80s Probability is 0.8, and the probability for belonging to after 90s is 0, similarly, can calculate the prediction label of passenger C and passenger D.
It should be noted that any object in corresponding given object set, no matter the object is either with or without original tag, all It to be transmitted into row label, and obtain corresponding prediction label.
In addition, in addition to using the label pass-algorithm in the above embodiment, the embodiment of the present application can also use correlation Any label pass-algorithm in technology is to achieve the purpose that the prediction label for obtaining each object, the embodiment of the present application do not make this It limits.
In step 130, according to the original tag and prediction label of known class object, the first known class object is obtained The label Variation Matrix of collection, wherein the first known class object set includes known class object in given object set Partly or entirely, which is used to indicate that original tag of all categories in the first known class object set to make a variation to pre- The probability of mark label.
It should be noted that for the ease of understanding and describing, in the above example, citing Jie is only carried out with a small amount of object It continues, in practical applications, a large amount of object is generally comprised in given object set, in order to improve computational efficiency, the application is implemented In example, the first known class object set can be partitioned into from given object set according to certain percentage, i.e., this has been first It includes giving the part of known class object in object set rather than whole to know that class is concentrated, at this point, can specifically wrap It includes:
According to preset percentage, the first known class object set is partitioned into from given object set, wherein known to first Known class object sum * preset percentages in the number of objects that class is concentrated=given object set.
Such as preset percentage is 10%, it includes 1000 known class objects to give object set, then known to first Class concentration includes 100 objects.
In the embodiment of the present application, each object in the first known class object set has original tag and pre- mark simultaneously Label.When calculating the label Variation Matrix of the first known class object set, computational methods in the prior art may be used, first The original tag variation of each object in the first known class object set is calculated to the probability of prediction label, later, according to each object Original tag variation to prediction label probability, calculate the first known class object set label Variation Matrix.
In order to make it easy to understand, to predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, to label Variation Matrix It is illustrated, the matrix that label Variation Matrix is one 4 × 4.
In step 140, n times sampling is carried out to unknown class collection, obtains n sampling collection, will each samples respectively The information of collection and the information of known class object in given object set are combined, and obtain n combined data set;Wherein, should It includes the whole for giving unknown class in object set that unknown class, which is concentrated, and n is the default value not less than 1, and As n > 1, n sampling collection mutually disjoints.
In the embodiment of the present application, the value of n may include:2,3,4 or 5 etc., the sampling concentration sampled is taken out two-by-two The object that sample is concentrated differs completely.
In order to make full use of unknown class to concentrate unknown class, while the service efficiency of computing resource is taken into account, In one preferred embodiment, above-mentioned steps 104 may comprise steps of:S10, S11, S12 and S13, wherein
In S10,3 sampling are carried out to unknown class collection S, extract 30% object every time, obtains 3 sampling collection M1, M2 and M3;
In S11, sampling collection M1 is combined with the second known class object set D, obtains combined data set F1, wherein F1 Include the whole of known class object in given object set for { M1, D }, the second known class object set D;
In S12, sampling collection M2 is combined with the second known class object set D, obtains combined data set F2, wherein F2 For { M2, D };
In S13, sampling collection M3 is combined with the second known class object set D, obtains combined data set F3, wherein F3 For { M3, D }.
In step 150, for each combined data set, using the noise classification of resistance to label algorithm, to combined data set and The label Variation Matrix of first known class object set is handled, and n classification prediction model and n updated labels are obtained Variation Matrix.
In the embodiment of the present application, the noise classification of resistance to label algorithm has higher patience to the uncertainty of prediction label, Wherein, which can be that the more sorted logics for the noise of resistance to label return rmLR algorithms, in addition, for mark Label transmit the uncertainty of result, and the stability of prediction result is improved in such a way that multiple models are established in sampling.
In the embodiment of the present application, more sorted logics that any one of the relevant technologies noise of resistance to label may be used return RmLR algorithms carry out polytypic training to the label Variation Matrix of combined data set and the first known class object set, wherein Specific training process can be realized in Spark related applications.
Specific algorithm is as follows, the grader of output and logistic regression compatibility:Wherein, xq For the feature row vector of object q, wkFor grader;
Determine object function: Wherein, p (y=k | xn,wk) by a Softmax functionModeling.
LBFGS optimization object functions, fixed labels Variation Matrix are used later;Spark LBFGS need to provide Gradient It realizes, returns to gradient and target function value, it is as follows:
Corresponding to the preferred embodiment in step 140, in the preferred embodiment, can train to obtain 3 classification Prediction model and 3 updated label Variation Matrixes.
In a step 160, according to any unknown class A in given object setiCharacteristic, n classification it is pre- Model and n updated label Variation Matrixes are surveyed, unknown class A is obtainediClass prediction result.
By taking the prediction of passenger's age bracket as an example, the essential information collection of passenger is very limited, may only have in many cases The passenger (passenger known to age bracket) of cell-phone number, unknown classification is seldom for total passenger, by using this Apply for the method that embodiment provides, passenger can be made full use of to share red packet and the relationship network data that generates, to effectively expand The quantity for filling training set, to achieve the effect that promote predictablity rate.
In order to make it easy to understand, the technical solution of the embodiment of the present application is described in the example in conjunction with shown in Fig. 1 C, such as scheme Shown in 1C, given object set includes:Known class object and unknown class, it is known that class has original mark Label, unknown class do not have original tag;After being transmitted into row label to known class object and unknown class, it is known that There is class original tag and prediction label, unknown class only to have prediction label;Known class object is carried out Processing obtains the label Variation Matrix of known class object set;To known class object and unknown class carry out n times sampling, Combination obtains combined data set 1, combined data set 2 ..., combined data set n, to the label variation square of known class object set Battle array and combined data set 1 are trained, and obtain classification prediction model 1 and updated label Variation Matrix 1;To known class pair As the label Variation Matrix and combined data set 2 of collection are trained, classification prediction model 2 and updated label variation square are obtained Battle array 2;The label Variation Matrix and combined data set n of known class object set are trained, classification prediction model n and more is obtained Label Variation Matrix n after new.Finally, by the characteristic of any unknown class be input to above-mentioned classification prediction model and In updated label Variation Matrix, class prediction result is obtained.
As seen from the above-described embodiment, which can be by the semi-supervised learning in machine learning method (i.e. label transmits) It is combined with supervised learning (classify), is used for carrying out class prediction to unknown class, the embodiment of the present application is by label It transmits obtained prediction label to be effectively attached in supervised learning, and come to unknown class in such a way that multiple models are established in sampling Other object carries out class prediction, to achieve the purpose that improve the Stability and veracity of prediction result.
As shown in Fig. 2, Fig. 2 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application The flow chart of survey method, this method may comprise steps of:
In step 210, for given object set, obtain give in object set the characteristic of each object and Object relationship data;Wherein, which includes known class object and unknown class, and each known Class all has the original tag for indicating classification.
In a step 220, according to the characteristic of each object in given object set and object relationship data, mark is utilized Pass-algorithm is signed, the prediction label for giving each object in object set is obtained.
In step 230, according to the original tag and prediction label of known class object, the first known class object is obtained The label Variation Matrix of collection, wherein the first known class object set includes known class object in given object set Partly or entirely, which is used to indicate that original tag of all categories in the first known class object set to make a variation to pre- The probability of mark label.
In step 240, n times sampling is carried out to unknown class collection, obtains n sampling collection, will each samples respectively The information of collection and the information of known class object in given object set are combined, and obtain n combined data set;Wherein, should It includes the whole for giving unknown class in object set that unknown class, which is concentrated, and n is the default value not less than 1, and As n > 1, n sampling collection mutually disjoints.
In step 250, for each combined data set, using the noise classification of resistance to label algorithm, to combined data set and The label Variation Matrix of first known class object set is handled, and n classification prediction model and n updated labels are obtained Variation Matrix.
Step 110~step 150 in step 210~step 250 in the embodiment of the present application, with Figure 1A illustrated embodiments Similar, the embodiment of the present application repeats no more this, the content detail as per in Figure 1A illustrated embodiments.
In step 260, by any unknown class A in given object setiCharacteristic be input to n classification In a in prediction model, the more class probability vectors of a first kind are obtained;And by unknown class AiPrediction label It is input in b in n updated label Variation Matrixes, obtains the b more class probability vectors of the second class;Wherein a and b are equal For the default value no more than n.
Preferably, in the embodiment of the present application, a and b are n, at this point it is possible to make full use of n points generated in step 250 Class prediction model and n updated label Variation Matrixes.
For example, 3 classification prediction models and 3 updated label Variation Matrixes are generated in step 250, in this step In, by unknown class AiCharacteristic input 3 classification prediction models in, by unknown class AiPrediction label It is input in 3 updated label Variation Matrixes, obtains 3 the second classes of the more class probability vector sums of 3 first kind and classify more Probability vector
In step 270, it according to the more class probability vectors of the first kind obtained and the more class probability vectors of the second class, obtains To unknown class AiClass prediction result.
In one preferred embodiment, above-mentioned steps 207 may include:
Mean operation is carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class, obtains mesh Mark more class probability vectors;Classification corresponding to the more maximum labels of class probability vector median of target is determined as unknown classification Object AiClassification.
In addition it is also possible to by being carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class The mode of weighted sum obtains the more class probability vectors of target;The maximum label institute of the more class probability vector medians of target is right The classification answered is determined as unknown class AiClassification.
It should be noted that although describing the operation of the embodiment of the present application method with particular order in the accompanying drawings, this is simultaneously Undesired or hint must execute these operations according to the particular order, or have to carry out and operate ability shown in whole Realize desired result.On the contrary, the step of describing in flow chart, which can change, executes sequence.Additionally or alternatively, it can save Multiple steps are merged into a step and executed, and/or a step is decomposed into execution of multiple steps by slightly certain steps.
Corresponding with the embodiment of the prediction technique of aforementioned object classification, the embodiment of the present application also provides the pre- of object type Survey the embodiment of device.
As shown in figure 3, Fig. 3 is the prediction according to a kind of object type shown in one exemplary embodiment of the embodiment of the present application The block diagram of device, for carrying out class prediction to the unknown class given in object set, described device may include:
Collective data obtains module 310, for being directed to the given object set, obtains each in the given object set The characteristic and object relationship data of object;Wherein, the given object set includes known class object and not Know class, and each known class object all has the original tag for indicating classification;
In the embodiment of the present application, it includes multiple objects to give object set, and in practical applications, object can be to use Family, or event, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, object relationship data are referred to for describing in given object set between each object The data of incidence relation.For example, when object is user, object relationship data can be good between each user for describing The data of friendly relationship, specifically, can be based on network application (such as social networking application, trip application and Video Applications etc.) User's friend relation data;Alternatively, object relationship data may be to be closed for describing the red packet granting between each user The data of system, specifically, can be that the red packet based on network application provides relation data.When object is event, closed between object Coefficient evidence can be the event correlation data based on network application.
In the embodiment of the present application, when object is user, classification may include:The age bracket of user, the trip side of user Formula preference, trip period, the level of consumption of user or the consumption propensity etc. of user of user;When object is event, classification can Think that the probability that event occurs, the embodiment of the present application are not construed as limiting this.
In the embodiment of the present application, when object is user, the characteristic of object may include:The history trip ground of user The application program etc. installed in the terminal device of point information or user, wherein the history trip location information of user can be from It is obtained in the trip application of the user's registration, for the terminal device of Android system, the terminal device that can be used from the user It is middle to obtain mounted application program.
In the embodiment of the present application, it is known that class has original tag, unknown class not to have original tag.
For example, the friend relation data of the QQ friends (such as user B, user C and user D) of user A and user A, user A Age bracket it is known that the age bracket of the QQ friends of user A is unknown, according to the relationship of the QQ friends of the age bracket of user A and user A Data, the age bracket of the QQ friends of prediction user A, at this point, given object set includes:User A, user B, user C and use Tetra- objects of family D, object relationship data are the good of the QQ friends (such as user B, user C and user D) of user A and user A Friendly relation data, user A are known class object, and user B, user C and user D are unknown class, and classification is user Age bracket.
Prediction label obtains module 320, for obtaining the given object set that module 310 obtains according to the collective data In each object characteristic and the object relationship data, utilize label pass-algorithm, obtain the given object set In each object prediction label;
In the embodiment of the present application, using label pass-algorithm to determining the characteristic and object of each object in object set Between after relation data handled, giving known class object in object set has original tag and prediction label, this is given Unknown class only has prediction label in object set.
Label Variation Matrix obtains module 330, is used for original tag and prediction label according to the known class object, Obtain the label Variation Matrix of the first known class object set, wherein the first known class object set include it is described to Determine some or all of known class object in object set, the label Variation Matrix is for indicating first known class Probability of the original tag of all categories variation to prediction label in object set;
It should be noted that for the ease of understanding and describing, in the above example, citing Jie is only carried out with a small amount of object It continues, in practical applications, a large amount of object is generally comprised in given object set, in order to improve computational efficiency, the application is implemented In example, the first known class object set can be partitioned into from given object set according to certain percentage, i.e., this has been first It includes giving the part of known class object in object set rather than whole to know that class is concentrated, at this point, can specifically wrap It includes:
According to preset percentage, the first known class object set is partitioned into from given object set, wherein known to first Known class object sum * preset percentages in the number of objects that class is concentrated=given object set.
Such as preset percentage is 10%, it includes 1000 known class objects to give object set, then known to first Class concentration includes 100 objects.
In the embodiment of the present application, each object in the first known class object set has original tag and pre- mark simultaneously Label.When calculating the label Variation Matrix of the first known class object set, computational methods in the prior art may be used, first The original tag variation of each object in the first known class object set is calculated to the probability of prediction label, later, according to each object Original tag variation to prediction label probability, calculate the first known class object set label Variation Matrix.
In order to make it easy to understand, to predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, to label Variation Matrix It is illustrated, the matrix that label Variation Matrix is one 4 × 4.
Composite module 340 respectively will be each for the progress n times sampling of unknown class collection, obtaining n sampling collection The information of collection of sampling and the information of known class object in the given object set are combined, and obtain n combined data set; Wherein, the unknown class concentration includes the whole of unknown class in the given object set, and n is not less than 1 Default value, and as n > 1, n sampling collection mutually disjoints;
In the embodiment of the present application, the value of n may include:2,3,4 or 5 etc., the sampling concentration sampled is taken out two-by-two The object that sample is concentrated differs completely.
Training module 350, each combined data set for being obtained for the composite module 340, is made an uproar using resistance to label Sound sorting algorithm handles the label Variation Matrix of the combined data set and the first known class object set, obtains To n classification prediction model and n updated label Variation Matrixes;
In the embodiment of the present application, the noise classification of resistance to label algorithm has higher patience to the uncertainty of prediction label, Wherein, which can be that the more sorted logics for the noise of resistance to label return rmLR algorithms, in addition, for mark Label transmit the uncertainty of result, and the stability of prediction result is improved in such a way that multiple models are established in sampling.
In the embodiment of the present application, more sorted logics that any one of the relevant technologies noise of resistance to label may be used return RmLR algorithms carry out polytypic training to the label Variation Matrix of combined data set and the first known class object set, wherein Specific training process can be realized in Spark related applications.
Specific algorithm is as follows, the grader of output and logistic regression compatibility:Wherein, xq For the feature row vector of object q, wkFor grader;
Determine object function: Wherein, p (y=k | xn,wk) by a Softmax functionModeling.
LBFGS optimization object functions, fixed labels Variation Matrix are used later;Spark LBFGS need to provide Gradient It realizes, returns to gradient and target function value, it is as follows:
Class prediction module 360, for according to any unknown class A in the given object setiCharacteristic According to, the n classification prediction model and the n updated label Variation Matrixes, the unknown class A is obtainedi's Class prediction result.
By taking the prediction of passenger's age bracket as an example, the essential information collection of passenger is very limited, may only have in many cases The passenger (passenger known to age bracket) of cell-phone number, unknown classification is seldom for total passenger, by using this Apply for the method that embodiment provides, passenger can be made full use of to share red packet and the relationship network data that generates, to effectively expand The quantity for filling training set, to achieve the effect that promote predictablity rate.
As seen from the above-described embodiment, which can be by the semi-supervised learning in machine learning method (i.e. label transmits) It is combined with supervised learning (classify), is used for carrying out class prediction to unknown class, the embodiment of the present application is by label It transmits obtained prediction label to be effectively attached in supervised learning, and come to unknown class in such a way that multiple models are established in sampling Other object carries out class prediction, to achieve the purpose that improve the Stability and veracity of prediction result.
As shown in figure 4, Fig. 4 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application The block diagram for surveying device, on the basis of which can be with embodiment shown in Fig. 3, the prediction label obtains module 320, can be with Including:
Vector characterization submodule 321, for the characteristic expression of each object in the given object set to be characterized Row vector;
In the embodiment of the present application, feature row vector is characterized the vectorization of data, each element pair in feature row vector A feature for elephant is coped with, the value of each element is 1 or 0, and it is 0 representative pair that value represents object to have this feature, value for 1 As this no feature.
Similarity calculation submodule 322, for characterizing the feature row vector and institute that submodule 321 obtains according to the vector It states object relationship data, calculates that all in the given object set there are the cosine of two feature row vectors of direct relation Similarity;
In the embodiment of the present application, it can determine to determine all in the presence of direct in object set according to object relationship data The object of relationship, by taking red packet granting relationship as an example, direct relation refers to directly providing the relationship of red packet.
Label transmits submodule 323, the cosine similarity for being calculated according to the similarity calculation submodule 322 And the object relationship data, by the original tag of known class object in the given object set, pass to it is described to Determine each object in object set, obtains the prediction label of each object in the given object set.
In scene of going on a journey, for the age bracket prediction of user, step 121~step 123 to be described.
It is known that object relationship data, which are red packet, provides relationship, i.e. the structure that gives bonus between two two users includes multiple The relational network of user, the relational network include:Tetra- passenger A, passenger B, passenger C and passenger D passengers, passenger A are provided with Age bracket, passenger B, passenger C and passenger D do not provide age bracket.
Characteristic is:Passenger A, passenger B, tetra- passengers of passenger C and passenger D history go on a journey location information, with Beijing For, 100 trip places can be marked off, when being characterized row vector to characteristic expression, if passenger went this to go out Row place, then the corresponding value in the trip place is 1, if not going to the trip place, the corresponding value in the trip place is 0, For example, the feature row vector of passenger A is [0,0,1 ... 1,0], this feature row vector includes 100 elements, and each element corresponds to One trip place, the value of element represent passenger A for 1 and went to the corresponding trip place of the element, and the value of element is 0 representative Passenger A did not go to the corresponding trip place of the element.
To predict 4 age brackets:After 60, after 70, it is after 80s and after 90s for, the age bracket of passenger A be known as it is after 80s, at this time The label probability matrix of passenger A is [0,0,1,0], wherein the age bracket that first element in label probability matrix is passenger A It is the probability (i.e. probability is 0) after 60, the age bracket that second element is passenger A is the probability (i.e. probability is 0) after 70, third A element is that the age bracket of passenger A is probability after 80s (i.e. probability is 1), and the age bracket that the 4th element is passenger A is after 90s Probability (i.e. probability be 0).
For example, the feature row vector of passenger A be [0,0,1 ... 1,0], similarly, B, C and D also correspond to a feature row to Amount, to calculating the cohesions for the passenger that two have direct red packet to provide relationship, wherein the calculation formula of cohesion is:Two spies Levy the cosine similarity of row vector.
For example, there are direct red packets to provide relationship by passenger A and passenger B, by the feature row vector and the passenger that calculate passenger A The cosine similarity of the feature row vector of B, obtain cosine similarity be 0.8, at this point, the label probability matrix of passenger B be [0,0, 0.8,0], the prediction label of passenger B is:It is 0 that passenger B, which belongs to the probability after 60, and it is 0 to belong to the probability after 70, belongs to after 80s Probability is 0.8, and the probability for belonging to after 90s is 0, similarly, can calculate the prediction label of passenger C and passenger D.
It should be noted that any object in corresponding given object set, no matter the object is either with or without original tag, all It to be transmitted into row label, and obtain corresponding prediction label.
As shown in figure 5, Fig. 5 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application The block diagram of device is surveyed, which can be on the basis of Fig. 3 or embodiment illustrated in fig. 4, in order to make full use of unknown classification pair As concentrating unknown class, while the service efficiency of computing resource is taken into account, the composite module 340 may include:
Submodule 341 of sampling extracts 30% object, obtains every time for carrying out 3 sampling to unknown class collection S To 3 sampling collection M1, M2 and M3;
Submodule 342 is combined, collection M1 is combined with the second known class object set D for that will sample, and obtains combined data set F1, wherein F1 is { M1, D }, and the second known class object set D includes known class pair in the given object set The whole of elephant;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 For { M2, D };
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 For { M3, D }.
As shown in fig. 6, Fig. 6 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application The block diagram of device is surveyed, which can be shown in Fig. 3~Fig. 5 on the basis of any embodiment, the class prediction module 360, may include:
More class probability vectors obtain submodule 361, are used for any unknown class in the given object set AiCharacteristic be input in a in n classification prediction model, obtain the more class probability vectors of a first kind;And By the unknown class AiPrediction label be input in b in n updated label Variation Matrixes, obtain b The more class probability vectors of second class;Wherein a and b is the default value no more than n;
Preferably, in the embodiment of the present application, a and b are n, at this point it is possible to make full use of n points generated in step 250 Class prediction model and n updated label Variation Matrixes.
Class prediction submodule 362, first for being obtained according to more class probability vectors acquisition submodules 361 The more class probability vectors of class and the more class probability vectors of the second class, obtain the unknown class AiClass prediction result.
As shown in fig. 7, Fig. 7 is according to the pre- of another object type shown in one exemplary embodiment of the embodiment of the present application The block diagram of device is surveyed, on the basis of which can be with embodiment shown in Fig. 6, the class prediction submodule 362 can wrap It includes:
More class probability vector obtaining units 3621, for the more class probability vectors of a first kind and the b The more class probability vectors of second class carry out mean operation, obtain the more class probability vectors of target;
Class prediction unit 3622 is used for the class corresponding to the more maximum labels of class probability vector median of the target It is not determined as the unknown class AiClassification.
In addition it is also possible to by being carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class The mode of weighted sum obtains the more class probability vectors of target;The maximum label institute of the more class probability vector medians of target is right The classification answered is determined as unknown class AiClassification.
The function of modules and the realization process of effect specifically refer to and correspond to step in the above method in above-mentioned apparatus Realization process, details are not described herein.
For device embodiments, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separating component The unit of explanation may or may not be physically separated, and the component shown as unit can be or can also It is not physical unit, you can be located at a place, or may be distributed over multiple network units.It can be according to actual It needs that some or all of module therein is selected to realize the purpose of the embodiment of the present application scheme.Those of ordinary skill in the art Without creative efforts, you can to understand and implement.
The embodiment of the present application also provides a kind of computer storage media, have program stored therein instruction in the storage medium, Described program instructs:For given object set, obtain in the given object set characteristic of each object and Object relationship data;Wherein, the given object set includes known class object and unknown class, and it is each Know that class all has the original tag for indicating classification;According to the characteristic of each object in the given object set And the object relationship data obtain the prediction label of each object in the given object set using label pass-algorithm; According to the original tag and prediction label of the known class object, the label variation square of the first known class object set is obtained Battle array, wherein the first known class object set includes the part of known class object or complete in the given object set Portion, the label Variation Matrix are used to indicate that original tag of all categories in the first known class object set to make a variation to prediction The probability of label;To unknown class collection carry out n times sampling, obtain n sampling collection, respectively by it is each sampling collection information and The information of known class object is combined in the given object set, obtains n combined data set;Wherein, described unknown Class concentration includes the whole of unknown class in the given object set, and n is the default value not less than 1, and As n > 1, n sampling collection mutually disjoints;For each combined data set, using the noise classification of resistance to label algorithm, to described The label Variation Matrix of combined data set and the first known class object set is handled, and n classification prediction model is obtained And n updated label Variation Matrixes;According to any unknown class A in the given object setiCharacteristic, The n classification prediction model and the n updated label Variation Matrixes, obtain the unknown class AiClassification Prediction result.
The embodiment of the present application can be used in one or more wherein include program code storage medium it is (including but unlimited In magnetic disk storage, CD-ROM, optical memory etc.) on the form of computer program product implemented.Computer can use storage Medium includes permanent and non-permanent, removable and non-removable media, can be accomplished by any method or technique information Storage.Information can be computer-readable instruction, data structure, the module of program or other data.The storage medium of computer Example include but not limited to:Phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus or any other non-biography Defeated medium can be used for storage and can be accessed by a computing device information.
As shown in figure 8, Fig. 8 is that the embodiment of the present application is a kind of for object type shown according to an exemplary embodiment One structural schematic diagram of prediction meanss.For example, device 800 may be provided as a server.With reference to Fig. 8, device 800 includes place Component 822 is managed, further comprises one or more processors, and by the memory resource representated by memory 832, be used for Storage can be by the instruction of the execution of processing component 822, such as application program.The application program stored in memory 832 can wrap Include it is one or more each correspond to one group of instruction module.Refer in addition, processing component 822 is configured as executing It enables, to execute class prediction method provided by the embodiments of the present application, this method includes:For given object set, described in acquisition The characteristic and object relationship data of each object in given object set;Wherein, the given object set includes Known class object and unknown class, and each known class object all has the original tag for indicating classification;Root It is obtained using label pass-algorithm according to the characteristic and the object relationship data of each object in the given object set Obtain the prediction label of each object in the given object set;According to the original tag of the known class object and pre- mark Label obtain the label Variation Matrix of the first known class object set, wherein the first known class object set includes described Some or all of known class object in given object set, the label Variation Matrix is for indicating first known class Probability of the original tag of all categories variation to prediction label in other object set;N times sampling is carried out to unknown class collection, is obtained To n sampling collection, the information of each sampling collection and the information of known class object in the given object set are carried out respectively Combination, obtains n combined data set;Wherein, the unknown class concentration includes unknown class in the given object set The whole of other object, n are the default value not less than 1, and as n > 1, and n sampling collection mutually disjoints;For each combination Data set, using the noise classification of resistance to label algorithm, to the label of the combined data set and the first known class object set Variation Matrix is handled, and n classification prediction model and n updated label Variation Matrixes are obtained;According to described given pair As any unknown class A in setiCharacteristic, the n classification prediction model and the n updated labels Variation Matrix obtains the unknown class AiClass prediction result.
Device 800 can also include the power management that a power supply module 826 is configured as executive device 800, and one has Line or radio network interface 850 are configured as device 800 being connected to network and input and output (I/O) interface 858.Dress Setting 800 can operate based on the operating system for being stored in memory 832, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 832 of instruction, above-metioned instruction can be executed by the processing component 822 of device 800 to complete the embodiment of the present application The prediction technique of the above-mentioned object type provided.For example, the non-transitorycomputer readable storage medium can be ROM, with Machine accesses memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Those skilled in the art will readily occur to the application implementation after considering specification and putting into practice disclosure disclosed herein Other embodiments of example.Any modification, purposes or the adaptability that the embodiment of the present application is intended to cover the embodiment of the present application become Change, these variations, uses, or adaptations follow the general principle of the application and include that the embodiment of the present application is undisclosed Common knowledge or conventional techniques in the art.The description and examples are only to be considered as illustrative, the application The true scope and spirit of embodiment are indicated by the following claims.
It should be understood that the embodiment of the present application is not limited to the accurate knot for being described above and being shown in the accompanying drawings Structure, and various modifications and changes may be made without departing from the scope thereof.The range of the embodiment of the present application is only by appended right It is required that limit.

Claims (17)

1. a kind of prediction technique of object type, for carrying out class prediction to the unknown class given in object set, It is characterized in that, the method includes:
For the given object set, obtains in the given object set and closed between the characteristic and object of each object Coefficient evidence;Wherein, the given object set includes known class object and unknown class, and each known class pair As all having the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, is transmitted and calculated using label Method obtains the prediction label of each object in the given object set;
According to the original tag and prediction label of the known class object, the label variation of the first known class object set is obtained Matrix, wherein the first known class object set include in the given object set part of known class object or All, the label Variation Matrix is used to indicate that original tag of all categories in the first known class object set to make a variation to pre- The probability of mark label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively gives the information of each sampling collection with described The information for determining known class object in object set is combined, and obtains n combined data set;Wherein, the unknown classification pair As concentrate include unknown class in the given object set whole, n is default value not less than 1, and as n > 1 When, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to known to the combined data set and described first The label Variation Matrix of class collection is handled, and n classification prediction model and n updated label variation squares are obtained Battle array;
According to any unknown class A in the given object setiCharacteristic, the n classification prediction model and institute N updated label Variation Matrixes are stated, the unknown class A is obtainediClass prediction result.
2. according to the method described in claim 1, it is characterized in that, the spy according to each object in the given object set Data and the object relationship data are levied, using label pass-algorithm, each object is pre- in the acquisition given object set Mark label, including:
The characteristic expression of each object in the given object set is characterized row vector;
According to the feature row vector and the object relationship data, calculate all in the presence of direct in the given object set The cosine similarity of two feature row vectors of relationship;
According to the cosine similarity and the object relationship data, by known class object in the given object set Original tag passes to each object in the given object set, obtains the prediction of each object in the given object set Label.
3. according to the method described in claim 1, it is characterized in that, it is described to unknown class collection carry out n times sampling, obtain The information of each sampling collection and the information of known class object in the given object set are carried out group by n sampling collection respectively It closes, obtains n combined data set, including:
3 sampling are carried out to unknown class collection S, extract 30% object every time, obtain 3 sampling collection M1, M2 and M3;
Sampling collection M1 is combined with the second known class object set D, obtains combined data set F1, wherein F1 is { M1, D }, described Second known class object set D includes the whole of known class object in the given object set;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 is {M2,D};
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 is {M3,D}。
4. according to the method described in claim 1, it is characterized in that, described according to any unknown class in the given object set Other object AiCharacteristic, the n classification prediction model and the n updated label Variation Matrixes, described in acquisition Unknown class AiClass prediction as a result, including:
By any unknown class A in the given object setiCharacteristic be input to n classification prediction model in a In a, the more class probability vectors of a first kind are obtained;And by the unknown class AiPrediction label be input to n In b in updated label Variation Matrix, the b more class probability vectors of the second class are obtained;Wherein a and b is no more than n Default value;
According to the more class probability vectors of the first kind obtained and the more class probability vectors of the second class, the unknown classification pair is obtained As AiClass prediction result.
5. according to the method described in claim 4, it is characterized in that, described according to the more class probability vectors of the first kind obtained And the second more class probability vector of class, obtain the unknown class AiClass prediction as a result, including:
Mean operation is carried out to the more class probability vectors of a first kind and the b more class probability vectors of the second class, is obtained To the more class probability vectors of target;
Classification corresponding to the more maximum labels of class probability vector median of the target is determined as the unknown class Ai Classification.
6. according to the method described in claim 1, it is characterized in that, the object is user;The classification includes:The year of user Age section, the trip mode preference of user, trip period, the level of consumption of user or the consumption propensity of user of user.
7. according to the method described in claim 1, it is characterized in that, the object is user;The characteristic includes:User History trip location information or user terminal device in application program installation situation.
8. according to the method described in claim 1, it is characterized in that, the object is user;The object relationship data packet It includes:The data of relationship, or the number for describing the friend relation between each user are provided for describing the red packet between each user According to.
9. a kind of prediction meanss of object type, for carrying out class prediction to the unknown class given in object set, It is characterized in that, described device includes:
Collective data obtains module, for being directed to the given object set, obtains each object in the given object set Characteristic and object relationship data;Wherein, the given object set includes known class object and unknown classification Object, and each known class object all has the original tag for indicating classification;
Prediction label obtains module, for obtaining each object in the given object set that module obtains according to the collective data Characteristic and the object relationship data obtain each object in the given object set using label pass-algorithm Prediction label;
Label Variation Matrix obtains module, for according to the original tag and prediction label of the known class object, obtaining the The label Variation Matrix of one known class object set, wherein the first known class object set includes the given object Some or all of known class object in set, the label Variation Matrix is for indicating the first known class object set In original tag variation of all categories to prediction label probability;
Composite module obtains n sampling collection, respectively by each sampling collection for carrying out n times sampling to unknown class collection Information and the information of known class object in the given object set are combined, and obtain n combined data set;Wherein, institute The whole that unknown class concentration includes unknown class in the given object set is stated, n is the present count not less than 1 Value, and as n > 1, n sampling collection mutually disjoints;
Training module, each combined data set for being obtained for the composite module, using the noise classification of resistance to label algorithm, The label Variation Matrix of the combined data set and the first known class object set is handled, it is pre- to obtain n classification Survey model and n updated label Variation Matrixes;
Class prediction module, for according to any unknown class A in the given object setiCharacteristic, the n A classification prediction model and the n updated label Variation Matrixes, obtain the unknown class AiClass prediction As a result.
10. device according to claim 9, which is characterized in that the prediction label obtains module, including:
Vector characterization submodule, for the characteristic expression of each object in the given object set to be characterized row vector;
Similarity calculation submodule, for being closed between characterizing the feature row vector and the object that submodule obtains according to the vector Coefficient evidence, calculates that all in the given object set there are the cosine similarities of two feature row vectors of direct relation;
Label transmits submodule, the cosine similarity for being calculated according to the similarity calculation submodule and the object Between relation data the original tag of known class object in the given object set is passed into the given object set In each object, obtain the prediction label of each object in the given object set.
11. device according to claim 9, which is characterized in that the composite module, including:
Submodule of sampling extracts 30% object, obtains 3 pumpings every time for carrying out 3 sampling to unknown class collection S Sample collection M1, M2 and M3;
Submodule is combined, collection M1 is combined with the second known class object set D for that will sample, and obtains combined data set F1, wherein F1 is { M1, D }, and the second known class object set D includes the whole of known class object in the given object set;
And combine sampling collection M2 with the second known class object set D, obtain combined data set F2, wherein F2 is {M2,D};
And combine sampling collection M3 with the second known class object set D, obtain combined data set F3, wherein F3 is {M3,D}。
12. device according to claim 9, which is characterized in that the class prediction module, including:
More class probability vectors obtain submodule, are used for any unknown class A in the given object setiFeature Data are input in a in n classification prediction model, obtain the more class probability vectors of a first kind;And by described in not Know class AiPrediction label be input in b in n updated label Variation Matrixes, it is more to obtain b the second classes Class probability vector;Wherein a and b is the default value no more than n;
Class prediction submodule is classified generally more for obtaining the first kind that submodule is obtained according to more class probability vectors Rate vector and the more class probability vectors of the second class, obtain the unknown class AiClass prediction result.
13. device according to claim 12, which is characterized in that the class prediction submodule, including:
More class probability vector obtaining units, for more to the more class probability vectors of a first kind and the b the second classes Class probability vector carries out mean operation, obtains the more class probability vectors of target;
Class prediction unit, for the classification corresponding to the more maximum labels of class probability vector median of the target to be determined as The unknown class AiClassification.
14. device according to claim 9, which is characterized in that the object is user;The classification includes:User's Age bracket, the trip mode preference of user, trip period, the level of consumption of user or the consumption propensity of user of user.
15. device according to claim 9, which is characterized in that the object is user;The characteristic includes:With The installation situation of application program in the history trip location information at family or the terminal device of user.
16. device according to claim 9, which is characterized in that the object is user;The object relationship data packet It includes:The data of relationship, or the number for describing the friend relation between each user are provided for describing the red packet between each user According to.
17. a kind of computer storage media, which is characterized in that have program stored therein instruction in the storage medium, and described program refers to Order includes:
For given object set, the characteristic and object relationship number of each object in the given object set are obtained According to;Wherein, the given object set includes known class object and unknown class, and each known class object is equal With the original tag for indicating classification;
According to the characteristic of each object in the given object set and the object relationship data, is transmitted and calculated using label Method obtains the prediction label of each object in the given object set;
According to the original tag and prediction label of the known class object, the label variation of the first known class object set is obtained Matrix, wherein the first known class object set include in the given object set part of known class object or All, the label Variation Matrix is used to indicate that original tag of all categories in the first known class object set to make a variation to pre- The probability of mark label;
N times sampling is carried out to unknown class collection, obtains n sampling collection, respectively gives the information of each sampling collection with described The information for determining known class object in object set is combined, and obtains n combined data set;Wherein, the unknown classification pair As concentrate include unknown class in the given object set whole, n is default value not less than 1, and as n > 1 When, n sampling collection mutually disjoints;
For each combined data set, using the noise classification of resistance to label algorithm, to known to the combined data set and described first The label Variation Matrix of class collection is handled, and n classification prediction model and n updated label variation squares are obtained Battle array;
According to any unknown class A in the given object setiCharacteristic, the n classification prediction model and institute N updated label Variation Matrixes are stated, the unknown class A is obtainediClass prediction result.
CN201710179031.1A 2017-03-23 2017-03-23 Object class prediction method and device Active CN108629358B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201710179031.1A CN108629358B (en) 2017-03-23 2017-03-23 Object class prediction method and device
CN201880020197.1A CN110447039A (en) 2017-03-23 2018-03-16 The system and method for predicting object type
PCT/CN2018/079348 WO2018171531A1 (en) 2017-03-23 2018-03-16 System and method for predicting classification for object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710179031.1A CN108629358B (en) 2017-03-23 2017-03-23 Object class prediction method and device

Publications (2)

Publication Number Publication Date
CN108629358A true CN108629358A (en) 2018-10-09
CN108629358B CN108629358B (en) 2020-12-25

Family

ID=63585880

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710179031.1A Active CN108629358B (en) 2017-03-23 2017-03-23 Object class prediction method and device
CN201880020197.1A Pending CN110447039A (en) 2017-03-23 2018-03-16 The system and method for predicting object type

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201880020197.1A Pending CN110447039A (en) 2017-03-23 2018-03-16 The system and method for predicting object type

Country Status (2)

Country Link
CN (2) CN108629358B (en)
WO (1) WO2018171531A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060247A (en) * 2019-04-18 2019-07-26 深圳市深视创新科技有限公司 Cope with the robust deep neural network learning method of sample marking error
CN111611429A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Data annotation method and device, electronic equipment and computer readable storage medium
CN113811915A (en) * 2019-02-26 2021-12-17 北京嘀嘀无限科技发展有限公司 Unified order serving and fleet management for online shared travel platform

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11645693B1 (en) * 2020-02-28 2023-05-09 Amazon Technologies, Inc. Complementary consumer item selection
US11526700B2 (en) 2020-06-29 2022-12-13 International Business Machines Corporation Annotating unlabeled data using classifier error rates
CN112132178B (en) * 2020-08-19 2023-10-13 深圳云天励飞技术股份有限公司 Object classification method, device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714139A (en) * 2013-12-20 2014-04-09 华南理工大学 Parallel data mining method for identifying a mass of mobile client bases
CN104572733A (en) * 2013-10-22 2015-04-29 腾讯科技(深圳)有限公司 User interest tag classification method and device
CN105069129A (en) * 2015-06-24 2015-11-18 合肥工业大学 Self-adaptive multi-label prediction method
CN105184326A (en) * 2015-09-30 2015-12-23 广东工业大学 Active learning multi-label social network data analysis method based on graph data
CN105446988A (en) * 2014-06-30 2016-03-30 华为技术有限公司 Classification predicting method and device
CN105608471A (en) * 2015-12-28 2016-05-25 苏州大学 Robust transductive label estimation and data classification method and system
US20160379133A1 (en) * 2015-06-23 2016-12-29 Microsoft Technology Licensing, Llc Reasoning classification based on feature pertubation
CN106452809A (en) * 2015-08-04 2017-02-22 北京奇虎科技有限公司 Data processing method and device
CN106446191A (en) * 2016-09-30 2017-02-22 浙江工业大学 Logistic regression based multi-feature network popular tag prediction method
CN106504029A (en) * 2016-11-08 2017-03-15 山东大学 A kind of gas station's Method for Sales Forecast method based on customer group's behavior analysiss

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090092299A1 (en) * 2007-10-03 2009-04-09 Siemens Medical Solutions Usa, Inc. System and Method for Joint Classification Using Feature Space Cluster Labels
US8386574B2 (en) * 2009-10-29 2013-02-26 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN103605990B (en) * 2013-10-23 2017-02-08 江苏大学 Integrated multi-classifier fusion classification method and integrated multi-classifier fusion classification system based on graph clustering label propagation
CN104750875B (en) * 2015-04-23 2018-03-02 苏州大学 A kind of machine error data classification method and system
CN105930411A (en) * 2016-04-18 2016-09-07 苏州大学 Classifier training method, classifier and sentiment classification system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572733A (en) * 2013-10-22 2015-04-29 腾讯科技(深圳)有限公司 User interest tag classification method and device
CN103714139A (en) * 2013-12-20 2014-04-09 华南理工大学 Parallel data mining method for identifying a mass of mobile client bases
CN105446988A (en) * 2014-06-30 2016-03-30 华为技术有限公司 Classification predicting method and device
US20160379133A1 (en) * 2015-06-23 2016-12-29 Microsoft Technology Licensing, Llc Reasoning classification based on feature pertubation
CN105069129A (en) * 2015-06-24 2015-11-18 合肥工业大学 Self-adaptive multi-label prediction method
CN106452809A (en) * 2015-08-04 2017-02-22 北京奇虎科技有限公司 Data processing method and device
CN105184326A (en) * 2015-09-30 2015-12-23 广东工业大学 Active learning multi-label social network data analysis method based on graph data
CN105608471A (en) * 2015-12-28 2016-05-25 苏州大学 Robust transductive label estimation and data classification method and system
CN106446191A (en) * 2016-09-30 2017-02-22 浙江工业大学 Logistic regression based multi-feature network popular tag prediction method
CN106504029A (en) * 2016-11-08 2017-03-15 山东大学 A kind of gas station's Method for Sales Forecast method based on customer group's behavior analysiss

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUI-SHENG CHOU: "comparison of multilabel classification models to forecast project dispute resolutions", 《EXPERT SYSTEMS WITH APPLICATIONS》 *
刘列 等: "社交网络用户标签预测研究", 《中文信息学报》 *
辛霆麟: "基于标签传播的链路预测算法研究与应用", 《中国优秀硕士学位论文全文数据库-基础科学辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611429A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Data annotation method and device, electronic equipment and computer readable storage medium
CN111611429B (en) * 2019-02-25 2023-05-12 北京嘀嘀无限科技发展有限公司 Data labeling method, device, electronic equipment and computer readable storage medium
CN113811915A (en) * 2019-02-26 2021-12-17 北京嘀嘀无限科技发展有限公司 Unified order serving and fleet management for online shared travel platform
CN113811915B (en) * 2019-02-26 2024-05-31 北京嘀嘀无限科技发展有限公司 Unified order dispatch and fleet management for online shared travel platform
CN110060247A (en) * 2019-04-18 2019-07-26 深圳市深视创新科技有限公司 Cope with the robust deep neural network learning method of sample marking error

Also Published As

Publication number Publication date
WO2018171531A1 (en) 2018-09-27
CN110447039A (en) 2019-11-12
CN108629358B (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN108629358A (en) The prediction technique and device of object type
Han et al. Joint air quality and weather prediction based on multi-adversarial spatiotemporal networks
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
CN105005589B (en) A kind of method and apparatus of text classification
CN110163647B (en) Data processing method and device
CN107862022B (en) Culture resource recommendation system
Lee et al. When twitter meets foursquare: tweet location prediction using foursquare
CN108090216B (en) Label prediction method, device and storage medium
Bian et al. Predicting trending messages and diffusion participants in microblogging network
US9286379B2 (en) Document quality measurement
CN112749330B (en) Information pushing method, device, computer equipment and storage medium
Ruan et al. GADM: Manual fake review detection for O2O commercial platforms
CN110990718A (en) Social network model building module of company image improving system
CN109462578A (en) Threat intelligence use and propagation method based on statistical learning
CN107392311A (en) The method and apparatus of sequence cutting
CN103617146B (en) A kind of machine learning method and device based on hardware resource consumption
Demertzis et al. A machine hearing framework for real-time streaming analytics using Lambda architecture
CN115952343A (en) Social robot detection method based on multi-relation graph convolutional network
Ozdikis et al. Spatial statistics of term co-occurrences for location prediction of tweets
Cheng et al. ISC: An iterative social based classifier for adult account detection on twitter
CN110909258A (en) Information recommendation method, device, equipment and storage medium
Lu et al. Predicting viral news events in online media
Kotzias et al. Addressing the Sparsity of Location Information on Twitter.
Wang et al. Abnormal trajectory detection based on geospatial consistent modeling
CN110008975B (en) Social network water army detection method based on immune hazard theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant