CN110020121A - Software crowdsourcing item recommendation method and system based on transfer learning - Google Patents
Software crowdsourcing item recommendation method and system based on transfer learning Download PDFInfo
- Publication number
- CN110020121A CN110020121A CN201710959395.1A CN201710959395A CN110020121A CN 110020121 A CN110020121 A CN 110020121A CN 201710959395 A CN201710959395 A CN 201710959395A CN 110020121 A CN110020121 A CN 110020121A
- Authority
- CN
- China
- Prior art keywords
- feature
- project
- source domain
- data
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Marketing (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of software crowdsourcing item recommendation method and system based on transfer learning, collecting first needs the data for the software crowdsourcing platform recommended as target numeric field data, the data of other software crowdsourcing platforms are as source domain data, the project data being collected simultaneously on software crowdsourcing platform and developer's data;Then the feature vector for establishing developer and project respectively is that developer and project model respectively, respectively constitutes characteristic data set with this feature vector;Again based on the relativeness between target domain characterization space and source domain feature space, new mappings characteristics space is established with Feature Mapping, by the maps feature vectors of source domain and aiming field into new feature space, realizes the feature alignment of aiming field and source domain;Finally using the developer's feature vector and item feature vector in mappings characteristics space, calculate separately the developer of aiming field and source domain and the similarity of project, application example migrates classification algorithm training recommender system model, and applies recommender system model recommended project.
Description
Technical field
The present invention relates to a kind of technology in software crowdsourcing field, specifically a kind of software crowdsourcing based on transfer learning
Item recommendation method and system.
Background technique
In recent years, with the development of software crowdsourcing industry, increasing user is added in software crowdsourcing platform and sends out
Cloth demand and job-hunting.The basic procedure of software crowdsourcing platform is: 1) project is published to crowdsourcing platform by project publisher,
And provide corresponding remuneration;2) developer of crowdsourcing platform may browse through project, if they are interested can to register this
Project;3) project publisher selects appropriate personnel to develop the project from numerous registration personnel;4) developer has succeeded
At project and obtain corresponding remuneration.However in recommender system, serious platform cold start-up problem, i.e., new building are always existed
Platform in user and project data accumulation it is on the low side.Being limited with the training and prediction of event recommender system by data scarcity can not
Play function.
Cold start-up problem is an extremely important and common problem.In the problem of specific to recommender system, mainly may be used
To be refined as item cold start-up problem, user is cold-started problem and platform cold start-up problem.One new item arrives, not
It is used by any user, which results in that it can not be recommended any user, this is item cold start-up problem;Similarly, one it is new
When user arrives, any item because the user did not score can not push away the user by the method for collaborative filtering
It recommends, this is that user is cold-started problem;Lack accumulative synergistic data on recommender system platform, data scale is unable to reach building model
Required scale, proposed algorithm model can not train, this is platform cold start-up problem.
Summary of the invention
The present invention can not solve platform cold start-up for the prior art, knowledge compatible degree judges too simple, item class
The defects of not fine enough is not defined, a kind of software crowdsourcing item recommendation method and system based on transfer learning is proposed, significant
While improving the accuracy rate recommended, using the data on other crowdsourcing platforms, the source platform of auxiliary data scarcity is pushed away
The training for recommending algorithm solves the problems, such as the platform cold start-up of recommender system, improves the accuracy of project recommendation.
The present invention is achieved by the following technical solutions:
The present invention relates to a kind of software crowdsourcing item recommendation method based on transfer learning, collect first need to recommend it is soft
The data of part crowdsourcing platform are collected simultaneously soft as target numeric field data, the data of other software crowdsourcing platforms as source domain data
Project data and developer's data on part crowdsourcing platform;Then the feature vector for establishing developer and project respectively is to open
Hair personnel and project model respectively, respectively constitute characteristic data set with this feature vector;It is based on target domain characterization space and source again
Relativeness between characteristic of field space establishes new mappings characteristics space with Feature Mapping, by the feature of source domain and aiming field
DUAL PROBLEMS OF VECTOR MAPPING realizes the feature alignment of aiming field and source domain into new feature space;Finally using in mappings characteristics space
Developer's feature vector and item feature vector calculate separately the developer of aiming field and source domain and the similarity of project,
Application example migrates classification algorithm training recommender system model, and applies recommender system model recommended project.
The project data includes the label of project and the description of project;Developer's data include related with personnel
Project number, including personnel's registration, acceptance of the bid, the project data delivered.
The developer of the aiming field and project are under exploitation characteristic of field space;Correspondingly, the exploitation of source domain
Personnel and project are under source domain feature space;
The modeling, specifically includes the following steps:
1) BOW vector is established to the label of each of aiming field and source domain project respectively.The BOW of all source domain projects to
Amount composition indicates the TM of the label substance of all items on source domain platforms,tMatrix;The BOW vector of all aiming field projects forms
Indicate the TM of the label substance of all items on source domain platformt,tMatrix;
2) the TF-IDF value that the word w in each item description is calculated using existing TF-IDF method, utilizes all words
The description vectors of the TF-IDF value composition project of language.The description vectors composition of all source domain projects indicates all items on source domain platform
The KM of purpose description contents,tMatrix;All items retouches on the description vectors composition expression source domain platform of all aiming field projects
State the KM of contentt,tMatrix;
3) the exposure Pop (j) for establishing each project j is calculated;
4) the label vector matrix TM of source domain all staff on board is establisheds,uWith keyword vector matrix K Ms,u;Establish target
The label vector matrix TM of domain all staff on boardt,uWith keyword vector matrix K Mt,u。
The TF value formula isIDF value formula isTF-IDF value is public
Formula is tfidfw,d=tfw,d*idfw, in which: nw,dIt is the number that word w occurs in item description d, | { j:w ∈ dj| it indicates
The quantity of item description comprising word w.
The exposure Pop (j)=α (ts,tc)*Popr(j)+β(ts,tc), in which: Popr(j)=Sum (U:,j), α
(ts,tc)、β(ts,tc) it is the two time weighting factors, t is setsFor the issuing time of project,β(ts,
tc)=ln (tc-ts+1)。
The label vector matrixKeyword
Vector matrixWherein: λA, λB, λFRespectively register
Weight, acceptance of the bid weight and completion weight;Au、BuAnd FuRespectively indicate the project set that staff u registered.
Described establishes new mappings characteristics space, specifically includes the following steps:
1) important feature and insignificant feature of source domain and aiming field are merged into important feature collection F`t,fWith insignificant spy
Collect F`k,f, from important feature set F`t,fMiddle chosen axis feature set Pt, in insignificant feature set F`k,fChosen axis feature set Pk,
Obtain axis feature set P;
2) cluster is carried out using k-means clustering algorithm and obtain m cluster, using cluster centre as final axis feature;
3) weighted feature collection D is calculated:=KM:,t+TM:,t* W, in which: TM:,tFor the TM obtained in modeling processs,tOr
TMt,t。KM:,tThe KM obtained in modeling processs,tOr KMt,t, W is weight;
4) calculated by training linear classifier the correlation between each axis feature p and every other word w come into
Row axis Feature Mapping obtains weight matrix W=[w`1]…[w`m];
5) to weight matrix W=[w`1]…[w`m] singular value decomposition calculate the correlation of axis feature with non-axis feature.
The similarity obtains unified vector by the way that label vector matrix TM to be added with keyword vector matrix K M
DM, to establish similarity matrix SM;Positive and negative example is constructed again to be achieved, specifically: positive example selection has registration relationship project t, people
Member u is constructed.And the building for negative example, need to choose some calculating similarity p from the project that personnel did not register,
The item design personnel project that the present invention preferentially uses the exposure for relationship of not registering with personnel high is constituted to similarity is calculated
Negative example.This method is right compared to the negative example of random project personnel, and the higher negative example of exposure not find item due to personnel
Mesh is reduced without the probability of registration, more representative of the negative interest orientation of personnel.Therefore first according to the exposure of calculated project t
Then to the entry sorting in project set T personnel and project is examined successively whether according to this item sequence in luminosity Pop (t)
There is registration relationship, if so, then continuing checking next;If it is not, constructing negative example using current project and personnel.;
The training, using TrAdaBoost algorithm training recommender system model.
The recommender system model application logistic regression algorithm is recommended.
The present invention relates to a kind of systems for realizing the above method, comprising: collection module, characteristic vector module, mapping block
And recommending module, in which: collection module collects target numeric field data and source domain data, characteristic vector module receive target numeric field data and
Source domain data and the characteristic vector for establishing developer and project, the feature that mapping block is established between aiming field and source domain are reflected
It penetrates, mappings characteristics space is established with Feature Mapping, recommending module calculates separately staff and the project of aiming field and source domain
Similarity, application example migrate classification algorithm training recommender system model and recommended project.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2~Fig. 4 is R@5 on software crowdsourcing platform in embodiment, R@10, the comparison diagram of R@15;
Fig. 5~Fig. 7 is P@5 on software crowdsourcing platform in embodiment, P@10, the comparison diagram of P@15.
Specific embodiment
As shown in Figure 1, being related to a kind of software crowdsourcing item recommendation method based on transfer learning, comprising the following steps:
1) collecting needs the data for the software crowdsourcing platform recommended as target numeric field data, the number of other software crowdsourcing platforms
According to as source domain data.
Data on software crowdsourcing platform are divided into two human subjects: the first kind is user set User, and user can be divided into again
Two classes meet the Bao Fangyu party awarding the contract, and User={ Rec, Sent }, wherein Rec expression connects packet side user set, and Sent indicates the party awarding the contract
User's set;Second class is project set T.Assuming that there is m to meet Bao Fang on platform, n project, i.e., | Rec |=m, | T |=n.?
The relationship of mesh and user can be expressed as R={ give out a contract for a project, register, get the bid, deliver }.The final present invention indicates the data on platform
Are as follows: (u, t, r) ∈ User × T × R.Given utility matrix U;All items T in utility matrix U, wherein forD represents the label of project, and k represents the keyword of project.Pass through historical data (u, t, r) ∈ User × T
× R learns a model, and model can meet packet side u ∈ Rec for one of input, recommend suitable project set j ∈ T.
2) the target numeric field data and source domain data that are collected into according to previous step establish the spy of developer and project respectively
Sign vector is that developer and project model respectively, respectively constitutes characteristic data set with this feature vector.Wherein aiming field is opened
Hair personnel and project are under exploitation characteristic of field space;Correspondingly, it is empty to be in source domain feature for the developer of source domain and project
Between under;.
2.1) BOW vector, the BOW of all source domain projects are established to each project of target numeric field data and source domain data respectively
Vector composition indicates the TM of the label substance of all items on source domain platforms,tMatrix;The BOW Vector Groups of all aiming field projects
At the TM for indicating the label substance of all items on source domain platformt,tMatrix.BOW is bag of words, and set of words is changed into one
Only contain the binary set of { 0,1 }, initializing the vector first is full null vector, set of words is checked later, if in set of words
Some word w once occurred, then by vectorA element substitution is 1.The label vector of each final project by
BOW vector composition.After finally carrying out vectorization to all n item descriptions, obtaining size is n* | Dt| matrix.This matrix
The label substance information for illustrating all items on platform, uses TM:,tIt is indicated.
2.2) the TF-IDF value of the word w in each item description is calculated.The TF value formula is
IDF value formula isTF-IDF value formula is tfidfw,d=tfw,d*idfw.Wherein: nw,dIt is in project
The number that word w occurs in d is described, | { j:w ∈ dj| indicate the quantity of the item description comprising word w.For a certain project
D is described, a vector is constructed, complete zero and dimension be | D |;The tfidf of each word w in item description d is calculated laterw,d, most
Afterwards by dimension corresponding in vectorIn value be substituted for tfidfw,d.Vectorization finally is carried out to all n item descriptions
Afterwards, obtaining size is n* | Dk| matrix.This matrix illustrates the content information of all items on platform, carries out table using KM
Show;The description vectors composition of all source domain projects indicates the KM of the description content of all items on source domain platforms,tMatrix;It is all
The description vectors composition of aiming field project indicates the KM of the description content of all items on source domain platformt,tMatrix;
2.3) the exposure Pop (j) for establishing each project j is calculated.Exposure Pop (j)=α (ts,tc)*Popr(j)+β
(ts,tc), in which: Popr(j)=Sum (U:,j), α (ts,tc)、β(ts,tc) it is the two time weighting factors, t is setsFor project
Issuing time,β(ts,tc)=ln (tc-ts+1)。
2.4) the label vector matrix TM of all staff on board is establishedu,:With keyword vector matrix K Mu,:。
The label vector matrixKeyword
Vector matrixWherein: λA, λB, λFRespectively register
Weight, acceptance of the bid weight and completion weight, we are set as 0.5,0.3,0.2 herein;Au、BuAnd FuRespectively indicate staff u
The project set registered.
3) based on the relativeness between target domain characterization space and source domain feature space, new reflect is established with Feature Mapping
Feature space is penetrated, by the maps feature vectors of source domain and aiming field into new feature space, realizes the spy of aiming field and source domain
Sign alignment.
3.1) important feature and insignificant feature of source domain and aiming field are merged into important feature collection F`t,fWith it is insignificant
Feature set F`k,f, from important feature set F`t,fMiddle chosen axis feature set Pt, in insignificant feature set F`k,fChosen axis feature set
Pk, obtain axis feature set P.
For an example d=(xk,xt)∈Ds,f, from its all insignificant feature set Xk={ xkOne son of middle selection
Collect X`k=Xk∩DT,t, for an example d=(xk,xt)∈DT,fOperation as source domain example, and from all of it
Insignificant feature set Xk={ xkIn choose a subset X`k=Xk∩DS,t.It chooses in the insignificant feature of d and appears in other side domain
Important feature in character subset.This lexon collection due to be other side domain important feature word set in a part, therefore can be with
Think this part of word for represent the characteristic of a project have the function of it is bigger, so this part of insignificant feature is mentioned
It takes out.
Delete X`kLess than one threshold value of intermediate valueFeature.The word occurred in a project is not necessarily extremely important
, and the feature for being required to represent this project of important feature.By remaining X`kIn insignificant feature upgrades be it is important
Feature, and by the important feature collection F` in source domain and aiming fieldt,f, insignificant feature set F`k,fIt returns.It is needed later from important spy
Axis feature is obtained in sign.
The important feature collection F`t,f=F`s,t,f∪F`t,t,fOne character subset P of middle selectiont=F`s,t,f∩F
`t,t,f, delete feature set PtIn appear in less than one threshold value of data set frequencyFeature, export PtMiddle residue character, as
The axis feature set of important feature.
The insignificant feature set F`k,f=F`s,k,f∪F`t,k,fMiddle selection a subset P`k=F`s,k,f∩F`t,k,f,
From P`kMiddle selection a subset Pk, wherein comprising having the word of highest mutual information special relative to the class label in similarity data set
Sign.Then less than one threshold value of all mutual informations in F is deletedWord feature, export PkMiddle residue character, as insignificant
The axis feature set of feature.
3.2) cluster is carried out using k-means clustering algorithm obtain m cluster, it is special using cluster centre as final axis
Sign.Clustering processing is carried out to axis feature, m cluster is obtained, using cluster centre as final axis feature.Select k-means poly-
Class algorithm is clustered, and the distance d for two vectors a, b is d=Σ (ai-bi)2.We select m=150 herein.
3.3) weighted feature collection D is calculated:=KM:,t+TM:,t* W, in which: TM:,tFor the TM obtained in modeling processs,tOr
TMt,t。KM:,tThe KM obtained in modeling processs,tOr KMt,t, W is weight.
3.4) calculated by training linear classifier the correlation between each axis feature p and every other word w come
Axis Feature Mapping is carried out, weight matrix W=[w` is obtained1]…[w`m].Pass through training linear classification using weight-SCL algorithm
Device calculates the correlation between each axis feature and every other word.The linear classifier predicts w based on other wordss、wt
Whether can occur in a document.For each axis feature p ∈ P create training set D:D=(Mask (x, p), Value (x, p)) | x ∈
Du}.Mask (x, p) function returns to the copy of x, and the value of axis feature p is set as zero, this is equivalent to deletes from feature space
These axis features.In Value (x, p), if the characteristic value non-zero of x axis feature p, returns to+1, -1 is otherwise returned to.For
Each D, corresponding linear classifier is by minimizing formula w`l=argmin (∑jL(w*xj,pl(xj))+λ||w||2) instruct
Practice.Finally obtain weight matrix W=[w`1]…[w`m]。
3.5) to weight matrix W=[w`1]…[w`m] singular value decomposition it is related to non-axis feature to calculate axis feature
Property.Weight-SCL algorithm passes through calculating | V | × m ties up parameter matrix W=[w`1]…[w`m] singular value decomposition come identify axis spy
Sign and the correlation between non-axis feature.W is in the form of multiple linear classifiers to related between axis feature and non-axis feature
Structure is encoded.Therefore, the column of U have determined the common minor structure in these classifiers.It selects related to maximum singular value
The U of connection is arranged, the available part with correlation maximum in W of this minor structure.θ is defined as related to k maximum singular value
Those of connection U column,
4) staff of aiming field and source domain and the similarity of project are calculated separately using mappings characteristics space, using reality
Example migration classification algorithm training recommender system model.
The similarity, obtains in the following manner:
4.1) label vector matrix TM is added to the unified vector DM of acquisition with keyword vector matrix K M, to establish similarity
Matrix SM.
An example s in the similarity matrix SM, it is necessary first to arbitrary project t and personnel u be taken to construct them
Similarity vector p.Establish formula d=| fu,i-ft,i|, pi=m-d, in which: for i-th of element p_i of p, user of service
The absolute distance d of i-th of element f_ (t, i) of i-th of element f_ (u, i) and item feature vector f_t of feature vector f_u comes
Indicate similarity, absolute distance is lower, and similarity is higher.Absolute distance is subtracted using a definite value m, obtains similarity.Definite value m
To uniform greatest measure.It there is negative value to avoid p_i, m recommends to take the maximum value in feature vector.
It is divided into four kinds of situations.Situation one: there are certain feature but personnel do not have in project.Situation two: personnel have certain spy
It levies and is not present in project.Situation three: project and personnel do not have certain feature.Situation four: project and personnel have certain spy
Sign.
The similarity being calculated in situation one and situation two should be consistent.But for actual demand, it is believed that situation
One similarity should be lower than the similarity of situation two, and reason is apparent: the demand of project should be than the ability of personnel
It is more important.When therefore for detecting that situation one occurs, absolute distance is greater than 1 coefficient a by the present invention multiplied by one, so that
Similarity is lower.This coefficient a is known as negative feature coefficient.Formula p is revised as in the similarity formula of situation oncei=m-a*
d。
The similarity being calculated in situation three and situation four should be consistent.But for actual demand, it is believed that situation
The case where four is higher than the similarity of situation three, and project and personnel have certain feature is less common to be also worth being taken seriously, it says
The demand that project is illustrated can be met by personnel.And all do not have certain be characterized in it is more common and universal.It is demonstrated by project
Without certain demand, personnel are also unessential without the phenomenon that ability.Therefore when detecting situation three, one will be subtracted
Coefficient z greater than 0, so that similarity is lower.This coefficient z is become into zero characteristic coefficient.Similarity formula in situation three
It is revised as formula pi=m-d-z.
Obtain the similarity p of the ith feature between any two personnel and projecti, calculated by identical method
The all elements value of similarity vector p, to construct outgoing vector p.Use the similarity vector p of project and personnel as similar degree
Label according to the feature of the corresponding instance s of matrix SM, whether registration as SM.Wherein project and there are registration relationships, and mark is then arranged
Signing l is 1, and such example is also referred to as positive example;It is 0 that label l, which is then arranged, there is no registration relationship, and such example is also referred to as
Negative example.For the example s of each project and personnel, all have corresponding feature vector p and label l, i.e. s=(p, l) | l=
{0,1}}.In addition the quantity ratio for defining positive example and negative example is k,|S1| indicate the quantity of positive example, | S0| indicate negative example
Quantity.
4.2) positive and negative example is constructed.There is registration relationship between positive example personnel and project.Selection project t, personnel u meet effect
With respective value U in matrixt,u=1.And the building for negative example, some calculating phases are chosen from the project that personnel did not register
Like degree p, negative example is constituted to similarity is calculated using the high item design personnel project of the exposure for relationship of not registering with personnel.
It is then successively examined according to this item sequence according to the exposure Pop (t) of project t to the entry sorting in project set T first
Look into whether personnel and project have registration relationship, if so, then continuing checking next;If it is not, using current project and
Personnel construct negative example.
To homogenization greatest measure m, negative feature coefficient a, zero characteristic coefficient z and positive and negative example ratio k, this four numerical value
Assignment has been carried out, m=10, a=2, z=8, this class value of k=1.2, so that the numerical value that system performance is optimal are had chosen.
4.3) application existing TrAdaBoost algorithm combination logistic regression algorithm training recommender system model.
5) recommender system model recommended project is applied.Recommended using logistic regression algorithm.The algorithm is utilized
Sigmoid function,Anticipation function isVector x is input to hθ(x)
In, obtained value represents the probability that result takes 1: and P (y=1 | x;θ)=hθ(x)。
The present embodiment is related to a kind of system for realizing the above method, comprising: collection module, characteristic vector module, mapping mould
Block and recommending module, in which: collection module collects target numeric field data and source domain data;Characteristic vector module receives target numeric field data
With source domain data and establish the characteristic vector of developer and project;The feature that mapping block is established between aiming field and source domain is reflected
It penetrates, mappings characteristics space is established with Feature Mapping;Recommending module calculates separately staff and the project of aiming field and source domain
Similarity, application example migrate classification algorithm training recommender system model and recommended project.
Compared with prior art, it proposes a kind of software crowdsourcing item recommendation method and system based on transfer learning, mentions
The high accuracy rate recommended, using the data on other crowdsourcing platforms, the source platform progress proposed algorithm of auxiliary data scarcity
Training solves the problems, such as the platform cold start-up of recommender system, improves the accuracy of project recommendation.
Have chosen domestic two famous software crowdsourcing platforms: one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity, solution, which distribute telephone numbers, to carry out data and crawls and compare
Test.Each platform has a characteristic data of oneself, but while testing be used only item description information, item label and project with
Connect the registration, acceptance of the bid and completion three classes relationship of Bao Fang.Details as Follows shown in table for data on software crowdsourcing platform.From one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity
Website has crawled project 6000 in total, meets 10597 people of packet side, and the relationship for connecing the registration of Bao Fangyu project is 48769;From liberation
The project that crawls in number 2800 in total, packet side totally 5231 people is met, existing registration relationship number is 15498.
Since the data that solution is distributed telephone numbers are considerably less than the data of one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity, use the data of one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity as source domain number
According to the data for using solution to distribute telephone numbers are distributed telephone numbers on platform from one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity platform migration to solution as target domain data, by data.
The data that the Chinese two famous crowdsourcing platforms of table 1 are crawled
P@k and R@k two indices are selected to evaluate the superiority and inferiority of proposed algorithm.P@k reflection is to recommend accuracy, that is, is being pushed away
In the top-k project recommended, how many project is that there are registration relationships with personnel;And R@k reflection is recall rate, that is, is being tested
The top-k project how many is concentrated recommend.
A personnel u is inputted in proposed algorithm, algorithm can return to an item destination aggregation (mda), wherein each project has
Corresponding trained values represent personnel and the matched degree of project.Algorithm can be ranked up trained values, just obtain one and push away
Recommend sequence.
Assuming that the sequence of the acquisition after sequence is lu, defined function h (k, lu), a sequence is inputted, is returned in list
The item of top-k.It usesAnd
Two indices assess recommendation effect.Wherein TestuExpression personnel u corresponding Item Sets in test set;Dev indicates personnel's collection.
The method compared with recommender system chosen is: the nearest neighbor algorithm based on content
Similarity is calculated using content matrix.Wherein simi,i′Similarity between expression item, IkIt gives and item i most like top-k
?;And SCL algorithm.Only training and the test on liberation number collection of ICBNN algorithm;And SCL algorithm and multi-source proposed algorithm
It distributes telephone numbers in solution and is trained in one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity's two datasets, tested on liberation number collection.
It as shown in Figure 2 to 4, is R@5 on software crowdsourcing platform, R@10, the comparison diagram of R@15;As shown in Fig. 5~Fig. 7,
For P@5 on software crowdsourcing platform, P@10, the comparison diagram of P@15.By above data, it can be seen that on P and R.Four algorithms
Show almost the same, i.e., the data of single source proposed algorithm ratio ICBNN algorithm are slightly higher, and gap is not very big;And multi-source is recommended
The data of algorithm ratio SCL algorithm are obviously high;Multi-source proposed algorithm is more much higher than the data of single source proposed algorithm.
Test data is analyzed as follows: multi-source proposed algorithm due to used propose for two category feature problems
Weight-SCL algorithm is well many than the effect of basic skills SCL algorithm;And multi-source proposed algorithm recommends to calculate compared to single source
Method has migrated the data on one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity's platform, is equivalent to and has carried out very big expansion to data set, therefore proposed algorithm effect phase
There is the promotion of 1.2X for ICBNN method.
Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference
Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute
Limit, each implementation within its scope is by the constraint of the present invention.
Claims (9)
1. a kind of software crowdsourcing item recommendation method based on transfer learning, which is characterized in that collection needs to recommend soft first
The data of part crowdsourcing platform are collected simultaneously soft as target numeric field data, the data of other software crowdsourcing platforms as source domain data
Project data and developer's data on part crowdsourcing platform;Then the feature vector for establishing developer and project respectively is to open
Hair personnel and project model respectively, respectively constitute characteristic data set with this feature vector;It is based on target domain characterization space and source again
Relativeness between characteristic of field space establishes new mappings characteristics space with Feature Mapping, by the feature of source domain and aiming field
DUAL PROBLEMS OF VECTOR MAPPING realizes the feature alignment of aiming field and source domain into new feature space;Finally using in mappings characteristics space
Developer's feature vector and item feature vector calculate separately the developer of aiming field and source domain and the similarity of project,
Application example migrates classification algorithm training recommender system model, and applies recommender system model recommended project.
2. the software crowdsourcing item recommendation method according to claim 1 based on transfer learning, characterized in that described builds
Mould, specifically includes the following steps:
1) BOW vector, the BOW Vector Groups of all source domain projects are established to the label of each of aiming field and source domain project respectively
At the TM for indicating the label substance of all items on source domain platforms,tMatrix;The BOW vector of all aiming field projects, which forms, to be indicated
The TM of the label substance of all items on source domain platformt,tMatrix;
2) the TF-IDF value that the word w in each item description is calculated using existing TF-IDF method, utilizes all words
TF-IDF value forms the description vectors of project, and the description vectors composition of all source domain projects indicates all items on source domain platform
The KM of description contents,tMatrix;The description vectors composition of all aiming field projects indicates on source domain platform in the description of all items
The KM of appearancet,tMatrix;
3) the exposure Pop (j) for establishing each project j is calculated;
4) the label vector matrix TM of source domain all staff on board is establisheds,uWith keyword vector matrix K Ms,u;Establish aiming field institute
There is the label vector matrix TM of stafft,uWith keyword vector matrix K Mt,u。
3. the software crowdsourcing item recommendation method according to claim 2 based on transfer learning, characterized in that the exposure
Luminosity Pop (j)=α (ts,tc)*Popr(j)+β(ts,tc), in which: Popr(j)=Sum (U:,j), α (ts,tc)、β(ts,tc) be
T is arranged in the two time weighting factorssFor the issuing time of project,β(ts,tc)=ln (tc-ts+1)。
4. the software crowdsourcing item recommendation method and system according to claim 3 based on transfer learning, characterized in that institute
The label vector matrix statedKeyword vector matrixWherein: λA, λB, λFRespectively registration weight, acceptance of the bid
Weight and completion weight, Au、BuAnd FuRespectively indicate the project set that staff u registered.
5. the software crowdsourcing item recommendation method according to claim 4 based on transfer learning, characterized in that described builds
New mappings characteristics space is stood, specifically includes the following steps:
1) important feature and insignificant feature of source domain and aiming field are merged into important feature collection F`t,fWith insignificant feature set F
`k,f, from important feature set F`t,fMiddle chosen axis feature set Pt, in insignificant feature set F`k,fChosen axis feature set Pk, obtain axis
Feature set P;
2) cluster is carried out using k-means clustering algorithm and obtain m cluster, using cluster centre as final axis feature;
3) weighted feature collection D is calculated:=KM:,t+TM:,t* W, in which: TM:,tFor the TM obtained in modeling processs,tOr TMt,t。KM:,t
The KM obtained in modeling processs,tOr KMt,t, W is weight;
4) correlation between each axis feature p and every other word w is calculated by training linear classifier to carry out axis
Feature Mapping obtains weight matrix W=[w`1]…[w`m];
5) to weight matrix W=[w`1]…[w`m] singular value decomposition calculate the correlation of axis feature with non-axis feature.
6. the software crowdsourcing item recommendation method according to claim 5 based on transfer learning, characterized in that the phase
Like degree, unified vector DM is obtained by the way that label vector matrix TM to be added with keyword vector matrix K M, to establish similarity moment
Battle array SM;Positive and negative example is constructed again to be achieved.
7. the software crowdsourcing item recommendation method according to claim 1 based on transfer learning, characterized in that the instruction
Practice, is realized by application TrAdaBoost algorithm.
8. the software crowdsourcing item recommendation method according to claim 1 based on transfer learning, characterized in that described pushes away
System model application logistic regression algorithm is recommended to be recommended.
9. a kind of system for realizing any of the above-described claim the method characterized by comprising collection module, Characteristic Vectors
Measure module, mapping block and recommending module, in which: collection module collects target numeric field data and source domain data, characteristic vector module
It receives target numeric field data and source domain data and the characteristic vector for establishing developer and project, mapping block establishes aiming field and source
Feature Mapping between domain establishes mappings characteristics space with Feature Mapping, and recommending module calculates separately the work of aiming field and source domain
Make the similarity of personnel and project, application example migrates classification algorithm training recommender system model and recommended project.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710959395.1A CN110020121A (en) | 2017-10-16 | 2017-10-16 | Software crowdsourcing item recommendation method and system based on transfer learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710959395.1A CN110020121A (en) | 2017-10-16 | 2017-10-16 | Software crowdsourcing item recommendation method and system based on transfer learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110020121A true CN110020121A (en) | 2019-07-16 |
Family
ID=67186635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710959395.1A Pending CN110020121A (en) | 2017-10-16 | 2017-10-16 | Software crowdsourcing item recommendation method and system based on transfer learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110020121A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111159542A (en) * | 2019-12-12 | 2020-05-15 | 中国科学院深圳先进技术研究院 | Cross-domain sequence recommendation method based on self-adaptive fine-tuning strategy |
CN111324812A (en) * | 2020-02-20 | 2020-06-23 | 深圳前海微众银行股份有限公司 | Federal recommendation method, device, equipment and medium based on transfer learning |
CN111932108A (en) * | 2020-08-06 | 2020-11-13 | 北京航空航天大学杭州创新研究院 | Developer recommendation method oriented to group software process |
CN112396092A (en) * | 2020-10-26 | 2021-02-23 | 北京航空航天大学 | Crowdsourcing developer recommendation method and device |
CN112417288A (en) * | 2020-11-25 | 2021-02-26 | 南京大学 | Task cross-domain recommendation method for crowdsourcing software testing |
CN112767009A (en) * | 2020-12-31 | 2021-05-07 | 上海梦创双杨数据科技股份有限公司 | Training item recommendation method based on WeChat public number training registration |
CN113222073A (en) * | 2021-06-09 | 2021-08-06 | 支付宝(杭州)信息技术有限公司 | Method and device for training transfer learning model and recommendation model |
CN113343087A (en) * | 2021-06-09 | 2021-09-03 | 南京星云数字技术有限公司 | Method and system for acquiring marketing user |
US20230053820A1 (en) * | 2021-08-19 | 2023-02-23 | Red Hat, Inc. | Generating a build process for building software in a target environment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530428A (en) * | 2013-11-04 | 2014-01-22 | 武汉大学 | Same-occupation type recommendation method based on developer practical skill similarity |
EP2860672A2 (en) * | 2013-10-10 | 2015-04-15 | Deutsche Telekom AG | Scalable cross domain recommendation system |
WO2015192655A1 (en) * | 2014-06-20 | 2015-12-23 | 华为技术有限公司 | Method and device for establishing and using user recommendation model in social network |
EP2983123A1 (en) * | 2014-07-17 | 2016-02-10 | Deutsche Telekom AG | Self transfer learning recommendation method and system |
CN105447145A (en) * | 2015-11-25 | 2016-03-30 | 天津大学 | Item-based transfer learning recommendation method and recommendation apparatus thereof |
CN106201465A (en) * | 2016-06-23 | 2016-12-07 | 扬州大学 | Software project personalized recommendation method towards open source community |
CN106227767A (en) * | 2016-07-15 | 2016-12-14 | 华侨大学 | A kind of based on the adaptive collaborative filtering method of field dependency |
US20170220951A1 (en) * | 2016-02-02 | 2017-08-03 | Xerox Corporation | Adapting multiple source classifiers in a target domain |
-
2017
- 2017-10-16 CN CN201710959395.1A patent/CN110020121A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2860672A2 (en) * | 2013-10-10 | 2015-04-15 | Deutsche Telekom AG | Scalable cross domain recommendation system |
CN103530428A (en) * | 2013-11-04 | 2014-01-22 | 武汉大学 | Same-occupation type recommendation method based on developer practical skill similarity |
WO2015192655A1 (en) * | 2014-06-20 | 2015-12-23 | 华为技术有限公司 | Method and device for establishing and using user recommendation model in social network |
EP2983123A1 (en) * | 2014-07-17 | 2016-02-10 | Deutsche Telekom AG | Self transfer learning recommendation method and system |
CN105447145A (en) * | 2015-11-25 | 2016-03-30 | 天津大学 | Item-based transfer learning recommendation method and recommendation apparatus thereof |
US20170220951A1 (en) * | 2016-02-02 | 2017-08-03 | Xerox Corporation | Adapting multiple source classifiers in a target domain |
CN106201465A (en) * | 2016-06-23 | 2016-12-07 | 扬州大学 | Software project personalized recommendation method towards open source community |
CN106227767A (en) * | 2016-07-15 | 2016-12-14 | 华侨大学 | A kind of based on the adaptive collaborative filtering method of field dependency |
Non-Patent Citations (5)
Title |
---|
JIANGANG ZHU等: "A Learning to Rank Framework for Developer Recommendation in Software Crowdsourcing", 《2015 ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC)》 * |
NING LI等: "Task Recommendation with Developer Social Network in Software Crowdsourcing", 《2016 23RD ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC)》 * |
刘桂峰等: "一种改进的多源域多视角学习算法", 《青岛大学学报(自然科学版)》 * |
柯良文等: "基于用户特征迁移的协同过滤推荐", 《计算机工程》 * |
董爱美等: "基于迁移共享空间的分类新算法", 《计算机研究与发展》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111159542A (en) * | 2019-12-12 | 2020-05-15 | 中国科学院深圳先进技术研究院 | Cross-domain sequence recommendation method based on self-adaptive fine-tuning strategy |
CN111324812A (en) * | 2020-02-20 | 2020-06-23 | 深圳前海微众银行股份有限公司 | Federal recommendation method, device, equipment and medium based on transfer learning |
CN111932108A (en) * | 2020-08-06 | 2020-11-13 | 北京航空航天大学杭州创新研究院 | Developer recommendation method oriented to group software process |
CN111932108B (en) * | 2020-08-06 | 2022-07-19 | 北京航空航天大学杭州创新研究院 | Developer recommendation method oriented to group software process |
CN112396092A (en) * | 2020-10-26 | 2021-02-23 | 北京航空航天大学 | Crowdsourcing developer recommendation method and device |
CN112396092B (en) * | 2020-10-26 | 2023-09-29 | 北京航空航天大学 | Crowdsourcing developer recommendation method and device |
CN112417288A (en) * | 2020-11-25 | 2021-02-26 | 南京大学 | Task cross-domain recommendation method for crowdsourcing software testing |
CN112417288B (en) * | 2020-11-25 | 2024-04-12 | 南京大学 | Task cross-domain recommendation method for crowdsourcing software test |
CN112767009B (en) * | 2020-12-31 | 2023-07-18 | 上海梦创双杨数据科技股份有限公司 | Training project recommendation method based on WeChat public number training registration |
CN112767009A (en) * | 2020-12-31 | 2021-05-07 | 上海梦创双杨数据科技股份有限公司 | Training item recommendation method based on WeChat public number training registration |
CN113222073A (en) * | 2021-06-09 | 2021-08-06 | 支付宝(杭州)信息技术有限公司 | Method and device for training transfer learning model and recommendation model |
CN113343087A (en) * | 2021-06-09 | 2021-09-03 | 南京星云数字技术有限公司 | Method and system for acquiring marketing user |
US20230053820A1 (en) * | 2021-08-19 | 2023-02-23 | Red Hat, Inc. | Generating a build process for building software in a target environment |
US11995420B2 (en) * | 2021-08-19 | 2024-05-28 | Red Hat, Inc. | Generating a build process for building software in a target environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110020121A (en) | Software crowdsourcing item recommendation method and system based on transfer learning | |
Zhang et al. | Can consumer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp | |
Fan et al. | Product-aware helpfulness prediction of online reviews | |
CN109189904A (en) | Individuation search method and system | |
Tyrrell et al. | A framework for assessing direct economic impacts of tourist events: Distinguishing origins, destinations, and causes of expenditures | |
Elliott et al. | Spatial statistical methods in environmental epidemiology: a critique | |
CN111259263B (en) | Article recommendation method and device, computer equipment and storage medium | |
CN109345348A (en) | The recommended method of multidimensional information portrait based on travel agency user | |
CN103761254B (en) | Method for matching and recommending service themes in various fields | |
CN111274330B (en) | Target object determination method and device, computer equipment and storage medium | |
CN109493199A (en) | Products Show method, apparatus, computer equipment and storage medium | |
Yang et al. | Tag-based expert recommendation in community question answering | |
CN109408712A (en) | A kind of construction method of travel agency user multidimensional information portrait | |
CN110222709A (en) | A kind of multi-tag intelligence marking method and system | |
Huang et al. | Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow | |
Rekabsaz et al. | Measuring societal biases from text corpora with smoothed first-order co-occurrence | |
CN114266443A (en) | Data evaluation method and device, electronic equipment and storage medium | |
Schmink | Dependent development and the division of labor by sex: Venezuela | |
Li et al. | Mining online reviews for ranking products: A novel method based on multiple classifiers and interval-valued intuitionistic fuzzy TOPSIS | |
CN113535949B (en) | Multi-modal combined event detection method based on pictures and sentences | |
Goli et al. | A bias correction approach for interference in ranking experiments | |
CN112348300A (en) | Method and device for pushing information | |
KR102457904B1 (en) | System for providing sports lesson matching service | |
CN107169837B (en) | Method, device, electronic equipment and computer readable medium for assisting search | |
Li et al. | Incorporating facial attractiveness in photos for online dating recommendation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190716 |
|
WD01 | Invention patent application deemed withdrawn after publication |