CN110020121A - Software crowdsourcing item recommendation method and system based on transfer learning - Google Patents

Software crowdsourcing item recommendation method and system based on transfer learning Download PDF

Info

Publication number
CN110020121A
CN110020121A CN201710959395.1A CN201710959395A CN110020121A CN 110020121 A CN110020121 A CN 110020121A CN 201710959395 A CN201710959395 A CN 201710959395A CN 110020121 A CN110020121 A CN 110020121A
Authority
CN
China
Prior art keywords
feature
project
source domain
data
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710959395.1A
Other languages
Chinese (zh)
Inventor
阎姝含
沈备军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201710959395.1A priority Critical patent/CN110020121A/en
Publication of CN110020121A publication Critical patent/CN110020121A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of software crowdsourcing item recommendation method and system based on transfer learning, collecting first needs the data for the software crowdsourcing platform recommended as target numeric field data, the data of other software crowdsourcing platforms are as source domain data, the project data being collected simultaneously on software crowdsourcing platform and developer's data;Then the feature vector for establishing developer and project respectively is that developer and project model respectively, respectively constitutes characteristic data set with this feature vector;Again based on the relativeness between target domain characterization space and source domain feature space, new mappings characteristics space is established with Feature Mapping, by the maps feature vectors of source domain and aiming field into new feature space, realizes the feature alignment of aiming field and source domain;Finally using the developer's feature vector and item feature vector in mappings characteristics space, calculate separately the developer of aiming field and source domain and the similarity of project, application example migrates classification algorithm training recommender system model, and applies recommender system model recommended project.

Description

Software crowdsourcing item recommendation method and system based on transfer learning
Technical field
The present invention relates to a kind of technology in software crowdsourcing field, specifically a kind of software crowdsourcing based on transfer learning Item recommendation method and system.
Background technique
In recent years, with the development of software crowdsourcing industry, increasing user is added in software crowdsourcing platform and sends out Cloth demand and job-hunting.The basic procedure of software crowdsourcing platform is: 1) project is published to crowdsourcing platform by project publisher, And provide corresponding remuneration;2) developer of crowdsourcing platform may browse through project, if they are interested can to register this Project;3) project publisher selects appropriate personnel to develop the project from numerous registration personnel;4) developer has succeeded At project and obtain corresponding remuneration.However in recommender system, serious platform cold start-up problem, i.e., new building are always existed Platform in user and project data accumulation it is on the low side.Being limited with the training and prediction of event recommender system by data scarcity can not Play function.
Cold start-up problem is an extremely important and common problem.In the problem of specific to recommender system, mainly may be used To be refined as item cold start-up problem, user is cold-started problem and platform cold start-up problem.One new item arrives, not It is used by any user, which results in that it can not be recommended any user, this is item cold start-up problem;Similarly, one it is new When user arrives, any item because the user did not score can not push away the user by the method for collaborative filtering It recommends, this is that user is cold-started problem;Lack accumulative synergistic data on recommender system platform, data scale is unable to reach building model Required scale, proposed algorithm model can not train, this is platform cold start-up problem.
Summary of the invention
The present invention can not solve platform cold start-up for the prior art, knowledge compatible degree judges too simple, item class The defects of not fine enough is not defined, a kind of software crowdsourcing item recommendation method and system based on transfer learning is proposed, significant While improving the accuracy rate recommended, using the data on other crowdsourcing platforms, the source platform of auxiliary data scarcity is pushed away The training for recommending algorithm solves the problems, such as the platform cold start-up of recommender system, improves the accuracy of project recommendation.
The present invention is achieved by the following technical solutions:
The present invention relates to a kind of software crowdsourcing item recommendation method based on transfer learning, collect first need to recommend it is soft The data of part crowdsourcing platform are collected simultaneously soft as target numeric field data, the data of other software crowdsourcing platforms as source domain data Project data and developer's data on part crowdsourcing platform;Then the feature vector for establishing developer and project respectively is to open Hair personnel and project model respectively, respectively constitute characteristic data set with this feature vector;It is based on target domain characterization space and source again Relativeness between characteristic of field space establishes new mappings characteristics space with Feature Mapping, by the feature of source domain and aiming field DUAL PROBLEMS OF VECTOR MAPPING realizes the feature alignment of aiming field and source domain into new feature space;Finally using in mappings characteristics space Developer's feature vector and item feature vector calculate separately the developer of aiming field and source domain and the similarity of project, Application example migrates classification algorithm training recommender system model, and applies recommender system model recommended project.
The project data includes the label of project and the description of project;Developer's data include related with personnel Project number, including personnel's registration, acceptance of the bid, the project data delivered.
The developer of the aiming field and project are under exploitation characteristic of field space;Correspondingly, the exploitation of source domain Personnel and project are under source domain feature space;
The modeling, specifically includes the following steps:
1) BOW vector is established to the label of each of aiming field and source domain project respectively.The BOW of all source domain projects to Amount composition indicates the TM of the label substance of all items on source domain platforms,tMatrix;The BOW vector of all aiming field projects forms Indicate the TM of the label substance of all items on source domain platformt,tMatrix;
2) the TF-IDF value that the word w in each item description is calculated using existing TF-IDF method, utilizes all words The description vectors of the TF-IDF value composition project of language.The description vectors composition of all source domain projects indicates all items on source domain platform The KM of purpose description contents,tMatrix;All items retouches on the description vectors composition expression source domain platform of all aiming field projects State the KM of contentt,tMatrix;
3) the exposure Pop (j) for establishing each project j is calculated;
4) the label vector matrix TM of source domain all staff on board is establisheds,uWith keyword vector matrix K Ms,u;Establish target The label vector matrix TM of domain all staff on boardt,uWith keyword vector matrix K Mt,u
The TF value formula isIDF value formula isTF-IDF value is public Formula is tfidfw,d=tfw,d*idfw, in which: nw,dIt is the number that word w occurs in item description d, | { j:w ∈ dj| it indicates The quantity of item description comprising word w.
The exposure Pop (j)=α (ts,tc)*Popr(j)+β(ts,tc), in which: Popr(j)=Sum (U:,j), α (ts,tc)、β(ts,tc) it is the two time weighting factors, t is setsFor the issuing time of project,β(ts, tc)=ln (tc-ts+1)。
The label vector matrixKeyword Vector matrixWherein: λA, λB, λFRespectively register Weight, acceptance of the bid weight and completion weight;Au、BuAnd FuRespectively indicate the project set that staff u registered.
Described establishes new mappings characteristics space, specifically includes the following steps:
1) important feature and insignificant feature of source domain and aiming field are merged into important feature collection F`t,fWith insignificant spy Collect F`k,f, from important feature set F`t,fMiddle chosen axis feature set Pt, in insignificant feature set F`k,fChosen axis feature set Pk, Obtain axis feature set P;
2) cluster is carried out using k-means clustering algorithm and obtain m cluster, using cluster centre as final axis feature;
3) weighted feature collection D is calculated:=KM:,t+TM:,t* W, in which: TM:,tFor the TM obtained in modeling processs,tOr TMt,t。KM:,tThe KM obtained in modeling processs,tOr KMt,t, W is weight;
4) calculated by training linear classifier the correlation between each axis feature p and every other word w come into Row axis Feature Mapping obtains weight matrix W=[w`1]…[w`m];
5) to weight matrix W=[w`1]…[w`m] singular value decomposition calculate the correlation of axis feature with non-axis feature.
The similarity obtains unified vector by the way that label vector matrix TM to be added with keyword vector matrix K M DM, to establish similarity matrix SM;Positive and negative example is constructed again to be achieved, specifically: positive example selection has registration relationship project t, people Member u is constructed.And the building for negative example, need to choose some calculating similarity p from the project that personnel did not register, The item design personnel project that the present invention preferentially uses the exposure for relationship of not registering with personnel high is constituted to similarity is calculated Negative example.This method is right compared to the negative example of random project personnel, and the higher negative example of exposure not find item due to personnel Mesh is reduced without the probability of registration, more representative of the negative interest orientation of personnel.Therefore first according to the exposure of calculated project t Then to the entry sorting in project set T personnel and project is examined successively whether according to this item sequence in luminosity Pop (t) There is registration relationship, if so, then continuing checking next;If it is not, constructing negative example using current project and personnel.;
The training, using TrAdaBoost algorithm training recommender system model.
The recommender system model application logistic regression algorithm is recommended.
The present invention relates to a kind of systems for realizing the above method, comprising: collection module, characteristic vector module, mapping block And recommending module, in which: collection module collects target numeric field data and source domain data, characteristic vector module receive target numeric field data and Source domain data and the characteristic vector for establishing developer and project, the feature that mapping block is established between aiming field and source domain are reflected It penetrates, mappings characteristics space is established with Feature Mapping, recommending module calculates separately staff and the project of aiming field and source domain Similarity, application example migrate classification algorithm training recommender system model and recommended project.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2~Fig. 4 is R@5 on software crowdsourcing platform in embodiment, R@10, the comparison diagram of R@15;
Fig. 5~Fig. 7 is P@5 on software crowdsourcing platform in embodiment, P@10, the comparison diagram of P@15.
Specific embodiment
As shown in Figure 1, being related to a kind of software crowdsourcing item recommendation method based on transfer learning, comprising the following steps:
1) collecting needs the data for the software crowdsourcing platform recommended as target numeric field data, the number of other software crowdsourcing platforms According to as source domain data.
Data on software crowdsourcing platform are divided into two human subjects: the first kind is user set User, and user can be divided into again Two classes meet the Bao Fangyu party awarding the contract, and User={ Rec, Sent }, wherein Rec expression connects packet side user set, and Sent indicates the party awarding the contract User's set;Second class is project set T.Assuming that there is m to meet Bao Fang on platform, n project, i.e., | Rec |=m, | T |=n.? The relationship of mesh and user can be expressed as R={ give out a contract for a project, register, get the bid, deliver }.The final present invention indicates the data on platform Are as follows: (u, t, r) ∈ User × T × R.Given utility matrix U;All items T in utility matrix U, wherein forD represents the label of project, and k represents the keyword of project.Pass through historical data (u, t, r) ∈ User × T × R learns a model, and model can meet packet side u ∈ Rec for one of input, recommend suitable project set j ∈ T.
2) the target numeric field data and source domain data that are collected into according to previous step establish the spy of developer and project respectively Sign vector is that developer and project model respectively, respectively constitutes characteristic data set with this feature vector.Wherein aiming field is opened Hair personnel and project are under exploitation characteristic of field space;Correspondingly, it is empty to be in source domain feature for the developer of source domain and project Between under;.
2.1) BOW vector, the BOW of all source domain projects are established to each project of target numeric field data and source domain data respectively Vector composition indicates the TM of the label substance of all items on source domain platforms,tMatrix;The BOW Vector Groups of all aiming field projects At the TM for indicating the label substance of all items on source domain platformt,tMatrix.BOW is bag of words, and set of words is changed into one Only contain the binary set of { 0,1 }, initializing the vector first is full null vector, set of words is checked later, if in set of words Some word w once occurred, then by vectorA element substitution is 1.The label vector of each final project by BOW vector composition.After finally carrying out vectorization to all n item descriptions, obtaining size is n* | Dt| matrix.This matrix The label substance information for illustrating all items on platform, uses TM:,tIt is indicated.
2.2) the TF-IDF value of the word w in each item description is calculated.The TF value formula is IDF value formula isTF-IDF value formula is tfidfw,d=tfw,d*idfw.Wherein: nw,dIt is in project The number that word w occurs in d is described, | { j:w ∈ dj| indicate the quantity of the item description comprising word w.For a certain project D is described, a vector is constructed, complete zero and dimension be | D |;The tfidf of each word w in item description d is calculated laterw,d, most Afterwards by dimension corresponding in vectorIn value be substituted for tfidfw,d.Vectorization finally is carried out to all n item descriptions Afterwards, obtaining size is n* | Dk| matrix.This matrix illustrates the content information of all items on platform, carries out table using KM Show;The description vectors composition of all source domain projects indicates the KM of the description content of all items on source domain platforms,tMatrix;It is all The description vectors composition of aiming field project indicates the KM of the description content of all items on source domain platformt,tMatrix;
2.3) the exposure Pop (j) for establishing each project j is calculated.Exposure Pop (j)=α (ts,tc)*Popr(j)+β (ts,tc), in which: Popr(j)=Sum (U:,j), α (ts,tc)、β(ts,tc) it is the two time weighting factors, t is setsFor project Issuing time,β(ts,tc)=ln (tc-ts+1)。
2.4) the label vector matrix TM of all staff on board is establishedu,:With keyword vector matrix K Mu,:
The label vector matrixKeyword Vector matrixWherein: λA, λB, λFRespectively register Weight, acceptance of the bid weight and completion weight, we are set as 0.5,0.3,0.2 herein;Au、BuAnd FuRespectively indicate staff u The project set registered.
3) based on the relativeness between target domain characterization space and source domain feature space, new reflect is established with Feature Mapping Feature space is penetrated, by the maps feature vectors of source domain and aiming field into new feature space, realizes the spy of aiming field and source domain Sign alignment.
3.1) important feature and insignificant feature of source domain and aiming field are merged into important feature collection F`t,fWith it is insignificant Feature set F`k,f, from important feature set F`t,fMiddle chosen axis feature set Pt, in insignificant feature set F`k,fChosen axis feature set Pk, obtain axis feature set P.
For an example d=(xk,xt)∈Ds,f, from its all insignificant feature set Xk={ xkOne son of middle selection Collect X`k=Xk∩DT,t, for an example d=(xk,xt)∈DT,fOperation as source domain example, and from all of it Insignificant feature set Xk={ xkIn choose a subset X`k=Xk∩DS,t.It chooses in the insignificant feature of d and appears in other side domain Important feature in character subset.This lexon collection due to be other side domain important feature word set in a part, therefore can be with Think this part of word for represent the characteristic of a project have the function of it is bigger, so this part of insignificant feature is mentioned It takes out.
Delete X`kLess than one threshold value of intermediate valueFeature.The word occurred in a project is not necessarily extremely important , and the feature for being required to represent this project of important feature.By remaining X`kIn insignificant feature upgrades be it is important Feature, and by the important feature collection F` in source domain and aiming fieldt,f, insignificant feature set F`k,fIt returns.It is needed later from important spy Axis feature is obtained in sign.
The important feature collection F`t,f=F`s,t,f∪F`t,t,fOne character subset P of middle selectiont=F`s,t,f∩F `t,t,f, delete feature set PtIn appear in less than one threshold value of data set frequencyFeature, export PtMiddle residue character, as The axis feature set of important feature.
The insignificant feature set F`k,f=F`s,k,f∪F`t,k,fMiddle selection a subset P`k=F`s,k,f∩F`t,k,f, From P`kMiddle selection a subset Pk, wherein comprising having the word of highest mutual information special relative to the class label in similarity data set Sign.Then less than one threshold value of all mutual informations in F is deletedWord feature, export PkMiddle residue character, as insignificant The axis feature set of feature.
3.2) cluster is carried out using k-means clustering algorithm obtain m cluster, it is special using cluster centre as final axis Sign.Clustering processing is carried out to axis feature, m cluster is obtained, using cluster centre as final axis feature.Select k-means poly- Class algorithm is clustered, and the distance d for two vectors a, b is d=Σ (ai-bi)2.We select m=150 herein.
3.3) weighted feature collection D is calculated:=KM:,t+TM:,t* W, in which: TM:,tFor the TM obtained in modeling processs,tOr TMt,t。KM:,tThe KM obtained in modeling processs,tOr KMt,t, W is weight.
3.4) calculated by training linear classifier the correlation between each axis feature p and every other word w come Axis Feature Mapping is carried out, weight matrix W=[w` is obtained1]…[w`m].Pass through training linear classification using weight-SCL algorithm Device calculates the correlation between each axis feature and every other word.The linear classifier predicts w based on other wordss、wt Whether can occur in a document.For each axis feature p ∈ P create training set D:D=(Mask (x, p), Value (x, p)) | x ∈ Du}.Mask (x, p) function returns to the copy of x, and the value of axis feature p is set as zero, this is equivalent to deletes from feature space These axis features.In Value (x, p), if the characteristic value non-zero of x axis feature p, returns to+1, -1 is otherwise returned to.For Each D, corresponding linear classifier is by minimizing formula w`l=argmin (∑jL(w*xj,pl(xj))+λ||w||2) instruct Practice.Finally obtain weight matrix W=[w`1]…[w`m]。
3.5) to weight matrix W=[w`1]…[w`m] singular value decomposition it is related to non-axis feature to calculate axis feature Property.Weight-SCL algorithm passes through calculating | V | × m ties up parameter matrix W=[w`1]…[w`m] singular value decomposition come identify axis spy Sign and the correlation between non-axis feature.W is in the form of multiple linear classifiers to related between axis feature and non-axis feature Structure is encoded.Therefore, the column of U have determined the common minor structure in these classifiers.It selects related to maximum singular value The U of connection is arranged, the available part with correlation maximum in W of this minor structure.θ is defined as related to k maximum singular value Those of connection U column,
4) staff of aiming field and source domain and the similarity of project are calculated separately using mappings characteristics space, using reality Example migration classification algorithm training recommender system model.
The similarity, obtains in the following manner:
4.1) label vector matrix TM is added to the unified vector DM of acquisition with keyword vector matrix K M, to establish similarity Matrix SM.
An example s in the similarity matrix SM, it is necessary first to arbitrary project t and personnel u be taken to construct them Similarity vector p.Establish formula d=| fu,i-ft,i|, pi=m-d, in which: for i-th of element p_i of p, user of service The absolute distance d of i-th of element f_ (t, i) of i-th of element f_ (u, i) and item feature vector f_t of feature vector f_u comes Indicate similarity, absolute distance is lower, and similarity is higher.Absolute distance is subtracted using a definite value m, obtains similarity.Definite value m To uniform greatest measure.It there is negative value to avoid p_i, m recommends to take the maximum value in feature vector.
It is divided into four kinds of situations.Situation one: there are certain feature but personnel do not have in project.Situation two: personnel have certain spy It levies and is not present in project.Situation three: project and personnel do not have certain feature.Situation four: project and personnel have certain spy Sign.
The similarity being calculated in situation one and situation two should be consistent.But for actual demand, it is believed that situation One similarity should be lower than the similarity of situation two, and reason is apparent: the demand of project should be than the ability of personnel It is more important.When therefore for detecting that situation one occurs, absolute distance is greater than 1 coefficient a by the present invention multiplied by one, so that Similarity is lower.This coefficient a is known as negative feature coefficient.Formula p is revised as in the similarity formula of situation oncei=m-a* d。
The similarity being calculated in situation three and situation four should be consistent.But for actual demand, it is believed that situation The case where four is higher than the similarity of situation three, and project and personnel have certain feature is less common to be also worth being taken seriously, it says The demand that project is illustrated can be met by personnel.And all do not have certain be characterized in it is more common and universal.It is demonstrated by project Without certain demand, personnel are also unessential without the phenomenon that ability.Therefore when detecting situation three, one will be subtracted Coefficient z greater than 0, so that similarity is lower.This coefficient z is become into zero characteristic coefficient.Similarity formula in situation three It is revised as formula pi=m-d-z.
Obtain the similarity p of the ith feature between any two personnel and projecti, calculated by identical method The all elements value of similarity vector p, to construct outgoing vector p.Use the similarity vector p of project and personnel as similar degree Label according to the feature of the corresponding instance s of matrix SM, whether registration as SM.Wherein project and there are registration relationships, and mark is then arranged Signing l is 1, and such example is also referred to as positive example;It is 0 that label l, which is then arranged, there is no registration relationship, and such example is also referred to as Negative example.For the example s of each project and personnel, all have corresponding feature vector p and label l, i.e. s=(p, l) | l= {0,1}}.In addition the quantity ratio for defining positive example and negative example is k,|S1| indicate the quantity of positive example, | S0| indicate negative example Quantity.
4.2) positive and negative example is constructed.There is registration relationship between positive example personnel and project.Selection project t, personnel u meet effect With respective value U in matrixt,u=1.And the building for negative example, some calculating phases are chosen from the project that personnel did not register Like degree p, negative example is constituted to similarity is calculated using the high item design personnel project of the exposure for relationship of not registering with personnel. It is then successively examined according to this item sequence according to the exposure Pop (t) of project t to the entry sorting in project set T first Look into whether personnel and project have registration relationship, if so, then continuing checking next;If it is not, using current project and Personnel construct negative example.
To homogenization greatest measure m, negative feature coefficient a, zero characteristic coefficient z and positive and negative example ratio k, this four numerical value Assignment has been carried out, m=10, a=2, z=8, this class value of k=1.2, so that the numerical value that system performance is optimal are had chosen.
4.3) application existing TrAdaBoost algorithm combination logistic regression algorithm training recommender system model.
5) recommender system model recommended project is applied.Recommended using logistic regression algorithm.The algorithm is utilized Sigmoid function,Anticipation function isVector x is input to hθ(x) In, obtained value represents the probability that result takes 1: and P (y=1 | x;θ)=hθ(x)。
The present embodiment is related to a kind of system for realizing the above method, comprising: collection module, characteristic vector module, mapping mould Block and recommending module, in which: collection module collects target numeric field data and source domain data;Characteristic vector module receives target numeric field data With source domain data and establish the characteristic vector of developer and project;The feature that mapping block is established between aiming field and source domain is reflected It penetrates, mappings characteristics space is established with Feature Mapping;Recommending module calculates separately staff and the project of aiming field and source domain Similarity, application example migrate classification algorithm training recommender system model and recommended project.
Compared with prior art, it proposes a kind of software crowdsourcing item recommendation method and system based on transfer learning, mentions The high accuracy rate recommended, using the data on other crowdsourcing platforms, the source platform progress proposed algorithm of auxiliary data scarcity Training solves the problems, such as the platform cold start-up of recommender system, improves the accuracy of project recommendation.
Have chosen domestic two famous software crowdsourcing platforms: one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity, solution, which distribute telephone numbers, to carry out data and crawls and compare Test.Each platform has a characteristic data of oneself, but while testing be used only item description information, item label and project with Connect the registration, acceptance of the bid and completion three classes relationship of Bao Fang.Details as Follows shown in table for data on software crowdsourcing platform.From one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity Website has crawled project 6000 in total, meets 10597 people of packet side, and the relationship for connecing the registration of Bao Fangyu project is 48769;From liberation The project that crawls in number 2800 in total, packet side totally 5231 people is met, existing registration relationship number is 15498.
Since the data that solution is distributed telephone numbers are considerably less than the data of one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity, use the data of one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity as source domain number According to the data for using solution to distribute telephone numbers are distributed telephone numbers on platform from one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity platform migration to solution as target domain data, by data.
The data that the Chinese two famous crowdsourcing platforms of table 1 are crawled
P@k and R@k two indices are selected to evaluate the superiority and inferiority of proposed algorithm.P@k reflection is to recommend accuracy, that is, is being pushed away In the top-k project recommended, how many project is that there are registration relationships with personnel;And R@k reflection is recall rate, that is, is being tested The top-k project how many is concentrated recommend.
A personnel u is inputted in proposed algorithm, algorithm can return to an item destination aggregation (mda), wherein each project has Corresponding trained values represent personnel and the matched degree of project.Algorithm can be ranked up trained values, just obtain one and push away Recommend sequence.
Assuming that the sequence of the acquisition after sequence is lu, defined function h (k, lu), a sequence is inputted, is returned in list The item of top-k.It usesAnd Two indices assess recommendation effect.Wherein TestuExpression personnel u corresponding Item Sets in test set;Dev indicates personnel's collection.
The method compared with recommender system chosen is: the nearest neighbor algorithm based on content Similarity is calculated using content matrix.Wherein simi,i′Similarity between expression item, IkIt gives and item i most like top-k ?;And SCL algorithm.Only training and the test on liberation number collection of ICBNN algorithm;And SCL algorithm and multi-source proposed algorithm It distributes telephone numbers in solution and is trained in one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity's two datasets, tested on liberation number collection.
It as shown in Figure 2 to 4, is R@5 on software crowdsourcing platform, R@10, the comparison diagram of R@15;As shown in Fig. 5~Fig. 7, For P@5 on software crowdsourcing platform, P@10, the comparison diagram of P@15.By above data, it can be seen that on P and R.Four algorithms Show almost the same, i.e., the data of single source proposed algorithm ratio ICBNN algorithm are slightly higher, and gap is not very big;And multi-source is recommended The data of algorithm ratio SCL algorithm are obviously high;Multi-source proposed algorithm is more much higher than the data of single source proposed algorithm.
Test data is analyzed as follows: multi-source proposed algorithm due to used propose for two category feature problems Weight-SCL algorithm is well many than the effect of basic skills SCL algorithm;And multi-source proposed algorithm recommends to calculate compared to single source Method has migrated the data on one of the chief characters in "Pilgrimage To The West" who was supposedly incarnated through the spirit of pig, a symbol of man's cupidity's platform, is equivalent to and has carried out very big expansion to data set, therefore proposed algorithm effect phase There is the promotion of 1.2X for ICBNN method.
Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute Limit, each implementation within its scope is by the constraint of the present invention.

Claims (9)

1. a kind of software crowdsourcing item recommendation method based on transfer learning, which is characterized in that collection needs to recommend soft first The data of part crowdsourcing platform are collected simultaneously soft as target numeric field data, the data of other software crowdsourcing platforms as source domain data Project data and developer's data on part crowdsourcing platform;Then the feature vector for establishing developer and project respectively is to open Hair personnel and project model respectively, respectively constitute characteristic data set with this feature vector;It is based on target domain characterization space and source again Relativeness between characteristic of field space establishes new mappings characteristics space with Feature Mapping, by the feature of source domain and aiming field DUAL PROBLEMS OF VECTOR MAPPING realizes the feature alignment of aiming field and source domain into new feature space;Finally using in mappings characteristics space Developer's feature vector and item feature vector calculate separately the developer of aiming field and source domain and the similarity of project, Application example migrates classification algorithm training recommender system model, and applies recommender system model recommended project.
2. the software crowdsourcing item recommendation method according to claim 1 based on transfer learning, characterized in that described builds Mould, specifically includes the following steps:
1) BOW vector, the BOW Vector Groups of all source domain projects are established to the label of each of aiming field and source domain project respectively At the TM for indicating the label substance of all items on source domain platforms,tMatrix;The BOW vector of all aiming field projects, which forms, to be indicated The TM of the label substance of all items on source domain platformt,tMatrix;
2) the TF-IDF value that the word w in each item description is calculated using existing TF-IDF method, utilizes all words TF-IDF value forms the description vectors of project, and the description vectors composition of all source domain projects indicates all items on source domain platform The KM of description contents,tMatrix;The description vectors composition of all aiming field projects indicates on source domain platform in the description of all items The KM of appearancet,tMatrix;
3) the exposure Pop (j) for establishing each project j is calculated;
4) the label vector matrix TM of source domain all staff on board is establisheds,uWith keyword vector matrix K Ms,u;Establish aiming field institute There is the label vector matrix TM of stafft,uWith keyword vector matrix K Mt,u
3. the software crowdsourcing item recommendation method according to claim 2 based on transfer learning, characterized in that the exposure Luminosity Pop (j)=α (ts,tc)*Popr(j)+β(ts,tc), in which: Popr(j)=Sum (U:,j), α (ts,tc)、β(ts,tc) be T is arranged in the two time weighting factorssFor the issuing time of project,β(ts,tc)=ln (tc-ts+1)。
4. the software crowdsourcing item recommendation method and system according to claim 3 based on transfer learning, characterized in that institute The label vector matrix statedKeyword vector matrixWherein: λA, λB, λFRespectively registration weight, acceptance of the bid Weight and completion weight, Au、BuAnd FuRespectively indicate the project set that staff u registered.
5. the software crowdsourcing item recommendation method according to claim 4 based on transfer learning, characterized in that described builds New mappings characteristics space is stood, specifically includes the following steps:
1) important feature and insignificant feature of source domain and aiming field are merged into important feature collection F`t,fWith insignificant feature set F `k,f, from important feature set F`t,fMiddle chosen axis feature set Pt, in insignificant feature set F`k,fChosen axis feature set Pk, obtain axis Feature set P;
2) cluster is carried out using k-means clustering algorithm and obtain m cluster, using cluster centre as final axis feature;
3) weighted feature collection D is calculated:=KM:,t+TM:,t* W, in which: TM:,tFor the TM obtained in modeling processs,tOr TMt,t。KM:,t The KM obtained in modeling processs,tOr KMt,t, W is weight;
4) correlation between each axis feature p and every other word w is calculated by training linear classifier to carry out axis Feature Mapping obtains weight matrix W=[w`1]…[w`m];
5) to weight matrix W=[w`1]…[w`m] singular value decomposition calculate the correlation of axis feature with non-axis feature.
6. the software crowdsourcing item recommendation method according to claim 5 based on transfer learning, characterized in that the phase Like degree, unified vector DM is obtained by the way that label vector matrix TM to be added with keyword vector matrix K M, to establish similarity moment Battle array SM;Positive and negative example is constructed again to be achieved.
7. the software crowdsourcing item recommendation method according to claim 1 based on transfer learning, characterized in that the instruction Practice, is realized by application TrAdaBoost algorithm.
8. the software crowdsourcing item recommendation method according to claim 1 based on transfer learning, characterized in that described pushes away System model application logistic regression algorithm is recommended to be recommended.
9. a kind of system for realizing any of the above-described claim the method characterized by comprising collection module, Characteristic Vectors Measure module, mapping block and recommending module, in which: collection module collects target numeric field data and source domain data, characteristic vector module It receives target numeric field data and source domain data and the characteristic vector for establishing developer and project, mapping block establishes aiming field and source Feature Mapping between domain establishes mappings characteristics space with Feature Mapping, and recommending module calculates separately the work of aiming field and source domain Make the similarity of personnel and project, application example migrates classification algorithm training recommender system model and recommended project.
CN201710959395.1A 2017-10-16 2017-10-16 Software crowdsourcing item recommendation method and system based on transfer learning Pending CN110020121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710959395.1A CN110020121A (en) 2017-10-16 2017-10-16 Software crowdsourcing item recommendation method and system based on transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710959395.1A CN110020121A (en) 2017-10-16 2017-10-16 Software crowdsourcing item recommendation method and system based on transfer learning

Publications (1)

Publication Number Publication Date
CN110020121A true CN110020121A (en) 2019-07-16

Family

ID=67186635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710959395.1A Pending CN110020121A (en) 2017-10-16 2017-10-16 Software crowdsourcing item recommendation method and system based on transfer learning

Country Status (1)

Country Link
CN (1) CN110020121A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159542A (en) * 2019-12-12 2020-05-15 中国科学院深圳先进技术研究院 Cross-domain sequence recommendation method based on self-adaptive fine-tuning strategy
CN111324812A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Federal recommendation method, device, equipment and medium based on transfer learning
CN111932108A (en) * 2020-08-06 2020-11-13 北京航空航天大学杭州创新研究院 Developer recommendation method oriented to group software process
CN112396092A (en) * 2020-10-26 2021-02-23 北京航空航天大学 Crowdsourcing developer recommendation method and device
CN112417288A (en) * 2020-11-25 2021-02-26 南京大学 Task cross-domain recommendation method for crowdsourcing software testing
CN112767009A (en) * 2020-12-31 2021-05-07 上海梦创双杨数据科技股份有限公司 Training item recommendation method based on WeChat public number training registration
CN113222073A (en) * 2021-06-09 2021-08-06 支付宝(杭州)信息技术有限公司 Method and device for training transfer learning model and recommendation model
CN113343087A (en) * 2021-06-09 2021-09-03 南京星云数字技术有限公司 Method and system for acquiring marketing user
US20230053820A1 (en) * 2021-08-19 2023-02-23 Red Hat, Inc. Generating a build process for building software in a target environment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530428A (en) * 2013-11-04 2014-01-22 武汉大学 Same-occupation type recommendation method based on developer practical skill similarity
EP2860672A2 (en) * 2013-10-10 2015-04-15 Deutsche Telekom AG Scalable cross domain recommendation system
WO2015192655A1 (en) * 2014-06-20 2015-12-23 华为技术有限公司 Method and device for establishing and using user recommendation model in social network
EP2983123A1 (en) * 2014-07-17 2016-02-10 Deutsche Telekom AG Self transfer learning recommendation method and system
CN105447145A (en) * 2015-11-25 2016-03-30 天津大学 Item-based transfer learning recommendation method and recommendation apparatus thereof
CN106201465A (en) * 2016-06-23 2016-12-07 扬州大学 Software project personalized recommendation method towards open source community
CN106227767A (en) * 2016-07-15 2016-12-14 华侨大学 A kind of based on the adaptive collaborative filtering method of field dependency
US20170220951A1 (en) * 2016-02-02 2017-08-03 Xerox Corporation Adapting multiple source classifiers in a target domain

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2860672A2 (en) * 2013-10-10 2015-04-15 Deutsche Telekom AG Scalable cross domain recommendation system
CN103530428A (en) * 2013-11-04 2014-01-22 武汉大学 Same-occupation type recommendation method based on developer practical skill similarity
WO2015192655A1 (en) * 2014-06-20 2015-12-23 华为技术有限公司 Method and device for establishing and using user recommendation model in social network
EP2983123A1 (en) * 2014-07-17 2016-02-10 Deutsche Telekom AG Self transfer learning recommendation method and system
CN105447145A (en) * 2015-11-25 2016-03-30 天津大学 Item-based transfer learning recommendation method and recommendation apparatus thereof
US20170220951A1 (en) * 2016-02-02 2017-08-03 Xerox Corporation Adapting multiple source classifiers in a target domain
CN106201465A (en) * 2016-06-23 2016-12-07 扬州大学 Software project personalized recommendation method towards open source community
CN106227767A (en) * 2016-07-15 2016-12-14 华侨大学 A kind of based on the adaptive collaborative filtering method of field dependency

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JIANGANG ZHU等: "A Learning to Rank Framework for Developer Recommendation in Software Crowdsourcing", 《2015 ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC)》 *
NING LI等: "Task Recommendation with Developer Social Network in Software Crowdsourcing", 《2016 23RD ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC)》 *
刘桂峰等: "一种改进的多源域多视角学习算法", 《青岛大学学报(自然科学版)》 *
柯良文等: "基于用户特征迁移的协同过滤推荐", 《计算机工程》 *
董爱美等: "基于迁移共享空间的分类新算法", 《计算机研究与发展》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159542A (en) * 2019-12-12 2020-05-15 中国科学院深圳先进技术研究院 Cross-domain sequence recommendation method based on self-adaptive fine-tuning strategy
CN111324812A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Federal recommendation method, device, equipment and medium based on transfer learning
CN111932108A (en) * 2020-08-06 2020-11-13 北京航空航天大学杭州创新研究院 Developer recommendation method oriented to group software process
CN111932108B (en) * 2020-08-06 2022-07-19 北京航空航天大学杭州创新研究院 Developer recommendation method oriented to group software process
CN112396092A (en) * 2020-10-26 2021-02-23 北京航空航天大学 Crowdsourcing developer recommendation method and device
CN112396092B (en) * 2020-10-26 2023-09-29 北京航空航天大学 Crowdsourcing developer recommendation method and device
CN112417288A (en) * 2020-11-25 2021-02-26 南京大学 Task cross-domain recommendation method for crowdsourcing software testing
CN112417288B (en) * 2020-11-25 2024-04-12 南京大学 Task cross-domain recommendation method for crowdsourcing software test
CN112767009B (en) * 2020-12-31 2023-07-18 上海梦创双杨数据科技股份有限公司 Training project recommendation method based on WeChat public number training registration
CN112767009A (en) * 2020-12-31 2021-05-07 上海梦创双杨数据科技股份有限公司 Training item recommendation method based on WeChat public number training registration
CN113222073A (en) * 2021-06-09 2021-08-06 支付宝(杭州)信息技术有限公司 Method and device for training transfer learning model and recommendation model
CN113343087A (en) * 2021-06-09 2021-09-03 南京星云数字技术有限公司 Method and system for acquiring marketing user
US20230053820A1 (en) * 2021-08-19 2023-02-23 Red Hat, Inc. Generating a build process for building software in a target environment
US11995420B2 (en) * 2021-08-19 2024-05-28 Red Hat, Inc. Generating a build process for building software in a target environment

Similar Documents

Publication Publication Date Title
CN110020121A (en) Software crowdsourcing item recommendation method and system based on transfer learning
Zhang et al. Can consumer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp
Fan et al. Product-aware helpfulness prediction of online reviews
CN109189904A (en) Individuation search method and system
Tyrrell et al. A framework for assessing direct economic impacts of tourist events: Distinguishing origins, destinations, and causes of expenditures
Elliott et al. Spatial statistical methods in environmental epidemiology: a critique
CN111259263B (en) Article recommendation method and device, computer equipment and storage medium
CN109345348A (en) The recommended method of multidimensional information portrait based on travel agency user
CN103761254B (en) Method for matching and recommending service themes in various fields
CN111274330B (en) Target object determination method and device, computer equipment and storage medium
CN109493199A (en) Products Show method, apparatus, computer equipment and storage medium
Yang et al. Tag-based expert recommendation in community question answering
CN109408712A (en) A kind of construction method of travel agency user multidimensional information portrait
CN110222709A (en) A kind of multi-tag intelligence marking method and system
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
Rekabsaz et al. Measuring societal biases from text corpora with smoothed first-order co-occurrence
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
Schmink Dependent development and the division of labor by sex: Venezuela
Li et al. Mining online reviews for ranking products: A novel method based on multiple classifiers and interval-valued intuitionistic fuzzy TOPSIS
CN113535949B (en) Multi-modal combined event detection method based on pictures and sentences
Goli et al. A bias correction approach for interference in ranking experiments
CN112348300A (en) Method and device for pushing information
KR102457904B1 (en) System for providing sports lesson matching service
CN107169837B (en) Method, device, electronic equipment and computer readable medium for assisting search
Li et al. Incorporating facial attractiveness in photos for online dating recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190716

WD01 Invention patent application deemed withdrawn after publication