CN115186174A - Crowdsourcing task personalized recommendation method and system based on machine learning - Google Patents
Crowdsourcing task personalized recommendation method and system based on machine learning Download PDFInfo
- Publication number
- CN115186174A CN115186174A CN202210620698.1A CN202210620698A CN115186174A CN 115186174 A CN115186174 A CN 115186174A CN 202210620698 A CN202210620698 A CN 202210620698A CN 115186174 A CN115186174 A CN 115186174A
- Authority
- CN
- China
- Prior art keywords
- task
- worker
- workers
- tasks
- motivation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000010801 machine learning Methods 0.000 title claims abstract description 23
- 230000003993 interaction Effects 0.000 claims abstract description 29
- 230000008450 motivation Effects 0.000 claims description 51
- 238000012549 training Methods 0.000 claims description 18
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000013145 classification model Methods 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 230000009193 crawling Effects 0.000 claims description 4
- 230000007115 recruitment Effects 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 230000002349 favourable effect Effects 0.000 claims description 2
- 230000003542 behavioural effect Effects 0.000 abstract description 8
- 230000008569 process Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000009133 cooperative interaction Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 208000018459 dissociative disease Diseases 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012358 sourcing Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 208000025174 PANDAS Diseases 0.000 description 1
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036963 noncompetitive effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012946 outsourcing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229910052711 selenium Inorganic materials 0.000 description 1
- 239000011669 selenium Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- BGRJTUBHPOOWDU-UHFFFAOYSA-N sulpiride Chemical compound CCN1CCCC1CNC(=O)C1=CC(S(N)(=O)=O)=CC=C1OC BGRJTUBHPOOWDU-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Mathematical Physics (AREA)
- Entrepreneurship & Innovation (AREA)
- Computing Systems (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a crowdsourcing task personalized recommendation method and system based on machine learning, and relates to the technical field of task scheduling. According to the invention, a complete and fine worker feature recognition system is constructed based on a fuger behavioral model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of crowdsourcing platforms. Meanwhile, historical data of workers and task data in a period of time are crawled, the current worker-task interaction data is constructed, a worker data set covering the past and current information of the workers is formed, and the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally realizes personalized recommendation of crowdsourcing tasks.
Description
Technical Field
The invention relates to the technical field of task scheduling, in particular to a crowdsourcing task personalized recommendation method and system based on machine learning.
Background
Crowdsourcing refers to the practice of a company or organization outsourcing work tasks performed by employees in the past to unspecified public volunteers in a free-voluntary manner. The crowd-sourcing is social production, the value is created by users together, the method is a typical application scene of group intelligent cooperation, the task completion efficiency can be greatly improved, and the collective effect in the group society is fully exerted.
The method for recommending the user in the traditional task recommendation method comprises the following steps: the method comprises the steps of firstly obtaining task data and worker data on a crowdsourcing platform, outputting a feature vector table of worker capacity, and then generating a task recommendation list according to a task bid record of workers with similar capacity and a bid record of a target worker for recommendation.
However, the existing recommendation algorithm has the following defects:
1. the measurement standard of the characteristics of workers is rough, and the driven machine and the capability are usually identified by a certain dimension, or the determination factor or the relevant prediction factor of the characteristics cannot be completely identified based on the relevant theory;
2. traditional recommendations are often based only on historical data of tasks and workers, and interaction characteristics of the workers and the tasks change along with time, so that acquisition of worker characteristic data is limited, and characteristics of the workers cannot be effectively measured;
3. the current recommendation succession is from feature modeling of single attribute, to joint interaction of users and tasks, and to acquisition of implicit feedback information in features. Most also stay in the modeling of single attributes and joint interactions, with a small percentage beginning to focus on implicit feedback information.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a crowdsourcing task personalized recommendation method and system based on machine learning, and solves the problem that the accuracy of the existing method for worker feature recognition is poor.
(II) technical scheme
In order to realize the purpose, the invention is realized by the following technical scheme:
in a first aspect, a crowd-sourced task personalized recommendation method based on machine learning is provided, and the method includes:
acquiring historical characteristic data of workers and task characteristic data; acquiring worker-task interaction characteristic data based on the task characteristic data;
and training a worker classification model considering worker motivation and worker capability based on the preprocessed worker historical characteristic data and the worker-task interaction characteristic data.
Based on the trained worker classification model, workers with high motivation and high capacity are screened out from the characteristic data set of the workers to be classified, and task recommendation is carried out on the workers.
Further, acquiring historical worker feature data and task feature data; and acquiring worker-task interaction feature data based on the task feature data, including:
s101, crawling all historical worker feature data and task feature data in a period of time from a crowd-bag platform;
s102, preprocessing the task characteristic data, and coding the task characteristic data to obtain a task characteristic data set;
s103, enabling the task characteristic data set to be in a time sequence of 1:1, dividing the data into a training set and a test set, and acquiring worker-task interaction characteristic data based on the training set;
s104, preprocessing the historical characteristic data of the workers, and coding to obtain a historical characteristic data set of the workers;
and S105, preprocessing the worker-task interaction characteristic data and coding.
Further, the preprocessing the task feature data includes:
screening out task characteristic data of people with task bids larger than or equal to M1;
rejecting tasks which are not completed or are not competitively recruited;
tasks with missing key features such as task selection reward, task release time, task deadline and the like are removed;
duplicate removal is carried out on repeated task characteristic data in the list;
removing abnormal values in the task list;
the preprocessing of the historical characteristic data of the workers comprises the following steps:
eliminating workers with the past electing suggestion number less than or equal to M2;
rejecting workers with past suggested number less than or equal to M3;
the preprocessing of the worker-task interaction feature data comprises the following steps:
eliminating workers with the number of the participating tasks less than M4;
eliminating workers with the number of winning bid tasks less than M5;
and completing the missing value.
Further, the training of the worker classification model considering worker motivation and worker ability based on the preprocessed worker historical feature data and the worker-task interaction feature data comprises:
s201, constructing a capability index and a motivation index of a worker;
s202, determining the weight of each variable of the capability index of the worker and the weight of each variable of the motivation index to obtain the motivation value and the capability value of each worker;
and S203, acquiring a threshold line model used as a division standard of high-motivation and high-capacity workers.
Further, the worker's ability index includes: inherent ability, professional ability, general experience, professional experience, diversity, complexity;
motivational metrics for the worker, including: enjoyment and enjoyment, work autonomy, task complexity, self-marketing/sense of ownership;
and the variables of intrinsic capacity include: platform scoring, platform authentication level and past bid winning times;
the variables of expertise include: the number of authentication categories, the number of authentication sub-categories, the number of authentication skills, the number of skills over ten years;
the general empirical variables include: past task suggestion quantity, recent participation task quantity, recent bid-winning task quantity and hourly rate;
the variables of the expertise include: the method comprises the following steps of (1) authenticating the industry number, the number of tasks contained in recent participation tasks and the number of tasks contained in recent bid winning tasks;
the variables of the diversity include: the number of recent participation in different task categories, the number of recent participation in different task sub-categories, the number of recent bid-winning different task categories, and the number of recent bid-winning different task sub-categories;
the variables of complexity include: the average number of bidders participating in the task recently, the average number of suggestions participating in the task recently, the average number of collectors participating in the task recently, and the average number of browsing times of participating in the task recently;
the variables for enjoyment and enjoyment include: average task rewards and task approval rates;
the variables of work autonomy include: average recruitment time, task completion rate;
the variables of task complexity include: the ratio of the average collection number of participants to the number of bidders of the tasks, the ratio of the average collection number of the winning tasks to the number of bidders of the tasks, the ratio of the average browsing number of participants to the collection number of the tasks, and the ratio of the average browsing number of the winning tasks to the collection number of the tasks;
the variables of self-marketing/sense of attribution include: favorable comment rate and comment number.
Further, the determining the weight of each variable of the worker's ability index and the weight of each variable of the motivation index to obtain the motivation value and the ability value of each worker includes:
determining the weight of each variable based on a CRITIC objective weighting method;
and the calculation formulas of the motivation value and the capability value of the worker are as follows:
a i indicating the capacity value size of the ith worker;
m i represents the magnitude of motivation of the ith worker;
x ij a j capability variable value representing an ith worker;
y ik representing the value of the kth motivational variable for the ith worker.
Further, the threshold line model includes:
Motivition=m i -M f +β
Ability=a i -A f +α
wherein,
M f kolmogorov mean values motivating workers;
A f kolmogorov mean value for worker competence;
α represents a parameter of the adjustment on the capability axis;
beta represents a parameter regulated on the motor shaft;
and constructing a loss function lost, applying a five-fold cross validation training alpha and beta to make the lost function converged, and taking the average value of the alpha and the beta as a final result.
Further, based on the trained worker classification model, a worker with high motivation and high capacity is screened out from the feature data set of the worker to be classified, and the method comprises the following steps: and when the motivation value and the ability value of the workers to be classified are both larger than the threshold value line, identifying the workers with high motivation and high ability.
In a second aspect, a system for personalized recommendation of crowdsourced tasks based on machine learning is provided, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.
(III) advantageous effects
The invention provides a crowdsourcing task personalized recommendation method and system based on machine learning. Compared with the prior art, the method has the following beneficial effects:
according to the invention, a complete and fine worker feature recognition system is constructed based on a fuger behavioral model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of crowdsourcing platforms. Meanwhile, historical data of workers and task data in a period of time are crawled, the current worker-task interaction data is constructed, a worker data set covering the past and current information of the workers is formed, and the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally achieves personalized recommendation of crowdsourcing tasks.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete description of the technical solutions in the embodiments of the present invention, it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application solves the problem that the accuracy of the existing method for identifying the characteristics of workers is poor by providing the crowdsourcing task personalized recommendation method and system based on machine learning.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
as shown in fig. 1, the present invention provides a method for personalized recommendation of a crowdsourcing task based on machine learning, which comprises:
s1, acquiring historical characteristic data of workers and task characteristic data; acquiring worker-task interaction characteristic data based on the task characteristic data;
and S2, training a worker classification model considering worker motivation and worker capability based on the preprocessed worker historical characteristic data and the worker-task interaction characteristic data.
And S3, based on the trained worker classification model, screening out workers with high motivation and high capacity from the characteristic data set of the workers to be classified, and recommending tasks for the workers.
The beneficial effect of this embodiment does:
according to the invention, a complete and fine worker feature identification system is constructed based on a fugue behavioural model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of a crowdsourcing platform. Meanwhile, historical data of workers and task data for a period of time are crawled, and the current worker-task interaction data is constructed to form a worker data set covering the past and current information of the workers, so that the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally achieves personalized recommendation of crowdsourcing tasks.
The following describes the implementation process of the embodiment of the present invention in detail:
s1, acquiring historical characteristic data of workers and task characteristic data; and obtaining worker-task interaction feature data based on the task feature data.
The method specifically comprises the following steps:
s101, crawling all historical worker feature data and task feature data in a period of time from a crowd-sourcing platform.
In specific implementation, a lightweight crawler frame script based on Python can be used as a base, webpage data extraction and analysis are carried out through XPath and CSS expressions, a Redis database is used as a distributed shared crawler queue, a MongoDB database is used as a data storage library, a Selenium automated testing tool is integrated, middleware such as a random User-Agent, an Ali cloud Agent IP and a self-built Agent IP pool are used at the same time, the middleware is deployed to a cloud server, historical feature data of all workers in a certain class are crawled, and large-scale real-time incremental crawling of historical feature data and task feature data of workers on a crowdsourcing platform is realized.
And as shown in table 1, the worker historical feature data includes: worker ID, platform certification level, platform rating, worker goodness, task completion rate, task approval rate, work rate per hour, certification category, certification industry, certification skill, skill duration, past total number of suggestions (total number of suggested task solutions submitted in the past), past total number of elected suggestions (number of task solutions ranked first).
The task feature data includes: the system comprises a task name, a task recruitment state, a task type, a task category, a task subcategory, a task industry, a task release time, a task deadline, a task winning reward (a monetary reward obtained by ranking a task solution first), a task participation reward (some tasks are different according to tasks, and in order to attract more workers to bid, the task reward is divided into the winning reward and the participation reward, wherein a large part of reward is provided for the first number of tasks except the first number of tasks, a small part of reward is provided for the first number of tasks except the first number of tasks), a task suggestion number (a task suggestion scheme submitted by the number of workers of the task), a task bidding number, a task bid winning number, a task collection number and a task browsing number. In the present invention, the winning symbol indicates selection or a parameter.
TABLE 1
S102, after the crawled data are obtained, firstly preprocessing the task characteristic data, and coding the task characteristic data to obtain a task characteristic data set.
In specific implementation, the pretreatment mainly comprises data screening and data cleaning in sequence.
Wherein the data screening comprises:
1) And screening out task characteristic data of the number of the task bidders greater than or equal to M1.
2) And rejecting incomplete or non-competitive tasks for recruitment. (this embodiment is directed to contest-like tasks only).
Project class tasks: the task bonus is an interval, the submitted proposal is a proposal, and a contract is required to be signed after the subsequent completion.
Competition type tasks: the prize money of the task is a fixed value, and the submitted suggestion can be winning the prize money.
Task class task: a prize may be awarded (e.g., filling a questionnaire) upon completion of the task.
3) And eliminating tasks with missing key features such as task selection reward, task release time, task deadline and the like.
The data cleansing includes:
1) Duplicate removal is carried out on repeated task characteristic data in the list;
2) And removing abnormal values in the task list.
M1 is a preset threshold value and can be set according to actual needs.
For data encoding, in specific implementation, a python operating environment can be constructed first, and a corresponding encoding mode is adopted for each preprocessed data.
Label coding is adopted for classified and graded data, so that each type can be described by one number;
directly coding data of specific numerical values;
the normalization processing is also needed, because the sample feature data are different in type and dimension and have large absolute value difference, some features with small value range are ignored, and the data normalization improves the convergence rate and model accuracy of the model.
Thus, a task feature data set can be obtained.
S103, enabling the task characteristic data set to be in a time sequence of 1:1, dividing the training set into a training set and a testing set, applying the task training set, and acquiring worker-task interaction characteristic data through pandas and numpy packages of python.
As shown in Table 2, the worker-task interaction feature data includes: the method comprises the following steps of providing a task name, a task number, a task category number, a task subcategory number, a task industry number, a bid winning task name, a bid winning task number, a bid winning task category, a bid winning task subcategory number, a bid winning task industry number, a task average reward, a task average work time, a bid winning task average reward and a bid winning task average work time for each worker.
TABLE 2
And S104, preprocessing the crawled historical characteristic data of the workers, and coding to obtain a historical characteristic data set of the workers.
In specific implementation, the pretreatment mainly comprises data screening and data cleaning in sequence.
Wherein the data screening comprises:
removing the following historical worker characteristic data:
1) Workers whose elected advice number is less than or equal to M2 have been elected in the past.
2) The number of workers less than or equal to M3 has been suggested in the past.
For the encoding of the historical characteristic data of the workers, a python operating environment can be constructed firstly during specific implementation, and a corresponding encoding mode is adopted for each preprocessed data.
Label coding is adopted for classified and graded data, so that each type can be described by one number;
directly coding data of specific numerical values;
the normalization processing is also needed, because the sample feature data are different in type and dimension and have large absolute value difference, some features with small value range are ignored, and the data normalization improves the convergence rate and model accuracy of the model.
Thus, a historical characteristic data set of the worker can be obtained. For the division of the worker historical feature data set, the following method can be adopted:
the historical worker feature data set is as follows 8: and 2, dividing the training set and the test set into a training set and a verification set by using k-fold cross verification.
And S105, preprocessing the worker-task interaction characteristic data and coding the data.
In specific implementation, the pretreatment mainly comprises data screening and data cleaning in sequence.
Wherein the data screening comprises:
and eliminating the following worker-task interaction characteristic data:
1) Workers with the number of participating tasks smaller than M4;
2) The number of the winning bid tasks is less than M5;
M2-M5 are preset threshold values and can be set according to actual needs.
The data cleansing includes:
and completing the missing value. In this embodiment, the method for completing the missing value is not limited, for example, the similarity between workers may be calculated by a Jaccard similarity method based on the worker-task interaction feature data, that is, the proportion of the number of intersection elements of any two workers a and B in the union of a and B is calculated, that is, the Jaccard similarity between the workers a and B is calculated, and the value obtained by weighted averaging is the missing value.
For the encoding operation, the same encoding method of the worker history feature data and the task feature data may be adopted.
And S2, training a worker classification model considering worker motivation and worker capability based on the preprocessed worker historical characteristic data and the worker-task interaction characteristic data.
Specifically, the method comprises the following steps:
s201, constructing the capability index and motivation index of workers.
For the worker ability, the worker feature and the worker task interaction feature are extracted from the six indexes of inherent ability, professional ability, general experience, professional experience, diversity and complexity to obtain the worker ability index.
Wherein the intrinsic capacity is dependent on the variable x 1 ~x 3 Determining;
professional ability according to variable x 4 ~x 7 Determining;
general experience is based on the variable x 8 ~x 11 Determining;
professional experience based on variable x 12 ~x 14 Determining;
diversity according to the variable x 15 ~x 18 Determining;
complexity rootAccording to the variable x 19 ~x 22 Determining;
for the motivation of workers, motivation indexes of the workers are extracted from four indexes of enjoyment and enjoyment, work autonomy, task complexity and self marketing/attribution.
Wherein enjoyment and enjoyment is based on the variable y 1 ~y 2 Determining;
work autonomy according to variable y 3 ~y 4 Determining;
task complexity in accordance with variable x 5 ~x 8 Determining;
self-marketing/sense of attribution according to variable x 9 ~x 10 And (5) determining.
Worker motivation and ability predictors and related variables are shown in table 3:
TABLE 3
S202, determining the weight of each variable of the worker' S ability index and the weight of each variable of the motivation index to obtain the motivation value and the ability value of each worker.
In specific implementation, objective weights of variables can be comprehensively measured based on the CRITIC objective weighting method.
CRITIC objective weighting method: and comprehensively measuring the objective weight of the variable based on the contrast strength of the evaluation variable and the conflict between the variables. The relevance between the variables is considered while the variability of the variables is considered, the higher the number of the variables is, the more important the variables are, and the objective attributes of the data are completely utilized to carry out scientific evaluation.
The contrast strength refers to the size of the value difference between the evaluation schemes of the same variable, and is expressed by standard deviation, and the larger the standard deviation is, the higher the weight is. And in this embodiment:
x ij a j capability variable value representing an ith worker;
y ik representing a kth motivation variable value for an ith worker;
The conflict between the variables means that if two variables have strong positive correlation, the smaller the conflict is, the lower the weight is, and the correlation coefficient is used for representing the conflict. And in this implementation:
r ij the correlation coefficient between the ith variable and the jth variable representing the capability, and the conflict is represented as (1-r) ij );
And the calculation formula of the parameters is as follows:
and the information quantity contained in the variable is represented by the product of the contrast intensity and the conflict of the variable:
therefore, the objective weight of the variable is calculated as follows:
the calculation formula of the motivation value and the capability value of the worker is as follows:
a i representing the actual capacity value size of the ith worker;
m i representing the actual motivation value magnitude of the ith worker;
p represents the number of capability variables;
q represents the number of motivational variables;
And S203, constructing a threshold line model for being used as a division standard of high-motivation and high-capacity workers.
The threshold line consists of a theoretical limit that indicates, for values above this limit, sufficiently high motivation and ability to effect a change in the behavior of the worker that only one signal-type trigger is required and the correct positioning of the limit is inherently linked to the worker's data set.
Based on the concept description research of the fuger behavioral model, the fixed threshold linear mathematical model is defined as follows:
the central trend of the data set relocates the threshold line based on the Kolmogorov mean value while employing the threshold line displacement criteria to better implement triggers and identify highly motivated, high capacity workers.
Wherein
M f Kolmogorov average for worker motivation,
A f as a Kolmogorov average of worker competency,
the specific calculation formula is as follows:
considering the equation in the application domain, α and β are added to the motivation and the capability respectively to adjust the displacement of the threshold line, the loss function lost is constructed, the α and β are trained by using the five-fold cross validation to converge the lost function lost, the average value of α and β is taken as the training result, and the specific calculation formula is as follows:
Motivition=m i -M f +β
Ability=a i -A f +α
wherein,
motion represents an incentive value corresponding to a threshold line;
abilit represents the capability value corresponding to the threshold line;
m p a predicted value representing a motivation value of the worker;
m i indicating the magnitude of the actual motivation value of the ith worker
α represents a parameter of the adjustment on the capability axis;
beta represents a parameter regulated on the motor shaft;
lost is the constructed loss function.
Finally, the values of alpha and beta are obtained.
The Fogg Behavioral Model (FBM) is called Fogg Behavior model. The fuger behavioral model is an empirical behavioral model, and a behavior must converge three elements at the same time: motivations, capabilities, and triggers. When no action occurs, at least one of the three elements is lost. The horizontal and vertical axes correspond to the level of motivation and ability, respectively, and one threshold line serves as a guideline for behavior change.
And S3, based on the trained worker classification model, screening out workers with high motivation and high capacity from the characteristic data set of the workers to be classified, and recommending tasks.
In the specific implementation process, the first-stage reactor,
according to the trained classification model, identifying the high-motivation high-capacity workers in the test set, wherein the specific identification mode is as follows:
for example, for the ith worker, the actual motivation value and the ability value are m i 、a i If, at the same time:
the worker is identified as a highly motivated and powerful worker.
After the identification is finished, calculating the winning bid probability P of the jth worker with high motivation and high capacity participating in the ith task ij And recommending and sequencing the tasks of the workers according to the bid-winning probability and the number of the allowed bid-winning people of the tasks, and generating a final recommendation list.
Wherein,
P ij a bid winning value representing a j-th worker participating in an i-th task;
a ij indicating the capability value of the jth worker in the ith task;
a ik indicating a capability value of a kth worker in the ith task;
W i indicating the number of workers participating in the ith task.
Example 2
The invention also provides a crowd-sourced task personalized recommendation system based on machine learning, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method when executing the computer program.
It can be understood that, the crowdsourcing task personalized recommendation system based on machine learning provided by the embodiment of the present invention corresponds to the crowdsourcing task personalized recommendation method based on machine learning, and relevant content explanations, examples, beneficial effects and the like of the system can refer to corresponding content in the crowdsourcing task personalized recommendation method based on machine learning, and details are not repeated here.
In summary, compared with the prior art, the invention has the following beneficial effects:
(1) according to the invention, a complete and fine worker feature identification system is constructed based on a fugue behavioural model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of a crowdsourcing platform. Meanwhile, historical data of workers and task data for a period of time are crawled, and the current worker-task interaction data is constructed to form a worker data set covering the past and current information of the workers, so that the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally achieves personalized recommendation of crowdsourcing tasks.
It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions in essence or part contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A crowd-sourced task personalized recommendation method based on machine learning is characterized by comprising the following steps:
acquiring historical characteristic data of workers and task characteristic data; acquiring worker-task interaction feature data based on the task feature data;
and training a worker classification model considering worker motivation and worker capability based on the preprocessed worker historical characteristic data and the worker-task interaction characteristic data.
Based on the trained worker classification model, workers with high motivation and high capacity are screened out from the characteristic data set of the workers to be classified, and task recommendation is carried out on the workers.
2. The crowd-sourced task personalized recommendation method based on machine learning as claimed in claim 1, wherein the worker historical feature data and task feature data are obtained; and acquiring worker-task interaction feature data based on the task feature data, including:
s101, crawling all historical worker feature data and task feature data in a period of time from a crowd-bag platform;
s102, preprocessing the task characteristic data, and coding the task characteristic data to obtain a task characteristic data set;
s103, enabling the task characteristic data set to be in a time sequence of 1:1, dividing the data into a training set and a test set, and acquiring worker-task interaction characteristic data based on the training set;
s104, preprocessing and coding historical characteristic data of workers to obtain a historical characteristic data set of the workers;
and S105, preprocessing the worker-task interaction characteristic data and coding the data.
3. The method for personalized recommendation of crowdsourced tasks based on machine learning according to claim 2, wherein the preprocessing of task feature data comprises:
screening out task characteristic data of people with task bids larger than or equal to M1;
rejecting tasks which are not completed or are not competitively recruited;
tasks with missing key features such as task selection reward, task release time, task deadline and the like are removed;
duplicate removal is carried out on repeated task characteristic data in the list;
removing abnormal values in the task list;
the preprocessing of the historical characteristic data of the workers comprises the following steps:
eliminating workers with the past electing suggestion number less than or equal to M2;
rejecting workers with past recommended number less than or equal to M3;
the preprocessing of the worker-task interaction feature data comprises the following steps:
eliminating workers with the number of the participating tasks less than M4;
removing workers with the number of winning bid tasks less than M5;
and completing the missing value.
4. The method for personalized recommendation of crowdsourced tasks based on machine learning according to claim 1, wherein training of a worker classification model considering worker motivation and worker ability based on the preprocessed worker historical feature data and worker-task interaction feature data comprises:
s201, constructing a capability index and a motivation index of a worker;
s202, determining the weight of each variable of the capability index of the worker and the weight of each variable of the motivation index to obtain the motivation value and the capability value of each worker;
and S203, acquiring a threshold line model for being used as a division standard of high-motivation and high-capacity workers.
5. The method for personalized recommendation of crowdsourced tasks based on machine learning of claim 4, wherein the capability index of the worker comprises: intrinsic ability, professional ability, general experience, professional experience, diversity, complexity;
motivational metrics for the worker, including: enjoyment and enjoyment, work autonomy, task complexity, self-marketing/sense of ownership;
and the variables of intrinsic capability include: platform scoring, platform authentication level and past bid-winning times;
the variables of expertise include: the number of authentication categories, the number of authentication sub-categories, the number of authentication skills, the number of skills over ten years;
the general empirical variables include: past task suggestion quantity, recent participation task quantity, recent bid winning task quantity and hourly rate;
the variables of the expertise include: the method comprises the following steps of (1) authenticating the industry number, the number of tasks contained in recent participation tasks and the number of tasks contained in recent bid winning tasks;
the variables of the diversity include: the number of recent participation in different task categories, the number of recent participation in different task sub-categories, the number of recent bid-winning different task categories, and the number of recent bid-winning different task sub-categories;
the variables of complexity include: average number of bidders participating in the task recently, average suggested number of tasks participating recently, average number of collections participating in the task recently, and average number of browsing times of tasks participating recently;
the variables for enjoyment and enjoyment include: average task rewards and task approval rates;
the variables of work autonomy include: average recruitment time, task completion rate;
the variables of task complexity include: the ratio of the average collection number of participants to the number of bidders of the tasks, the ratio of the average collection number of the winning tasks to the number of bidders of the tasks, the ratio of the average browsing number of participants to the collection number of the tasks, and the ratio of the average browsing number of the winning tasks to the collection number of the tasks;
the variables of self-marketing/sense of attribution include: favorable comment rate and number of comments.
6. The method for personalized recommendation of crowdsourcing tasks based on machine learning according to claim 4, wherein the determining weights of the variables of the ability indicators and the weights of the variables of the motivation indicators of the workers to obtain the motivation value and the ability value of each worker comprises:
determining the weight of each variable based on a CRITIC objective weighting method;
and the calculation formulas of the motivation value and the capability value of the worker are as follows:
a i indicating the capacity value size of the ith worker;
m i represents the magnitude of the motivation value of the ith worker;
x ij a j capability variable value representing an ith worker;
y ik representing the value of the kth motivational variable for the ith worker.
7. The method for personalized recommendation of crowdsourcing tasks based on machine learning according to claim 4, wherein the threshold line model comprises:
Motivition=m i -M f +β
Ability=a i -A f +α
wherein,
a i indicating the capacity value size of the ith worker;
m i represents the magnitude of motivation of the ith worker;
M f kolmogorov mean values motivating workers;
A f kolmogorov mean value for worker competence;
α represents a parameter of the adjustment on the capability axis;
beta represents a parameter regulated on the motor shaft;
and constructing a loss function lost, applying five-fold cross validation training alpha and beta to converge the lost function lost, and taking the average value of the alpha and the beta as a final result.
8. The method as claimed in claim 7, wherein the method for personalized recommendation of crowdsourcing tasks based on machine learning is characterized in that the method for personalized recommendation of crowdsourcing tasks based on trained worker classification models is used for screening out workers with high motivation and high capacity from a feature data set of workers to be classified, and comprises: and when the motivation value and the capability value of the workers to be classified are both larger than the threshold value line, the workers with high motivation and high capability are identified.
9. A system for personalized recommendation of crowdsourced tasks based on machine learning, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210620698.1A CN115186174A (en) | 2022-06-02 | 2022-06-02 | Crowdsourcing task personalized recommendation method and system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210620698.1A CN115186174A (en) | 2022-06-02 | 2022-06-02 | Crowdsourcing task personalized recommendation method and system based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115186174A true CN115186174A (en) | 2022-10-14 |
Family
ID=83513470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210620698.1A Pending CN115186174A (en) | 2022-06-02 | 2022-06-02 | Crowdsourcing task personalized recommendation method and system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115186174A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116307294A (en) * | 2023-05-22 | 2023-06-23 | 合肥城市云数据中心股份有限公司 | LBS space crowdsourcing task allocation method based on differential privacy and firefly improvement |
-
2022
- 2022-06-02 CN CN202210620698.1A patent/CN115186174A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116307294A (en) * | 2023-05-22 | 2023-06-23 | 合肥城市云数据中心股份有限公司 | LBS space crowdsourcing task allocation method based on differential privacy and firefly improvement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cheung et al. | A multi-perspective knowledge-based system for customer service management | |
Srivastava et al. | Intelligent employee retention system for attrition rate analysis and churn prediction: An ensemble machine learning and multi-criteria decision-making approach | |
Jordan et al. | Empirical game-theoretic analysis of the TAC supply chain game | |
CN108804319A (en) | A kind of recommendation method for improving Top-k crowdsourcing test platform tasks | |
Arinze | Selecting appropriate forecasting models using rule induction | |
Nassar et al. | Fuzzy clustering validity for contractor performance evaluation: Application to UAE contractors | |
CN112686693A (en) | Method, system, equipment and storage medium for predicting marginal electricity price of electric power spot market | |
CN114357284B (en) | Crowd-sourced task personalized recommendation method and system based on deep learning | |
CN115619571A (en) | Financing planning method, system and device | |
CN115186174A (en) | Crowdsourcing task personalized recommendation method and system based on machine learning | |
CN113672797A (en) | Content recommendation method and device | |
Widjaja et al. | Application of ROC Criteria Prioritization Technique in Employee Performance Appraisal Evaluation | |
CN117807452A (en) | Ordering method, device, equipment and storage medium based on target matching | |
CN114548494A (en) | Visual cost data prediction intelligent analysis system | |
Perng et al. | A service quality improvement dynamic decision support system for refurbishment contractors | |
Pan et al. | Cognitive stress and learning economic order quantity inventory management: An experimental investigation | |
Nadeem et al. | Probec: A Product hunting tool | |
Overby et al. | How reduced search costs and the distribution of bidder participation affect auction prices | |
Ma et al. | A Deep Choice Model for Hiring Outcome Prediction in Online Labor Markets | |
Kar | An approach for prioritizing supplier selection criteria through consensus building using Analytic Hierarchy Process and Fuzzy set theory | |
CN113570455A (en) | Stock recommendation method and device, computer equipment and storage medium | |
Wang et al. | Predicting project success using ANN-ensemble classificaiton models | |
Wang et al. | The application of deep learning algorithm in marketing intelligence | |
Garavaglia | Methodological issues and models in'History Friendly'Simulations | |
Paul et al. | Selection of the most optimal contractor in Indian Construction Industry using TOPSIS and Extended TOPSIS model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |