CN115186174A

CN115186174A - Crowdsourcing task personalized recommendation method and system based on machine learning

Info

Publication number: CN115186174A
Application number: CN202210620698.1A
Authority: CN
Inventors: 彭张林; 杨威; 张强; 陆效农; 万德全; 陈媛媛; 胡欣如
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2022-10-14

Abstract

The invention provides a crowdsourcing task personalized recommendation method and system based on machine learning, and relates to the technical field of task scheduling. According to the invention, a complete and fine worker feature recognition system is constructed based on a fuger behavioral model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of crowdsourcing platforms. Meanwhile, historical data of workers and task data in a period of time are crawled, the current worker-task interaction data is constructed, a worker data set covering the past and current information of the workers is formed, and the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally realizes personalized recommendation of crowdsourcing tasks.

Description

Crowdsourcing task personalized recommendation method and system based on machine learning

Technical Field

The invention relates to the technical field of task scheduling, in particular to a crowdsourcing task personalized recommendation method and system based on machine learning.

Background

Crowdsourcing refers to the practice of a company or organization outsourcing work tasks performed by employees in the past to unspecified public volunteers in a free-voluntary manner. The crowd-sourcing is social production, the value is created by users together, the method is a typical application scene of group intelligent cooperation, the task completion efficiency can be greatly improved, and the collective effect in the group society is fully exerted.

The method for recommending the user in the traditional task recommendation method comprises the following steps: the method comprises the steps of firstly obtaining task data and worker data on a crowdsourcing platform, outputting a feature vector table of worker capacity, and then generating a task recommendation list according to a task bid record of workers with similar capacity and a bid record of a target worker for recommendation.

However, the existing recommendation algorithm has the following defects:

1. the measurement standard of the characteristics of workers is rough, and the driven machine and the capability are usually identified by a certain dimension, or the determination factor or the relevant prediction factor of the characteristics cannot be completely identified based on the relevant theory;

2. traditional recommendations are often based only on historical data of tasks and workers, and interaction characteristics of the workers and the tasks change along with time, so that acquisition of worker characteristic data is limited, and characteristics of the workers cannot be effectively measured;

3. the current recommendation succession is from feature modeling of single attribute, to joint interaction of users and tasks, and to acquisition of implicit feedback information in features. Most also stay in the modeling of single attributes and joint interactions, with a small percentage beginning to focus on implicit feedback information.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a crowdsourcing task personalized recommendation method and system based on machine learning, and solves the problem that the accuracy of the existing method for worker feature recognition is poor.

(II) technical scheme

In order to realize the purpose, the invention is realized by the following technical scheme:

in a first aspect, a crowd-sourced task personalized recommendation method based on machine learning is provided, and the method includes:

acquiring historical characteristic data of workers and task characteristic data; acquiring worker-task interaction characteristic data based on the task characteristic data;

and training a worker classification model considering worker motivation and worker capability based on the preprocessed worker historical characteristic data and the worker-task interaction characteristic data.

Based on the trained worker classification model, workers with high motivation and high capacity are screened out from the characteristic data set of the workers to be classified, and task recommendation is carried out on the workers.

Further, acquiring historical worker feature data and task feature data; and acquiring worker-task interaction feature data based on the task feature data, including:

s101, crawling all historical worker feature data and task feature data in a period of time from a crowd-bag platform;

s102, preprocessing the task characteristic data, and coding the task characteristic data to obtain a task characteristic data set;

s103, enabling the task characteristic data set to be in a time sequence of 1:1, dividing the data into a training set and a test set, and acquiring worker-task interaction characteristic data based on the training set;

s104, preprocessing the historical characteristic data of the workers, and coding to obtain a historical characteristic data set of the workers;

and S105, preprocessing the worker-task interaction characteristic data and coding.

Further, the preprocessing the task feature data includes:

screening out task characteristic data of people with task bids larger than or equal to M1;

rejecting tasks which are not completed or are not competitively recruited;

tasks with missing key features such as task selection reward, task release time, task deadline and the like are removed;

duplicate removal is carried out on repeated task characteristic data in the list;

removing abnormal values in the task list;

the preprocessing of the historical characteristic data of the workers comprises the following steps:

eliminating workers with the past electing suggestion number less than or equal to M2;

rejecting workers with past suggested number less than or equal to M3;

the preprocessing of the worker-task interaction feature data comprises the following steps:

eliminating workers with the number of the participating tasks less than M4;

eliminating workers with the number of winning bid tasks less than M5;

and completing the missing value.

Further, the training of the worker classification model considering worker motivation and worker ability based on the preprocessed worker historical feature data and the worker-task interaction feature data comprises:

s201, constructing a capability index and a motivation index of a worker;

s202, determining the weight of each variable of the capability index of the worker and the weight of each variable of the motivation index to obtain the motivation value and the capability value of each worker;

and S203, acquiring a threshold line model used as a division standard of high-motivation and high-capacity workers.

Further, the worker's ability index includes: inherent ability, professional ability, general experience, professional experience, diversity, complexity;

motivational metrics for the worker, including: enjoyment and enjoyment, work autonomy, task complexity, self-marketing/sense of ownership;

and the variables of intrinsic capacity include: platform scoring, platform authentication level and past bid winning times;

the variables of expertise include: the number of authentication categories, the number of authentication sub-categories, the number of authentication skills, the number of skills over ten years;

the general empirical variables include: past task suggestion quantity, recent participation task quantity, recent bid-winning task quantity and hourly rate;

the variables of the expertise include: the method comprises the following steps of (1) authenticating the industry number, the number of tasks contained in recent participation tasks and the number of tasks contained in recent bid winning tasks;

the variables of the diversity include: the number of recent participation in different task categories, the number of recent participation in different task sub-categories, the number of recent bid-winning different task categories, and the number of recent bid-winning different task sub-categories;

the variables of complexity include: the average number of bidders participating in the task recently, the average number of suggestions participating in the task recently, the average number of collectors participating in the task recently, and the average number of browsing times of participating in the task recently;

the variables for enjoyment and enjoyment include: average task rewards and task approval rates;

the variables of work autonomy include: average recruitment time, task completion rate;

the variables of task complexity include: the ratio of the average collection number of participants to the number of bidders of the tasks, the ratio of the average collection number of the winning tasks to the number of bidders of the tasks, the ratio of the average browsing number of participants to the collection number of the tasks, and the ratio of the average browsing number of the winning tasks to the collection number of the tasks;

the variables of self-marketing/sense of attribution include: favorable comment rate and comment number.

Further, the determining the weight of each variable of the worker's ability index and the weight of each variable of the motivation index to obtain the motivation value and the ability value of each worker includes:

determining the weight of each variable based on a CRITIC objective weighting method;

and the calculation formulas of the motivation value and the capability value of the worker are as follows:

a _i indicating the capacity value size of the ith worker;

m _i represents the magnitude of motivation of the ith worker;

representing the objective weight magnitude of the jth capacity variable;

an objective weight magnitude representing a kth motive variable;

x _ij a j capability variable value representing an ith worker;

y _ik representing the value of the kth motivational variable for the ith worker.

Further, the threshold line model includes:

Motivition＝m _i -M _f +β

Ability＝a _i -A _f +α

wherein,

M _f kolmogorov mean values motivating workers;

A _f kolmogorov mean value for worker competence;

α represents a parameter of the adjustment on the capability axis;

beta represents a parameter regulated on the motor shaft;

and constructing a loss function lost, applying a five-fold cross validation training alpha and beta to make the lost function converged, and taking the average value of the alpha and the beta as a final result.

Further, based on the trained worker classification model, a worker with high motivation and high capacity is screened out from the feature data set of the worker to be classified, and the method comprises the following steps: and when the motivation value and the ability value of the workers to be classified are both larger than the threshold value line, identifying the workers with high motivation and high ability.

In a second aspect, a system for personalized recommendation of crowdsourced tasks based on machine learning is provided, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.

(III) advantageous effects

The invention provides a crowdsourcing task personalized recommendation method and system based on machine learning. Compared with the prior art, the method has the following beneficial effects:

according to the invention, a complete and fine worker feature recognition system is constructed based on a fuger behavioral model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of crowdsourcing platforms. Meanwhile, historical data of workers and task data in a period of time are crawled, the current worker-task interaction data is constructed, a worker data set covering the past and current information of the workers is formed, and the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally achieves personalized recommendation of crowdsourcing tasks.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete description of the technical solutions in the embodiments of the present invention, it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application solves the problem that the accuracy of the existing method for identifying the characteristics of workers is poor by providing the crowdsourcing task personalized recommendation method and system based on machine learning.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example 1:

as shown in fig. 1, the present invention provides a method for personalized recommendation of a crowdsourcing task based on machine learning, which comprises:

s1, acquiring historical characteristic data of workers and task characteristic data; acquiring worker-task interaction characteristic data based on the task characteristic data;

and S2, training a worker classification model considering worker motivation and worker capability based on the preprocessed worker historical characteristic data and the worker-task interaction characteristic data.

And S3, based on the trained worker classification model, screening out workers with high motivation and high capacity from the characteristic data set of the workers to be classified, and recommending tasks for the workers.

The beneficial effect of this embodiment does:

according to the invention, a complete and fine worker feature identification system is constructed based on a fugue behavioural model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of a crowdsourcing platform. Meanwhile, historical data of workers and task data for a period of time are crawled, and the current worker-task interaction data is constructed to form a worker data set covering the past and current information of the workers, so that the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally achieves personalized recommendation of crowdsourcing tasks.

The following describes the implementation process of the embodiment of the present invention in detail:

s1, acquiring historical characteristic data of workers and task characteristic data; and obtaining worker-task interaction feature data based on the task feature data.

The method specifically comprises the following steps:

s101, crawling all historical worker feature data and task feature data in a period of time from a crowd-sourcing platform.

In specific implementation, a lightweight crawler frame script based on Python can be used as a base, webpage data extraction and analysis are carried out through XPath and CSS expressions, a Redis database is used as a distributed shared crawler queue, a MongoDB database is used as a data storage library, a Selenium automated testing tool is integrated, middleware such as a random User-Agent, an Ali cloud Agent IP and a self-built Agent IP pool are used at the same time, the middleware is deployed to a cloud server, historical feature data of all workers in a certain class are crawled, and large-scale real-time incremental crawling of historical feature data and task feature data of workers on a crowdsourcing platform is realized.

And as shown in table 1, the worker historical feature data includes: worker ID, platform certification level, platform rating, worker goodness, task completion rate, task approval rate, work rate per hour, certification category, certification industry, certification skill, skill duration, past total number of suggestions (total number of suggested task solutions submitted in the past), past total number of elected suggestions (number of task solutions ranked first).

The task feature data includes: the system comprises a task name, a task recruitment state, a task type, a task category, a task subcategory, a task industry, a task release time, a task deadline, a task winning reward (a monetary reward obtained by ranking a task solution first), a task participation reward (some tasks are different according to tasks, and in order to attract more workers to bid, the task reward is divided into the winning reward and the participation reward, wherein a large part of reward is provided for the first number of tasks except the first number of tasks, a small part of reward is provided for the first number of tasks except the first number of tasks), a task suggestion number (a task suggestion scheme submitted by the number of workers of the task), a task bidding number, a task bid winning number, a task collection number and a task browsing number. In the present invention, the winning symbol indicates selection or a parameter.

TABLE 1

S102, after the crawled data are obtained, firstly preprocessing the task characteristic data, and coding the task characteristic data to obtain a task characteristic data set.

In specific implementation, the pretreatment mainly comprises data screening and data cleaning in sequence.

Wherein the data screening comprises:

1) And screening out task characteristic data of the number of the task bidders greater than or equal to M1.

2) And rejecting incomplete or non-competitive tasks for recruitment. (this embodiment is directed to contest-like tasks only).

Project class tasks: the task bonus is an interval, the submitted proposal is a proposal, and a contract is required to be signed after the subsequent completion.

Competition type tasks: the prize money of the task is a fixed value, and the submitted suggestion can be winning the prize money.

Task class task: a prize may be awarded (e.g., filling a questionnaire) upon completion of the task.

3) And eliminating tasks with missing key features such as task selection reward, task release time, task deadline and the like.

The data cleansing includes:

1) Duplicate removal is carried out on repeated task characteristic data in the list;

2) And removing abnormal values in the task list.

M1 is a preset threshold value and can be set according to actual needs.

For data encoding, in specific implementation, a python operating environment can be constructed first, and a corresponding encoding mode is adopted for each preprocessed data.

Label coding is adopted for classified and graded data, so that each type can be described by one number;

directly coding data of specific numerical values;

the normalization processing is also needed, because the sample feature data are different in type and dimension and have large absolute value difference, some features with small value range are ignored, and the data normalization improves the convergence rate and model accuracy of the model.

Thus, a task feature data set can be obtained.

S103, enabling the task characteristic data set to be in a time sequence of 1:1, dividing the training set into a training set and a testing set, applying the task training set, and acquiring worker-task interaction characteristic data through pandas and numpy packages of python.

As shown in Table 2, the worker-task interaction feature data includes: the method comprises the following steps of providing a task name, a task number, a task category number, a task subcategory number, a task industry number, a bid winning task name, a bid winning task number, a bid winning task category, a bid winning task subcategory number, a bid winning task industry number, a task average reward, a task average work time, a bid winning task average reward and a bid winning task average work time for each worker.

TABLE 2

And S104, preprocessing the crawled historical characteristic data of the workers, and coding to obtain a historical characteristic data set of the workers.

Wherein the data screening comprises:

removing the following historical worker characteristic data:

1) Workers whose elected advice number is less than or equal to M2 have been elected in the past.

2) The number of workers less than or equal to M3 has been suggested in the past.

For the encoding of the historical characteristic data of the workers, a python operating environment can be constructed firstly during specific implementation, and a corresponding encoding mode is adopted for each preprocessed data.

directly coding data of specific numerical values;

Thus, a historical characteristic data set of the worker can be obtained. For the division of the worker historical feature data set, the following method can be adopted:

the historical worker feature data set is as follows 8: and 2, dividing the training set and the test set into a training set and a verification set by using k-fold cross verification.

And S105, preprocessing the worker-task interaction characteristic data and coding the data.

Wherein the data screening comprises:

and eliminating the following worker-task interaction characteristic data:

1) Workers with the number of participating tasks smaller than M4;

2) The number of the winning bid tasks is less than M5;

M2-M5 are preset threshold values and can be set according to actual needs.

The data cleansing includes:

and completing the missing value. In this embodiment, the method for completing the missing value is not limited, for example, the similarity between workers may be calculated by a Jaccard similarity method based on the worker-task interaction feature data, that is, the proportion of the number of intersection elements of any two workers a and B in the union of a and B is calculated, that is, the Jaccard similarity between the workers a and B is calculated, and the value obtained by weighted averaging is the missing value.

For the encoding operation, the same encoding method of the worker history feature data and the task feature data may be adopted.

Specifically, the method comprises the following steps:

s201, constructing the capability index and motivation index of workers.

For the worker ability, the worker feature and the worker task interaction feature are extracted from the six indexes of inherent ability, professional ability, general experience, professional experience, diversity and complexity to obtain the worker ability index.

Wherein the intrinsic capacity is dependent on the variable x ₁ ～x ₃ Determining;

professional ability according to variable x ₄ ～x ₇ Determining;

general experience is based on the variable x ₈ ～x ₁₁ Determining;

professional experience based on variable x ₁₂ ～x ₁₄ Determining;

diversity according to the variable x ₁₅ ～x ₁₈ Determining;

complexity rootAccording to the variable x ₁₉ ～x ₂₂ Determining;

for the motivation of workers, motivation indexes of the workers are extracted from four indexes of enjoyment and enjoyment, work autonomy, task complexity and self marketing/attribution.

Wherein enjoyment and enjoyment is based on the variable y ₁ ～y ₂ Determining;

work autonomy according to variable y ₃ ～y ₄ Determining;

task complexity in accordance with variable x ₅ ～x ₈ Determining;

self-marketing/sense of attribution according to variable x ₉ ～x ₁₀ And (5) determining.

Worker motivation and ability predictors and related variables are shown in table 3:

TABLE 3

S202, determining the weight of each variable of the worker' S ability index and the weight of each variable of the motivation index to obtain the motivation value and the ability value of each worker.

In specific implementation, objective weights of variables can be comprehensively measured based on the CRITIC objective weighting method.

CRITIC objective weighting method: and comprehensively measuring the objective weight of the variable based on the contrast strength of the evaluation variable and the conflict between the variables. The relevance between the variables is considered while the variability of the variables is considered, the higher the number of the variables is, the more important the variables are, and the objective attributes of the data are completely utilized to carry out scientific evaluation.

The contrast strength refers to the size of the value difference between the evaluation schemes of the same variable, and is expressed by standard deviation, and the larger the standard deviation is, the higher the weight is. And in this embodiment:

x _ij a j capability variable value representing an ith worker;

represents the mean of the jth capacity variable;

represents the standard deviation of the jth capacity variable;

y _ik representing a kth motivation variable value for an ith worker;

represents the mean of the kth motive variable;

represents the standard deviation of the kth motive variable.

The conflict between the variables means that if two variables have strong positive correlation, the smaller the conflict is, the lower the weight is, and the correlation coefficient is used for representing the conflict. And in this implementation:

r _ij the correlation coefficient between the ith variable and the jth variable representing the capability, and the conflict is represented as (1-r) _ij )；

Representing the sum of conflicts of the jth variable of the capability with other variables;

representing the sum of the conflicts of the k-th variable of the motivation with other variables.

And the calculation formula of the parameters is as follows:

and the information quantity contained in the variable is represented by the product of the contrast intensity and the conflict of the variable:

therefore, the objective weight of the variable is calculated as follows:

the calculation formula of the motivation value and the capability value of the worker is as follows:

a _i representing the actual capacity value size of the ith worker;

m _i representing the actual motivation value magnitude of the ith worker;

p represents the number of capability variables;

q represents the number of motivational variables;

indicating the amount of information contained in the jth capability variable;

representing the objective weight magnitude of the jth capacity variable;

indicating the information content contained in the kth motivation variable;

representing the magnitude of the objective weight of the kth motive variable.

And S203, constructing a threshold line model for being used as a division standard of high-motivation and high-capacity workers.

The threshold line consists of a theoretical limit that indicates, for values above this limit, sufficiently high motivation and ability to effect a change in the behavior of the worker that only one signal-type trigger is required and the correct positioning of the limit is inherently linked to the worker's data set.

Based on the concept description research of the fuger behavioral model, the fixed threshold linear mathematical model is defined as follows:

the central trend of the data set relocates the threshold line based on the Kolmogorov mean value while employing the threshold line displacement criteria to better implement triggers and identify highly motivated, high capacity workers.

Wherein

M _f Kolmogorov average for worker motivation,

A _f as a Kolmogorov average of worker competency,

the specific calculation formula is as follows:

considering the equation in the application domain, α and β are added to the motivation and the capability respectively to adjust the displacement of the threshold line, the loss function lost is constructed, the α and β are trained by using the five-fold cross validation to converge the lost function lost, the average value of α and β is taken as the training result, and the specific calculation formula is as follows:

Motivition＝m _i -M _f +β

Ability＝a _i -A _f +α

wherein,

motion represents an incentive value corresponding to a threshold line;

abilit represents the capability value corresponding to the threshold line;

m _p a predicted value representing a motivation value of the worker;

m _i indicating the magnitude of the actual motivation value of the ith worker

α represents a parameter of the adjustment on the capability axis;

beta represents a parameter regulated on the motor shaft;

lost is the constructed loss function.

Finally, the values of alpha and beta are obtained.

The Fogg Behavioral Model (FBM) is called Fogg Behavior model. The fuger behavioral model is an empirical behavioral model, and a behavior must converge three elements at the same time: motivations, capabilities, and triggers. When no action occurs, at least one of the three elements is lost. The horizontal and vertical axes correspond to the level of motivation and ability, respectively, and one threshold line serves as a guideline for behavior change.

And S3, based on the trained worker classification model, screening out workers with high motivation and high capacity from the characteristic data set of the workers to be classified, and recommending tasks.

In the specific implementation process, the first-stage reactor,

according to the trained classification model, identifying the high-motivation high-capacity workers in the test set, wherein the specific identification mode is as follows:

for example, for the ith worker, the actual motivation value and the ability value are m _i 、a _i If, at the same time:

the worker is identified as a highly motivated and powerful worker.

After the identification is finished, calculating the winning bid probability P of the jth worker with high motivation and high capacity participating in the ith task _ij And recommending and sequencing the tasks of the workers according to the bid-winning probability and the number of the allowed bid-winning people of the tasks, and generating a final recommendation list.

Wherein,

P _ij a bid winning value representing a j-th worker participating in an i-th task;

a _ij indicating the capability value of the jth worker in the ith task;

a _ik indicating a capability value of a kth worker in the ith task;

W _i indicating the number of workers participating in the ith task.

Example 2

The invention also provides a crowd-sourced task personalized recommendation system based on machine learning, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method when executing the computer program.

It can be understood that, the crowdsourcing task personalized recommendation system based on machine learning provided by the embodiment of the present invention corresponds to the crowdsourcing task personalized recommendation method based on machine learning, and relevant content explanations, examples, beneficial effects and the like of the system can refer to corresponding content in the crowdsourcing task personalized recommendation method based on machine learning, and details are not repeated here.

In summary, compared with the prior art, the invention has the following beneficial effects:

(1) according to the invention, a complete and fine worker feature identification system is constructed based on a fugue behavioural model integrating dynamics and capability theories according to the existing literature of crowdsourcing competition and available information of a crowdsourcing platform. Meanwhile, historical data of workers and task data for a period of time are crawled, and the current worker-task interaction data is constructed to form a worker data set covering the past and current information of the workers, so that the characteristics of the workers are better measured. The help platform quickly identifies workers with high value, generates a recommendation list based on the election probability, and finally achieves personalized recommendation of crowdsourcing tasks.

It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions in essence or part contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A crowd-sourced task personalized recommendation method based on machine learning is characterized by comprising the following steps:

acquiring historical characteristic data of workers and task characteristic data; acquiring worker-task interaction feature data based on the task feature data;

2. The crowd-sourced task personalized recommendation method based on machine learning as claimed in claim 1, wherein the worker historical feature data and task feature data are obtained; and acquiring worker-task interaction feature data based on the task feature data, including:

s104, preprocessing and coding historical characteristic data of workers to obtain a historical characteristic data set of the workers;

3. The method for personalized recommendation of crowdsourced tasks based on machine learning according to claim 2, wherein the preprocessing of task feature data comprises:

rejecting tasks which are not completed or are not competitively recruited;

removing abnormal values in the task list;

rejecting workers with past recommended number less than or equal to M3;

eliminating workers with the number of the participating tasks less than M4;

removing workers with the number of winning bid tasks less than M5;

and completing the missing value.

4. The method for personalized recommendation of crowdsourced tasks based on machine learning according to claim 1, wherein training of a worker classification model considering worker motivation and worker ability based on the preprocessed worker historical feature data and worker-task interaction feature data comprises:

s201, constructing a capability index and a motivation index of a worker;

and S203, acquiring a threshold line model for being used as a division standard of high-motivation and high-capacity workers.

5. The method for personalized recommendation of crowdsourced tasks based on machine learning of claim 4, wherein the capability index of the worker comprises: intrinsic ability, professional ability, general experience, professional experience, diversity, complexity;

and the variables of intrinsic capability include: platform scoring, platform authentication level and past bid-winning times;

the general empirical variables include: past task suggestion quantity, recent participation task quantity, recent bid winning task quantity and hourly rate;

the variables of complexity include: average number of bidders participating in the task recently, average suggested number of tasks participating recently, average number of collections participating in the task recently, and average number of browsing times of tasks participating recently;

the variables of self-marketing/sense of attribution include: favorable comment rate and number of comments.

6. The method for personalized recommendation of crowdsourcing tasks based on machine learning according to claim 4, wherein the determining weights of the variables of the ability indicators and the weights of the variables of the motivation indicators of the workers to obtain the motivation value and the ability value of each worker comprises:

a _i indicating the capacity value size of the ith worker;

m _i represents the magnitude of the motivation value of the ith worker;

representing the objective weight magnitude of the jth capacity variable;

an objective weight magnitude representing a kth motive variable;

x _ij a j capability variable value representing an ith worker;

7. The method for personalized recommendation of crowdsourcing tasks based on machine learning according to claim 4, wherein the threshold line model comprises:

Motivition＝m _i -M _f +β

Ability＝a _i -A _f +α

wherein,

a _i indicating the capacity value size of the ith worker;

m _i represents the magnitude of motivation of the ith worker;

M _f kolmogorov mean values motivating workers;

A _f kolmogorov mean value for worker competence;

α represents a parameter of the adjustment on the capability axis;

beta represents a parameter regulated on the motor shaft;

and constructing a loss function lost, applying five-fold cross validation training alpha and beta to converge the lost function lost, and taking the average value of the alpha and the beta as a final result.

8. The method as claimed in claim 7, wherein the method for personalized recommendation of crowdsourcing tasks based on machine learning is characterized in that the method for personalized recommendation of crowdsourcing tasks based on trained worker classification models is used for screening out workers with high motivation and high capacity from a feature data set of workers to be classified, and comprises: and when the motivation value and the capability value of the workers to be classified are both larger than the threshold value line, the workers with high motivation and high capability are identified.

9. A system for personalized recommendation of crowdsourced tasks based on machine learning, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method according to any one of claims 1 to 8.