Disclosure of Invention
In order to overcome the defects of the prior art, the embodiment of the disclosure provides a design method of a big data case evaluation model, a talent evaluation method, and a talent library construction and recommendation method, so that cases are effectively scored according to comments or treatment opinions, and case evaluation and talent evaluation quantification are realized. The technical scheme is as follows:
in a first aspect, a method for designing a big data case evaluation model is provided, and the method includes the following steps:
collecting and storing historical cases, scores, comments and processing opinions thereof, basic information of an applicant and processing information according to the problem category;
judging the event processing state, and selecting all cases with the processing states of being finished as historical cases;
in the historical case, the marked historical case is directly used as a satisfaction score, and the satisfaction score of the historical case is calculated according to the corresponding comment by using a natural language processing NLP algorithm and a multi-item logistic regression model without marking;
extracting feature words of the comment by using a word segmentation algorithm tool word2vec, and then performing satisfaction scoring by using a plurality of logistic regression, wherein the method specifically comprises the following steps:
randomly selecting a certain number of training samples and test samples in the historical case by a random sampling method, wherein the number of the selected samples is N, and the random seeds are recorded as R; in the training samples, the cases with given scores can be directly used as the training samples, and the samples without scores need to be scored for each comment through a text labeling tool;
firstly, obtaining model parameters by using an MLR regression equation through training set samples, wherein the model parameters are as follows:
setting comment text of a certain case as X
j,j=1,…,N
train,N
trainFor the number of training samples, the corresponding satisfaction score is denoted S
jAnd each feature word extracted by the word2vec tool is marked as x
ijI-1, …, n, where n is the comment text X
jTotal number of Chinese words, set to x
ijM is the number of occurrences in the text, x
ijGo out in the textFrequency of the current
Obtaining each characteristic word x in the text through the score of the comment text
ijThe corresponding score is denoted S
jThe score of the feature word is the score of the comment text where the feature word is located, namely, each comment text has a corresponding triple set
Obtaining the corresponding weight coefficient W of the feature words through an MLR regression equation, and calculating the corresponding satisfaction degree score S of the comment texts of the test set through the trained model parameters of the training setk:
w
ikE W, N is the number of words in the text, k is 1, …, N
test;
Calculating the prediction accuracy P and testing the centralized satisfaction degree score S
kCorresponding true score is
Is the number of samples tested.
Different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR weight coefficients W are obtained, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
Preferably, the method further comprises scoring satisfaction SkFurther optimization is carried out, specifically as follows:
recording the number of all question categories as L and the number of cases under each category as M
lL1, …, L with a corresponding score of S
jl,j=1,…,M
lL1, …, L, corresponding to treatment weeksThe period is t months, the score is optimized by increasing the score lifting weight coefficient, and the processing period lifting coefficient is designed according to the relation between the processing period t and the score
According to the number of texts under each category as M
lAnd designing the occurrence frequency lifting coefficient as follows according to the relation with the score:
that is to say the optimized
Wherein a and b are weight coefficients (0,1),
calculating the prediction accuracy P and testing the centralized satisfaction degree score
Corresponding true score is
Different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR weight coefficients W are obtained, then parameters a and b are traversed, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
Further, for the case without scoring and evaluation, the processing opinion is marked as X, the processing opinion X is matched with the processing opinion Y of the case in the historical case, the score of the historical case with the most similar processing opinion is selected for scoring, specifically, the matching degree of the two processing opinions is calculated, a threshold value is set, and if the matching degree exceeds the threshold value, the case without scoring is scored by using the scored case;
calculating the matching degree of the processing opinions X, Y of the two cases specifically as follows:
calculating the association degree of the processing opinion and the characteristic words thereof, and marking each characteristic word extracted from the processing opinion X as XiStatistics of xiThe number m of occurrences in the text of X and the total number n of the feature words in X are obtainediFrequency of occurrence pxi=m/n;
Processing opinion X and its feature word XiThe degree of association is:
calculating the relevance of the feature words x and y, determining a relevance coefficient and a feature word frequency coefficient by utilizing mutual information, wherein the relevance is small when the independent occurrence frequency of x and y is high and the simultaneous occurrence frequency of x and y is low; when the independent occurrence frequencies of x and y are high and the simultaneous occurrence frequencies of x and y are also high, the correlation degree of the x and y is high, and the correlation coefficient is designed to be
The frequency coefficients of the feature words are respectively
And
namely, the relevance of the feature words x and y is:
wherein Nx is the total number of cases containing the characteristic word x, Ny is the total number of cases containing the characteristic word y, and Nxy is the total number of cases containing { x + y };
the matching degree corr (X, Y) of the two processing opinions is:
where l1 and l2 are the number of words in the processing opinion X and Y, respectively, YjFeature words of the visited text Y;
and setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
Preferably, the method further comprises: adding xiSynonyms of (u)iAnd a synonym viThus, the corresponding degree of association is:
corr(X,ui)=corr(X,xi),
corr(X,vi)=k1*corr(X,xi),k1∈(0,1)。
wherein k is1Is a coefficient of degree of association of the near word;
the matching degree corr (X, Y) of the two processing opinions is:
wherein U is a synonym set for processing the opinion X, V is a synonym set for processing the opinion X, G is a synonym set for processing the opinion Y, H is a synonym set for processing the opinion Y, U is a synonym set for processing the opinion Y, andi∈U,vi∈V,gj∈G,hje.g. H, m1 and m2 are the numbers of words in the sets U and G, respectively, and n1 and n2 are the numbers of words in the sets V and H, respectively;
and setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
Further, if corr (X, Y) < T, the comment is replaced with the processing opinion according to the scoring method based on the comment, the feature words of the processing opinion are extracted by using the word segmentation algorithm word2vec, and then the satisfaction scoring is performed by using the multi-term logistic regression MLR.
In a second aspect, a talent evaluation method of a big data case evaluation model is provided, and the method comprises the following steps:
according to case scores obtained by the design method of the big data case evaluation model in any possible implementation mode, the comprehensive score of each case processor q in each category is obtained through the case scores of each processor in each problem category
Q is the total number of persons to be treated, L is the number of question categories, M
lFor the number of cases under the problem category l,
a satisfaction score is assigned to each case of the handler under each classification;
further obtain the comprehensive score S of each case processing person q
q:
Preferably, the method further comprises: and further optimizing the satisfaction degree score results under each category through the processing periods and the occurrence frequencies of different categories:
note that the average processing period corresponding to each question category is t
lMonth, the number of cases under each category is M
lOptimizing the score by increasing the weighting factor for the score according to the processing period t
lThe relationship with the score is designed to process the cycle lifting factor as
According to the number M of texts under each category
lIn relation to the score, the lifting coefficient of the designed occurrence frequency is
Namely obtaining the optimized comprehensive score of each case handler q under each category
Wherein a and b are belonged to (0,1) as weight coefficients;
optimized comprehensive score S of each case processing person q
qComprises the following steps:
in a third aspect, a talent base construction method of a big data case evaluation model is provided, and the method comprises the following steps: according to the talent evaluation method of the big data case evaluation model in any one of all possible implementation modes, the comprehensive score of each case processing person q under each category is obtained
And a composite score S for each case-handler q
q;
According to
Whether the value is greater than a given threshold value 1, an
Sorting results, and constructing a talent base of the problem category l (or the corresponding field);
according to SqWhether the value is greater than a given threshold value of 2, and SqAnd sequencing results to construct a talent library in the comprehensive field.
Preferably, the method further comprises: the thresholds of "expert", "skilled", "learning" and "novice" in the talent bank are set respectively to obtain five types of talent banks.
In a fourth aspect, a talent base recommendation method for a big data case evaluation model is provided, which is characterized by comprising the following steps: according to the talent base constructed by the talent base construction method of the big data case evaluation model in any possible implementation mode, for a new case, firstly, the 'expert' and 'proficiency' of the corresponding problem type talent base are matched, secondly, the 'proficiency' and 'understanding' are carried out, and if no matching object exists, the talent base in the comprehensive field is matched by the same method.
Compared with the prior art, one of the technical schemes has the following beneficial effects:
the case is effectively scored according to the comments or the processing opinions, the case assessment and talent evaluation quantification are realized, and meanwhile, a scoring model is further optimized by combining the difficulty degree of event processing and the number of available references of the existing case; a set of efficient and intelligent talent screening mechanism is provided for case processing related departments, and an accurate talent library which is driven by big data is formed according to the excellence field of workers. For a new case, the most skilled worker can be intelligently matched and recommended according to the talent ability tendency, so that the working efficiency is greatly improved, and the phenomenon that a micro event is changed into a small event and a small event is changed into a large event due to the ability of the worker is avoided; and a reasonable performance assessment mechanism can be set according to the ranking of the talent base.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that the solution of the present invention can be used in all fields with similar operation modes as case evaluation.
In a first aspect, an embodiment of the present disclosure provides a method for designing a big data case evaluation model, where the method includes the following steps:
historical cases, applicant basic information and processing information are collected and stored according to the problem types through an OA system, and are separately stored according to processing tracks (namely processing states: not processed, proxy/supervision and completed). The current problem categories mainly include 415 subdivided areas, such as urban and rural construction, land contract, returning to farm and forest, and so on.
Judging the event processing state, and selecting all cases with the processing states of being finished as historical cases;
and calculating the satisfaction scores of all cases after the case processing through a natural language processing NLP algorithm and a multi-term logistic regression model.
In the case processing process, the applicant directly scores cases, for example, the score is graded as 5: 4 points (satisfactory), 3 points (satisfactory), 2 points (general), 1 point (unsatisfactory) and 0 point (unsatisfactory), but most of the time, the applicant often does not give a score directly but only gives a comment, in which case, in order to quantify case evaluation and talent evaluation, the case needs to be scored according to the comment.
Extracting feature words of the comment by using a word segmentation algorithm tool word2vec, and then performing satisfaction scoring by using a multi-term logistic regression (MLR), wherein the method comprises the following steps:
a certain number of training samples (including training set and test set) were selected in the history case by random sampling (the number of samples selected was N (here, 10000), and the set random seed was denoted as R). In the training sample, the case with the star rating given can be directly used as the training sample, and the sample without the star rating given directly needs to be scored by the text marking tool with each comment in the five score grades.
Splitting training sample data into a test set and a training set by a two-eight principle, namely the sample size of the test set is Ntest=ceil(0.2*N1) The sample size of the training set is Ntrain=N1-NtestWhere ceil means rounding up.
Firstly, obtaining model parameters by using an MLR regression equation through training set samples, wherein the model parameters are as follows:
suppose the comment text is X
j,j=1,…,N
trainDegree of satisfaction of correspondenceScore is reported as S
j,j=1,…,N
trainAnd each feature word extracted by the word2vec tool is marked as x
ijI-1, …, n, where n is the comment text X
jTotal number of Chinese words, set to x
ijM is the number of occurrences in the text, x
ijFrequency of occurrence in the text
Obtaining each characteristic word x in the text through the score of the comment text
ijThe corresponding score is denoted S
j(the score of the feature word is the score of the comment text where the feature word is located), that is, for each comment text, there is a corresponding triple set
The corresponding weight coefficient W of the feature words can be obtained through an MLR regression equation, the regression input is the feature words, and the output is the satisfaction score, which is shown in the following table:
calculating the corresponding satisfaction degree score of the comment text of the test set through the trained model parameters (the weight coefficient W of the feature words) of the training set, and recording the score as Sk,k=1,…,NtestThe calculation formula is as follows:
w
ike.g. W, n is the number of words in the text.
Considering that the treatment level of a treating person is not only related to common scores, but also has certain relation with the type or difficulty of the treated case, for example, the treatment level of the treating person A is improved greatly correspondingly when the treating person A treats a plurality of difficult cases; in consideration of different difficulty degrees of processing of different problem categories (called as classification for short), the method can quickly solve the problems in the past in the actual working process, is more troublesome for cases with less contact or brand new cases, can finish the simple cases quickly, and is longer in the processing period of the troublesome cases. A great deal of research is carried out in the early stage to obtain the potential association between the number of cases of each type and the processing difficulty of the cases; further optimizing the satisfaction score result through the event processing period and the event occurrence frequency: for the case with trouble (longer processing period), the requirement on the level of the processor is high, so that the corresponding satisfaction score is correspondingly improved; for rare (few-occurrence) cases, the requirement on the level of a processor is higher, so that the corresponding satisfaction score is correspondingly improved. Preferably, the satisfaction score S is further optimized as follows:
recording the number of all question categories as L and the number of cases under each category as M
lL1, …, L with a corresponding score of S
jl,j=1,…,M
lL is 1, …, L, corresponding to the processing period of t months, the score is optimized by increasing the score lifting weight coefficient, and the processing period lifting coefficient is designed according to the relation between the processing period t and the score
According to the number of texts under each category as M
lAnd designing the occurrence frequency lifting coefficient as follows according to the relation with the score:
the goal of optimizing the satisfaction score is achieved, and the highest score value of the score should not exceed the highest score by 4 points in the process of improving the score.
That is to say the optimized
Wherein a and b ∈ (0,1) are weight coefficients.
Calculating a prediction accuracy P based on
Obtaining the predicted value of the test set sample
Corresponding actual score is recorded as
k=1,…,N
test,N
testIs the number of test samples;
different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR model parameters W are obtained, then parameters a and b are traversed, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
Preferably, in an actual situation, there are many cases, the applicant does not give a score, nor gives any evaluation, and in this case, for the processing opinion X without a score and evaluation, the processing opinion X is matched with the processing opinion of the case Y in the history case, the score of the history case with the most similar processing opinion is selected for scoring, specifically, the matching degree of the two processing opinions is calculated, and a threshold value is set, and if the matching degree exceeds the threshold value, the scored case is used for scoring the unscored case.
The calculating of the matching degree of the two case text processing opinions X, Y specifically includes:
calculating the association degree of the processing opinion and the characteristic words, and marking each extracted characteristic word as X for the processing opinion XiStatistics of xiThe number m of occurrences in the text of X and the total number n of the feature words in X are obtainediFrequency of occurrence pxi=m/n。
Processing opinion X and its feature word XiThe degree of association is:
calculating the association degree of the feature words x and y, and considering that the association degree is very small when the occurrence frequency of x is high, the occurrence frequency of y is also high, but the occurrence frequency of x and y is relatively low simultaneously by utilizing a mutual information principle; when the occurrence frequency of x is high, the occurrence frequency of y is also high, and the occurrence frequency of x and y is also high at the same time, the correlation degree of the x and y is larger, and the correlation coefficient is designed to be
The frequency coefficients of the feature words are respectively
And
namely, the relevance of the feature words x and y is:
wherein Nx is the total number of cases containing the characteristic word x, Ny is the total number of cases containing the characteristic word y, and Nxy is the total number of cases containing { x + y };
the matching degree corr (X, Y) of the two processing opinions is:
where l1 and l2 are the number of words in the processing opinion X and Y, respectively, YjIs the characteristic word of the visit text Y.
And setting a matching degree threshold T, when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X, and considering that the two processing opinions are approximate when the score is more than the set threshold, wherein the value of T is 0.6.
Preferably, in order to ensure that the calculation of the relevance is more reasonable, word sense analysis is added, namely, x is addediSynonyms of (u)iAnd a synonym viThus, the corresponding degree of association is: of course synonyms uiAnd a synonym viThe number of (a) may be one or more.
corr(X,ui)=corr(X,xi),
corr(X,vi)=k1*corr(X,xi),k1∈(0,1)。
Wherein k is1Is the coefficient of the degree of association of the similar meaning word, and the value is 0.6 here.
The matching degree corr (X, Y) of the two processing opinions is:
wherein U is a synonym set for processing the opinion X, V is a synonym set for processing the opinion X, G is a synonym set for processing the opinion Y, H is a synonym set for processing the opinion Y, U is a synonym set for processing the opinion Y, andi∈U,vi∈V,gj∈G,hje.h, m1 and m2 are the number of words in sets U and G, respectively, and n1 and n2 are the number of words in sets V and H, respectively.
And setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
Further, if corr (X, Y) < T, the comment is replaced with the processing opinion according to the scoring method based on the comment, the feature words of the processing opinion are extracted first by using the word segmentation algorithm word2vec, and then the satisfaction scoring is performed by using the multi-term logistic regression (MLR).
In a second aspect, an embodiment of the present disclosure provides a talent evaluation method for a big data case evaluation model, where the method includes the following steps:
case scores are obtained according to the design method of the big data case evaluation model and through all question classesGrading the case of each processor to obtain the comprehensive score of each case processor q under each category
(the point scale was set here to 5 points: 4 points (expert), 3 points (expert), 2 points (skilled), 1 point (learned), 0 points (novice)).
Q is the total number of persons to be treated, L is the number of question categories, M
lFor the number of cases under each classification,
a satisfaction score is assigned to each case of the handler under each classification;
the higher the value is, the stronger the case processing ability of the case processor q in the problem category l is, i.e. the better the case processing the field corresponding to the category is.
Composite score S for each case handler qqIs composed of
Preferably, the difficulty degrees of processing different problem categories (short for: classification) are different, namely the problem categories which are common in the past can be quickly solved, and the problem categories which are less contacted before are more troublesome; the method can finish the problem types quickly, and the processing period is usually long for the problem types which are troublesome; therefore, the satisfaction score result under each category needs to be further optimized through the category processing period and the category occurrence frequency: for the problem category (longer processing period), the requirement on the level of a processor is high, so that the corresponding satisfaction score is correspondingly improved; for the rare (occurring frequency is very few) problem categories, the requirement on the level of a processor is higher, so that the corresponding satisfaction score is correspondingly improved.
Note that the average processing period corresponding to each question category is t
lMonth, the number of cases under each category is M
lOptimizing the score by increasing the weighting factor for the score according to the processing period t
lThe relationship with the score is designed to process the cycle lifting factor as
According to the number of texts under each category as M
lIn relation to the score, the lifting coefficient of the designed occurrence frequency is
Namely obtaining the optimized comprehensive score of each case handler q under each category
Wherein a and b ∈ (0,1) are weight coefficients.
Optimized comprehensive score S of each case processing person qqComprises the following steps:
in a third aspect: the embodiment of the disclosure provides a talent base construction method of a big data case evaluation model, which comprises the following steps:
obtaining the comprehensive score of each case processing person q under each category according to the talent evaluation method of the big data case evaluation model
And a composite score S for each case-handler q
q;
According to
Whether the value is greater than a given threshold value 1, an
Sorting results, and constructing a talent base of the problem category l (or the corresponding field);
according to SqWhether the value is greater than a given threshold value of 2, and SqAnd sequencing results to construct a talent library in the comprehensive field.
The higher the value is, the stronger the case processing ability of the case processor q in the problem category l is, i.e. the better the case processing the field corresponding to the category is. According to
Or S
qThe score value of (a) may give each person's integrated ability, and area of excellence in the question, and the result table is exemplified below. We define that scores over 3.6 are expert, scores over 3 are expert, scores over 2.4 are skilled, scores over 1 are known, otherwise it is novice.
Preferably, threshold values of 'expert', 'skillful', 'skilled', 'learned', 'novice' in the talent library are respectively set, and five types of talent libraries are respectively obtained;
in a fourth aspect, an embodiment of the present disclosure provides a talent base recommendation method for a big data case evaluation model, where the method includes:
for a new case, firstly, matching the 'expert' and 'proficiency' of the talent base corresponding to the problem category, secondly, matching 'proficiency' and 'learning', and if no matching object exists, matching the talent base in the comprehensive field by the same method.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.