CN112581036B - Design method of big data case evaluation model, talent evaluation method, talent library construction and recommendation method - Google Patents

Design method of big data case evaluation model, talent evaluation method, talent library construction and recommendation method Download PDF

Info

Publication number
CN112581036B
CN112581036B CN202011625659.8A CN202011625659A CN112581036B CN 112581036 B CN112581036 B CN 112581036B CN 202011625659 A CN202011625659 A CN 202011625659A CN 112581036 B CN112581036 B CN 112581036B
Authority
CN
China
Prior art keywords
score
processing
case
talent
opinion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011625659.8A
Other languages
Chinese (zh)
Other versions
CN112581036A (en
Inventor
熊林海
周金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co Ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co Ltd filed Critical Nanjing Inspector Intelligent Technology Co Ltd
Publication of CN112581036A publication Critical patent/CN112581036A/en
Application granted granted Critical
Publication of CN112581036B publication Critical patent/CN112581036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a design method of a big data case evaluation model, a talent evaluation method and a talent base construction and recommendation method, wherein the design method comprises the steps of collecting and storing historical cases, scores, comments and treatment opinions of the historical cases, applicant basic information and treatment information according to problem categories; judging the event processing state, and selecting all cases with the processing states of being finished as historical cases; and in the historical cases, directly taking the scores as satisfaction scores, and calculating the satisfaction scores of the historical cases according to corresponding comments by using a natural language processing NLP algorithm and a multi-item logistic regression model without scores. The case can be effectively scored according to the comments or the treatment opinions through the evaluation model, and the case evaluation and talent evaluation quantification is realized.

Description

Design method of big data case evaluation model, talent evaluation method, talent library construction and recommendation method
Technical Field
The invention relates to the field of natural language processing and intelligent case processing, in particular to a design method of a big data case evaluation model, a talent evaluation method and a talent library construction and recommendation method.
Background
At present, the work of the case department in China mainly depends on manual processing and seriously depends on the knowledge storage and professional level of workers, the case processing department completely depends on the experience of registered workers for the sorting attribution of cases, and a set of talent screening mechanism is urgently needed to evaluate the excellence field of each department and the personal excellence field of the workers. In the case processing process, the applicant can directly score cases, but most of the cases are not directly scored but only given comments, so that the case processing result cannot be quantitatively evaluated; the case can be automatically matched with the staff skilled in handling the event through an intelligent recommendation algorithm, the case handling burden is reduced from the two aspects of case handling staff and an applicant, the case handling efficiency is improved, and the satisfaction degree of public case handling service is improved.
Disclosure of Invention
In order to overcome the defects of the prior art, the embodiment of the disclosure provides a design method of a big data case evaluation model, a talent evaluation method, and a talent library construction and recommendation method, so that cases are effectively scored according to comments or treatment opinions, and case evaluation and talent evaluation quantification are realized. The technical scheme is as follows:
in a first aspect, a method for designing a big data case evaluation model is provided, and the method includes the following steps:
collecting and storing historical cases, scores, comments and processing opinions thereof, basic information of an applicant and processing information according to the problem category;
judging the event processing state, and selecting all cases with the processing states of being finished as historical cases;
in the historical case, the marked historical case is directly used as a satisfaction score, and the satisfaction score of the historical case is calculated according to the corresponding comment by using a natural language processing NLP algorithm and a multi-item logistic regression model without marking;
extracting feature words of the comment by using a word segmentation algorithm tool word2vec, and then performing satisfaction scoring by using a plurality of logistic regression, wherein the method specifically comprises the following steps:
randomly selecting a certain number of training samples and test samples in the historical case by a random sampling method, wherein the number of the selected samples is N, and the random seeds are recorded as R; in the training samples, the cases with given scores can be directly used as the training samples, and the samples without scores need to be scored for each comment through a text labeling tool;
firstly, obtaining model parameters by using an MLR regression equation through training set samples, wherein the model parameters are as follows:
setting comment text of a certain case as Xj,j=1,…,Ntrain,NtrainFor the number of training samples, the corresponding satisfaction score is denoted SjAnd each feature word extracted by the word2vec tool is marked as xijI-1, …, n, where n is the comment text XjTotal number of Chinese words, set to xijM is the number of occurrences in the text, xijGo out in the textFrequency of the current
Figure GDA0003148176570000021
Obtaining each characteristic word x in the text through the score of the comment textijThe corresponding score is denoted SjThe score of the feature word is the score of the comment text where the feature word is located, namely, each comment text has a corresponding triple set
Figure GDA0003148176570000022
Obtaining the corresponding weight coefficient W of the feature words through an MLR regression equation, and calculating the corresponding satisfaction degree score S of the comment texts of the test set through the trained model parameters of the training setk
Figure GDA0003148176570000023
wikE W, N is the number of words in the text, k is 1, …, Ntest
Calculating the prediction accuracy P and testing the centralized satisfaction degree score SkCorresponding true score is
Figure GDA0003148176570000024
Figure GDA0003148176570000025
Is the number of samples tested.
Different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR weight coefficients W are obtained, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
Preferably, the method further comprises scoring satisfaction SkFurther optimization is carried out, specifically as follows:
recording the number of all question categories as L and the number of cases under each category as MlL1, …, L with a corresponding score of Sjl,j=1,…,MlL1, …, L, corresponding to treatment weeksThe period is t months, the score is optimized by increasing the score lifting weight coefficient, and the processing period lifting coefficient is designed according to the relation between the processing period t and the score
Figure GDA0003148176570000026
According to the number of texts under each category as MlAnd designing the occurrence frequency lifting coefficient as follows according to the relation with the score:
Figure GDA0003148176570000031
that is to say the optimized
Figure GDA0003148176570000032
Figure GDA0003148176570000033
Wherein a and b are weight coefficients (0,1),
calculating the prediction accuracy P and testing the centralized satisfaction degree score
Figure GDA0003148176570000034
Corresponding true score is
Figure GDA0003148176570000035
Figure GDA0003148176570000036
Different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR weight coefficients W are obtained, then parameters a and b are traversed, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
Further, for the case without scoring and evaluation, the processing opinion is marked as X, the processing opinion X is matched with the processing opinion Y of the case in the historical case, the score of the historical case with the most similar processing opinion is selected for scoring, specifically, the matching degree of the two processing opinions is calculated, a threshold value is set, and if the matching degree exceeds the threshold value, the case without scoring is scored by using the scored case;
calculating the matching degree of the processing opinions X, Y of the two cases specifically as follows:
calculating the association degree of the processing opinion and the characteristic words thereof, and marking each characteristic word extracted from the processing opinion X as XiStatistics of xiThe number m of occurrences in the text of X and the total number n of the feature words in X are obtainediFrequency of occurrence pxi=m/n;
Processing opinion X and its feature word XiThe degree of association is:
Figure GDA0003148176570000037
calculating the relevance of the feature words x and y, determining a relevance coefficient and a feature word frequency coefficient by utilizing mutual information, wherein the relevance is small when the independent occurrence frequency of x and y is high and the simultaneous occurrence frequency of x and y is low; when the independent occurrence frequencies of x and y are high and the simultaneous occurrence frequencies of x and y are also high, the correlation degree of the x and y is high, and the correlation coefficient is designed to be
Figure GDA0003148176570000038
The frequency coefficients of the feature words are respectively
Figure GDA0003148176570000039
And
Figure GDA00031481765700000310
namely, the relevance of the feature words x and y is:
Figure GDA00031481765700000311
wherein Nx is the total number of cases containing the characteristic word x, Ny is the total number of cases containing the characteristic word y, and Nxy is the total number of cases containing { x + y };
the matching degree corr (X, Y) of the two processing opinions is:
Figure GDA0003148176570000041
where l1 and l2 are the number of words in the processing opinion X and Y, respectively, YjFeature words of the visited text Y;
and setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
Preferably, the method further comprises: adding xiSynonyms of (u)iAnd a synonym viThus, the corresponding degree of association is:
corr(X,ui)=corr(X,xi),
corr(X,vi)=k1*corr(X,xi),k1∈(0,1)。
wherein k is1Is a coefficient of degree of association of the near word;
the matching degree corr (X, Y) of the two processing opinions is:
Figure GDA0003148176570000042
wherein U is a synonym set for processing the opinion X, V is a synonym set for processing the opinion X, G is a synonym set for processing the opinion Y, H is a synonym set for processing the opinion Y, U is a synonym set for processing the opinion Y, andi∈U,vi∈V,gj∈G,hje.g. H, m1 and m2 are the numbers of words in the sets U and G, respectively, and n1 and n2 are the numbers of words in the sets V and H, respectively;
and setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
Further, if corr (X, Y) < T, the comment is replaced with the processing opinion according to the scoring method based on the comment, the feature words of the processing opinion are extracted by using the word segmentation algorithm word2vec, and then the satisfaction scoring is performed by using the multi-term logistic regression MLR.
In a second aspect, a talent evaluation method of a big data case evaluation model is provided, and the method comprises the following steps:
according to case scores obtained by the design method of the big data case evaluation model in any possible implementation mode, the comprehensive score of each case processor q in each category is obtained through the case scores of each processor in each problem category
Figure GDA0003148176570000051
Figure GDA0003148176570000052
Q is the total number of persons to be treated, L is the number of question categories, MlFor the number of cases under the problem category l,
Figure GDA0003148176570000053
a satisfaction score is assigned to each case of the handler under each classification;
further obtain the comprehensive score S of each case processing person qq
Figure GDA0003148176570000054
Preferably, the method further comprises: and further optimizing the satisfaction degree score results under each category through the processing periods and the occurrence frequencies of different categories:
note that the average processing period corresponding to each question category is tlMonth, the number of cases under each category is MlOptimizing the score by increasing the weighting factor for the score according to the processing period tlThe relationship with the score is designed to process the cycle lifting factor as
Figure GDA0003148176570000055
According to the number M of texts under each categorylIn relation to the score, the lifting coefficient of the designed occurrence frequency is
Figure GDA0003148176570000056
Namely obtaining the optimized comprehensive score of each case handler q under each category
Figure GDA0003148176570000057
Figure GDA0003148176570000058
Wherein a and b are belonged to (0,1) as weight coefficients;
optimized comprehensive score S of each case processing person qqComprises the following steps:
Figure GDA0003148176570000059
in a third aspect, a talent base construction method of a big data case evaluation model is provided, and the method comprises the following steps: according to the talent evaluation method of the big data case evaluation model in any one of all possible implementation modes, the comprehensive score of each case processing person q under each category is obtained
Figure GDA00031481765700000510
And a composite score S for each case-handler qq
According to
Figure GDA00031481765700000511
Whether the value is greater than a given threshold value 1, an
Figure GDA00031481765700000512
Sorting results, and constructing a talent base of the problem category l (or the corresponding field);
according to SqWhether the value is greater than a given threshold value of 2, and SqAnd sequencing results to construct a talent library in the comprehensive field.
Preferably, the method further comprises: the thresholds of "expert", "skilled", "learning" and "novice" in the talent bank are set respectively to obtain five types of talent banks.
In a fourth aspect, a talent base recommendation method for a big data case evaluation model is provided, which is characterized by comprising the following steps: according to the talent base constructed by the talent base construction method of the big data case evaluation model in any possible implementation mode, for a new case, firstly, the 'expert' and 'proficiency' of the corresponding problem type talent base are matched, secondly, the 'proficiency' and 'understanding' are carried out, and if no matching object exists, the talent base in the comprehensive field is matched by the same method.
Compared with the prior art, one of the technical schemes has the following beneficial effects:
the case is effectively scored according to the comments or the processing opinions, the case assessment and talent evaluation quantification are realized, and meanwhile, a scoring model is further optimized by combining the difficulty degree of event processing and the number of available references of the existing case; a set of efficient and intelligent talent screening mechanism is provided for case processing related departments, and an accurate talent library which is driven by big data is formed according to the excellence field of workers. For a new case, the most skilled worker can be intelligently matched and recommended according to the talent ability tendency, so that the working efficiency is greatly improved, and the phenomenon that a micro event is changed into a small event and a small event is changed into a large event due to the ability of the worker is avoided; and a reasonable performance assessment mechanism can be set according to the ranking of the talent base.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that the solution of the present invention can be used in all fields with similar operation modes as case evaluation.
In a first aspect, an embodiment of the present disclosure provides a method for designing a big data case evaluation model, where the method includes the following steps:
historical cases, applicant basic information and processing information are collected and stored according to the problem types through an OA system, and are separately stored according to processing tracks (namely processing states: not processed, proxy/supervision and completed). The current problem categories mainly include 415 subdivided areas, such as urban and rural construction, land contract, returning to farm and forest, and so on.
Judging the event processing state, and selecting all cases with the processing states of being finished as historical cases;
and calculating the satisfaction scores of all cases after the case processing through a natural language processing NLP algorithm and a multi-term logistic regression model.
In the case processing process, the applicant directly scores cases, for example, the score is graded as 5: 4 points (satisfactory), 3 points (satisfactory), 2 points (general), 1 point (unsatisfactory) and 0 point (unsatisfactory), but most of the time, the applicant often does not give a score directly but only gives a comment, in which case, in order to quantify case evaluation and talent evaluation, the case needs to be scored according to the comment.
Extracting feature words of the comment by using a word segmentation algorithm tool word2vec, and then performing satisfaction scoring by using a multi-term logistic regression (MLR), wherein the method comprises the following steps:
a certain number of training samples (including training set and test set) were selected in the history case by random sampling (the number of samples selected was N (here, 10000), and the set random seed was denoted as R). In the training sample, the case with the star rating given can be directly used as the training sample, and the sample without the star rating given directly needs to be scored by the text marking tool with each comment in the five score grades.
Splitting training sample data into a test set and a training set by a two-eight principle, namely the sample size of the test set is Ntest=ceil(0.2*N1) The sample size of the training set is Ntrain=N1-NtestWhere ceil means rounding up.
Firstly, obtaining model parameters by using an MLR regression equation through training set samples, wherein the model parameters are as follows:
suppose the comment text is Xj,j=1,…,NtrainDegree of satisfaction of correspondenceScore is reported as Sj,j=1,…,NtrainAnd each feature word extracted by the word2vec tool is marked as xijI-1, …, n, where n is the comment text XjTotal number of Chinese words, set to xijM is the number of occurrences in the text, xijFrequency of occurrence in the text
Figure GDA0003148176570000071
Obtaining each characteristic word x in the text through the score of the comment textijThe corresponding score is denoted Sj(the score of the feature word is the score of the comment text where the feature word is located), that is, for each comment text, there is a corresponding triple set
Figure GDA0003148176570000072
The corresponding weight coefficient W of the feature words can be obtained through an MLR regression equation, the regression input is the feature words, and the output is the satisfaction score, which is shown in the following table:
Figure GDA0003148176570000073
Figure GDA0003148176570000081
calculating the corresponding satisfaction degree score of the comment text of the test set through the trained model parameters (the weight coefficient W of the feature words) of the training set, and recording the score as Sk,k=1,…,NtestThe calculation formula is as follows:
Figure GDA0003148176570000082
wike.g. W, n is the number of words in the text.
Considering that the treatment level of a treating person is not only related to common scores, but also has certain relation with the type or difficulty of the treated case, for example, the treatment level of the treating person A is improved greatly correspondingly when the treating person A treats a plurality of difficult cases; in consideration of different difficulty degrees of processing of different problem categories (called as classification for short), the method can quickly solve the problems in the past in the actual working process, is more troublesome for cases with less contact or brand new cases, can finish the simple cases quickly, and is longer in the processing period of the troublesome cases. A great deal of research is carried out in the early stage to obtain the potential association between the number of cases of each type and the processing difficulty of the cases; further optimizing the satisfaction score result through the event processing period and the event occurrence frequency: for the case with trouble (longer processing period), the requirement on the level of the processor is high, so that the corresponding satisfaction score is correspondingly improved; for rare (few-occurrence) cases, the requirement on the level of a processor is higher, so that the corresponding satisfaction score is correspondingly improved. Preferably, the satisfaction score S is further optimized as follows:
recording the number of all question categories as L and the number of cases under each category as MlL1, …, L with a corresponding score of Sjl,j=1,…,MlL is 1, …, L, corresponding to the processing period of t months, the score is optimized by increasing the score lifting weight coefficient, and the processing period lifting coefficient is designed according to the relation between the processing period t and the score
Figure GDA0003148176570000083
According to the number of texts under each category as MlAnd designing the occurrence frequency lifting coefficient as follows according to the relation with the score:
Figure GDA0003148176570000084
the goal of optimizing the satisfaction score is achieved, and the highest score value of the score should not exceed the highest score by 4 points in the process of improving the score.
That is to say the optimized
Figure GDA0003148176570000085
Figure GDA0003148176570000086
Wherein a and b ∈ (0,1) are weight coefficients.
Calculating a prediction accuracy P based on
Figure GDA0003148176570000087
Obtaining the predicted value of the test set sample
Figure GDA0003148176570000088
Corresponding actual score is recorded as
Figure GDA0003148176570000091
k=1,…,Ntest,NtestIs the number of test samples;
Figure GDA0003148176570000092
different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR model parameters W are obtained, then parameters a and b are traversed, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
Preferably, in an actual situation, there are many cases, the applicant does not give a score, nor gives any evaluation, and in this case, for the processing opinion X without a score and evaluation, the processing opinion X is matched with the processing opinion of the case Y in the history case, the score of the history case with the most similar processing opinion is selected for scoring, specifically, the matching degree of the two processing opinions is calculated, and a threshold value is set, and if the matching degree exceeds the threshold value, the scored case is used for scoring the unscored case.
The calculating of the matching degree of the two case text processing opinions X, Y specifically includes:
calculating the association degree of the processing opinion and the characteristic words, and marking each extracted characteristic word as X for the processing opinion XiStatistics of xiThe number m of occurrences in the text of X and the total number n of the feature words in X are obtainediFrequency of occurrence pxi=m/n。
Processing opinion X and its feature word XiThe degree of association is:
Figure GDA0003148176570000093
calculating the association degree of the feature words x and y, and considering that the association degree is very small when the occurrence frequency of x is high, the occurrence frequency of y is also high, but the occurrence frequency of x and y is relatively low simultaneously by utilizing a mutual information principle; when the occurrence frequency of x is high, the occurrence frequency of y is also high, and the occurrence frequency of x and y is also high at the same time, the correlation degree of the x and y is larger, and the correlation coefficient is designed to be
Figure GDA0003148176570000094
The frequency coefficients of the feature words are respectively
Figure GDA0003148176570000095
And
Figure GDA0003148176570000096
namely, the relevance of the feature words x and y is:
Figure GDA0003148176570000097
wherein Nx is the total number of cases containing the characteristic word x, Ny is the total number of cases containing the characteristic word y, and Nxy is the total number of cases containing { x + y };
the matching degree corr (X, Y) of the two processing opinions is:
Figure GDA0003148176570000098
where l1 and l2 are the number of words in the processing opinion X and Y, respectively, YjIs the characteristic word of the visit text Y.
And setting a matching degree threshold T, when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X, and considering that the two processing opinions are approximate when the score is more than the set threshold, wherein the value of T is 0.6.
Preferably, in order to ensure that the calculation of the relevance is more reasonable, word sense analysis is added, namely, x is addediSynonyms of (u)iAnd a synonym viThus, the corresponding degree of association is: of course synonyms uiAnd a synonym viThe number of (a) may be one or more.
corr(X,ui)=corr(X,xi),
corr(X,vi)=k1*corr(X,xi),k1∈(0,1)。
Wherein k is1Is the coefficient of the degree of association of the similar meaning word, and the value is 0.6 here.
The matching degree corr (X, Y) of the two processing opinions is:
Figure GDA0003148176570000101
wherein U is a synonym set for processing the opinion X, V is a synonym set for processing the opinion X, G is a synonym set for processing the opinion Y, H is a synonym set for processing the opinion Y, U is a synonym set for processing the opinion Y, andi∈U,vi∈V,gj∈G,hje.h, m1 and m2 are the number of words in sets U and G, respectively, and n1 and n2 are the number of words in sets V and H, respectively.
And setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
Further, if corr (X, Y) < T, the comment is replaced with the processing opinion according to the scoring method based on the comment, the feature words of the processing opinion are extracted first by using the word segmentation algorithm word2vec, and then the satisfaction scoring is performed by using the multi-term logistic regression (MLR).
In a second aspect, an embodiment of the present disclosure provides a talent evaluation method for a big data case evaluation model, where the method includes the following steps:
case scores are obtained according to the design method of the big data case evaluation model and through all question classesGrading the case of each processor to obtain the comprehensive score of each case processor q under each category
Figure GDA0003148176570000111
(the point scale was set here to 5 points: 4 points (expert), 3 points (expert), 2 points (skilled), 1 point (learned), 0 points (novice)).
Figure GDA0003148176570000112
Q is the total number of persons to be treated, L is the number of question categories, MlFor the number of cases under each classification,
Figure GDA0003148176570000113
a satisfaction score is assigned to each case of the handler under each classification;
Figure GDA0003148176570000114
the higher the value is, the stronger the case processing ability of the case processor q in the problem category l is, i.e. the better the case processing the field corresponding to the category is.
Composite score S for each case handler qqIs composed of
Figure GDA0003148176570000115
Preferably, the difficulty degrees of processing different problem categories (short for: classification) are different, namely the problem categories which are common in the past can be quickly solved, and the problem categories which are less contacted before are more troublesome; the method can finish the problem types quickly, and the processing period is usually long for the problem types which are troublesome; therefore, the satisfaction score result under each category needs to be further optimized through the category processing period and the category occurrence frequency: for the problem category (longer processing period), the requirement on the level of a processor is high, so that the corresponding satisfaction score is correspondingly improved; for the rare (occurring frequency is very few) problem categories, the requirement on the level of a processor is higher, so that the corresponding satisfaction score is correspondingly improved.
Note that the average processing period corresponding to each question category is tlMonth, the number of cases under each category is MlOptimizing the score by increasing the weighting factor for the score according to the processing period tlThe relationship with the score is designed to process the cycle lifting factor as
Figure GDA0003148176570000116
According to the number of texts under each category as MlIn relation to the score, the lifting coefficient of the designed occurrence frequency is
Figure GDA0003148176570000117
Namely obtaining the optimized comprehensive score of each case handler q under each category
Figure GDA0003148176570000118
Figure GDA0003148176570000119
Wherein a and b ∈ (0,1) are weight coefficients.
Optimized comprehensive score S of each case processing person qqComprises the following steps:
Figure GDA0003148176570000121
in a third aspect: the embodiment of the disclosure provides a talent base construction method of a big data case evaluation model, which comprises the following steps:
obtaining the comprehensive score of each case processing person q under each category according to the talent evaluation method of the big data case evaluation model
Figure GDA0003148176570000122
And a composite score S for each case-handler qq
According to
Figure GDA0003148176570000123
Whether the value is greater than a given threshold value 1, an
Figure GDA0003148176570000124
Sorting results, and constructing a talent base of the problem category l (or the corresponding field);
according to SqWhether the value is greater than a given threshold value of 2, and SqAnd sequencing results to construct a talent library in the comprehensive field.
Figure GDA0003148176570000125
The higher the value is, the stronger the case processing ability of the case processor q in the problem category l is, i.e. the better the case processing the field corresponding to the category is. According to
Figure GDA0003148176570000126
Or SqThe score value of (a) may give each person's integrated ability, and area of excellence in the question, and the result table is exemplified below. We define that scores over 3.6 are expert, scores over 3 are expert, scores over 2.4 are skilled, scores over 1 are known, otherwise it is novice.
Preferably, threshold values of 'expert', 'skillful', 'skilled', 'learned', 'novice' in the talent library are respectively set, and five types of talent libraries are respectively obtained;
in a fourth aspect, an embodiment of the present disclosure provides a talent base recommendation method for a big data case evaluation model, where the method includes:
for a new case, firstly, matching the 'expert' and 'proficiency' of the talent base corresponding to the problem category, secondly, matching 'proficiency' and 'learning', and if no matching object exists, matching the talent base in the comprehensive field by the same method.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (10)

1. A design method of a big data case evaluation model is characterized by comprising the following steps:
collecting and storing historical cases, scores, comments and processing opinions thereof, basic information of an applicant and processing information according to the problem category;
judging the event processing state, and selecting all cases with the processing states of being finished as historical cases;
in the historical case, the marked historical case is directly used as a satisfaction score, and the satisfaction score of the historical case is calculated according to the corresponding comment by using a natural language processing NLP algorithm and a multi-item logistic regression model without marking;
extracting feature words of the comment by using a word segmentation algorithm tool word2vec, and then performing satisfaction scoring by using a plurality of logistic regression, wherein the method specifically comprises the following steps:
randomly selecting a certain number of training samples and test samples in the historical case by a random sampling method, wherein the number of the selected samples is N, and the random seeds are recorded as R; in the training samples, the cases with given scores can be directly used as the training samples, and the samples without scores need to be scored for each comment through a text labeling tool;
firstly, obtaining model parameters by using an MLR regression equation through training set samples, wherein the model parameters are as follows:
setting comment text of a certain case as Xj,j=1,…,Ntrain,NtrainFor the number of training samples, the corresponding satisfaction score is denoted SjAnd each feature word extracted by the word2vec tool is marked as xijI-1, …, n, where n is the comment text XjTotal number of Chinese words, set to xijM is the number of occurrences in the text, xijFrequency p of occurrence in the textxij=m/n;
The score can be obtained by the comment textEach feature word x in the textijThe corresponding score is denoted SjThe score of the feature word is the score of the comment text where the feature word is located, namely, for each comment text, a corresponding triple set { (x)11,px11,S1),(x21,px21,S1)……}
Obtaining the corresponding weight coefficient W of the feature words through an MLR regression equation, and calculating the corresponding satisfaction degree score S of the comment texts of the test set through the trained model parameters of the training setk
Figure FDA0003148176560000011
wikE W, N is the number of words in the text, k is 1, …, Ntest
Calculating the prediction accuracy P and testing the centralized satisfaction degree score SkCorresponding true score is
Figure FDA0003148176560000012
Figure FDA0003148176560000013
Is the number of test samples;
different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR weight coefficients W are obtained, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
2. The design method of big data case evaluation model according to claim 1, further comprising scoring satisfaction degree SkFurther optimization is carried out, specifically as follows:
recording the number of all question categories as L and the number of cases under each category as MlL1, …, L with a corresponding score of Sjl,j=1,…,MlL1, …, L, corresponding to a processing period of t months, score optimization is performed by increasing the weighting factor of score enhancementDesigning a processing period lifting coefficient according to the relation between the processing period t and the score as
Figure FDA0003148176560000021
According to the number of texts under each category as MlAnd designing the occurrence frequency lifting coefficient as follows according to the relation with the score:
Figure FDA0003148176560000022
that is to say the optimized
Figure FDA0003148176560000023
Figure FDA0003148176560000024
Wherein a and b are weight coefficients (0,1),
calculating the prediction accuracy P and testing the centralized satisfaction degree score
Figure FDA0003148176560000025
Corresponding true score is
Figure FDA0003148176560000026
Figure FDA0003148176560000027
Different training sets and test sets can be obtained by traversing the random seeds R in the random sampling, so that corresponding MLR weight coefficients W are obtained, then parameters a and b are traversed, corresponding model accuracy is obtained, and the model with the highest accuracy is taken as a scoring model.
3. The design method of a big data case evaluation model according to any of claims 1 or 2, characterized in that, for cases without scoring and evaluation, the processing opinion is marked as X, the processing opinion X is matched with the processing opinion Y of cases in historical cases, the score of the historical case with the most similar processing opinion is selected for scoring, specifically, the matching degree of the two processing opinions is calculated, and a threshold is set, if the matching degree exceeds the threshold, the case without scoring is scored by using the scored case;
calculating the matching degree of the processing opinions X, Y of the two cases specifically as follows:
calculating the association degree of the processing opinion and the characteristic words thereof, and marking each characteristic word extracted from the processing opinion X as XiStatistics of xiThe number m of occurrences in the text of X and the total number n of the feature words in X are obtainediFrequency of occurrence pxi=m/n;
Processing opinion X and its feature word XiThe degree of association is:
Figure FDA0003148176560000031
calculating the relevance of the feature words x and y, determining the relevance coefficient and the feature word frequency coefficient by utilizing mutual information, and designing the relevance coefficient as
Figure FDA0003148176560000032
The frequency coefficients of the feature words are respectively
Figure FDA0003148176560000033
And
Figure FDA0003148176560000034
namely, the relevance of the feature words x and y is:
Figure FDA0003148176560000035
wherein Nx is the total number of cases containing the characteristic word x, Ny is the total number of cases containing the characteristic word y, and Nxy is the total number of cases containing { x + y };
the matching degree corr (X, Y) of the two processing opinions is:
Figure FDA0003148176560000036
where l1 and l2 are the number of words in the processing opinion X and Y, respectively, YjFeature words of the visited text Y;
and setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
4. The design method of big data case evaluation model according to claim 3, characterized in that the method further comprises: adding xiSynonyms of (u)iAnd a synonym viThus, the corresponding degree of association is:
corr(X,ui)=corr(X,xi),
corr(X,vi)=k1*corr(X,xi),k1∈(0,1)
wherein k is1Is a coefficient of degree of association of the near word;
the matching degree corr (X, Y) of the two processing opinions is:
Figure FDA0003148176560000041
wherein U is a synonym set for processing the opinion X, V is a synonym set for processing the opinion X, G is a synonym set for processing the opinion Y, H is a synonym set for processing the opinion Y, U is a synonym set for processing the opinion Y, andi∈U,vi∈V,gj∈G,hje.g. H, m1 and m2 are the numbers of words in the sets U and G, respectively, and n1 and n2 are the numbers of words in the sets V and H, respectively;
and setting a matching degree threshold T, and when corr (X, Y) is more than or equal to T, using the score of the processing opinion Y as the score of the processing opinion X.
5. The method as claimed in claim 4, wherein if corr (X, Y) < T, the comment is replaced with the processing opinion according to the scoring method based on the comment, the word segmentation algorithm word2vec is used to extract the feature words of the processing opinion, and then the multiple logistic regression MLR is used to score the satisfaction.
6. A talent evaluation method of a big data case evaluation model is characterized by comprising the following steps:
case score obtained by the design method of big data case evaluation model according to any one of claims 1-5, and comprehensive score of each case handler q under each category is obtained through case score of each handler under each problem category
Figure FDA0003148176560000042
Figure FDA0003148176560000043
Q is the total number of persons to be treated, L is the number of question categories, MlFor the number of cases under the problem category l,
Figure FDA0003148176560000044
a satisfaction score is assigned to each case of the handler under each classification;
further obtain the comprehensive score S of each case processing person qq
Figure FDA0003148176560000045
7. The talent evaluation method of big data case evaluation model according to claim 6, further comprising: and further optimizing the satisfaction degree score results under each category through the processing periods and the occurrence frequencies of different categories:
remember each question category pairShould average the processing period tlMonth, the number of cases under each category is MlOptimizing the score by increasing the weighting factor for the score according to the processing period tlThe relationship with the score is designed to process the cycle lifting factor as
Figure FDA0003148176560000051
According to the number M of texts under each categorylIn relation to the score, the lifting coefficient of the designed occurrence frequency is
Figure FDA0003148176560000052
Namely obtaining the optimized comprehensive score of each case handler q under each category
Figure FDA0003148176560000053
Figure FDA0003148176560000054
Wherein a and b are belonged to (0,1) as weight coefficients;
optimized comprehensive score S of each case processing person qqComprises the following steps:
Figure FDA0003148176560000055
8. a talent base construction method of a big data case evaluation model is characterized by comprising the following steps:
the talent evaluation method of big data case evaluation model according to any one of claims 6-7, obtaining the comprehensive score of each case processing person q under each category
Figure FDA0003148176560000056
And a composite score S for each case-handler qq
According to
Figure FDA0003148176560000057
Whether the value is greater than a given threshold value 1, an
Figure FDA0003148176560000058
Sequencing results, and constructing a talent library of the problem category l;
according to SqWhether the value is greater than a given threshold value of 2, and SqAnd sequencing results to construct a talent library in the comprehensive field.
9. The talent base construction method of big data case evaluation model according to claim 8, characterized by further comprising: the thresholds of "expert", "skilled", "learning" and "novice" in the talent bank are set respectively to obtain five types of talent banks.
10. A talent base recommendation method for a big data case evaluation model is characterized by comprising the following steps:
the talent database constructed by the talent database construction method for the big data case evaluation model according to any one of claims 8-9, for a new case, firstly matches the "expert" and "proficiency" of the corresponding problem category talent database, secondly matches the "proficiency" and then the "understanding", and if there is no matching object, matches the talent database in the comprehensive field by the same method.
CN202011625659.8A 2020-09-30 2020-12-31 Design method of big data case evaluation model, talent evaluation method, talent library construction and recommendation method Active CN112581036B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011058457X 2020-09-30
CN202011058457 2020-09-30

Publications (2)

Publication Number Publication Date
CN112581036A CN112581036A (en) 2021-03-30
CN112581036B true CN112581036B (en) 2021-09-24

Family

ID=75144926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011625659.8A Active CN112581036B (en) 2020-09-30 2020-12-31 Design method of big data case evaluation model, talent evaluation method, talent library construction and recommendation method

Country Status (1)

Country Link
CN (1) CN112581036B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636447A (en) * 2015-01-21 2015-05-20 上海天呈医流科技股份有限公司 Intelligent evaluation method and system for medical instrument B2B website users
CN110717654A (en) * 2019-09-17 2020-01-21 合肥工业大学 Product quality evaluation method and system based on user comments

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636447A (en) * 2015-01-21 2015-05-20 上海天呈医流科技股份有限公司 Intelligent evaluation method and system for medical instrument B2B website users
CN110717654A (en) * 2019-09-17 2020-01-21 合肥工业大学 Product quality evaluation method and system based on user comments

Also Published As

Publication number Publication date
CN112581036A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112700325A (en) Method for predicting online credit return customers based on Stacking ensemble learning
CN109711424B (en) Behavior rule acquisition method, device and equipment based on decision tree
CN110188047A (en) A kind of repeated defects report detection method based on binary channels convolutional neural networks
CN112966962A (en) Electric business and enterprise evaluation method
CN111583012B (en) Method for evaluating default risk of credit, debt and debt main body by fusing text information
CN113051291A (en) Work order information processing method, device, equipment and storage medium
Chakrabarty A regression approach to distribution and trend analysis of quarterly foreign tourist arrivals in India
CN107545038A (en) A kind of file classification method and equipment
CN112016769B (en) Method and device for managing relative person risk prediction and information recommendation
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN111738856A (en) Stock public opinion investment decision analysis method and device
TWI477987B (en) Methods for sentimental analysis of news text
CN115544348A (en) Intelligent mass information searching system based on Internet big data
CN107992613A (en) A kind of Text Mining Technology protection of consumers&#39; rights index analysis method based on machine learning
CN113837578B (en) Grid supervision, management and evaluation method for power supervision enterprise
CN113592197A (en) Household service recommendation system and method
Tiruneh et al. Feature selection for construction organizational competencies impacting performance
CN112581036B (en) Design method of big data case evaluation model, talent evaluation method, talent library construction and recommendation method
CN115660608B (en) One-stop innovative entrepreneurship incubation method
CN113538021A (en) Machine learning algorithm for store continuity prediction of shopping mall
CN111507528A (en) Stock long-term trend prediction method based on CNN-L STM
WO2023061174A1 (en) Method and apparatus for constructing risk prediction model for autism spectrum disorder
CN116703328A (en) Project review method and system
CN113657726B (en) Personnel risk analysis method based on random forest
CN113887994A (en) Failure mode risk assessment method and system based on Internet comment mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Design method, talent evaluation method, talent pool construction and recommendation method of a big data case evaluation model

Effective date of registration: 20220705

Granted publication date: 20210924

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2022980009897

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230720

Granted publication date: 20210924

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2022980009897

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Design method, talent evaluation method, talent pool construction and recommendation method of a big data case evaluation model

Effective date of registration: 20230803

Granted publication date: 20210924

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2023980050832